ClinArena Orientation Guide
Thank you for participating in this important research. This guide will help you understand how to complete evaluations effectively and ensure your contributions are properly tracked for co-authorship.
Getting Started
Login Options
IMPORTANT: You can log in using either your National Provider Identifier (NPI) or your email address. Both methods ensure your evaluations are tracked and counted toward co-authorship.
Option 1: NPI Login
- Navigate to
https://clinarena.com/ - Select "NPI" as your verification method
- Enter your 10-digit NPI in the "National Provider Identifier" field
- Click "Verify Identity"
- Your identity will be verified against the CMS NPI Registry
Option 2: Email Login
- Navigate to
https://clinarena.com/ - Select "Email" as your verification method
- Enter your email address and full name
- Click "Verify Identity"
- Your identity will be verified via email
Note: If you do not have an NPI or prefer not to log in, you can vote as a guest. However, guest votes are recorded anonymously and will not be counted toward co-authorship.
How ClinArena Works
The Evaluation Interface
For each case, you will see:
- Ground Truth Explanation: A brief patient summary at the top
- FHIR Data (expandable): Complete EHR data including:
- Lab results
- Active conditions and diagnoses
- Current medications
- Family history
- Vital signs
- Other clinical data
- Two Model Responses (Model A and Model B): Anonymized AI-generated analyses displayed side-by-side
Your Task
Compare the two model responses and select which one is better. You will evaluate 500 pairwise comparisons total (50 cases × 10 comparisons per case).
Evaluation Criteria
When comparing two model responses, consider the following dimensions:
1. Clinical Summary Quality
- Does the response accurately summarize the patient's clinical picture?
- Are the most important findings highlighted?
- Is the summary clear and well-organized?
2. Identification of Clinically Significant Findings
- Does the response identify critical lab abnormalities (e.g., acute kidney injury, hyperkalemia, severe anemia)?
- Are dangerous drug interactions or contraindications flagged?
- Are disease progression patterns or complications recognized?
3. Actionable Clinical Insights
- Does the response provide recommendations that would be useful to a clinician?
- Are the suggested next steps appropriate and evidence-based?
- Does the response prioritize urgent issues appropriately?
4. Evidence and Citations
- Are claims supported by citations from the medical literature?
- Are the citations relevant and from reputable sources?
- Is the evidence appropriately applied to this specific patient?
5. Depth and Completeness
- Does the response address the full complexity of the patient's condition?
- Are important comorbidities and their interactions considered?
- Is the analysis thorough without being unnecessarily verbose?
Making Your Selection
After reviewing both responses, you have several options:
- Model A if the left response is superior
- Model B if the right response is superior
- Tie if both responses are truly equivalent in quality
- Skip if the case is outside your specialty or you're not comfortable evaluating it
Skip Functionality
If a case falls outside your area of expertise or you're not comfortable evaluating it, you can use the "Skip" button. This allows you to:
- Move to the next case without submitting a vote
- Focus on cases within your specialty or comfort zone
- Ensure evaluations are completed by clinicians with appropriate expertise
Skipped cases are not recorded as votes, so you can skip as many cases as needed.
Optional Feedback
When making your selection, you can optionally provide feedback about the comparison. Click the "Feedback" button in the voting interface to expand a text area where you can:
- Share your thoughts on why you chose one model over another
- Note any concerns or observations about the responses
- Provide context about your clinical reasoning
- Highlight particularly strong or weak aspects of the responses
Feedback is completely optional but helps improve the quality of our research. Your feedback will be stored with your vote and can provide valuable insights for model development.
Important Guidance on Ties
Use the "Tie" button sparingly. We recognize that in some cases, two responses may be very similar, but we encourage you to make a choice whenever possible. Ask yourself:
- If I had to choose one response to present to a colleague, which would it be?
- Which response would I trust more in a real clinical scenario?
- Even if both are good, which has a slight edge in any dimension?
Only use "Tie" if:
- Both responses are truly indistinguishable in quality
- Both responses have equivalent strengths and weaknesses
- You absolutely cannot determine a preference after careful review
In practice, ties should be rare (ideally <10% of comparisons).
Best Practices
Review the Full EHR Data
While the ground truth explanation provides a summary, expand and review the full FHIR data to understand the complete clinical picture. The model responses may reference findings that are not in the summary but are present in the detailed EHR.
Take Your Time
Each comparison should take 1-2 minutes. Don't rush—your clinical judgment is what makes this evaluation valuable.
Work at Your Own Pace
You can complete evaluations in multiple sessions. The platform will save your progress. We recommend completing at least 50-100 comparisons, but more is always better.
Stay Objective
The models are anonymized (labeled only as "Model A" and "Model B") to prevent bias. You won't know which model is which, and the same model may appear as "Model A" in one comparison and "Model B" in another.
Trust Your Clinical Judgment
You are the expert. If something in a model response seems wrong, clinically inappropriate, or potentially harmful, that should heavily influence your decision.
Skip When Appropriate
Don't hesitate to skip cases that are outside your specialty or where you don't feel comfortable making an evaluation. It's better to skip than to provide an evaluation without appropriate expertise.
Provide Feedback When Helpful
While feedback is optional, it can be valuable for understanding your reasoning and improving the models. Consider providing feedback when you notice something particularly noteworthy or when your clinical judgment differs significantly from what might be expected.
Technical Tips
If You Encounter Issues
- Comparison won't load: Refresh the page and log in again (with your NPI or email)
- Can't expand FHIR data: Try a different browser (Chrome or Firefox recommended)
- Lost your place: The platform tracks your progress automatically
Browser Recommendations
- Recommended: Chrome, Firefox, Safari (latest versions)
- Screen size: Desktop or laptop recommended for optimal viewing
Progress Tracking
The platform will show you how many comparisons you've completed. We're asking for at least 50-100 comparisons per evaluator, but you're welcome to complete as many as you'd like. More evaluations mean more robust data and a stronger paper.
Thank You!
Your participation in ClinArena is essential to advancing the safe and effective deployment of AI in clinical practice. By contributing your expertise, you're helping to:
- Validate a new synthetic dataset for the research community
- Assess the accuracy and safety of current AI systems
- Guide the development of next-generation clinical AI tools
We're grateful for your time and expertise, and we look forward to collaborating with you on this important work.