
The Model Card Template That Passes FDA Pre-Cert Review
The FDA Submission That Got Rejected
Startup: "We're submitting our AI diagnostic tool for FDA Pre-Cert."
FDA Reviewer: "Provide documentation: training data, model architecture, evaluation metrics, clinical validation."
Startup: "We have a white paper..."
FDA: "We need structured documentation. Model card, data card, and clinical evaluation report. Resubmit in 6 months."
The Delay: 6 months of scrambling to create documentation that should've existed from day one.
1. **Model Card**: What the AI does, how it was trained, limitations
What FDA Pre-Cert Requires (The Checklist)
Three Documents:
- Model Card: What the AI does, how it was trained, limitations
- Data Card: Where training data came from, bias testing, quality control
- Clinical Evaluation Report: Real-world validation, safety monitoring
Timeline:
- Without documentation: 12-18 months to approval
- With documentation: 6-9 months
Cost Savings: 6 months of eng time + faster time to market

The FDA-Ready Model Card Template
Section 1: Intended Use
What FDA Wants:
- Medical condition/disease targeted
- Patient population (age, sex, comorbidities)
- Clinical setting (hospital, clinic, home use)
- User (physician, nurse, patient)
Example:
INTENDED USE Medical Condition: Type 2 Diabetes screening Patient Population: Adults 18-75, no prior diabetes diagnosis Clinical Setting: Primary care clinic Primary User: Primary care physician Decision Support: AI flags high-risk patients for lab testing (HbA1c)Click to examine closely
What NOT to Say: "General health screening" (too vague—FDA will reject)
Section 2: Model Architecture
What FDA Wants:
- Algorithm type (e.g., "Gradient boosting classifier")
- Input features (e.g., "Age, BMI, blood pressure, family history")
- Output (e.g., "Risk score 0-100, with threshold at 70 for high-risk")
Example:
MODEL ARCHITECTURE Algorithm: XGBoost (gradient boosting decision trees) Version: XGBoost 1.7.0 Inputs: 12 clinical features (age, BMI, systolic BP, fasting glucose, etc.) Output: Diabetes risk score (0-100) Threshold: Score ≥70 = High Risk (recommend HbA1c lab test)Click to examine closely
Why This Matters: FDA needs to understand how the AI makes decisions (interpretability requirement).
Section 3: Training Data
What FDA Wants:
- Source (where data came from)
- Volume (how many patients)
- Demographics (age, sex, race, ethnicity)
- Date range (when data was collected)
- Quality control (how you ensured data accuracy)
Example:
TRAINING DATA Source: Electronic Health Records from [Hospital System], IRB-approved (Protocol #12345) Volume: 50,000 patients (2018-2023) Demographics: - Age: Mean 52 (range 18-75), SD 14 - Sex: 52% female, 48% male - Race: 60% White, 20% Black, 12% Hispanic, 8% Asian - Ethnicity: 85% Non-Hispanic, 15% Hispanic Data Quality: - Missing data: <5% per feature (imputed using median) - Outliers: Values >99th percentile reviewed by clinician, corrected or removed De-Identification: HIPAA-compliant (dates shifted, names removed, rare diagnoses aggregated)Click to examine closely
Red Flag: If demographics don't match US population, FDA will ask about bias.
Section 4: Evaluation Metrics
What FDA Wants:
- Accuracy, sensitivity, specificity (clinical gold standards)
- Performance by demographic subgroup (fairness testing)
- Comparison to human clinicians (is AI better?)
- Clinical impact (does AI improve patient outcomes?)
Example:
EVALUATION METRICS Test Set: 10,000 patients (held out, not used in training) Overall Performance: - Sensitivity (Recall): 87% (95% CI: 85-89%) - Specificity: 82% (95% CI: 80-84%) - AUC: 0.91 Subgroup Performance (Fairness Testing): - Female: Sensitivity 88%, Specificity 83% - Male: Sensitivity 86%, Specificity 81% - White: Sensitivity 89%, Specificity 84% - Black: Sensitivity 84%, Specificity 79% (within 5pp, acceptable) Comparison to Physician: - Physician sensitivity: 78% (AI +9pp improvement) - Physician specificity: 85% (AI -3pp, acceptable trade-off) Clinical Impact: - Early detection: AI flags 12% more high-risk patients than physician alone - Estimated prevented complications: 200 cases/year per 10,000 patients screenedClick to examine closely
Why This Matters: FDA cares about patient outcomes, not just model accuracy.
Section 5: Limitations and Warnings
What FDA Wants:
- Known failure modes (when AI is unreliable)
- Contraindications (when NOT to use AI)
- Required human oversight (physician must review)
Example:
LIMITATIONS Known Failure Modes: - Lower accuracy for patients with rare comorbidities (<1% of population) - Not validated for patients under 18 or over 75 - Not validated for Type 1 Diabetes (only Type 2) Contraindications: - Do NOT use for patients with pre-existing diabetes diagnosis - Do NOT use as sole diagnostic tool (lab confirmation required) Required Human Oversight: - Physician must review all high-risk flags before ordering lab tests - AI is decision support, not autonomous diagnosis - Physician retains final clinical decision authorityClick to examine closely
Why This Matters: FDA wants proof you're not overselling the AI's capabilities.
Section 6: Post-Market Surveillance
What FDA Wants:
- How you'll monitor AI performance in production
- What triggers a safety alert (accuracy drop, adverse events)
- How often you'll retrain/update the model
Example:
POST-MARKET SURVEILLANCE Monitoring Plan: - Monthly accuracy tracking on production data (random sample of 500 patients) - Alert trigger: Sensitivity drops below 80% OR specificity drops below 75% - Physician feedback: Track overrides, false positives, false negatives Safety Reporting: - Adverse events (patient harm) reported to FDA within 30 days - Quarterly summary report to FDA (performance metrics, user feedback) Model Updates: - Annual retraining with new data (subject to FDA review) - Version control: All model versions documented, old versions archivedClick to examine closely
Why This Matters: FDA Pre-Cert assumes continuous improvement (not "set it and forget it").
Real Example: Diabetic Retinopathy Detection AI
Product: AI analyzes retinal images, flags diabetic retinopathy.
FDA Submission:
Intended Use: Screen diabetic patients for retinopathy in primary care settings (not ophthalmology clinics).
Model: Convolutional neural network (ResNet-50 architecture)
Training Data: 120,000 retinal images from 5 hospital systems (2015-2020)
Evaluation:
- Sensitivity: 92% (FDA target: >85%)
- Specificity: 88%
- Comparison: Ophthalmologist sensitivity 95% (AI -3pp, acceptable for screening)
Limitations:
- Not for patients with cataracts (image quality too poor)
- Requires human ophthalmologist to confirm positive findings
Post-Market:
- Monthly monitoring: Random sample of 1,000 images re-reviewed by ophthalmologist
- Alert: If AI sensitivity drops below 88%, auto-disable pending investigation
FDA Decision: Approved (6 months from submission to clearance).
Why It Worked: Documentation was complete upfront. No back-and-forth with FDA.
The Data Card (Companion to Model Card)
What FDA Wants (separate document):
- Data provenance: IRB approval, patient consent, HIPAA compliance
- Bias testing: Performance by race, sex, age, socioeconomic status
- Data retention: How long you keep training data, why
- Data security: Encryption, access controls, audit logs
Example Snippet:
DATA CARD
Provenance:
- Source: [Hospital System] EHR database
- IRB: Approved under Protocol #12345, waiver of consent (de-identified data)
- HIPAA: Compliant (Business Associate Agreement signed)
Bias Testing:
- Racial parity: Sensitivity within 5pp across racial groups
- Gender parity: Sensitivity within 3pp (female 88%, male 86%)
- Age: Lower sensitivity for patients >70 (79% vs. 87% for 40-60 age group)
→ Mitigation: Added warning for physicians treating elderly patients
Data Retention:
- Training data: Retained for 10 years (FDA device record requirement)
- Production data: De-identified logs retained for 3 years (monitoring)
Data Security:
- Encryption: AES-256 at rest, TLS 1.3 in transit
- Access: Role-based (PM, ML engineer, clinical validator—7 people total)
- Audit logs: Reviewed quarterly by compliance team
Click to examine closely
*If any box is unchecked, FDA will request more documentation.**
Checklist: Is Your Model Card FDA-Ready?
- Intended use (specific medical condition, patient population, clinical setting)
- Model architecture (algorithm, inputs, outputs, threshold)
- Training data (source, volume, demographics, quality control)
- Evaluation metrics (sensitivity, specificity, AUC, subgroup performance)
- Comparison to human clinician (is AI better/worse?)
- Clinical impact (does AI improve patient outcomes?)
- Limitations (failure modes, contraindications, required oversight)
- Post-market surveillance (monitoring plan, safety reporting, update schedule)
If any box is unchecked, FDA will request more documentation.
Common PM Mistakes
Mistake 1: Claiming "General Purpose" AI
- Reality: FDA requires narrow, well-defined medical use cases
- Fix: Specify exact condition, population, setting (not "health screening")
Mistake 2: No Bias Testing
- Reality: FDA will reject if you haven't tested performance across demographics
- Fix: Report sensitivity/specificity by race, sex, age (minimum)
Mistake 3: No Post-Market Plan
- Reality: FDA Pre-Cert assumes you'll monitor and update the AI
- Fix: Document monitoring frequency, alert triggers, update process
Alex Welcing is a Senior AI Product Manager in New York who writes FDA-ready model cards before submitting medical device AI. His regulatory approvals take 6 months, not 18, because documentation is a product requirement from day one.