Appendix G — Vendor AI Evaluation Toolkit – The Public Health AI Handbook

Introduction: Why You Need This Checklist

The AI Morgue (Appendix E) documented $100M+ in failed AI investments. Many failures were predictable. Warning signs existed but were ignored. Vendor claims went unquestioned. Due diligence was insufficient.

This toolkit helps you avoid those mistakes.

The Vendor-Buyer Information Asymmetry

Vendor knows: - Where the model was trained - What its limitations are - Where external validation failed - What patient populations it doesn’t work for - What the false alarm rate is in real-world use

You know: - What the vendor tells you in sales materials - (Often: not much more)

This checklist helps level the playing field.

Quick Reference: The 6-Domain Evaluation Framework

Use this framework to systematically evaluate any health AI vendor:

Domain	Key Questions	Red Flags
1. Technical Validation	Is there external validation? Peer-reviewed publication?	Internal validation only; no publications; vendor-funded studies
2. Clinical Safety	Safety testing? Adverse events tracked?	No safety data; no mention of harms; “100% safe” claims
3. Fairness & Equity	Performance across demographics? Bias audits?	No fairness testing; “we don’t use race so it’s fair”
4. Privacy & Security	HIPAA compliant? BAA provided? Encryption?	Vague privacy claims; no BAA; data sent to foreign servers
5. Workflow Integration	End-user testing? Training provided? Support available?	No user research; “plug and play”; minimal training
6. Business Viability	Company financially stable? Other customers? Roadmap?	Startup with no revenue; no references; unclear roadmap

Domain 1: Technical Validation 🔍

The Questions to Ask

Red Flags 🚩

Proceed with extreme caution if:

❌ “Validated on 100,000 patients” - But all from the same institution (not external validation)
❌ “98% accuracy” - On cherry-picked test set; no external validation
❌ “Deployed in 150+ hospitals” - Deployment ≠ Effectiveness; no outcome data
❌ “Proprietary validation” - No peer-reviewed publications; trust us
❌ “Can’t share validation data” - Red flag for lack of transparency
❌ Internal validation only - Models always perform better on development data
❌ Vendor-funded validation studies - Conflicts of interest

Scoring Rubric

Rate each item (0-2 points):

Criterion	0 Points	1 Point	2 Points
External Validation	None or internal only	1 site	3+ independent sites
Peer-Reviewed Publication	No	Conference abstract	Full peer-reviewed paper
Independent Researchers	Vendor employees only	Partial independence	Fully independent validation
Performance Reporting	AUC only	Sensitivity/specificity	Full confusion matrix + subgroup analysis
Generalizability Evidence	No evidence	Similar populations	Validated on your patient population

Scoring: - 8-10 points: Strong validation evidence - 5-7 points: Moderate evidence; pilot testing recommended - 0-4 points: Insufficient evidence; do NOT deploy

Domain 2: Clinical Safety 🏥

The Questions to Ask

Red Flags 🚩

Proceed with extreme caution if:

❌ “No reported adverse events” - Likely means no monitoring system, not that it’s safe
❌ “Clinicians love it” - No quantitative data on alert fatigue or response rates
❌ “Seamless integration” - No workflow analysis or user research
❌ High false positive rate (>20%) - Will cause alert fatigue
❌ No outcome data - Only technical performance (AUC) reported
❌ “100% safe” - Overconfident claims; every system has failure modes
❌ No failure mode analysis - Every AI system can fail; what happens when it does?

Lessons from Epic Sepsis Model

The Epic sepsis model had: - ✅ High AUC (0.76-0.83) - Impressive technical performance - ❌ But 88% false positive rate in external validation - ❌ No evidence of improved patient outcomes - ❌ Alert fatigue caused clinicians to ignore alerts

Don’t repeat this mistake. Demand outcome data, not just AUC.

Scoring Rubric

Rate each item (0-2 points):

Criterion	0 Points	1 Point	2 Points
Safety Testing	None mentioned	Some testing	Comprehensive failure mode analysis
Outcome Evidence	None	Retrospective analysis	Prospective RCT or quasi-experimental
Alert Burden	Unknown	<50% false positive	<20% false positive + manageable alert rate
Human Factors	No testing	Limited usability testing	Comprehensive workflow analysis
Adverse Event Monitoring	No system	Passive reporting	Active surveillance system

Scoring: - 8-10 points: Strong safety evidence - 5-7 points: Moderate evidence; close monitoring required - 0-4 points: Insufficient safety evidence; do NOT deploy

Domain 3: Fairness & Equity ⚖️

The Questions to Ask

Red Flags 🚩

Proceed with extreme caution if:

❌ “We don’t use race as a feature, so it’s fair” - Fairness through unawareness doesn’t work; race correlated with many features
❌ “Our algorithm is objective” - Algorithms encode human choices and societal biases
❌ “High accuracy means fair” - Accuracy ≠ Fairness (see OPTUM case)
❌ No fairness testing - Bias is default; fairness must be tested, not assumed
❌ Using costs as proxy for health needs - See OPTUM case: costs reflect access barriers, not just health needs
❌ Trained on non-representative data - Homogeneous training data (academic medical centers only, commercially insured only, etc.)

Lessons from OPTUM Algorithmic Bias

The OPTUM algorithm: - ✅ Accurately predicted healthcare costs - High technical performance - ❌ But costs ≠ health needs, especially for Black patients - ❌ Systematically underestimated Black patients’ health needs - ❌ 46.5% more Black patients should have been enrolled for equity

Don’t repeat this mistake. Test for fairness explicitly.

Scoring Rubric

Rate each item (0-2 points):

Criterion	0 Points	1 Point	2 Points
Fairness Audit	None conducted	Internal audit	Independent external audit
Subgroup Analysis	No reporting	Some subgroups	Comprehensive (race, ethnicity, age, sex, SES)
Training Data Diversity	Homogeneous	Somewhat diverse	Highly representative of your population
Proxy Variable Assessment	No assessment	Acknowledged	Validated against direct outcomes
Equity Impact Plan	No plan	Monitoring planned	Active mitigation strategies

Scoring: - 8-10 points: Strong fairness evidence - 5-7 points: Moderate evidence; continuous monitoring essential - 0-4 points: High bias risk; do NOT deploy without mitigation

Domain 4: Privacy & Security 🔒

The Questions to Ask

Red Flags 🚩

Proceed with extreme caution if:

❌ Refuses to sign BAA - Non-starter for HIPAA compliance
❌ “We’ll sign BAA later” - Must be in place BEFORE data sharing
❌ Vague about data storage location - “Cloud” is not specific enough; which cloud? Which region?
❌ Data sent to foreign servers - Compliance and privacy risks
❌ “We anonymize data so HIPAA doesn’t apply” - Re-identification risk; HIPAA still applies to most health data
❌ No SOC 2 or security certification - Unvetted security practices
❌ “Trust us with your data” - Trust requires verification
❌ Unclear data retention/deletion - Your data may persist indefinitely

Lessons from DeepMind & TraceTogether

DeepMind Streams: - ❌ Collected entire medical histories (not just kidney-related data) - ❌ Patients not informed - ❌ No proper legal basis - ❌ Result: Ruled unlawful by UK ICO

TraceTogether: - ❌ “Data only for contact tracing” → Became crime-fighting tool - ❌ Privacy promises broken - ❌ Trust destroyed

Lesson: Privacy promises must be legally binding and technically enforced.

Scoring Rubric

Rate each item (0-2 points):

Criterion	0 Points	1 Point	2 Points
HIPAA Compliance	No BAA or refuses	Will sign BAA	BAA + SOC 2 + HITRUST
Data Minimization	Collects everything	Some minimization	Strict minimization; edge deployment option
Security Certifications	None	SOC 2 Type I	SOC 2 Type II + penetration testing
Transparency	Vague policies	Clear policies	Detailed + third-party audit
Data Control	Vendor retains indefinitely	Retention period defined	You control data; deletion guaranteed

Scoring: - 8-10 points: Strong privacy & security - 5-7 points: Moderate; additional safeguards needed - 0-4 points: Unacceptable risk; do NOT proceed

Domain 5: Workflow Integration 🔄

The Questions to Ask

Red Flags 🚩

Proceed with extreme caution if:

❌ “Plug and play” - Healthcare is complex; no system is truly plug-and-play
❌ “Works with all EHRs” - Each EHR integration is custom; this is implausible
❌ “No training needed” - Users always need training for clinical decision support tools
❌ “One-size-fits-all” - Different institutions have different workflows and patient populations
❌ “We can implement in 2 weeks” - Unrealistic for complex systems; implementation takes months
❌ No user research - Designed in isolation from actual clinical workflows
❌ Minimal support - Email-only support; no phone; no dedicated account manager
❌ Black box, no customization - Can’t adjust thresholds or workflows to fit your institution

Lessons from Google Health India

Google’s diabetic retinopathy AI: - ✅ 96% accuracy in lab with research-grade cameras - ❌ But 55% of images ungradable in field with portable cameras - ❌ Nurses couldn’t operate system effectively (2-hour training inadequate) - ❌ 5 min/patient workflow disruption collapsed clinics - ❌ No offline mode; internet connectivity required

Lesson: Lab performance ≠ Field performance. Workflow integration matters.

Scoring Rubric

Rate each item (0-2 points):

Criterion	0 Points	1 Point	2 Points
User Research	None	Some user testing	Extensive ethnographic research
EHR Integration	No integration or manual entry	Some EHR support	Native integration with your EHR
Training Program	Minimal (<2 hours)	Half-day training	Comprehensive with ongoing support
Customization	Black box, no customization	Limited adjustments	Highly customizable
Support Quality	Email only	Email + phone	Dedicated account manager + on-site support

Scoring: - 8-10 points: Strong workflow integration - 5-7 points: Moderate; expect implementation challenges - 0-4 points: High risk of failure; don’t proceed without pilot

Domain 6: Business Viability 💼

The Questions to Ask

Red Flags 🚩

Proceed with extreme caution if:

❌ Early-stage startup, no revenue - High risk of going out of business; your investment lost
❌ Can’t provide customer references - No one willing to vouch for them; bad sign
❌ Version 1.0 product - Expect bugs and instability; you’re the beta tester
❌ Vague about pricing - “It depends”; no transparency; potential for unexpected costs
❌ Long-term contract with no exit clause - You’re locked in even if it doesn’t work
❌ No regulatory clearance when required - Legal risk
❌ Leadership with no healthcare experience - Tech team with no domain expertise

Lessons from IBM Watson for Oncology

IBM Watson: - ✅ IBM is a massive, stable company - ❌ But even IBM couldn’t make Watson work - ❌ $62M spent by MD Anderson; project failed - ❌ Hospitals that bought in early lost millions

Lesson: Big company ≠ Good product. Validation matters more than brand name.

Scoring Rubric

Rate each item (0-2 points):

Criterion	0 Points	1 Point	2 Points
Company Stability	Startup, no revenue	Funded startup or small profitable company	Established, profitable, >5 years
Customer Base	<5 customers or no references	5-20 customers, some references	20+ customers, multiple willing references
Product Maturity	V1.0	V2-3	V4+ with track record
Regulatory	No clearance (when required)	Clearance in progress	FDA cleared/approved
Pricing Transparency	Vague or hidden	Somewhat clear	Fully transparent, fair terms

Scoring: - 8-10 points: Financially stable, low risk - 5-7 points: Moderate risk; negotiate favorable contract terms - 0-4 points: High financial risk; consider waiting

Putting It All Together: The Overall Evaluation Matrix

Use this to synthesize scores across all domains:

Domain	Weight	Your Score (0-10)	Weighted Score
1. Technical Validation	25%	_____	_____
2. Clinical Safety	25%	_____	_____
3. Fairness & Equity	20%	_____	_____
4. Privacy & Security	15%	_____	_____
5. Workflow Integration	10%	_____	_____
6. Business Viability	5%	_____	_____
TOTAL	100%		_____ / 10

Decision Framework

Overall Score Interpretation:

8.0 - 10.0: Proceed with Deployment
- Strong evidence across all domains
- Still: Start with pilot in 1-2 units before hospital-wide deployment
- Monitor closely for first 6 months
6.0 - 7.9: Conditional Deployment with Mitigation
- Identify weak domains and create mitigation plans
- Example: Weak fairness score → Implement continuous bias monitoring
- Pilot with intensive monitoring
- Re-evaluate after 6 months
4.0 - 5.9: Do NOT Deploy; Negotiate Improvements
- Too many gaps in evidence
- Go back to vendor with requirements:
  - “We need external validation study before we’ll consider”
  - “We need fairness audit before we’ll proceed”
- Consider waiting for product maturity
0 - 3.9: Do NOT Deploy
- Insufficient evidence
- High risk of failure or harm
- Wait for better products or invest in developing your own

Sample Questions for Vendor Meetings

Use these scripts to extract critical information:

Technical Validation Questions

Script: > “Can you provide the peer-reviewed publication of your external validation study? We’d like to see performance metrics at institutions not involved in development, broken down by patient demographics.”

Follow-ups if vendor hesitates: - “If there’s no peer-reviewed external validation, when do you plan to conduct one?” - “Can you share the names of institutions where validation occurred so we can contact them?” - “What was the performance at hospitals most similar to ours?”

Fairness Questions

Script: > “We’re concerned about health equity. Can you show us the fairness audit results? Specifically, performance by race, ethnicity, socioeconomic status, and insurance type?”

Follow-ups: - “If no fairness audit has been done, why not?” - “What is your plan for ongoing bias monitoring?” - “What happens if we discover bias after deployment?”

Privacy Questions

Script: > “Walk us through exactly what data leaves our institution, where it goes, how it’s stored, and how we can verify this. Can we see the Business Associate Agreement and SOC 2 report?”

Follow-ups: - “What specific PHI elements does your system need?” - “Can the system work on-premise without sending data to the cloud?” - “What happens to our data if we terminate the contract?”

Safety Questions

Script: > “What patient outcomes have improved at hospitals using your system? Can you provide data on mortality, length of stay, readmissions, or quality of life?”

Follow-ups if vendor only cites AUC: - “AUC is a technical metric. Has deployment improved patient outcomes?” - “What is the false positive rate in real-world use?” - “What happens when the model fails? What are the failure modes?”

Workflow Integration Questions

Script: > “Has your system been tested with nurses and physicians at institutions like ours? What did the usability testing reveal?”

Follow-ups: - “What is the typical implementation timeline?” - “What ongoing support do you provide?” - “Can we speak with your customers about their implementation experience?”

Procurement Contract Language Recommendations

If you decide to proceed, include these provisions in your contract:

1. Performance Guarantees

Vendor guarantees that the AI system will achieve the following performance metrics
at [Institution Name] during the pilot period:

- AUC ≥ [threshold] (or other appropriate metric)
- False positive rate ≤ [threshold]%
- Alert burden ≤ [number] alerts per day per unit
- User satisfaction ≥ [threshold] (measured by survey)

If performance falls below these thresholds for [timeframe], [Institution] may
terminate the contract without penalty and receive full refund.

2. Fairness Requirements

Vendor warrants that the AI system has undergone bias testing and demonstrates
equitable performance across patient demographic groups (race, ethnicity, age, sex,
socioeconomic status).

Vendor will provide [Institution] with quarterly bias audit reports showing
performance metrics stratified by demographic subgroups.

If disparate impact is identified (performance difference >10% across groups),
Vendor will work with [Institution] to mitigate bias within [timeframe] or
[Institution] may terminate without penalty.

3. Data Privacy & Security

Vendor agrees to:
- Sign Business Associate Agreement (BAA) prior to any PHI access
- Store all data in HIPAA-compliant infrastructure
- Encrypt data at rest (AES-256) and in transit (TLS 1.3+)
- Provide SOC 2 Type II audit report annually
- Not use [Institution] data for Vendor's own R&D without explicit written consent
- Delete all [Institution] data within 30 days of contract termination
- Provide audit logs of all data access quarterly

4. Validation & Monitoring

[Institution] has the right to:
- Conduct independent validation studies of the AI system
- Publish validation results (positive or negative)
- Access model performance dashboards in real-time
- Receive quarterly performance reports from Vendor

Vendor will provide:
- Technical documentation for independent validation
- API access for performance monitoring
- Support for [Institution]'s evaluation efforts

5. Liability & Indemnification

Vendor agrees to indemnify [Institution] for:
- Any patient harm caused by AI system errors or failures
- Regulatory fines resulting from Vendor's non-compliance (HIPAA, etc.)
- Data breaches resulting from Vendor's security failures

Liability cap: [Amount] (no less than annual contract value x 5)

Vendor maintains professional liability insurance of at least [Amount].

6. Termination Rights

[Institution] may terminate this agreement:
- For cause (breach of contract): Immediate termination, full refund
- For convenience: 90-day notice, pro-rated refund for remaining term
- For safety concerns: Immediate termination if AI system poses patient safety risk
- For non-performance: If system fails to meet performance guarantees after [timeframe]

Upon termination:
- Vendor must delete all [Institution] data within 30 days
- Vendor must provide data export in standard format (CSV, FHIR, etc.)
- [Institution] retains all rights to its data

Pilot Implementation Plan

Even after thorough evaluation, start small:

Phase 1: Controlled Pilot (Months 1-3)

Scope: - 1-2 clinical units (ICU, primary care clinic, etc.) - 10-50 patients/day - Intensive monitoring

Metrics to Track: - Technical performance (AUC, sensitivity, specificity, PPV) - Alert burden (alerts/day, false positive rate) - User experience (satisfaction, time spent per alert, response rate) - Workflow impact (time added per patient) - Clinical outcomes (compare to baseline) - Equity impact (outcomes by race, ethnicity, SES)

Success Criteria (Define Before Pilot): - Technical: AUC ≥ [threshold], FP rate ≤ [threshold]% - User: Satisfaction ≥ [threshold]/5, Response rate ≥ [threshold]% - Outcome: [Primary outcome] improved by ≥ [threshold]% vs baseline - Equity: No performance disparities >10% across groups

Go/No-Go Decision: - ✅ Proceed to Phase 2 if ALL success criteria met - ⚠️ Iterate/adjust if some criteria met (address specific gaps) - ❌ Terminate if major criteria not met (don’t throw good money after bad)

Phase 2: Expanded Pilot (Months 4-9)

Scope: - 5-10 units - Continue intensive monitoring

Objectives: - Validate Phase 1 results at larger scale - Test in diverse clinical settings - Identify implementation challenges - Refine workflows

Phase 3: Full Deployment (Month 10+)

Scope: - Hospital-wide or system-wide

Requirements: - Phase 2 demonstrated sustained benefit - User training completed - Ongoing monitoring system in place - Regular re-auditing planned (quarterly)

Never skip the pilot.

Real-World Case Study: Using the Checklist

Example: Evaluating a Hypothetical Sepsis Prediction Tool

Vendor Claims: - “AI predicts sepsis 6 hours before clinical recognition” - “90% sensitivity, 85% specificity” - “Deployed in 200+ hospitals” - “$500K/year for hospital-wide license”

Your Evaluation Using This Checklist:

Domain 1: Technical Validation (Score: 4/10)

✅ Published in peer-reviewed journal
❌ Internal validation only (same hospital group)
❌ No independent external validation
❌ 90% sensitivity/85% specificity in paper, but what about external sites?
Red flag: “Deployed in 200+ hospitals” ≠ Evidence of effectiveness (Epic!)

Domain 2: Clinical Safety (Score: 3/10)

✅ Safety mentioned in paper
❌ No prospective outcome studies
❌ No data on whether deployment reduced mortality
❌ False positive rate not reported
❌ Alert burden unknown
Red flag: Only technical metrics (AUC), no patient outcomes

Domain 3: Fairness & Equity (Score: 2/10)

❌ No fairness audit mentioned
❌ No performance by race/ethnicity
❌ When asked, vendor says “we don’t use race as a feature, so it’s fair”
Red flag: Fairness through unawareness (doesn’t work!)

Domain 4: Privacy & Security (Score: 7/10)

✅ Will sign BAA
✅ SOC 2 Type II certified
✅ Data encrypted at rest and in transit
⚠️ Data stored in vendor cloud (not on-premise option)

Domain 5: Workflow Integration (Score: 5/10)

✅ Integrates with your EHR (Epic)
⚠️ Implementation takes 3-6 months
⚠️ Training: 2-hour online module
❌ No customization; one-size-fits-all

Domain 6: Business Viability (Score: 8/10)

✅ Established company, 7 years in business
✅ 200 customers (they say)
✅ Willing to provide 2 references
⚠️ Pricing seems high ($500K/year)

Overall Weighted Score: 4.6 / 10

Decision: Do NOT Deploy - Insufficient validation (internal only) - No outcome evidence (AUC is not enough) - No fairness testing (high bias risk) - Workflow concerns (alert burden unknown)

Recommendation to Leadership: > “We evaluated [Vendor]’s sepsis prediction tool using the AI Vendor Evaluation Framework. > The system scores 4.6/10, below our threshold for deployment. > > Key concerns: > - No external validation (validation only at vendor’s hospital group) > - No evidence of improved patient outcomes (only technical metrics reported) > - No fairness audit (risk of bias similar to Epic sepsis model) > > We recommend: > 1. Request external validation study at independent hospitals > 2. Request fairness audit with performance by race/ethnicity > 3. Pilot at 2-3 similar institutions before we consider > 4. Re-evaluate in 12 months if vendor addresses gaps > > Alternative: Invest in building our own sepsis prediction model using our data, > which would be tailored to our patient population and workflows.”

Summary: Key Principles for Vendor Evaluation

Trust, but verify - Vendor claims mean nothing without independent validation
External validation is non-negotiable - Internal validation always looks better than real-world performance
Outcomes > Accuracy - AUC doesn’t save lives; improved patient outcomes do
Fairness testing is mandatory - Bias is the default; fairness must be proven
Start small, scale slowly - Pilot → Evaluate → Scale only if successful
Negotiate strong contracts - Performance guarantees, termination rights, data control
You can say no - Bad AI is worse than no AI; don’t deploy systems that aren’t ready

The most important lesson: You are not obligated to buy a product just because it has “AI” in the name. Demand evidence. Ask hard questions. Walk away if the evidence isn’t there.

Your patients deserve better than unvalidated AI systems.

Additional Resources

Appendix E (The AI Morgue): Detailed failure case studies showing what goes wrong
AHRQ Health IT Safety Toolkit: https://www.ahrq.gov/patient-safety/resources/hitsafety/index.html
FDA AI/ML Medical Device Guidance: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices
WHO Ethics & Governance of AI for Health: https://www.who.int/publications/i/item/9789240029200

Remember: The best AI system is one that improves patient outcomes, operates fairly, respects privacy, integrates into workflows, and has strong evidence supporting its use. Don’t settle for less.

Appendix G: AI Vendor Evaluation Checklist

Introduction: Why You Need This Checklist

The Vendor-Buyer Information Asymmetry

Quick Reference: The 6-Domain Evaluation Framework

Domain 1: Technical Validation 🔍

The Questions to Ask

Red Flags 🚩

Scoring Rubric

Domain 2: Clinical Safety 🏥

The Questions to Ask

Red Flags 🚩

Lessons from Epic Sepsis Model

Scoring Rubric

Domain 3: Fairness & Equity ⚖️

The Questions to Ask

Red Flags 🚩

Lessons from OPTUM Algorithmic Bias

Scoring Rubric

Domain 4: Privacy & Security 🔒

The Questions to Ask

Red Flags 🚩

Lessons from DeepMind & TraceTogether

Scoring Rubric

Domain 5: Workflow Integration 🔄

The Questions to Ask

Red Flags 🚩

Lessons from Google Health India

Scoring Rubric

Domain 6: Business Viability 💼

The Questions to Ask

Red Flags 🚩

Lessons from IBM Watson for Oncology

Scoring Rubric

Putting It All Together: The Overall Evaluation Matrix

Decision Framework

Sample Questions for Vendor Meetings

Technical Validation Questions

Fairness Questions

Privacy Questions

Safety Questions

Workflow Integration Questions

Procurement Contract Language Recommendations

1. Performance Guarantees

2. Fairness Requirements

3. Data Privacy & Security

4. Validation & Monitoring

5. Liability & Indemnification

6. Termination Rights

Pilot Implementation Plan

Phase 1: Controlled Pilot (Months 1-3)

Phase 2: Expanded Pilot (Months 4-9)

Phase 3: Full Deployment (Month 10+)

Real-World Case Study: Using the Checklist

Example: Evaluating a Hypothetical Sepsis Prediction Tool

Domain 1: Technical Validation (Score: 4/10)

Domain 2: Clinical Safety (Score: 3/10)

Domain 3: Fairness & Equity (Score: 2/10)

Domain 4: Privacy & Security (Score: 7/10)

Domain 5: Workflow Integration (Score: 5/10)

Domain 6: Business Viability (Score: 8/10)

Summary: Key Principles for Vendor Evaluation

Additional Resources