Appendix E — Code Repository Guide
Appendix C: Code Repository Guide
This guide helps you navigate the companion code repository, run examples, and adapt code for your own projects.
Repository Overview
Repository URL: https://github.com/your-org/ai-public-health-code
Structure:
ai-public-health-code/
├── README.md
├── requirements.txt
├── environment.yml
├── setup.py
├── data/
│ ├── sample/ # Sample datasets for learning
│ ├── real/ # Links to real public datasets
│ └── synthetic/ # Synthetic data generators
├── notebooks/
│ ├── chapter01/ # Jupyter notebooks by chapter
│ ├── chapter02/
│ └── ...
├── src/
│ ├── preprocessing/ # Data preprocessing utilities
│ ├── models/ # Model implementations
│ ├── evaluation/ # Evaluation metrics and tools
│ ├── fairness/ # Fairness assessment tools
│ ├── explainability/ # XAI tools (LIME, SHAP, etc.)
│ └── deployment/ # Deployment utilities
├── tests/
│ └── ... # Unit tests
├── examples/
│ ├── disease_surveillance/
│ ├── outbreak_prediction/
│ ├── diagnostic_ai/
│ └── resource_allocation/
└── docs/
├── api/ # API documentation
└── tutorials/ # Step-by-step tutorials
Getting Started
Prerequisites
Required: - Python 3.8 or higher - pip or conda package manager - 8GB RAM minimum (16GB recommended) - Jupyter Notebook or JupyterLab
Optional: - GPU (NVIDIA with CUDA for deep learning) - Docker (for containerized deployment)
Installation
Option 1: Using pip (recommended for most users)
# Clone repository
git clone https://github.com/your-org/ai-public-health-code.git
cd ai-public-health-code
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install package in development mode
pip install -e .
# Verify installation
python -c "import ai_public_health; print('Installation successful!')"Option 2: Using conda (recommended for data scientists)
# Clone repository
git clone https://github.com/your-org/ai-public-health-code.git
cd ai-public-health-code
# Create conda environment
conda env create -f environment.yml
# Activate environment
conda activate ai-public-health
# Install package in development mode
pip install -e .
# Verify installation
python -c "import ai_public_health; print('Installation successful!')"Option 3: Using Docker (recommended for deployment)
# Pull Docker image
docker pull your-org/ai-public-health:latest
# Run container
docker run -it -p 8888:8888 your-org/ai-public-health:latest
# Jupyter will be available at http://localhost:8888Quick Start
# Launch Jupyter
jupyter notebook
# Navigate to notebooks/chapter01/
# Open "01_introduction.ipynb"
# Run all cellsRepository Components
1. Data Directory
Sample Datasets
data/sample/ contains small datasets for learning:
# Load sample disease surveillance data
from ai_public_health.data import load_sample_data
# Flu surveillance (synthetic)
flu_data = load_sample_data('flu_surveillance')
print(flu_data.head())
# Output: week, state, flu_cases, population, ...
# Outbreak dataset (synthetic)
outbreak_data = load_sample_data('outbreak_simulation')
# Contains: location, day, cases, interventions, ...
# Clinical dataset (MIMIC-III subset, de-identified)
clinical_data = load_sample_data('clinical_demo')
# Contains: demographics, vitals, labs, outcomesAvailable sample datasets: - flu_surveillance - Weekly flu cases by state (5 years) - outbreak_simulation - Simulated disease outbreak - clinical_demo - ICU patient data (synthetic) - health_equity - Synthetic data for fairness analysis - vaccination_coverage - Vaccination rates by county
Real Public Datasets
data/real/ contains download scripts and links:
# Download real public datasets
from ai_public_health.data import download_dataset
# CDC FluView data
download_dataset('cdc_fluview', years=[2018, 2019, 2020])
# COVID-19 data from Johns Hopkins
download_dataset('jhu_covid19', start_date='2020-01-01')
# MIMIC-III (requires credentialing)
# See data/real/mimic/README.md for access instructionsSynthetic Data Generators
# Generate synthetic data for development/testing
from ai_public_health.data import generators
# Generate synthetic outbreak
outbreak = generators.generate_outbreak(
n_locations=100,
n_days=180,
outbreak_start=30,
r0=2.5,
intervention_day=60
)
# Generate synthetic clinical data
patients = generators.generate_clinical_data(
n_patients=1000,
include_outcomes=['mortality', 'los', 'readmission'],
demographic_distribution='us_census'
)
# Generate with built-in fairness issues (for testing fairness tools)
biased_data = generators.generate_clinical_data(
n_patients=1000,
inject_bias={
'race': {'effect_size': 0.3, 'type': 'outcome_disparity'},
'sex': {'effect_size': 0.15, 'type': 'feature_correlation'}
}
)2. Notebooks
Organized by chapter, each with theory + practice:
notebooks/
├── chapter01_introduction/
│ ├── 01_python_setup.ipynb
│ ├── 02_data_loading.ipynb
│ └── 03_first_ml_model.ipynb
├── chapter02_foundations/
│ ├── 01_supervised_learning.ipynb
│ ├── 02_evaluation_metrics.ipynb
│ └── 03_crossvalidation.ipynb
├── chapter03_disease_surveillance/
│ ├── 01_time_series_basics.ipynb
│ ├── 02_outbreak_detection.ipynb
│ └── 03_forecasting_models.ipynb
...
Notebook Features: - ✅ Step-by-step explanations - ✅ Executable code cells - ✅ Visualizations - ✅ Exercises with solutions - ✅ Links to relevant textbook sections
Example Notebook Structure:
# ===============================================
# Chapter 3: Disease Surveillance
# Notebook 2: Outbreak Detection
# ===============================================
# Learning Objectives:
# 1. Implement EARS algorithm
# 2. Apply CUSUM for anomaly detection
# 3. Evaluate outbreak detection performance
# %% Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ai_public_health.surveillance import EARS, CUSUM
# %% Load Data
data = pd.read_csv('../data/sample/flu_surveillance.csv')
# %% [THEORY] Outbreak Detection Methods
"""
Outbreak detection identifies when disease cases exceed expected levels.
Common methods:
1. EARS (Early Aberration Reporting System)
2. CUSUM (Cumulative Sum Control Chart)
3. Farrington algorithm
We'll implement and compare these methods.
"""
# %% [CODE] Implement EARS
ears = EARS(window=7, threshold=3)
alerts = ears.detect(data['cases'])
# Visualize
plt.figure(figsize=(12, 4))
plt.plot(data['week'], data['cases'], label='Cases')
plt.scatter(data['week'][alerts], data['cases'][alerts],
color='red', label='Alerts', zorder=5)
plt.xlabel('Week')
plt.ylabel('Cases')
plt.legend()
plt.title('EARS Outbreak Detection')
plt.show()
# %% [EXERCISE] Apply CUSUM
"""
EXERCISE: Implement CUSUM outbreak detection
- Use threshold h=5
- Compare sensitivity/specificity to EARS
- Plot results
SOLUTION: (click to reveal)
"""
# %% [ADVANCED] Real-time Deployment
"""
For production deployment, see:
- src/deployment/outbreak_detector.py
- examples/disease_surveillance/real_time_system.py
"""3. Source Code (src/)
Modular, reusable implementations:
Preprocessing
# ai_public_health/preprocessing/
from ai_public_health.preprocessing import (
clean_surveillance_data,
handle_missing_values,
engineer_features,
prepare_time_series
)
# Clean surveillance data
cleaned = clean_surveillance_data(
raw_data,
remove_outliers=True,
impute_missing=True,
standardize_location_names=True
)
# Feature engineering for disease prediction
features = engineer_features(
clinical_data,
feature_sets=['demographics', 'vitals', 'labs'],
interaction_terms=True,
temporal_features=True
)Models
# ai_public_health/models/
from ai_public_health.models import (
OutbreakDetector,
DiseaseForecaster,
ClinicalRiskPredictor,
ResourceAllocator
)
# Example: Clinical risk prediction
risk_model = ClinicalRiskPredictor(
model_type='xgboost',
outcome='mortality',
hyperparameters={'max_depth': 5, 'n_estimators': 100}
)
risk_model.fit(X_train, y_train)
predictions = risk_model.predict_proba(X_test)
# Built-in evaluation
metrics = risk_model.evaluate(X_test, y_test)
print(f"AUC: {metrics['auc']:.3f}")Evaluation
# ai_public_health/evaluation/
from ai_public_health.evaluation import (
compute_classification_metrics,
calibration_analysis,
bootstrap_confidence_intervals,
temporal_validation
)
# Comprehensive evaluation
evaluation = compute_classification_metrics(
y_true=y_test,
y_pred=predictions,
y_proba=probabilities,
metrics=['accuracy', 'sensitivity', 'specificity',
'ppv', 'npv', 'auc', 'brier_score']
)
# Calibration analysis
calibration = calibration_analysis(y_true, y_proba)
calibration.plot() # Calibration curve
# Bootstrap CIs
ci = bootstrap_confidence_intervals(
y_true, y_proba,
metric='auc',
n_bootstrap=1000,
confidence_level=0.95
)
print(f"AUC: {ci['estimate']:.3f} (95% CI: {ci['ci_lower']:.3f}-{ci['ci_upper']:.3f})")Fairness
# ai_public_health/fairness/
from ai_public_health.fairness import (
FairnessAuditor,
compute_fairness_metrics,
mitigate_bias
)
# Audit model for fairness
auditor = FairnessAuditor(
model=trained_model,
sensitive_attributes=['race', 'sex', 'age_group']
)
fairness_report = auditor.audit(
X_test,
y_test,
metrics=['demographic_parity', 'equalized_odds', 'equal_opportunity']
)
# Visualize disparities
fairness_report.plot_disparities()
# Mitigation
mitigated_model = mitigate_bias(
model=trained_model,
X_train=X_train,
y_train=y_train,
sensitive_features=sensitive_features,
constraint='equalized_odds'
)Explainability
# ai_public_health/explainability/
from ai_public_health.explainability import (
explain_prediction,
global_feature_importance,
generate_counterfactuals
)
# Explain individual prediction
explanation = explain_prediction(
model=trained_model,
instance=patient_data,
method='shap' # or 'lime', 'integrated_gradients'
)
explanation.plot()
# Global feature importance
importance = global_feature_importance(
model=trained_model,
X=X_train,
method='permutation'
)
importance.plot_top_k(k=20)
# Generate counterfactuals
counterfactuals = generate_counterfactuals(
model=trained_model,
instance=patient_data,
desired_outcome=0, # Change from positive to negative prediction
n_counterfactuals=5
)
print("To change prediction, modify:")
for cf in counterfactuals:
print(f" {cf['feature']}: {cf['original_value']} → {cf['counterfactual_value']}")4. Examples
End-to-end implementations:
Disease Surveillance System
# examples/disease_surveillance/real_time_system.py
"""
Real-time disease surveillance system
Demonstrates:
- Data ingestion from multiple sources
- Outbreak detection algorithms
- Alert generation
- Dashboard visualization
"""
from ai_public_health.surveillance import SurveillanceSystem
# Initialize system
system = SurveillanceSystem(
data_sources=['cdc_fluview', 'google_trends', 'twitter'],
detection_methods=['ears', 'cusum', 'farrington'],
alert_thresholds={'ears': 3, 'cusum': 5},
notification_channels=['email', 'sms', 'dashboard']
)
# Run monitoring loop
system.start_monitoring(interval_minutes=60)
# Access real-time dashboard at http://localhost:5000Diagnostic AI Deployment
# examples/diagnostic_ai/chest_xray_classifier.py
"""
Chest X-ray pneumonia classifier
Demonstrates:
- Model loading and inference
- Image preprocessing
- Explainable predictions
- Clinical integration
"""
from ai_public_health.diagnostic import ChestXRayClassifier
# Load pre-trained model
classifier = ChestXRayClassifier(
model_path='models/chest_xray_pneumonia.h5',
model_type='densenet121',
input_size=(224, 224)
)
# Classify X-ray
result = classifier.predict(
image_path='data/xray_001.jpg',
return_explanation=True,
return_confidence=True
)
print(f"Prediction: {result['label']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Explanation: See heatmap at {result['explanation_path']}")
# Generate radiology report
report = classifier.generate_report(result)
print(report)Running Examples
Example 1: Flu Forecasting
From scratch in 5 minutes:
# Navigate to notebook
cd notebooks/chapter03_disease_surveillance/
# Launch Jupyter
jupyter notebook 03_forecasting_models.ipynb
# Or run as script
python ../../examples/disease_surveillance/flu_forecast.pyWhat it does: 1. Loads CDC FluView data 2. Trains ARIMA, LSTM, XGBoost models 3. Generates 4-week-ahead forecasts 4. Evaluates forecast accuracy 5. Creates visualization
Output:
Flu Forecasting Results
=======================
ARIMA MAE: 2,450 cases
LSTM MAE: 2,180 cases
XGBoost MAE: 1,920 cases ← Best
Forecast for next 4 weeks:
Week 1: 15,230 cases (95% CI: 12,500-18,200)
Week 2: 17,830 cases (95% CI: 14,100-22,100)
Week 3: 19,450 cases (95% CI: 14,800-25,600)
Week 4: 18,920 cases (95% CI: 13,200-26,800)
Visualization saved to: outputs/flu_forecast.png
Example 2: Fairness Audit
# Run fairness audit on sepsis prediction model
cd examples/fairness/
python audit_sepsis_model.py \
--model models/sepsis_predictor.pkl \
--data data/test_set.csv \
--sensitive-attrs race sex age_group \
--output-dir outputs/fairness_audit/Output:
Fairness Audit Report
====================
Overall Performance:
- AUC: 0.82
- Sensitivity: 0.78
- Specificity: 0.75
Performance by Race:
- White: AUC 0.84, Sens 0.80, Spec 0.77
- Black: AUC 0.79, Sens 0.74, Spec 0.72 ⚠
- Hispanic: AUC 0.81, Sens 0.77, Spec 0.74
- Asian: AUC 0.83, Sens 0.79, Spec 0.76
Fairness Metrics:
- Demographic Parity: 0.12 (threshold: 0.10) ⚠ FAILED
- Equalized Odds: 0.08 (threshold: 0.10) ✓ PASSED
- Equal Opportunity: 0.06 (threshold: 0.10) ✓ PASSED
Recommendations:
1. Investigate lower sensitivity for Black patients
2. Consider rebalancing training data
3. Review for proxy discrimination
4. Monitor performance over time
Full report: outputs/fairness_audit/report.html
Example 3: Model Deployment
# Deploy model as REST API
cd examples/deployment/
# Start API server
python serve_model.py \
--model models/sepsis_predictor.pkl \
--port 5000 \
--workers 4
# API available at http://localhost:5000
# Test API
curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{
"age": 65,
"heart_rate": 110,
"temperature": 38.5,
"lactate": 3.2
}'
# Response:
# {
# "sepsis_risk": 0.72,
# "confidence": "high",
# "recommendation": "Consider sepsis protocol",
# "important_factors": ["lactate", "heart_rate", "age"]
# }Testing
Run test suite:
# Run all tests
pytest
# Run specific test module
pytest tests/test_surveillance.py
# Run with coverage
pytest --cov=ai_public_health --cov-report=html
# View coverage report
open htmlcov/index.htmlWrite your own tests:
# tests/test_my_model.py
import pytest
from ai_public_health.models import MyModel
def test_model_initialization():
model = MyModel()
assert model is not None
def test_model_training():
model = MyModel()
model.fit(X_train, y_train)
assert model.is_fitted
def test_model_prediction():
model = MyModel()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
assert len(predictions) == len(X_test)
assert all(0 <= p <= 1 for p in predictions)Contributing Code
We welcome contributions! Here’s how:
1. Setup Development Environment
# Fork repository on GitHub
# Clone your fork
git clone https://github.com/YOUR-USERNAME/ai-public-health-code.git
# Add upstream remote
git remote add upstream https://github.com/your-org/ai-public-health-code.git
# Create development branch
git checkout -b feature/my-new-feature
# Install development dependencies
pip install -r requirements-dev.txt2. Code Style
We follow PEP 8 with these tools:
# Format code
black src/
# Check style
flake8 src/
# Type checking
mypy src/
# All checks
make lint3. Documentation
def my_function(param1: int, param2: str) -> float:
"""
Brief description of function.
Detailed description explaining what the function does,
edge cases, and any important considerations.
Args:
param1: Description of param1
param2: Description of param2
Returns:
Description of return value
Raises:
ValueError: When param1 is negative
Example:
>>> result = my_function(5, "test")
>>> print(result)
42.0
"""
if param1 < 0:
raise ValueError("param1 must be non-negative")
# Implementation
return param1 * len(param2) / 2.04. Submit Pull Request
# Commit changes
git add .
git commit -m "Add feature: brief description"
# Push to your fork
git push origin feature/my-new-feature
# Open Pull Request on GitHub
# Fill out PR template
# Wait for reviewTroubleshooting
Common Issues
1. Import Error: ModuleNotFoundError
# Solution: Install package in development mode
pip install -e .2. GPU not detected
# Check CUDA installation
python -c "import torch; print(torch.cuda.is_available())"
# If False, install CUDA-enabled PyTorch
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu1183. Out of Memory (OOM)
# Reduce batch size
model.fit(X_train, y_train, batch_size=16) # Instead of 32 or 64
# Use gradient accumulation
model.fit(X_train, y_train,
batch_size=16,
accumulation_steps=2) # Effective batch size = 32
# Clear memory
import gc
gc.collect()
torch.cuda.empty_cache()4. Slow Training
# Use GPU if available
model.fit(X_train, y_train, device='cuda')
# Use mixed precision
model.fit(X_train, y_train, mixed_precision=True)
# Reduce model complexity
model = MyModel(layers=[64, 32]) # Instead of [256, 128, 64]Getting Help
- Documentation: https://ai-public-health-code.readthedocs.io
- Issues: https://github.com/your-org/ai-public-health-code/issues
- Discussions: https://github.com/your-org/ai-public-health-code/discussions
- Email: support@ai-public-health.org
Additional Resources
Tutorials
Video Tutorials: YouTube playlist with step-by-step walkthroughs
Interactive Tutorials: Google Colab notebooks (no installation required)
API Documentation
Full API docs: https://ai-public-health-code.readthedocs.io/api/
Quick Reference:
# Import main modules
from ai_public_health import (
data, # Data loading and preprocessing
models, # ML models
evaluation, # Evaluation metrics
fairness, # Fairness auditing
explainability, # XAI tools
deployment # Deployment utilities
)
# See docstrings
help(models.ClinicalRiskPredictor)Example Datasets
All example datasets documented at: https://ai-public-health-code.readthedocs.io/datasets/
Citation
If you use this code in your research, please cite:
@software{ai_public_health_code,
title = {AI in Public Health: Code Repository},
author = {Your Name and Contributors},
year = {2024},
url = {https://github.com/your-org/ai-public-health-code},
version = {1.0.0}
}Ready to start coding? Open notebooks/chapter01/01_python_setup.ipynb and follow along!