Appendix D — Code Repository Guide
Appendix C: Code Repository Guide
This guide helps you navigate the companion code repository, run examples, and adapt code for your own projects.
Repository Overview
Repository URL: https://github.com/your-org/ai-public-health-code
Structure:
ai-public-health-code/
├── README.md
├── requirements.txt
├── environment.yml
├── setup.py
├── data/
│ ├── sample/ # Sample datasets for learning
│ ├── real/ # Links to real public datasets
│ └── synthetic/ # Synthetic data generators
├── notebooks/
│ ├── chapter01/ # Jupyter notebooks by chapter
│ ├── chapter02/
│ └── ...
├── src/
│ ├── preprocessing/ # Data preprocessing utilities
│ ├── models/ # Model implementations
│ ├── evaluation/ # Evaluation metrics and tools
│ ├── fairness/ # Fairness assessment tools
│ ├── explainability/ # XAI tools (LIME, SHAP, etc.)
│ └── deployment/ # Deployment utilities
├── tests/
│ └── ... # Unit tests
├── examples/
│ ├── disease_surveillance/
│ ├── outbreak_prediction/
│ ├── diagnostic_ai/
│ └── resource_allocation/
└── docs/
├── api/ # API documentation
└── tutorials/ # Step-by-step tutorials
Getting Started
Prerequisites
Required: - Python 3.8 or higher - pip or conda package manager - 8GB RAM minimum (16GB recommended) - Jupyter Notebook or JupyterLab
Optional: - GPU (NVIDIA with CUDA for deep learning) - Docker (for containerized deployment)
Installation
Option 1: Using pip (recommended for most users)
# Clone repository
git clone https://github.com/your-org/ai-public-health-code.git
cd ai-public-health-code
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install package in development mode
pip install -e .
# Verify installation
python -c "import ai_public_health; print('Installation successful!')"
Option 2: Using conda (recommended for data scientists)
# Clone repository
git clone https://github.com/your-org/ai-public-health-code.git
cd ai-public-health-code
# Create conda environment
conda env create -f environment.yml
# Activate environment
conda activate ai-public-health
# Install package in development mode
pip install -e .
# Verify installation
python -c "import ai_public_health; print('Installation successful!')"
Option 3: Using Docker (recommended for deployment)
# Pull Docker image
docker pull your-org/ai-public-health:latest
# Run container
docker run -it -p 8888:8888 your-org/ai-public-health:latest
# Jupyter will be available at http://localhost:8888
Quick Start
# Launch Jupyter
jupyter notebook
# Navigate to notebooks/chapter01/
# Open "01_introduction.ipynb"
# Run all cells
Repository Components
1. Data Directory
Sample Datasets
data/sample/
contains small datasets for learning:
# Load sample disease surveillance data
from ai_public_health.data import load_sample_data
# Flu surveillance (synthetic)
= load_sample_data('flu_surveillance')
flu_data print(flu_data.head())
# Output: week, state, flu_cases, population, ...
# Outbreak dataset (synthetic)
= load_sample_data('outbreak_simulation')
outbreak_data # Contains: location, day, cases, interventions, ...
# Clinical dataset (MIMIC-III subset, de-identified)
= load_sample_data('clinical_demo')
clinical_data # Contains: demographics, vitals, labs, outcomes
Available sample datasets: - flu_surveillance
- Weekly flu cases by state (5 years) - outbreak_simulation
- Simulated disease outbreak - clinical_demo
- ICU patient data (synthetic) - health_equity
- Synthetic data for fairness analysis - vaccination_coverage
- Vaccination rates by county
Real Public Datasets
data/real/
contains download scripts and links:
# Download real public datasets
from ai_public_health.data import download_dataset
# CDC FluView data
'cdc_fluview', years=[2018, 2019, 2020])
download_dataset(
# COVID-19 data from Johns Hopkins
'jhu_covid19', start_date='2020-01-01')
download_dataset(
# MIMIC-III (requires credentialing)
# See data/real/mimic/README.md for access instructions
Synthetic Data Generators
# Generate synthetic data for development/testing
from ai_public_health.data import generators
# Generate synthetic outbreak
= generators.generate_outbreak(
outbreak =100,
n_locations=180,
n_days=30,
outbreak_start=2.5,
r0=60
intervention_day
)
# Generate synthetic clinical data
= generators.generate_clinical_data(
patients =1000,
n_patients=['mortality', 'los', 'readmission'],
include_outcomes='us_census'
demographic_distribution
)
# Generate with built-in fairness issues (for testing fairness tools)
= generators.generate_clinical_data(
biased_data =1000,
n_patients={
inject_bias'race': {'effect_size': 0.3, 'type': 'outcome_disparity'},
'sex': {'effect_size': 0.15, 'type': 'feature_correlation'}
} )
2. Notebooks
Organized by chapter, each with theory + practice:
notebooks/
├── chapter01_introduction/
│ ├── 01_python_setup.ipynb
│ ├── 02_data_loading.ipynb
│ └── 03_first_ml_model.ipynb
├── chapter02_foundations/
│ ├── 01_supervised_learning.ipynb
│ ├── 02_evaluation_metrics.ipynb
│ └── 03_crossvalidation.ipynb
├── chapter03_disease_surveillance/
│ ├── 01_time_series_basics.ipynb
│ ├── 02_outbreak_detection.ipynb
│ └── 03_forecasting_models.ipynb
...
Notebook Features: - ✅ Step-by-step explanations - ✅ Executable code cells - ✅ Visualizations - ✅ Exercises with solutions - ✅ Links to relevant textbook sections
Example Notebook Structure:
# ===============================================
# Chapter 3: Disease Surveillance
# Notebook 2: Outbreak Detection
# ===============================================
# Learning Objectives:
# 1. Implement EARS algorithm
# 2. Apply CUSUM for anomaly detection
# 3. Evaluate outbreak detection performance
# %% Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ai_public_health.surveillance import EARS, CUSUM
# %% Load Data
= pd.read_csv('../data/sample/flu_surveillance.csv')
data
# %% [THEORY] Outbreak Detection Methods
"""
Outbreak detection identifies when disease cases exceed expected levels.
Common methods:
1. EARS (Early Aberration Reporting System)
2. CUSUM (Cumulative Sum Control Chart)
3. Farrington algorithm
We'll implement and compare these methods.
"""
# %% [CODE] Implement EARS
= EARS(window=7, threshold=3)
ears = ears.detect(data['cases'])
alerts
# Visualize
=(12, 4))
plt.figure(figsize'week'], data['cases'], label='Cases')
plt.plot(data['week'][alerts], data['cases'][alerts],
plt.scatter(data[='red', label='Alerts', zorder=5)
color'Week')
plt.xlabel('Cases')
plt.ylabel(
plt.legend()'EARS Outbreak Detection')
plt.title(
plt.show()
# %% [EXERCISE] Apply CUSUM
"""
EXERCISE: Implement CUSUM outbreak detection
- Use threshold h=5
- Compare sensitivity/specificity to EARS
- Plot results
SOLUTION: (click to reveal)
"""
# %% [ADVANCED] Real-time Deployment
"""
For production deployment, see:
- src/deployment/outbreak_detector.py
- examples/disease_surveillance/real_time_system.py
"""
3. Source Code (src/
)
Modular, reusable implementations:
Preprocessing
# ai_public_health/preprocessing/
from ai_public_health.preprocessing import (
clean_surveillance_data,
handle_missing_values,
engineer_features,
prepare_time_series
)
# Clean surveillance data
= clean_surveillance_data(
cleaned
raw_data,=True,
remove_outliers=True,
impute_missing=True
standardize_location_names
)
# Feature engineering for disease prediction
= engineer_features(
features
clinical_data,=['demographics', 'vitals', 'labs'],
feature_sets=True,
interaction_terms=True
temporal_features )
Models
# ai_public_health/models/
from ai_public_health.models import (
OutbreakDetector,
DiseaseForecaster,
ClinicalRiskPredictor,
ResourceAllocator
)
# Example: Clinical risk prediction
= ClinicalRiskPredictor(
risk_model ='xgboost',
model_type='mortality',
outcome={'max_depth': 5, 'n_estimators': 100}
hyperparameters
)
risk_model.fit(X_train, y_train)= risk_model.predict_proba(X_test)
predictions
# Built-in evaluation
= risk_model.evaluate(X_test, y_test)
metrics print(f"AUC: {metrics['auc']:.3f}")
Evaluation
# ai_public_health/evaluation/
from ai_public_health.evaluation import (
compute_classification_metrics,
calibration_analysis,
bootstrap_confidence_intervals,
temporal_validation
)
# Comprehensive evaluation
= compute_classification_metrics(
evaluation =y_test,
y_true=predictions,
y_pred=probabilities,
y_proba=['accuracy', 'sensitivity', 'specificity',
metrics'ppv', 'npv', 'auc', 'brier_score']
)
# Calibration analysis
= calibration_analysis(y_true, y_proba)
calibration # Calibration curve
calibration.plot()
# Bootstrap CIs
= bootstrap_confidence_intervals(
ci
y_true, y_proba,='auc',
metric=1000,
n_bootstrap=0.95
confidence_level
)print(f"AUC: {ci['estimate']:.3f} (95% CI: {ci['ci_lower']:.3f}-{ci['ci_upper']:.3f})")
Fairness
# ai_public_health/fairness/
from ai_public_health.fairness import (
FairnessAuditor,
compute_fairness_metrics,
mitigate_bias
)
# Audit model for fairness
= FairnessAuditor(
auditor =trained_model,
model=['race', 'sex', 'age_group']
sensitive_attributes
)
= auditor.audit(
fairness_report
X_test,
y_test,=['demographic_parity', 'equalized_odds', 'equal_opportunity']
metrics
)
# Visualize disparities
fairness_report.plot_disparities()
# Mitigation
= mitigate_bias(
mitigated_model =trained_model,
model=X_train,
X_train=y_train,
y_train=sensitive_features,
sensitive_features='equalized_odds'
constraint )
Explainability
# ai_public_health/explainability/
from ai_public_health.explainability import (
explain_prediction,
global_feature_importance,
generate_counterfactuals
)
# Explain individual prediction
= explain_prediction(
explanation =trained_model,
model=patient_data,
instance='shap' # or 'lime', 'integrated_gradients'
method
)
explanation.plot()
# Global feature importance
= global_feature_importance(
importance =trained_model,
model=X_train,
X='permutation'
method
)
=20)
importance.plot_top_k(k
# Generate counterfactuals
= generate_counterfactuals(
counterfactuals =trained_model,
model=patient_data,
instance=0, # Change from positive to negative prediction
desired_outcome=5
n_counterfactuals
)
print("To change prediction, modify:")
for cf in counterfactuals:
print(f" {cf['feature']}: {cf['original_value']} → {cf['counterfactual_value']}")
4. Examples
End-to-end implementations:
Disease Surveillance System
# examples/disease_surveillance/real_time_system.py
"""
Real-time disease surveillance system
Demonstrates:
- Data ingestion from multiple sources
- Outbreak detection algorithms
- Alert generation
- Dashboard visualization
"""
from ai_public_health.surveillance import SurveillanceSystem
# Initialize system
= SurveillanceSystem(
system =['cdc_fluview', 'google_trends', 'twitter'],
data_sources=['ears', 'cusum', 'farrington'],
detection_methods={'ears': 3, 'cusum': 5},
alert_thresholds=['email', 'sms', 'dashboard']
notification_channels
)
# Run monitoring loop
=60)
system.start_monitoring(interval_minutes
# Access real-time dashboard at http://localhost:5000
Diagnostic AI Deployment
# examples/diagnostic_ai/chest_xray_classifier.py
"""
Chest X-ray pneumonia classifier
Demonstrates:
- Model loading and inference
- Image preprocessing
- Explainable predictions
- Clinical integration
"""
from ai_public_health.diagnostic import ChestXRayClassifier
# Load pre-trained model
= ChestXRayClassifier(
classifier ='models/chest_xray_pneumonia.h5',
model_path='densenet121',
model_type=(224, 224)
input_size
)
# Classify X-ray
= classifier.predict(
result ='data/xray_001.jpg',
image_path=True,
return_explanation=True
return_confidence
)
print(f"Prediction: {result['label']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Explanation: See heatmap at {result['explanation_path']}")
# Generate radiology report
= classifier.generate_report(result)
report print(report)
Running Examples
Example 1: Flu Forecasting
From scratch in 5 minutes:
# Navigate to notebook
cd notebooks/chapter03_disease_surveillance/
# Launch Jupyter
jupyter notebook 03_forecasting_models.ipynb
# Or run as script
python ../../examples/disease_surveillance/flu_forecast.py
What it does: 1. Loads CDC FluView data 2. Trains ARIMA, LSTM, XGBoost models 3. Generates 4-week-ahead forecasts 4. Evaluates forecast accuracy 5. Creates visualization
Output:
Flu Forecasting Results
=======================
ARIMA MAE: 2,450 cases
LSTM MAE: 2,180 cases
XGBoost MAE: 1,920 cases ← Best
Forecast for next 4 weeks:
Week 1: 15,230 cases (95% CI: 12,500-18,200)
Week 2: 17,830 cases (95% CI: 14,100-22,100)
Week 3: 19,450 cases (95% CI: 14,800-25,600)
Week 4: 18,920 cases (95% CI: 13,200-26,800)
Visualization saved to: outputs/flu_forecast.png
Example 2: Fairness Audit
# Run fairness audit on sepsis prediction model
cd examples/fairness/
python audit_sepsis_model.py \
--model models/sepsis_predictor.pkl \
--data data/test_set.csv \
--sensitive-attrs race sex age_group \
--output-dir outputs/fairness_audit/
Output:
Fairness Audit Report
====================
Overall Performance:
- AUC: 0.82
- Sensitivity: 0.78
- Specificity: 0.75
Performance by Race:
- White: AUC 0.84, Sens 0.80, Spec 0.77
- Black: AUC 0.79, Sens 0.74, Spec 0.72 ⚠
- Hispanic: AUC 0.81, Sens 0.77, Spec 0.74
- Asian: AUC 0.83, Sens 0.79, Spec 0.76
Fairness Metrics:
- Demographic Parity: 0.12 (threshold: 0.10) ⚠ FAILED
- Equalized Odds: 0.08 (threshold: 0.10) ✓ PASSED
- Equal Opportunity: 0.06 (threshold: 0.10) ✓ PASSED
Recommendations:
1. Investigate lower sensitivity for Black patients
2. Consider rebalancing training data
3. Review for proxy discrimination
4. Monitor performance over time
Full report: outputs/fairness_audit/report.html
Example 3: Model Deployment
# Deploy model as REST API
cd examples/deployment/
# Start API server
python serve_model.py \
--model models/sepsis_predictor.pkl \
--port 5000 \
--workers 4
# API available at http://localhost:5000
# Test API
curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{
"age": 65,
"heart_rate": 110,
"temperature": 38.5,
"lactate": 3.2
}'
# Response:
# {
# "sepsis_risk": 0.72,
# "confidence": "high",
# "recommendation": "Consider sepsis protocol",
# "important_factors": ["lactate", "heart_rate", "age"]
# }
Testing
Run test suite:
# Run all tests
pytest
# Run specific test module
pytest tests/test_surveillance.py
# Run with coverage
pytest --cov=ai_public_health --cov-report=html
# View coverage report
open htmlcov/index.html
Write your own tests:
# tests/test_my_model.py
import pytest
from ai_public_health.models import MyModel
def test_model_initialization():
= MyModel()
model assert model is not None
def test_model_training():
= MyModel()
model
model.fit(X_train, y_train)assert model.is_fitted
def test_model_prediction():
= MyModel()
model
model.fit(X_train, y_train)= model.predict(X_test)
predictions assert len(predictions) == len(X_test)
assert all(0 <= p <= 1 for p in predictions)
Contributing Code
We welcome contributions! Here’s how:
1. Setup Development Environment
# Fork repository on GitHub
# Clone your fork
git clone https://github.com/YOUR-USERNAME/ai-public-health-code.git
# Add upstream remote
git remote add upstream https://github.com/your-org/ai-public-health-code.git
# Create development branch
git checkout -b feature/my-new-feature
# Install development dependencies
pip install -r requirements-dev.txt
2. Code Style
We follow PEP 8 with these tools:
# Format code
black src/
# Check style
flake8 src/
# Type checking
mypy src/
# All checks
make lint
3. Documentation
def my_function(param1: int, param2: str) -> float:
"""
Brief description of function.
Detailed description explaining what the function does,
edge cases, and any important considerations.
Args:
param1: Description of param1
param2: Description of param2
Returns:
Description of return value
Raises:
ValueError: When param1 is negative
Example:
>>> result = my_function(5, "test")
>>> print(result)
42.0
"""
if param1 < 0:
raise ValueError("param1 must be non-negative")
# Implementation
return param1 * len(param2) / 2.0
4. Submit Pull Request
# Commit changes
git add .
git commit -m "Add feature: brief description"
# Push to your fork
git push origin feature/my-new-feature
# Open Pull Request on GitHub
# Fill out PR template
# Wait for review
Troubleshooting
Common Issues
1. Import Error: ModuleNotFoundError
# Solution: Install package in development mode
pip install -e .
2. GPU not detected
# Check CUDA installation
python -c "import torch; print(torch.cuda.is_available())"
# If False, install CUDA-enabled PyTorch
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
3. Out of Memory (OOM)
# Reduce batch size
=16) # Instead of 32 or 64
model.fit(X_train, y_train, batch_size
# Use gradient accumulation
model.fit(X_train, y_train,=16,
batch_size=2) # Effective batch size = 32
accumulation_steps
# Clear memory
import gc
gc.collect() torch.cuda.empty_cache()
4. Slow Training
# Use GPU if available
='cuda')
model.fit(X_train, y_train, device
# Use mixed precision
=True)
model.fit(X_train, y_train, mixed_precision
# Reduce model complexity
= MyModel(layers=[64, 32]) # Instead of [256, 128, 64] model
Getting Help
- Documentation: https://ai-public-health-code.readthedocs.io
- Issues: https://github.com/your-org/ai-public-health-code/issues
- Discussions: https://github.com/your-org/ai-public-health-code/discussions
- Email: support@ai-public-health.org
Additional Resources
Tutorials
Video Tutorials: YouTube playlist with step-by-step walkthroughs
Interactive Tutorials: Google Colab notebooks (no installation required)
API Documentation
Full API docs: https://ai-public-health-code.readthedocs.io/api/
Quick Reference:
# Import main modules
from ai_public_health import (
# Data loading and preprocessing
data, # ML models
models, # Evaluation metrics
evaluation, # Fairness auditing
fairness, # XAI tools
explainability, # Deployment utilities
deployment
)
# See docstrings
help(models.ClinicalRiskPredictor)
Example Datasets
All example datasets documented at: https://ai-public-health-code.readthedocs.io/datasets/
Citation
If you use this code in your research, please cite:
@software{ai_public_health_code,
title = {AI in Public Health: Code Repository},
author = {Your Name and Contributors},
year = {2024},
url = {https://github.com/your-org/ai-public-health-code},
version = {1.0.0}
}
Ready to start coding? Open notebooks/chapter01/01_python_setup.ipynb
and follow along!