Appendix D — Code Repository Guide

Appendix C: Code Repository Guide

This guide helps you navigate the companion code repository, run examples, and adapt code for your own projects.


Repository Overview

Repository URL: https://github.com/your-org/ai-public-health-code

Structure:

ai-public-health-code/
├── README.md
├── requirements.txt
├── environment.yml
├── setup.py
├── data/
│   ├── sample/           # Sample datasets for learning
│   ├── real/             # Links to real public datasets
│   └── synthetic/        # Synthetic data generators
├── notebooks/
│   ├── chapter01/        # Jupyter notebooks by chapter
│   ├── chapter02/
│   └── ...
├── src/
│   ├── preprocessing/    # Data preprocessing utilities
│   ├── models/           # Model implementations
│   ├── evaluation/       # Evaluation metrics and tools
│   ├── fairness/         # Fairness assessment tools
│   ├── explainability/   # XAI tools (LIME, SHAP, etc.)
│   └── deployment/       # Deployment utilities
├── tests/
│   └── ...              # Unit tests
├── examples/
│   ├── disease_surveillance/
│   ├── outbreak_prediction/
│   ├── diagnostic_ai/
│   └── resource_allocation/
└── docs/
    ├── api/             # API documentation
    └── tutorials/       # Step-by-step tutorials

Getting Started

Prerequisites

Required: - Python 3.8 or higher - pip or conda package manager - 8GB RAM minimum (16GB recommended) - Jupyter Notebook or JupyterLab

Optional: - GPU (NVIDIA with CUDA for deep learning) - Docker (for containerized deployment)

Installation

Option 1: Using pip (recommended for most users)

# Clone repository
git clone https://github.com/your-org/ai-public-health-code.git
cd ai-public-health-code

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install package in development mode
pip install -e .

# Verify installation
python -c "import ai_public_health; print('Installation successful!')"

Option 2: Using conda (recommended for data scientists)

# Clone repository
git clone https://github.com/your-org/ai-public-health-code.git
cd ai-public-health-code

# Create conda environment
conda env create -f environment.yml

# Activate environment
conda activate ai-public-health

# Install package in development mode
pip install -e .

# Verify installation
python -c "import ai_public_health; print('Installation successful!')"

Option 3: Using Docker (recommended for deployment)

# Pull Docker image
docker pull your-org/ai-public-health:latest

# Run container
docker run -it -p 8888:8888 your-org/ai-public-health:latest

# Jupyter will be available at http://localhost:8888

Quick Start

# Launch Jupyter
jupyter notebook

# Navigate to notebooks/chapter01/
# Open "01_introduction.ipynb"
# Run all cells

Repository Components

1. Data Directory

Sample Datasets

data/sample/ contains small datasets for learning:

# Load sample disease surveillance data
from ai_public_health.data import load_sample_data

# Flu surveillance (synthetic)
flu_data = load_sample_data('flu_surveillance')
print(flu_data.head())
# Output: week, state, flu_cases, population, ...

# Outbreak dataset (synthetic)
outbreak_data = load_sample_data('outbreak_simulation')
# Contains: location, day, cases, interventions, ...

# Clinical dataset (MIMIC-III subset, de-identified)
clinical_data = load_sample_data('clinical_demo')
# Contains: demographics, vitals, labs, outcomes

Available sample datasets: - flu_surveillance - Weekly flu cases by state (5 years) - outbreak_simulation - Simulated disease outbreak - clinical_demo - ICU patient data (synthetic) - health_equity - Synthetic data for fairness analysis - vaccination_coverage - Vaccination rates by county

Real Public Datasets

data/real/ contains download scripts and links:

# Download real public datasets
from ai_public_health.data import download_dataset

# CDC FluView data
download_dataset('cdc_fluview', years=[2018, 2019, 2020])

# COVID-19 data from Johns Hopkins
download_dataset('jhu_covid19', start_date='2020-01-01')

# MIMIC-III (requires credentialing)
# See data/real/mimic/README.md for access instructions

Synthetic Data Generators

# Generate synthetic data for development/testing
from ai_public_health.data import generators

# Generate synthetic outbreak
outbreak = generators.generate_outbreak(
    n_locations=100,
    n_days=180,
    outbreak_start=30,
    r0=2.5,
    intervention_day=60
)

# Generate synthetic clinical data
patients = generators.generate_clinical_data(
    n_patients=1000,
    include_outcomes=['mortality', 'los', 'readmission'],
    demographic_distribution='us_census'
)

# Generate with built-in fairness issues (for testing fairness tools)
biased_data = generators.generate_clinical_data(
    n_patients=1000,
    inject_bias={
        'race': {'effect_size': 0.3, 'type': 'outcome_disparity'},
        'sex': {'effect_size': 0.15, 'type': 'feature_correlation'}
    }
)

2. Notebooks

Organized by chapter, each with theory + practice:

notebooks/
├── chapter01_introduction/
│   ├── 01_python_setup.ipynb
│   ├── 02_data_loading.ipynb
│   └── 03_first_ml_model.ipynb
├── chapter02_foundations/
│   ├── 01_supervised_learning.ipynb
│   ├── 02_evaluation_metrics.ipynb
│   └── 03_crossvalidation.ipynb
├── chapter03_disease_surveillance/
│   ├── 01_time_series_basics.ipynb
│   ├── 02_outbreak_detection.ipynb
│   └── 03_forecasting_models.ipynb
...

Notebook Features: - ✅ Step-by-step explanations - ✅ Executable code cells - ✅ Visualizations - ✅ Exercises with solutions - ✅ Links to relevant textbook sections

Example Notebook Structure:

# ===============================================
# Chapter 3: Disease Surveillance
# Notebook 2: Outbreak Detection
# ===============================================

# Learning Objectives:
# 1. Implement EARS algorithm
# 2. Apply CUSUM for anomaly detection
# 3. Evaluate outbreak detection performance

# %% Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ai_public_health.surveillance import EARS, CUSUM

# %% Load Data
data = pd.read_csv('../data/sample/flu_surveillance.csv')

# %% [THEORY] Outbreak Detection Methods
"""
Outbreak detection identifies when disease cases exceed expected levels.

Common methods:
1. EARS (Early Aberration Reporting System)
2. CUSUM (Cumulative Sum Control Chart)
3. Farrington algorithm

We'll implement and compare these methods.
"""

# %% [CODE] Implement EARS
ears = EARS(window=7, threshold=3)
alerts = ears.detect(data['cases'])

# Visualize
plt.figure(figsize=(12, 4))
plt.plot(data['week'], data['cases'], label='Cases')
plt.scatter(data['week'][alerts], data['cases'][alerts],
           color='red', label='Alerts', zorder=5)
plt.xlabel('Week')
plt.ylabel('Cases')
plt.legend()
plt.title('EARS Outbreak Detection')
plt.show()

# %% [EXERCISE] Apply CUSUM
"""
EXERCISE: Implement CUSUM outbreak detection
- Use threshold h=5
- Compare sensitivity/specificity to EARS
- Plot results

SOLUTION: (click to reveal)
"""

# %% [ADVANCED] Real-time Deployment
"""
For production deployment, see:
- src/deployment/outbreak_detector.py
- examples/disease_surveillance/real_time_system.py
"""

3. Source Code (src/)

Modular, reusable implementations:

Preprocessing

# ai_public_health/preprocessing/

from ai_public_health.preprocessing import (
    clean_surveillance_data,
    handle_missing_values,
    engineer_features,
    prepare_time_series
)

# Clean surveillance data
cleaned = clean_surveillance_data(
    raw_data,
    remove_outliers=True,
    impute_missing=True,
    standardize_location_names=True
)

# Feature engineering for disease prediction
features = engineer_features(
    clinical_data,
    feature_sets=['demographics', 'vitals', 'labs'],
    interaction_terms=True,
    temporal_features=True
)

Models

# ai_public_health/models/

from ai_public_health.models import (
    OutbreakDetector,
    DiseaseForecaster,
    ClinicalRiskPredictor,
    ResourceAllocator
)

# Example: Clinical risk prediction
risk_model = ClinicalRiskPredictor(
    model_type='xgboost',
    outcome='mortality',
    hyperparameters={'max_depth': 5, 'n_estimators': 100}
)

risk_model.fit(X_train, y_train)
predictions = risk_model.predict_proba(X_test)

# Built-in evaluation
metrics = risk_model.evaluate(X_test, y_test)
print(f"AUC: {metrics['auc']:.3f}")

Evaluation

# ai_public_health/evaluation/

from ai_public_health.evaluation import (
    compute_classification_metrics,
    calibration_analysis,
    bootstrap_confidence_intervals,
    temporal_validation
)

# Comprehensive evaluation
evaluation = compute_classification_metrics(
    y_true=y_test,
    y_pred=predictions,
    y_proba=probabilities,
    metrics=['accuracy', 'sensitivity', 'specificity',
             'ppv', 'npv', 'auc', 'brier_score']
)

# Calibration analysis
calibration = calibration_analysis(y_true, y_proba)
calibration.plot()  # Calibration curve

# Bootstrap CIs
ci = bootstrap_confidence_intervals(
    y_true, y_proba,
    metric='auc',
    n_bootstrap=1000,
    confidence_level=0.95
)
print(f"AUC: {ci['estimate']:.3f} (95% CI: {ci['ci_lower']:.3f}-{ci['ci_upper']:.3f})")

Fairness

# ai_public_health/fairness/

from ai_public_health.fairness import (
    FairnessAuditor,
    compute_fairness_metrics,
    mitigate_bias
)

# Audit model for fairness
auditor = FairnessAuditor(
    model=trained_model,
    sensitive_attributes=['race', 'sex', 'age_group']
)

fairness_report = auditor.audit(
    X_test,
    y_test,
    metrics=['demographic_parity', 'equalized_odds', 'equal_opportunity']
)

# Visualize disparities
fairness_report.plot_disparities()

# Mitigation
mitigated_model = mitigate_bias(
    model=trained_model,
    X_train=X_train,
    y_train=y_train,
    sensitive_features=sensitive_features,
    constraint='equalized_odds'
)

Explainability

# ai_public_health/explainability/

from ai_public_health.explainability import (
    explain_prediction,
    global_feature_importance,
    generate_counterfactuals
)

# Explain individual prediction
explanation = explain_prediction(
    model=trained_model,
    instance=patient_data,
    method='shap'  # or 'lime', 'integrated_gradients'
)

explanation.plot()

# Global feature importance
importance = global_feature_importance(
    model=trained_model,
    X=X_train,
    method='permutation'
)

importance.plot_top_k(k=20)

# Generate counterfactuals
counterfactuals = generate_counterfactuals(
    model=trained_model,
    instance=patient_data,
    desired_outcome=0,  # Change from positive to negative prediction
    n_counterfactuals=5
)

print("To change prediction, modify:")
for cf in counterfactuals:
    print(f"  {cf['feature']}: {cf['original_value']}{cf['counterfactual_value']}")

4. Examples

End-to-end implementations:

Disease Surveillance System

# examples/disease_surveillance/real_time_system.py

"""
Real-time disease surveillance system

Demonstrates:
- Data ingestion from multiple sources
- Outbreak detection algorithms
- Alert generation
- Dashboard visualization
"""

from ai_public_health.surveillance import SurveillanceSystem

# Initialize system
system = SurveillanceSystem(
    data_sources=['cdc_fluview', 'google_trends', 'twitter'],
    detection_methods=['ears', 'cusum', 'farrington'],
    alert_thresholds={'ears': 3, 'cusum': 5},
    notification_channels=['email', 'sms', 'dashboard']
)

# Run monitoring loop
system.start_monitoring(interval_minutes=60)

# Access real-time dashboard at http://localhost:5000

Diagnostic AI Deployment

# examples/diagnostic_ai/chest_xray_classifier.py

"""
Chest X-ray pneumonia classifier

Demonstrates:
- Model loading and inference
- Image preprocessing
- Explainable predictions
- Clinical integration
"""

from ai_public_health.diagnostic import ChestXRayClassifier

# Load pre-trained model
classifier = ChestXRayClassifier(
    model_path='models/chest_xray_pneumonia.h5',
    model_type='densenet121',
    input_size=(224, 224)
)

# Classify X-ray
result = classifier.predict(
    image_path='data/xray_001.jpg',
    return_explanation=True,
    return_confidence=True
)

print(f"Prediction: {result['label']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Explanation: See heatmap at {result['explanation_path']}")

# Generate radiology report
report = classifier.generate_report(result)
print(report)

Running Examples

Example 1: Flu Forecasting

From scratch in 5 minutes:

# Navigate to notebook
cd notebooks/chapter03_disease_surveillance/

# Launch Jupyter
jupyter notebook 03_forecasting_models.ipynb

# Or run as script
python ../../examples/disease_surveillance/flu_forecast.py

What it does: 1. Loads CDC FluView data 2. Trains ARIMA, LSTM, XGBoost models 3. Generates 4-week-ahead forecasts 4. Evaluates forecast accuracy 5. Creates visualization

Output:

Flu Forecasting Results
=======================
ARIMA MAE: 2,450 cases
LSTM MAE: 2,180 cases
XGBoost MAE: 1,920 cases ← Best

Forecast for next 4 weeks:
Week 1: 15,230 cases (95% CI: 12,500-18,200)
Week 2: 17,830 cases (95% CI: 14,100-22,100)
Week 3: 19,450 cases (95% CI: 14,800-25,600)
Week 4: 18,920 cases (95% CI: 13,200-26,800)

Visualization saved to: outputs/flu_forecast.png

Example 2: Fairness Audit

# Run fairness audit on sepsis prediction model
cd examples/fairness/

python audit_sepsis_model.py \
    --model models/sepsis_predictor.pkl \
    --data data/test_set.csv \
    --sensitive-attrs race sex age_group \
    --output-dir outputs/fairness_audit/

Output:

Fairness Audit Report
====================

Overall Performance:
- AUC: 0.82
- Sensitivity: 0.78
- Specificity: 0.75

Performance by Race:
- White:    AUC 0.84, Sens 0.80, Spec 0.77
- Black:    AUC 0.79, Sens 0.74, Spec 0.72 ⚠
- Hispanic: AUC 0.81, Sens 0.77, Spec 0.74
- Asian:    AUC 0.83, Sens 0.79, Spec 0.76

Fairness Metrics:
- Demographic Parity: 0.12 (threshold: 0.10) ⚠ FAILED
- Equalized Odds: 0.08 (threshold: 0.10) ✓ PASSED
- Equal Opportunity: 0.06 (threshold: 0.10) ✓ PASSED

Recommendations:
1. Investigate lower sensitivity for Black patients
2. Consider rebalancing training data
3. Review for proxy discrimination
4. Monitor performance over time

Full report: outputs/fairness_audit/report.html

Example 3: Model Deployment

# Deploy model as REST API
cd examples/deployment/

# Start API server
python serve_model.py \
    --model models/sepsis_predictor.pkl \
    --port 5000 \
    --workers 4

# API available at http://localhost:5000

# Test API
curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "age": 65,
    "heart_rate": 110,
    "temperature": 38.5,
    "lactate": 3.2
  }'

# Response:
# {
#   "sepsis_risk": 0.72,
#   "confidence": "high",
#   "recommendation": "Consider sepsis protocol",
#   "important_factors": ["lactate", "heart_rate", "age"]
# }

Testing

Run test suite:

# Run all tests
pytest

# Run specific test module
pytest tests/test_surveillance.py

# Run with coverage
pytest --cov=ai_public_health --cov-report=html

# View coverage report
open htmlcov/index.html

Write your own tests:

# tests/test_my_model.py

import pytest
from ai_public_health.models import MyModel

def test_model_initialization():
    model = MyModel()
    assert model is not None

def test_model_training():
    model = MyModel()
    model.fit(X_train, y_train)
    assert model.is_fitted

def test_model_prediction():
    model = MyModel()
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    assert len(predictions) == len(X_test)
    assert all(0 <= p <= 1 for p in predictions)

Contributing Code

We welcome contributions! Here’s how:

1. Setup Development Environment

# Fork repository on GitHub
# Clone your fork
git clone https://github.com/YOUR-USERNAME/ai-public-health-code.git

# Add upstream remote
git remote add upstream https://github.com/your-org/ai-public-health-code.git

# Create development branch
git checkout -b feature/my-new-feature

# Install development dependencies
pip install -r requirements-dev.txt

2. Code Style

We follow PEP 8 with these tools:

# Format code
black src/

# Check style
flake8 src/

# Type checking
mypy src/

# All checks
make lint

3. Documentation

def my_function(param1: int, param2: str) -> float:
    """
    Brief description of function.

    Detailed description explaining what the function does,
    edge cases, and any important considerations.

    Args:
        param1: Description of param1
        param2: Description of param2

    Returns:
        Description of return value

    Raises:
        ValueError: When param1 is negative

    Example:
        >>> result = my_function(5, "test")
        >>> print(result)
        42.0
    """
    if param1 < 0:
        raise ValueError("param1 must be non-negative")

    # Implementation
    return param1 * len(param2) / 2.0

4. Submit Pull Request

# Commit changes
git add .
git commit -m "Add feature: brief description"

# Push to your fork
git push origin feature/my-new-feature

# Open Pull Request on GitHub
# Fill out PR template
# Wait for review

Troubleshooting

Common Issues

1. Import Error: ModuleNotFoundError

# Solution: Install package in development mode
pip install -e .

2. GPU not detected

# Check CUDA installation
python -c "import torch; print(torch.cuda.is_available())"

# If False, install CUDA-enabled PyTorch
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

3. Out of Memory (OOM)

# Reduce batch size
model.fit(X_train, y_train, batch_size=16)  # Instead of 32 or 64

# Use gradient accumulation
model.fit(X_train, y_train,
          batch_size=16,
          accumulation_steps=2)  # Effective batch size = 32

# Clear memory
import gc
gc.collect()
torch.cuda.empty_cache()

4. Slow Training

# Use GPU if available
model.fit(X_train, y_train, device='cuda')

# Use mixed precision
model.fit(X_train, y_train, mixed_precision=True)

# Reduce model complexity
model = MyModel(layers=[64, 32])  # Instead of [256, 128, 64]

Getting Help

  • Documentation: https://ai-public-health-code.readthedocs.io
  • Issues: https://github.com/your-org/ai-public-health-code/issues
  • Discussions: https://github.com/your-org/ai-public-health-code/discussions
  • Email: support@ai-public-health.org

Additional Resources

Tutorials

Video Tutorials: YouTube playlist with step-by-step walkthroughs

Interactive Tutorials: Google Colab notebooks (no installation required)

API Documentation

Full API docs: https://ai-public-health-code.readthedocs.io/api/

Quick Reference:

# Import main modules
from ai_public_health import (
    data,           # Data loading and preprocessing
    models,         # ML models
    evaluation,     # Evaluation metrics
    fairness,       # Fairness auditing
    explainability, # XAI tools
    deployment      # Deployment utilities
)

# See docstrings
help(models.ClinicalRiskPredictor)

Example Datasets

All example datasets documented at: https://ai-public-health-code.readthedocs.io/datasets/

Citation

If you use this code in your research, please cite:

@software{ai_public_health_code,
  title = {AI in Public Health: Code Repository},
  author = {Your Name and Contributors},
  year = {2024},
  url = {https://github.com/your-org/ai-public-health-code},
  version = {1.0.0}
}

Ready to start coding? Open notebooks/chapter01/01_python_setup.ipynb and follow along!