6 Disease Surveillance and Outbreak Detection

Learning Objectives

This chapter examines AI in disease surveillance and outbreak detection. You will learn to:

Implement anomaly detection algorithms (CDC’s EARS, Facebook Prophet)
Apply spatial-temporal clustering (SaTScan, DBSCAN) for outbreak hotspots
Evaluate internet surveillance (Google Flu Trends failures vs. HealthMap)
Integrate traditional and digital surveillance signals responsibly
Select metrics balancing early detection with alert fatigue
Navigate privacy-utility tradeoffs in digital disease monitoring
Develop judgment about when AI adds value vs. simpler methods

Prerequisites: AI in Context, Just Enough AI to Be Dangerous, The Data Problem.

📋 Chapter Summary (TL;DR)

The Big Picture: AI transforms disease surveillance from John Snow’s weeks-long door-to-door investigation (1854) to BlueDot’s automated 9-day warning before WHO announced COVID-19. But more data ≠ better signal—noise increases faster than useful information, and technology alone can’t replace human expertise.

The Surveillance Pyramid Challenge:

Traditional surveillance captures only lab-confirmed cases (tip of iceberg). AI enables detection at lower levels: - Social media mentions (symptomatic, not yet seeking care) - OTC medication sales (early self-treatment) - Wastewater viral load (all infections including asymptomatic) - Search queries (pre-symptomatic concern)

But each level introduces new biases and interpretation challenges.

Core AI Approaches for Outbreak Detection:

Anomaly Detection (CDC’s EARS, Facebook Prophet)
- Detects when case counts deviate from expected baseline
- Statistical control charts, time series decomposition
- Challenge: Setting thresholds to balance early detection vs. alert fatigue
Spatial-Temporal Clustering (SaTScan, DBSCAN)
- Identifies geographic hotspots and unusual disease clusters
- Space-time scan statistics for outbreak localization
- Critical for: Contact tracing, targeted interventions
Internet/Digital Surveillance (HealthMap, Flu Near You)
- Social media monitoring, search trends, mobility data
- Success: BlueDot detected COVID-19 nine days before WHO
- Failure: Google Flu Trends overestimated flu by 140% (learned spurious correlations)

The Google Flu Trends Lesson (Why Digital Surveillance Fails):

Published in Nature (2008), discontinued (2015). What went wrong? - Overfitted to spurious correlations (“basketball” searches peak during flu season) - Algorithm changes broke the model - Media coverage changed search behavior - Lacked epidemiologist input - Lesson: Big data ≠ good data. Correlation ≠ causation. Domain expertise required.

Evaluation Metrics (Different from Standard ML):

Sensitivity for Early Detection: How quickly do you detect outbreaks? Days/weeks earlier than traditional?
Specificity vs. Alert Fatigue: False alarm rate—every false positive erodes trust
Timeliness: Real-time vs. near-real-time vs. delayed signals
Coverage: Geographic and population representativeness
Trade-off: Maximize early warning while minimizing alert fatigue

Privacy-Utility Trade-offs:

Mobility data enables contact tracing but raises surveillance concerns
COVID-19 apps: Centralized (effective but privacy-invasive) vs. Decentralized (privacy-preserving but less effective)
Social media monitoring captures early signals but lacks demographic representativeness
Critical question: What level of privacy sacrifice is justified for public health gain?

Integration with Traditional Surveillance:

AI signals should complement, not replace traditional surveillance: - Use AI for early warning, traditional methods for confirmation - Triangulate multiple data streams (clinical + digital + environmental) - Maintain human epidemiologist-in-the-loop for interpretation - Ground-truth digital signals with lab confirmation

When AI Adds Value vs. Simpler Methods:

Use AI when: - Multiple heterogeneous data streams need integration - Real-time processing of massive data volumes - Detecting subtle patterns across space and time - Resource-limited settings lacking traditional infrastructure

Stick with traditional methods when: - Small geographic area with good reporting - Well-established surveillance system - Interpretability is paramount - Limited technical capacity for AI maintenance

The Takeaway for Public Health Practitioners: AI augments surveillance but doesn’t solve fundamental challenges—selection bias, reporting delays, changing definitions, privacy constraints. BlueDot succeeded where Google Flu Trends failed because it combined AI with domain expertise. Faster detection means nothing if it generates false alarms that erode trust. The goal isn’t technological sophistication—it’s actionable intelligence for timely public health response.

6.1 Introduction: The Evolution of Surveillance

September 1854, London: John Snow knocks on doors along Broad Street, interviewing residents about their water sources. He painstakingly maps cholera cases by hand. It takes him weeks to identify the contaminated pump—but his work revolutionizes epidemiology.

January 2020, Toronto: BlueDot, an AI surveillance platform, flags unusual pneumonia reports in Wuhan, China. It alerts its clients on January 9, nine days before the WHO’s public announcement. The algorithm analyzed airline ticketing data, predicted spread patterns, and identified at-risk cities—all automated, all in real-time.

The transformation is stunning. But here’s the paradox: we have more surveillance data than ever, yet outbreak detection remains incredibly difficult.

Why?

More data ≠ Better signal: Noise increases faster than useful information
Faster doesn’t always mean better: False alarms erode trust (alert fatigue)
Technology alone isn’t enough: Interpretation still requires human expertise
Equity gaps persist: Sophisticated surveillance exists where it’s least needed

COVID-19 laid this bare. Despite unprecedented surveillance capabilities—genomic sequencing, wastewater monitoring, mobility data, social media signals—we still struggled with: - Delayed outbreak detection in resource-limited settings - Contradictory signals from different surveillance streams - The “denominator problem” (testing bias masking true disease burden) - Privacy backlash against contact tracing apps

Surveillance vs. Monitoring vs. Screening

These terms are often confused:

Surveillance: Ongoing, systematic collection and analysis of health data for public health action - Purpose: Early warning, trend monitoring, program evaluation - Population: Entire communities or populations - Example: Weekly influenza case counts

Monitoring: Tracking specific measures over time, often program outcomes - Purpose: Assess intervention effectiveness - Population: Usually program participants - Example: Vaccination coverage rates

Screening: Identifying disease in asymptomatic individuals - Purpose: Early diagnosis and treatment - Population: Individuals at risk - Example: Mammography for breast cancer

This chapter focuses on surveillance—specifically, how AI can enhance early detection of outbreaks.

6.1.1 The Surveillance Pyramid

Recall from Chapter 3 the surveillance pyramid:

         Confirmed Cases
        /   (Lab-confirmed, reported)
       /
      🏥 Healthcare-Seeking Cases
     /   (Symptomatic, seeking care)
    /
   😷 All Symptomatic Cases
  /   (Including those who don't seek care)
 /
😊 All Infections
   (Including asymptomatic)

Traditional surveillance captures only the tip (confirmed cases). AI enables us to potentially detect signals at lower levels: - Social media mentions of symptoms (symptomatic, not yet seeking care) - Over-the-counter medication sales (early self-treatment) - Wastewater viral load (all infections, including asymptomatic) - Search engine queries (pre-symptomatic concern)

But each level introduces new biases and challenges.

6.1.2 What AI Can (and Cannot) Do for Surveillance

AI excels at: ✅ Processing massive, heterogeneous data streams in real-time
✅ Detecting subtle patterns humans might miss
✅ Automating repetitive monitoring tasks (freeing humans for interpretation)
✅ Integrating multiple data sources with different biases
✅ Providing early warning before traditional surveillance signals appear

AI struggles with: ❌ Novel outbreaks with no historical training data
❌ Explaining why an alert was triggered (black box problem)
❌ Distinguishing true signal from noise without verification
❌ Handling rapidly changing surveillance systems (non-stationarity)
❌ Operating in data-poor environments (rural, low-income settings)

The key insight: AI should augment, not replace traditional surveillance. The most effective systems combine both.

6.2 Traditional Surveillance Systems: The Baseline

Before exploring AI approaches, we must understand the baseline. Traditional surveillance remains the gold standard against which AI systems are judged.

6.2.1 Syndromic Surveillance

The idea: Monitor pre-diagnosis syndromes (fever, cough, rash) rather than confirmed diseases. This provides earlier signals but lower specificity.

Common data sources: - Emergency department chief complaints - Over-the-counter medication sales - School/workplace absenteeism - Ambulance dispatches - Calls to health hotlines (e.g., 811 in Canada, NHS 111 in UK)

Major systems in the US:

6.2.1.1 1. BioSense Platform

The CDC’s BioSense Platform collects syndromic data from ~70% of emergency departments nationwide.

Strengths: - Near real-time data (daily updates) - Standardized data elements - Built-in anomaly detection

Weaknesses: - Healthcare-seeking bias (Chapter 3 concept) - Respiratory syndrome overload during flu season - High false positive rate

6.2.1.2 2. ESSENCE (Electronic Surveillance System for the Early Notification of Community-based Epidemics)

Originally developed by Johns Hopkins, now widely deployed by state and local health departments.

Features: - Customizable syndrome definitions - Multiple aberration detection algorithms - Real-time dashboards

6.2.1.3 3. EARS (Early Aberration Reporting System)

A set of simple statistical algorithms developed by the CDC for rapid outbreak detection.

The EARS Algorithms: - C1: Compares today’s count to average of previous 7 days - C2: Compares today to 2-day moving average - C3: Uses 3-standard-deviation threshold on baseline

Let’s implement EARS C3:

Hide code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

def ears_c3(counts, baseline_days=7, guard_band=2):
    """
    EARS C3 algorithm for outbreak detection
    
    Parameters:
    - counts: array of daily counts
    - baseline_days: number of days to use for baseline (default 7)
    - guard_band: days to exclude before baseline (default 2)
    
    Returns:
    - alerts: boolean array indicating alerts
    - thresholds: upper control limits for each day
    """
    alerts = np.zeros(len(counts), dtype=bool)
    thresholds = np.zeros(len(counts))
    
    # Need at least baseline_days + guard_band days
    start_idx = baseline_days + guard_band
    
    for i in range(start_idx, len(counts)):
        # Baseline period: from (i - baseline_days - guard_band) to (i - guard_band - 1)
        baseline_start = i - baseline_days - guard_band
        baseline_end = i - guard_band
        
        baseline = counts[baseline_start:baseline_end]
        
        # Calculate baseline statistics
        baseline_mean = np.mean(baseline)
        baseline_std = np.std(baseline, ddof=1)
        
        # C3 threshold: mean + 3*std
        threshold = baseline_mean + 3 * baseline_std
        thresholds[i] = threshold
        
        # Alert if today's count exceeds threshold
        if counts[i] > threshold:
            alerts[i] = True
    
    return alerts, thresholds

# Example: Detect outbreak in synthetic syndromic data
np.random.seed(42)
n_days = 100

# Simulate baseline: seasonal pattern + noise
days = np.arange(n_days)
seasonal = 20 + 10 * np.sin(2 * np.pi * days / 30)  # 30-day cycle
noise = np.random.normal(0, 3, n_days)
baseline_counts = seasonal + noise

# Inject outbreak: days 60-75 have elevated counts
outbreak_counts = baseline_counts.copy()
outbreak_counts[60:75] += 15 + np.random.normal(0, 2, 15)

# Run EARS C3
alerts, thresholds = ears_c3(outbreak_counts, baseline_days=7, guard_band=2)

# Visualize
fig, ax = plt.subplots(figsize=(14, 6))

ax.plot(days, outbreak_counts, 'o-', label='Daily Counts', color='steelblue')
ax.plot(days, thresholds, '--', label='EARS C3 Threshold', color='orange', linewidth=2)
ax.fill_between(days, 0, thresholds, alpha=0.2, color='orange')

# Mark alerts
alert_days = days[alerts]
alert_counts = outbreak_counts[alerts]
ax.scatter(alert_days, alert_counts, color='red', s=100, zorder=5, 
           label=f'Alerts (n={alerts.sum()})', marker='X')

# Mark true outbreak period
ax.axvspan(60, 75, alpha=0.2, color='red', label='True Outbreak Period')

ax.set_xlabel('Day')
ax.set_ylabel('Syndromic Counts')
ax.set_title('EARS C3 Outbreak Detection Algorithm')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('ears_c3_example.png', dpi=300)
plt.show()

# Evaluate performance
true_outbreak = np.zeros(n_days, dtype=bool)
true_outbreak[60:75] = True

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report

cm = confusion_matrix(true_outbreak, alerts)
print("Confusion Matrix:")
print(cm)
print("\nClassification Report:")
print(classification_report(true_outbreak, alerts, 
                           target_names=['No Outbreak', 'Outbreak']))

# Time to detection
first_alert = np.where(alerts)[0][0] if alerts.any() else None
outbreak_start_day = 60

if first_alert:
    time_to_detection = first_alert - outbreak_start_day
    print(f"\nTime to Detection: {time_to_detection} days")
    
    if time_to_detection < 0:
        print("⚠️ False alarm before outbreak started")
    elif time_to_detection == 0:
        print("✓ Detected on outbreak start day")
    else:
        print(f"✓ Detected {time_to_detection} days after outbreak start")

The False Positive Problem

EARS and similar algorithms generate many false alarms. This is by design—trading specificity for sensitivity.

Why this matters: - Alert fatigue → Ignoring real outbreaks - Resource waste investigating false signals - Public trust erosion if alerts are publicized

The CDC’s MMWR reports show that only ~5-10% of syndromic surveillance alerts correspond to true outbreaks.

The solution: Layer multiple signals, require verification, adjust thresholds based on context.

6.2.2 Case-Based Surveillance

Notifiable disease reporting remains the cornerstone of public health surveillance.

The process: 1. Healthcare provider diagnoses disease 2. Reports to local health department (legally required) 3. Local → State → National (CDC/ECDC/WHO) 4. Aggregated and published (e.g., CDC’s NNDSS)

Timeliness challenges: - Days to weeks lag between infection and report - Incomplete reporting (estimated 10-50% of cases missed) - Varying definitions across jurisdictions

Electronic Lab Reporting (ELR): Automates step 2 by sending lab results directly to health departments via HL7 messaging.

Impact: - Reduces reporting delays by 4-7 days - Increases completeness - Still suffers from testing bias

6.2.3 Sentinel Surveillance

The idea: Monitor a representative sample of providers/sites intensively, rather than entire population superficially.

Example: FluView and ILINet

The CDC’s Influenza Surveillance System (ILINet) collects data from ~3,000 outpatient providers.

What they report weekly: - Total patient visits - Visits for influenza-like illness (ILI) - ILI percentage = (ILI visits / total visits) × 100

Strengths: - High-quality data (trained reporters) - Consistent definitions - Long time series for comparison

Limitations: - Small sample size - Not all regions equally represented - Healthcare-seeking bias still present

Code example: Visualizing ILINet data

Hide code

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# ILINet data is publicly available from CDC
# Download from: https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html
# For this example, we'll simulate similar data

# Simulate 5 years of weekly ILI data
np.random.seed(42)
weeks = pd.date_range('2018-01-01', periods=260, freq='W-MON')

# Baseline ILI with seasonal pattern
week_of_year = weeks.isocalendar().week
baseline_ili = 2.0 + 2.5 * np.exp(-((week_of_year - 52) % 52 - 6)**2 / 50)

# Add noise and trend
noise = np.random.normal(0, 0.3, len(weeks))
trend = np.linspace(0, 0.5, len(weeks))  # Slight upward trend

ili_pct = baseline_ili + noise + trend

# Create DataFrame
ili_data = pd.DataFrame({
    'week': weeks,
    'ili_pct': ili_pct,
    'season': weeks.year + (weeks.month >= 10).astype(int)
})

# Visualize
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Time series plot
for season in ili_data['season'].unique():
    season_data = ili_data[ili_data['season'] == season]
    axes[0].plot(season_data['week'], season_data['ili_pct'], 
                 marker='o', label=f'{season-1}/{season}', alpha=0.7)

axes[0].set_xlabel('Week')
axes[0].set_ylabel('ILI Percentage (%)')
axes[0].set_title('Weekly Influenza-Like Illness Percentage (ILINet Style)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Seasonal comparison
ili_data['week_of_season'] = (ili_data['week'].dt.isocalendar().week - 40) % 52

for season in ili_data['season'].unique():
    season_data = ili_data[ili_data['season'] == season]
    season_data = season_data.sort_values('week_of_season')
    axes[1].plot(season_data['week_of_season'], season_data['ili_pct'],
                 marker='o', label=f'{season-1}/{season}', alpha=0.7)

axes[1].set_xlabel('Week of Season (0 = Oct, 26 = Apr)')
axes[1].set_ylabel('ILI Percentage (%)')
axes[1].set_title('ILI Percentage by Week of Season (Aligned)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('ilinet_style_data.png', dpi=300)
plt.show()

# Calculate epidemic threshold (baseline + 2 SD)
historical_baseline = ili_data['ili_pct'].rolling(window=52, min_periods=26).mean()
historical_sd = ili_data['ili_pct'].rolling(window=52, min_periods=26).std()
epidemic_threshold = historical_baseline + 2 * historical_sd

print("ILI Surveillance Metrics:")
print(f"Mean ILI%: {ili_data['ili_pct'].mean():.2f}%")
print(f"Peak ILI%: {ili_data['ili_pct'].max():.2f}%")
print(f"Weeks above epidemic threshold: {(ili_data['ili_pct'] > epidemic_threshold).sum()}")

6.3 AI-Enhanced Early Warning Systems

Traditional surveillance works well for known diseases with established reporting. But what about novel threats or rapid detection before traditional reports arrive?

Enter internet-based surveillance (also called digital epidemiology or infoveillance).

6.3.1 HealthMap: Pioneering Digital Surveillance

HealthMap, launched in 2006 by researchers at Boston Children’s Hospital, was among the first automated disease surveillance systems.

How it works: 1. Data sources: News aggregators, social media, official reports, eyewitness accounts 2. NLP processing: Extract disease mentions, locations, severity indicators 3. Geocoding: Map events to geographic coordinates 4. Classification: Categorize by disease type, outbreak stage 5. Visualization: Display on interactive map

Notable successes:

2009 H1N1 Pandemic: HealthMap detected unusual respiratory illness reports in Mexico before the WHO announcement. The system tracked spread in real-time, providing situational awareness.

2014 Ebola Outbreak: HealthMap identified the first Ebola cases in Guinea in March 2014, nine days before the WHO confirmation.

2019-2020 COVID-19: HealthMap flagged pneumonia clusters in Wuhan on December 31, 2019—simultaneous with ProMED-mail’s human-curated alert.

The HealthMap Advantage

Unlike traditional surveillance that depends on healthcare-seeking and reporting infrastructure, HealthMap taps into informal information networks:

Local news in any language
Social media rumors (verified before display)
Unofficial disease reports
Eyewitness accounts

This is especially valuable in settings with weak surveillance infrastructure.

See Brownstein et al. 2008 (Freifeld et al. 2008; Brownstein et al. 2008) for technical details.

Limitations: - Signal-to-noise ratio: Many rumors don’t pan out - Verification needed: Automated detection ≠ confirmed outbreak - Language barriers: NLP struggles with low-resource languages - Digital divide: Underreports in areas with limited internet access

6.3.2 ProMED-mail: Human + AI Augmentation

ProMED-mail (Program for Monitoring Emerging Diseases) is a human-curated global surveillance system operated by the International Society for Infectious Diseases.

The model: - ~40,000 members worldwide submit outbreak reports - Expert moderators (physicians, epidemiologists) review and verify - Rapid dissemination via email list (60,000+ subscribers) - Now enhanced with AI for initial screening and translation

Historical impact:

SARS 2003: ProMED published the first English-language report of “atypical pneumonia” in Guangdong Province on February 10, 2003—providing early warning to the global community.

COVID-19: ProMED’s December 30, 2019 post about “undiagnosed pneumonia” in Wuhan was among the first public alerts.

The hybrid approach: - AI scans news/social media for potential signals - Human experts verify, contextualize, and comment - Community peer review (members respond with additional information)

Key lesson: AI handles volume, humans provide judgment. Neither alone is sufficient.

6.3.3 BlueDot: Commercial Success in Outbreak Intelligence

BlueDot, founded in 2014 by Dr. Kamran Khan (an infectious disease physician), represents the commercial state-of-the-art in AI surveillance.

Multi-source data integration: - News media (65,000 sources, 65 languages) - Official health reports - Airline ticketing data (global travel patterns) - Animal disease surveillance - Climate and environmental data - Population demographics

The algorithm: 1. Ingest: Real-time data from all sources 2. Filter: ML models identify anomalies and prioritize signals 3. Analyze: Predict disease spread using travel and climate data 4. Alert: Human epidemiologists review and contextualize 5. Report: Clients receive tailored intelligence

COVID-19 early warning:

On January 9, 2020, BlueDot alerted clients about a novel coronavirus outbreak in Wuhan and predicted which cities were at highest risk based on airline travel data.

This was: - 9 days before WHO’s public announcement - Days before ProMED and HealthMap alerts reached mass attention - Accurate predictions: Bangkok, Hong Kong, Tokyo, Taipei, Seoul were indeed early spread destinations

The catch: - Proprietary algorithm: Black box, can’t be independently validated - Expensive: Costs tens of thousands per year (out of reach for most health departments) - Still requires human verification: Automated alerts reviewed by BlueDot’s team

The Black Box Problem

BlueDot’s success raises a critical question: Can we trust outbreak intelligence we can’t verify?

Arguments in favor: - Track record: BlueDot’s alerts have been accurate - Human oversight: Expert team reviews all automated signals - Value proposition: Early warning justifies cost for paying clients

Arguments against: - No independent validation of algorithm performance - Public health decisions based on proprietary, unverifiable models - Equity concerns: Only wealthy entities can afford access - What happens if BlueDot is wrong? Who bears responsibility?

This tension—performance vs. transparency—appears throughout public health AI.

For academic perspective, see Wilson & Brownstein, 2020, Lancet Digital Health.

6.3.4 Building Your Own Web Scraper for Outbreak Signals

You can create a basic outbreak surveillance system using open-source tools:

Hide code

import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
import re
from geopy.geocoders import Nominatim
import time

# This is a simplified example - production systems need robust error handling,
# rate limiting, compliance with robots.txt, and proper data validation

def scrape_who_don():
    """
    Scrape WHO Disease Outbreak News (DON)
    URL: https://www.who.int/emergencies/disease-outbreak-news
    """
    url = "https://www.who.int/emergencies/disease-outbreak-news"
    
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, 'html.parser')
        
        # Find article listings (adjust selectors based on current site structure)
        articles = soup.find_all('div', class_='list-view--item')
        
        outbreaks = []
        for article in articles[:10]:  # Limit to 10 most recent
            try:
                title_elem = article.find('h3', class_='heading')
                date_elem = article.find('span', class_='timestamp')
                link_elem = article.find('a')
                
                if title_elem and date_elem and link_elem:
                    outbreaks.append({
                        'date': date_elem.text.strip(),
                        'title': title_elem.text.strip(),
                        'url': 'https://www.who.int' + link_elem['href'],
                        'source': 'WHO DON'
                    })
            except Exception as e:
                continue
        
        return pd.DataFrame(outbreaks)
    
    except Exception as e:
        print(f"Error scraping WHO DON: {e}")
        return pd.DataFrame()

def scrape_promed_via_rss():
    """
    Get ProMED posts via RSS feed
    """
    import feedparser
    
    feed_url = "https://promedmail.org/ajax/rss.php"
    
    try:
        feed = feedparser.parse(feed_url)
        
        posts = []
        for entry in feed.entries[:20]:  # Last 20 posts
            posts.append({
                'date': entry.published if 'published' in entry else 'Unknown',
                'title': entry.title,
                'url': entry.link,
                'summary': entry.summary if 'summary' in entry else '',
                'source': 'ProMED'
            })
        
        return pd.DataFrame(posts)
    
    except Exception as e:
        print(f"Error fetching ProMED RSS: {e}")
        return pd.DataFrame()

def extract_disease_mentions(text, disease_keywords):
    """
    Simple keyword matching for disease extraction
    In production, use NER models like BioBERT
    """
    text_lower = text.lower()
    mentioned_diseases = []
    
    for disease, keywords in disease_keywords.items():
        for keyword in keywords:
            if keyword.lower() in text_lower:
                mentioned_diseases.append(disease)
                break
    
    return list(set(mentioned_diseases))

def geocode_location(location_text):
    """
    Extract location from text and geocode
    """
    geolocator = Nominatim(user_agent="outbreak_surveillance_demo")
    
    try:
        location = geolocator.geocode(location_text, timeout=10)
        if location:
            return {
                'latitude': location.latitude,
                'longitude': location.longitude,
                'location_full': location.address
            }
    except Exception as e:
        pass
    
    return {'latitude': None, 'longitude': None, 'location_full': None}

# Disease keywords (simplified - real systems use ML models)
DISEASE_KEYWORDS = {
    'COVID-19': ['covid', 'coronavirus', 'sars-cov-2', 'pandemic'],
    'Influenza': ['influenza', 'flu', 'h1n1', 'h5n1', 'h3n2'],
    'Ebola': ['ebola', 'ebolavirus', 'hemorrhagic fever'],
    'Dengue': ['dengue', 'dengue fever', 'breakbone fever'],
    'Cholera': ['cholera', 'vibrio cholerae'],
    'Measles': ['measles', 'rubeola'],
    'Malaria': ['malaria', 'plasmodium'],
    'Mpox': ['mpox', 'monkeypox']
}

# Main surveillance pipeline
print("Fetching outbreak reports from multiple sources...")

# Scrape data
who_data = scrape_who_don()
promed_data = scrape_promed_via_rss()

# Combine sources
all_reports = pd.concat([who_data, promed_data], ignore_index=True)

# Extract diseases
all_reports['diseases'] = all_reports.apply(
    lambda row: extract_disease_mentions(
        str(row.get('title', '')) + ' ' + str(row.get('summary', '')),
        DISEASE_KEYWORDS
    ),
    axis=1
)

# Filter to reports with disease mentions
outbreak_reports = all_reports[all_reports['diseases'].apply(len) > 0].copy()

print(f"\nFound {len(outbreak_reports)} outbreak reports")
print("\nRecent Outbreaks:")
print(outbreak_reports[['date', 'title', 'diseases', 'source']].head(10))

# Alert generation logic
def generate_alerts(reports, alert_diseases=['COVID-19', 'Ebola', 'Cholera']):
    """
    Generate alerts for high-priority diseases
    """
    alerts = []
    
    for _, report in reports.iterrows():
        detected_priority = [d for d in report['diseases'] if d in alert_diseases]
        
        if detected_priority:
            alerts.append({
                'alert_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                'disease': ', '.join(detected_priority),
                'source': report['source'],
                'title': report['title'],
                'url': report['url'],
                'priority': 'HIGH'
            })
    
    return pd.DataFrame(alerts)

# Generate alerts
alerts = generate_alerts(outbreak_reports)

if len(alerts) > 0:
    print(f"\n🚨 {len(alerts)} HIGH PRIORITY ALERTS:")
    print(alerts[['alert_time', 'disease', 'title']].to_string(index=False))
else:
    print("\n✓ No high-priority alerts at this time")

# Save results
outbreak_reports.to_csv('outbreak_surveillance_feed.csv', index=False)
alerts.to_csv('outbreak_alerts.csv', index=False)

print("\n✓ Results saved to outbreak_surveillance_feed.csv and outbreak_alerts.csv")

Production-Ready Web Scraping

This example is educational. For production surveillance:

Respect robots.txt and terms of service
Implement rate limiting (don’t hammer servers)
Use RSS feeds when available (ProMED, WHO, ECDC all provide them)
Use proper NLP models (BioBERT, SciBERT) for disease/location extraction
Store historical data for trend analysis
Implement verification workflow (don’t auto-publish alerts)
Monitor for source changes (websites update structure frequently)

For a robust open-source solution, see EIOS (WHO’s Epidemic Intelligence from Open Sources platform).

6.4 Social Media Surveillance: Lessons from Google Flu Trends

Social media promised revolutionary disease surveillance. The reality has been… complicated.

6.4.1 The Google Flu Trends Story

2008: The Promise

Google researchers published a landmark paper in Nature (Ginsberg et al. 2009) showing that search query patterns could track influenza activity in near real-time.

The method: - Identify 45 search terms correlated with CDC ILINet data - Aggregate searches by region - Use linear model to predict current ILI levels

The results: - 97% correlation with CDC data - 1-2 weeks ahead of traditional surveillance - Updated daily (vs. weekly CDC reports)

The excitement: “Big data” + Machine learning = Real-time disease tracking!

Media proclaimed: “The end of traditional surveillance!”

2013: The Fall

During the 2012-2013 flu season, Google Flu Trends (GFT) massively overestimated influenza prevalence—predicting almost double the actual CDC-reported levels.

The post-mortem analysis (Lazer et al. 2014) identified multiple failures:

1. Algorithm Dynamics (Overfitting) - GFT used 50 million search terms → Selected 45 best correlates - With so many candidate predictors, spurious correlations were inevitable - Example: Searches for “high school basketball” correlated with flu season (both peak in winter) → Algorithm included it

2. Search Behavior Changes - Media coverage of flu → People searched more → Inflated estimates - Google’s search algorithm updates changed which terms appeared - Auto-complete suggestions biased searches

3. No Mechanism, Only Correlation - GFT had no epidemiological model—purely data-driven - When patterns changed (e.g., H1N1 pandemic), algorithm failed - As Lazer et al. wrote: “Big data hubris”—assumption that big data alone, without theory, is sufficient

4. Closed System, No Transparency - Google didn’t reveal which search terms were used - No independent validation possible - When it failed, couldn’t diagnose why

The Fundamental Lessons

Google Flu Trends teaches us critical principles for public health AI:

Correlation ≠ Causation (especially with big data)
Systems change over time (non-stationarity kills prediction)
Transparency matters (black boxes can’t be debugged)
Theory + Data beats Data alone (epidemiological mechanisms matter)
Validation must be ongoing (performance degrades)
Don’t replace traditional surveillance (use AI as complement)

For detailed analysis, see Lazer et al., 2014 (Lazer et al. 2014) and Butler, 2013. For a reassessment of GFT performance, see Olson et al., 2013 (Olson et al. 2013).

6.4.2 Modern Search-Based Surveillance: ARGO

Learning from GFT’s failure, researchers developed ARGO (AutoRegression with Google search data).

Key improvements: - Combines Google Trends data with CDC ILINet (not replacing it) - Uses time series methods (ARIMA) with epidemiological constraints - Regularly recalibrates as patterns change - Transparent (published algorithm, open validation)

Performance: - ~30% improvement over CDC ILINet alone for nowcasting - Useful for filling reporting gaps (e.g., estimating current week before CDC data arrives) - Robust to algorithm changes (because it adapts)

Code example: Simple nowcasting with search trends

Hide code

import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt

# Simulate weekly ILI data + search trends
np.random.seed(42)
weeks = pd.date_range('2020-01-01', periods=150, freq='W')

# True ILI (with 2-week reporting lag)
week_num = np.arange(len(weeks))
seasonal = 2.5 + 2.0 * np.sin(2 * np.pi * week_num / 52)
ili_true = seasonal + np.random.normal(0, 0.3, len(weeks))

# Search trends (leading indicator - correlated but noisy)
search_trends = ili_true + np.random.normal(0, 0.5, len(weeks))
search_trends = np.roll(search_trends, -1)  # Searches lead by 1 week

# Reported ILI (with 2-week delay)
ili_reported = np.concatenate([
    [np.nan, np.nan],  # First 2 weeks not yet reported
    ili_true[:-2]       # Everything else delayed 2 weeks
])

# Create DataFrame
data = pd.DataFrame({
    'week': weeks,
    'ili_true': ili_true,
    'ili_reported': ili_reported,
    'search_trends': search_trends
})

# Nowcasting: Predict current ILI using search trends + historical ILI
train_weeks = 100
test_weeks = len(weeks) - train_weeks

predictions_baseline = []
predictions_with_search = []

for i in range(train_weeks, len(weeks)):
    # Historical data up to this point
    train_data = data.iloc[:i]
    
    # Baseline: Use only reported ILI (ARIMA model)
    ili_reported_clean = train_data['ili_reported'].dropna()
    
    if len(ili_reported_clean) > 10:
        try:
            model_baseline = ARIMA(ili_reported_clean, order=(2,0,1))
            fit_baseline = model_baseline.fit()
            pred_baseline = fit_baseline.forecast(steps=1)[0]
        except:
            pred_baseline = ili_reported_clean.iloc[-1]
    else:
        pred_baseline = np.nan
    
    predictions_baseline.append(pred_baseline)
    
    # With search: Adjust prediction using current search trends
    current_search = train_data['search_trends'].iloc[-1]
    recent_search_avg = train_data['search_trends'].iloc[-4:].mean()
    
    # Simple adjustment: if search trends elevated, adjust upward
    search_signal = current_search - recent_search_avg
    pred_with_search = pred_baseline + 0.3 * search_signal  # 0.3 is learned weight
    
    predictions_with_search.append(pred_with_search)

# Add predictions to dataframe
data.loc[train_weeks:, 'pred_baseline'] = predictions_baseline
data.loc[train_weeks:, 'pred_with_search'] = predictions_with_search

# Evaluate
from sklearn.metrics import mean_absolute_error, mean_squared_error

test_data = data.iloc[train_weeks:]

mae_baseline = mean_absolute_error(test_data['ili_true'], test_data['pred_baseline'])
mae_search = mean_absolute_error(test_data['ili_true'], test_data['pred_with_search'])

rmse_baseline = np.sqrt(mean_squared_error(test_data['ili_true'], test_data['pred_baseline']))
rmse_search = np.sqrt(mean_squared_error(test_data['ili_true'], test_data['pred_with_search']))

print("Nowcasting Performance:")
print(f"Baseline (ILI only) MAE: {mae_baseline:.3f}, RMSE: {rmse_baseline:.3f}")
print(f"With Search Trends MAE: {mae_search:.3f}, RMSE: {rmse_search:.3f}")
print(f"Improvement: {(1 - mae_search/mae_baseline)*100:.1f}% reduction in error")

# Visualize
fig, ax = plt.subplots(figsize=(14, 7))

ax.plot(data['week'], data['ili_true'], 'o-', label='True ILI', 
        color='black', linewidth=2)
ax.plot(data['week'], data['ili_reported'], 's--', label='Reported ILI (2-week delay)',
        color='gray', alpha=0.7)
ax.plot(data['week'], data['pred_baseline'], 'x-', label='Nowcast (baseline)',
        color='blue', alpha=0.7)
ax.plot(data['week'], data['pred_with_search'], '^-', label='Nowcast (with search)',
        color='red', alpha=0.7)

ax.axvline(weeks[train_weeks], color='green', linestyle='--', linewidth=2,
           label='Train/Test Split')

ax.set_xlabel('Week')
ax.set_ylabel('ILI Percentage')
ax.set_title('Nowcasting ILI with Search Trends (ARGO-style)')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('ili_nowcasting_with_search.png', dpi=300)
plt.show()

6.4.3 Twitter/X for Disease Surveillance

Social media offers real-time, high-volume data about health concerns. But it’s noisy, biased, and privacy-sensitive.

Approaches:

1. Keyword-based tracking - Count mentions of “flu”, “fever”, “cough” - Pros: Simple, fast - Cons: Lots of false positives (“I’m sick of this traffic!”)

2. Sentiment analysis - Classify tweets as genuine health concerns vs. casual mentions - Paul et al., 2014 showed reasonable correlation with CDC ILINet

3. Bot detection and filtering - Many “health” tweets are from bots or automated accounts - Must filter to genuine user posts

Challenges:

❌ Selection bias: Twitter users ≠ general population (younger, urban, higher income)
❌ Privacy concerns: Even aggregated health data can reveal sensitive information
❌ Platform changes: API access, data policies constantly evolving
❌ Spam and manipulation: Bots, coordinated campaigns distort signal
❌ Language and cultural variation: Health expressions vary widely

Privacy and Ethics in Social Media Surveillance

Using social media for health surveillance raises serious concerns:

Consent: Users don’t expect health posts to be used for surveillance
Re-identification risk: Aggregated data can sometimes be de-anonymized
Stigma: Mental health, HIV/AIDS mentions could be sensitive
Equity: Surveillance focused on social media users misses vulnerable populations

Best practices: - Aggregate data (never analyze individual accounts) - Remove identifying information - Obtain IRB approval for research use - Be transparent about surveillance activities - Consider community engagement

See Vayena et al., 2015, PLoS Medicine for ethical framework.

6.5 Machine Learning for Anomaly Detection

Moving beyond simple thresholds, modern surveillance uses time series analysis and machine learning to detect outbreaks.

6.5.1 Time Series Forecasting with Prophet

Prophet, developed by Facebook (now Meta), is an open-source time series forecasting tool designed for business time series (which share features with epidemiological data):

Strong seasonal patterns (yearly, weekly cycles)
Holidays and special events
Piecewise trends with changepoints
Robustness to missing data

Why Prophet for public health: - Handles weekly seasonality (flu peaks in winter) - Automatically detects changepoints (outbreak starts/ends) - Provides uncertainty intervals (critical for decision-making) - Easy to use (minimal parameter tuning)

Complete example: Outbreak detection with Prophet

Hide code

import pandas as pd
import numpy as np
from prophet import Prophet
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Generate synthetic emergency department syndromic data
np.random.seed(42)
dates = pd.date_range('2022-01-01', '2024-12-31', freq='D')
n_days = len(dates)

# Baseline: seasonal pattern
day_of_year = dates.dayofyear
year = dates.year - dates.year.min()

# Seasonal component (influenza pattern)
seasonal = 50 + 30 * np.cos(2 * np.pi * (day_of_year - 15) / 365)

# Weekly pattern (lower on weekends)
day_of_week = dates.dayofweek
weekly = -10 * (day_of_week >= 5).astype(int)

# Trend
trend = 50 + 5 * year

# Noise
noise = np.random.normal(0, 5, n_days)

# Baseline counts
baseline_counts = seasonal + weekly + trend + noise
baseline_counts = np.maximum(baseline_counts, 0)

# Inject outbreak: September 2024
outbreak_start = (dates >= '2024-09-01') & (dates <= '2024-10-15')
outbreak_counts = baseline_counts.copy()
outbreak_counts[outbreak_start] += 40 + np.random.normal(0, 8, outbreak_start.sum())

# Create DataFrame
df = pd.DataFrame({
    'ds': dates,  # Prophet requires 'ds' for dates
    'y': outbreak_counts  # Prophet requires 'y' for values
})

# Split: train on pre-outbreak data
train_df = df[df['ds'] < '2024-09-01'].copy()
test_df = df[df['ds'] >= '2024-09-01'].copy()

# Fit Prophet model
model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False,
    changepoint_prior_scale=0.05,  # Controls flexibility of trend
    seasonality_prior_scale=10.0,   # Controls flexibility of seasonality
    interval_width=0.95             # 95% prediction intervals
)

print("Training Prophet model on historical data...")
model.fit(train_df)

# Make predictions for future (including outbreak period)
future = model.make_future_dataframe(periods=len(test_df), freq='D')
forecast = model.predict(future)

# Merge actual values with forecast
forecast = forecast.merge(df[['ds', 'y']], on='ds', how='left')

# Detect anomalies: actual value outside prediction interval
forecast['anomaly'] = (
    (forecast['y'] < forecast['yhat_lower']) | 
    (forecast['y'] > forecast['yhat_upper'])
)

forecast['anomaly_score'] = np.abs(forecast['y'] - forecast['yhat']) / (forecast['yhat_upper'] - forecast['yhat_lower'])

# Identify sustained anomalies (outbreak detection)
# Require 3+ consecutive days outside prediction interval
from scipy.ndimage import label

anomaly_regions, n_regions = label(forecast['anomaly'].fillna(False))

outbreak_detected = []
for region_id in range(1, n_regions + 1):
    region_mask = anomaly_regions == region_id
    region_length = region_mask.sum()
    
    if region_length >= 3:  # Sustained anomaly
        region_dates = forecast.loc[region_mask, 'ds']
        outbreak_detected.append({
            'start': region_dates.min(),
            'end': region_dates.max(),
            'duration': region_length,
            'mean_anomaly_score': forecast.loc[region_mask, 'anomaly_score'].mean()
        })

outbreak_detected_df = pd.DataFrame(outbreak_detected)

print("\n" + "="*60)
print("OUTBREAK DETECTION RESULTS")
print("="*60)

if len(outbreak_detected_df) > 0:
    print(f"\n🚨 {len(outbreak_detected_df)} potential outbreak(s) detected:")
    print(outbreak_detected_df.to_string(index=False))
    
    # Check if we detected the true outbreak
    true_outbreak_start = pd.Timestamp('2024-09-01')
    true_outbreak_end = pd.Timestamp('2024-10-15')
    
    for _, outbreak in outbreak_detected_df.iterrows():
        overlap_start = max(outbreak['start'], true_outbreak_start)
        overlap_end = min(outbreak['end'], true_outbreak_end)
        
        if overlap_start <= overlap_end:
            delay = (overlap_start - true_outbreak_start).days
            print(f"\n✓ True outbreak detected with {delay} day delay")
            break
else:
    print("\nNo sustained anomalies detected")

# Visualize
fig, axes = plt.subplots(3, 1, figsize=(16, 12))

# Top panel: Full time series with forecast
axes[0].scatter(forecast['ds'], forecast['y'], alpha=0.5, s=10, label='Actual Counts')
axes[0].plot(forecast['ds'], forecast['yhat'], 'b-', label='Prophet Forecast')
axes[0].fill_between(forecast['ds'], forecast['yhat_lower'], forecast['yhat_upper'],
                     alpha=0.3, color='blue', label='95% Prediction Interval')

# Mark detected outbreaks
for _, outbreak in outbreak_detected_df.iterrows():
    axes[0].axvspan(outbreak['start'], outbreak['end'], alpha=0.3, color='red')

axes[0].axvline(train_df['ds'].max(), color='green', linestyle='--', 
               linewidth=2, label='Train/Test Split')
axes[0].set_ylabel('Daily Counts')
axes[0].set_title('Prophet-Based Outbreak Detection')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Middle panel: Zoom into outbreak period
outbreak_period = forecast[forecast['ds'] >= '2024-08-01']
axes[1].scatter(outbreak_period['ds'], outbreak_period['y'], 
               alpha=0.7, s=20, label='Actual', color='black', zorder=3)
axes[1].plot(outbreak_period['ds'], outbreak_period['yhat'], 
            'b-', linewidth=2, label='Expected')
axes[1].fill_between(outbreak_period['ds'], 
                     outbreak_period['yhat_lower'], 
                     outbreak_period['yhat_upper'],
                     alpha=0.3, color='blue', label='95% PI')

# Highlight anomalies
anomalies = outbreak_period[outbreak_period['anomaly'] == True]
axes[1].scatter(anomalies['ds'], anomalies['y'], 
               color='red', s=100, marker='X', label='Anomalies', zorder=4)

axes[1].set_ylabel('Daily Counts')
axes[1].set_title('Outbreak Period (Zoomed)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Bottom panel: Anomaly scores
axes[2].plot(forecast['ds'], forecast['anomaly_score'], 
            'o-', alpha=0.5, label='Anomaly Score')
axes[2].axhline(1.0, color='red', linestyle='--', label='Alert Threshold')
axes[2].set_ylabel('Anomaly Score')
axes[2].set_xlabel('Date')
axes[2].set_title('Anomaly Scores Over Time')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('prophet_outbreak_detection.png', dpi=300)
plt.show()

# Compare to EARS C3
from scipy import stats

def ears_c3_rolling(counts, window=7):
    """Simple EARS C3 for comparison"""
    alerts = []
    for i in range(window + 2, len(counts)):
        baseline = counts[i-window-2:i-2]
        threshold = baseline.mean() + 3 * baseline.std()
        alerts.append(counts[i] > threshold)
    
    return [False] * (window + 2) + alerts

forecast['ears_alert'] = ears_c3_rolling(forecast['y'].values)

print("\nComparison: Prophet vs. EARS C3")
print(f"Prophet anomalies: {forecast['anomaly'].sum()}")
print(f"EARS C3 alerts: {sum(forecast['ears_alert'])}")
print(f"Overlap: {(forecast['anomaly'] & forecast['ears_alert']).sum()} days")

When to Use Prophet

Good for: ✓ Daily/weekly syndromic surveillance data
✓ Data with strong seasonality (flu, gastroenteritis)
✓ Need for uncertainty quantification
✓ Quick implementation with minimal tuning
✓ Multiple time series (can fit separate models per region)

Less suitable for: ✗ Hourly or minute-level data (use LSTM or ARIMA)
✗ Very short time series (<1 year)
✗ Outbreak forecasting (predicting future trajectory)—Prophet is for detecting current anomalies

For alternatives, see statsmodels for ARIMA/SARIMAX, or GluonTS for deep learning approaches.

6.6 Spatial-Temporal Cluster Detection

Diseases don’t just change over time—they cluster in space. Where an outbreak is happening matters as much as when.

6.6.1 SaTScan: The Gold Standard

SaTScan (Spatial, Temporal, or Space-Time Scan Statistic), developed by Martin Kulldorff, is the most widely used spatial cluster detection tool in public health.

How it works:

Create a scanning window: Circle of varying radius moves across map
For each location and radius: Count cases inside vs. outside circle
Test hypothesis: Are there more cases than expected by chance?
Statistical significance: Use Monte Carlo simulation (permutation test)
Most likely cluster: Location/radius with lowest p-value

Example: Detecting cholera clusters

Hide code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist
from scipy.stats import poisson
import geopandas as gpd
from shapely.geometry import Point, Polygon
import seaborn as sns

# Simulate case data with spatial cluster
np.random.seed(42)

# Background cases (uniformly distributed)
n_background = 200
bg_x = np.random.uniform(0, 100, n_background)
bg_y = np.random.uniform(0, 100, n_background)

# Cluster cases (concentrated in one area)
n_cluster = 100
cluster_center = [30, 70]
cluster_x = np.random.normal(cluster_center[0], 5, n_cluster)
cluster_y = np.random.normal(cluster_center[1], 5, n_cluster)

# Population at risk (grid of census tracts)
n_grid = 20
grid_x = np.linspace(5, 95, n_grid)
grid_y = np.linspace(5, 95, n_grid)
grid_xx, grid_yy = np.meshgrid(grid_x, grid_y)
pop_locations = np.column_stack([grid_xx.ravel(), grid_yy.ravel()])

# Population sizes (roughly uniform, with random variation)
pop_sizes = np.random.poisson(500, len(pop_locations))

# Combine all cases
all_x = np.concatenate([bg_x, cluster_x])
all_y = np.concatenate([bg_y, cluster_y])

cases_df = pd.DataFrame({
    'x': all_x,
    'y': all_y,
    'case': 1
})

pop_df = pd.DataFrame({
    'x': pop_locations[:, 0],
    'y': pop_locations[:, 1],
    'population': pop_sizes
})

print(f"Total cases: {len(cases_df)}")
print(f"Population grid cells: {len(pop_df)}")
print(f"Total population: {pop_df['population'].sum():,}")

# Spatial scan statistic (simplified version)
def spatial_scan_statistic(cases, population, max_radius=20, n_simulations=999):
    """
    Simplified spatial scan statistic (Kulldorff method)
    
    Parameters:
    - cases: DataFrame with x, y coordinates
    - population: DataFrame with x, y, population
    - max_radius: maximum radius to scan
    - n_simulations: Monte Carlo simulations for p-value
    
    Returns:
    - best_cluster: dict with cluster info
    - all_clusters: list of all tested clusters
    """
    
    case_coords = cases[['x', 'y']].values
    pop_coords = population[['x', 'y']].values
    pop_counts = population['population'].values
    
    total_cases = len(case_coords)
    total_pop = pop_counts.sum()
    
    best_llr = -np.inf
    best_cluster = None
    all_clusters = []
    
    # Scan all possible center points and radii
    for i, center in enumerate(pop_coords):
        # Calculate distances from this center to all population points
        distances = np.sqrt(np.sum((pop_coords - center)**2, axis=1))
        
        # Try different radii
        unique_distances = np.sort(np.unique(distances))
        radii_to_try = unique_distances[unique_distances <= max_radius]
        
        for radius in radii_to_try:
            # Cases within this circle
            case_distances = np.sqrt(np.sum((case_coords - center)**2, axis=1))
            cases_inside = (case_distances <= radius).sum()
            
            # Population within this circle
            pop_inside = pop_counts[distances <= radius].sum()
            
            if pop_inside == 0 or pop_inside == total_pop:
                continue
            
            # Expected cases (under null hypothesis of uniform risk)
            expected_inside = total_cases * (pop_inside / total_pop)
            
            # Log likelihood ratio
            if cases_inside > expected_inside:
                cases_outside = total_cases - cases_inside
                pop_outside = total_pop - pop_inside
                expected_outside = total_cases - expected_inside
                
                # Poisson-based likelihood ratio
                llr = (cases_inside * np.log(cases_inside / expected_inside) +
                       cases_outside * np.log(cases_outside / expected_outside))
                
                cluster_info = {
                    'center_x': center[0],
                    'center_y': center[1],
                    'radius': radius,
                    'cases_inside': cases_inside,
                    'pop_inside': pop_inside,
                    'expected_cases': expected_inside,
                    'relative_risk': cases_inside / expected_inside,
                    'llr': llr
                }
                
                all_clusters.append(cluster_info)
                
                if llr > best_llr:
                    best_llr = llr
                    best_cluster = cluster_info
    
    # Monte Carlo simulation for p-value
    print(f"\nRunning {n_simulations} Monte Carlo simulations...")
    
    simulated_llrs = []
    for sim in range(n_simulations):
        # Randomly assign cases to population locations
        random_assignment = np.random.choice(len(pop_coords), size=total_cases, 
                                            replace=True, p=pop_counts/total_pop)
        sim_case_coords = pop_coords[random_assignment]
        
        # Find best LLR for this random data
        sim_best_llr = -np.inf
        
        for center in pop_coords[::10]:  # Sample centers for speed
            distances = np.sqrt(np.sum((pop_coords - center)**2, axis=1))
            radii_to_try = unique_distances[unique_distances <= max_radius][::5]
            
            for radius in radii_to_try:
                case_distances = np.sqrt(np.sum((sim_case_coords - center)**2, axis=1))
                cases_inside = (case_distances <= radius).sum()
                pop_inside = pop_counts[distances <= radius].sum()
                
                if pop_inside == 0 or pop_inside == total_pop:
                    continue
                
                expected_inside = total_cases * (pop_inside / total_pop)
                
                if cases_inside > expected_inside:
                    cases_outside = total_cases - cases_inside
                    pop_outside = total_pop - pop_inside
                    expected_outside = total_cases - expected_inside
                    
                    llr = (cases_inside * np.log(cases_inside / expected_inside) +
                           cases_outside * np.log(cases_outside / expected_outside))
                    
                    if llr > sim_best_llr:
                        sim_best_llr = llr
        
        simulated_llrs.append(sim_best_llr)
        
        if (sim + 1) % 100 == 0:
            print(f"  Completed {sim + 1}/{n_simulations} simulations")
    
    # P-value: proportion of simulations with LLR >= observed
    p_value = (np.array(simulated_llrs) >= best_llr).sum() / n_simulations
    best_cluster['p_value'] = p_value
    
    return best_cluster, all_clusters

# Run spatial scan
print("Running spatial scan statistic...")
best_cluster, all_clusters = spatial_scan_statistic(
    cases_df, pop_df, max_radius=20, n_simulations=199
)

print("\n" + "="*60)
print("CLUSTER DETECTION RESULTS")
print("="*60)
print(f"\nMost Likely Cluster:")
print(f"  Center: ({best_cluster['center_x']:.1f}, {best_cluster['center_y']:.1f})")
print(f"  Radius: {best_cluster['radius']:.1f}")
print(f"  Cases Observed: {best_cluster['cases_inside']}")
print(f"  Cases Expected: {best_cluster['expected_cases']:.1f}")
print(f"  Relative Risk: {best_cluster['relative_risk']:.2f}")
print(f"  P-value: {best_cluster['p_value']:.4f}")

if best_cluster['p_value'] < 0.05:
    print(f"\n✓ Statistically significant cluster detected (p < 0.05)")
else:
    print(f"\n  Not statistically significant (p >= 0.05)")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(16, 7))

# Left panel: Case locations and detected cluster
axes[0].scatter(cases_df['x'], cases_df['y'], alpha=0.5, s=20, 
               color='blue', label='Cases')
axes[0].scatter(pop_df['x'], pop_df['y'], alpha=0.3, s=pop_df['population']/10, 
               color='gray', label='Population (size = pop)')

# Draw detected cluster circle
circle = plt.Circle((best_cluster['center_x'], best_cluster['center_y']),
                    best_cluster['radius'], color='red', fill=False, 
                    linewidth=3, label='Detected Cluster')
axes[0].add_patch(circle)

axes[0].plot(best_cluster['center_x'], best_cluster['center_y'], 'r*', 
            markersize=20, label='Cluster Center')

axes[0].set_xlim(0, 100)
axes[0].set_ylim(0, 100)
axes[0].set_xlabel('X Coordinate')
axes[0].set_ylabel('Y Coordinate')
axes[0].set_title('Spatial Cluster Detection (SaTScan-style)')
axes[0].legend()
axes[0].set_aspect('equal')

# Right panel: LLR heatmap
# Create grid of LLR values
clusters_df = pd.DataFrame(all_clusters)

# Aggregate by center location (take max LLR for each location)
pivot_data = clusters_df.groupby(['center_x', 'center_y'])['llr'].max().reset_index()

# Create heatmap
from scipy.interpolate import griddata
xi = np.linspace(0, 100, 50)
yi = np.linspace(0, 100, 50)
xi, yi = np.meshgrid(xi, yi)

zi = griddata((pivot_data['center_x'], pivot_data['center_y']), 
              pivot_data['llr'], (xi, yi), method='cubic')

im = axes[1].contourf(xi, yi, zi, levels=20, cmap='YlOrRd')
axes[1].scatter(cases_df['x'], cases_df['y'], alpha=0.3, s=10, color='blue')
axes[1].plot(best_cluster['center_x'], best_cluster['center_y'], 'r*', 
            markersize=20)

axes[1].set_xlabel('X Coordinate')
axes[1].set_ylabel('Y Coordinate')
axes[1].set_title('Log Likelihood Ratio Heatmap')
plt.colorbar(im, ax=axes[1], label='LLR')

plt.tight_layout()
plt.savefig('spatial_cluster_detection.png', dpi=300)
plt.show()

Real-World SaTScan Usage

The code above is simplified for education. For production analysis:

Use the real SaTScan software (download free)
Consider space-time scan statistics (not just spatial)
Account for covariates (age, socioeconomic factors)
Use proper case/control data structures
Adjust for multiple testing (many clusters tested)

For Python integration, see pySaTScan wrapper or satscan Python package.

For academic foundation, see Kulldorff, 1997, Communications in Statistics.

6.7 Integration and Triangulation

Real-world surveillance combines multiple data streams, each with different biases and timeliness.

6.7.1 The COVID-19 Surveillance Ecosystem

During COVID-19, public health agencies tracked:

Case-based surveillance (reported cases)
- Bias: Testing availability
- Timeliness: 3-7 day lag
Hospitalizations (COVID-NET)
- Bias: Severe cases only
- Timeliness: ~1 week lag
Deaths (NCHS)
- Bias: Most severe outcomes
- Timeliness: 2-3 week lag
Test positivity (% positive tests)
- Bias: Testing strategy changes
- Timeliness: Real-time to 3 days
Wastewater surveillance (viral RNA)
- Bias: Sewershed coverage
- Timeliness: Near real-time
Genomic surveillance (variant tracking)
- Bias: Sequencing capacity
- Timeliness: 1-2 week lag

The challenge: These often contradicted each other.

Example from Omicron wave (Dec 2021): - Cases ↗️ (skyrocketing) - Test positivity ↗️ (very high) - Hospitalizations → (stable initially) - Wastewater ↗️ (high viral load) - Deaths → (lagging indicator)

Interpretation: - Rapid spread (cases, test positivity, wastewater agree) - Lower severity or immune escape (hospitalization lag suggests different pattern) - Need to monitor hospitalizations closely

How to combine signals:

Hide code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Simulate multi-source surveillance data
np.random.seed(42)
dates = pd.date_range('2021-11-01', '2022-02-28', freq='D')
n_days = len(dates)

# True underlying epidemic curve
day_num = np.arange(n_days)
epidemic_curve = 1000 * np.exp(-((day_num - 60)**2) / 400)

# Each data source observes this with different bias/lag
sources = {}

# Cases: 3-day lag, undercount by 50%
sources['cases'] = np.roll(epidemic_curve * 0.5, 3) + np.random.normal(0, 50, n_days)

# Test positivity: 1-day lag, scaled 0-100%
sources['test_positivity'] = np.roll(epidemic_curve / epidemic_curve.max() * 30, 1) + np.random.normal(0, 2, n_days)

# Hospitalizations: 7-day lag, 5% of cases
sources['hospitalizations'] = np.roll(epidemic_curve * 0.05, 7) + np.random.normal(0, 10, n_days)

# Wastewater: no lag, noisy but unbiased
sources['wastewater'] = epidemic_curve + np.random.normal(0, 100, n_days)

# Deaths: 14-day lag, 1% of cases
sources['deaths'] = np.roll(epidemic_curve * 0.01, 14) + np.random.normal(0, 2, n_days)

# Create DataFrame
df = pd.DataFrame({'date': dates, 'true_epidemic': epidemic_curve})
for source_name, values in sources.items():
    df[source_name] = np.maximum(values, 0)  # No negative values

# Normalize each source (z-scores)
scaler = StandardScaler()
normalized_cols = []

for col in sources.keys():
    normalized_col = f'{col}_normalized'
    df[normalized_col] = scaler.fit_transform(df[[col]])
    normalized_cols.append(normalized_col)

# Ensemble prediction: weighted average of normalized sources
# Weights based on reliability/timeliness
weights = {
    'cases_normalized': 0.25,
    'test_positivity_normalized': 0.20,
    'hospitalizations_normalized': 0.15,
    'wastewater_normalized': 0.30,  # Most weight (real-time, unbiased)
    'deaths_normalized': 0.10  # Least weight (lagging)
}

df['ensemble_signal'] = sum(df[col] * weight for col, weight in weights.items())

# Detect outbreak onset (when ensemble crosses threshold)
threshold = 0.5  # 0.5 SD above mean
df['alert'] = df['ensemble_signal'] > threshold

# Find first alert
first_alert_idx = df[df['alert']].index.min() if df['alert'].any() else None

# True outbreak onset (when true epidemic > threshold)
true_threshold = epidemic_curve.max() * 0.2
df['true_outbreak'] = df['true_epidemic'] > true_threshold
true_onset_idx = df[df['true_outbreak']].index.min()

# Visualize
fig, axes = plt.subplots(3, 1, figsize=(14, 12))

# Top: Individual data sources (raw)
for source in sources.keys():
    axes[0].plot(df['date'], df[source], alpha=0.7, label=source.replace('_', ' ').title())

axes[0].set_ylabel('Counts (varied scales)')
axes[0].set_title('Multiple Surveillance Data Sources')
axes[0].legend(loc='upper right')
axes[0].grid(True, alpha=0.3)

# Middle: Normalized sources
for col in normalized_cols:
    axes[1].plot(df['date'], df[col], alpha=0.7, 
                label=col.replace('_normalized', '').replace('_', ' ').title())

axes[1].axhline(threshold, color='red', linestyle='--', linewidth=2, label='Alert Threshold')
axes[1].set_ylabel('Normalized Values (Z-score)')
axes[1].set_title('Normalized Surveillance Signals')
axes[1].legend(loc='upper right')
axes[1].grid(True, alpha=0.3)

# Bottom: Ensemble signal
axes[2].plot(df['date'], df['ensemble_signal'], 'b-', linewidth=2, label='Ensemble Signal')
axes[2].axhline(threshold, color='red', linestyle='--', linewidth=2, label='Alert Threshold')

# Mark alert period
alert_periods = df[df['alert']]
if len(alert_periods) > 0:
    axes[2].fill_between(alert_periods['date'], -2, 3, alpha=0.3, color='red', label='Alert Active')

# Mark true outbreak onset
if first_alert_idx is not None and true_onset_idx is not None:
    axes[2].axvline(df.loc[true_onset_idx, 'date'], color='green', linestyle='--', 
                   linewidth=2, label='True Outbreak Onset')
    axes[2].axvline(df.loc[first_alert_idx, 'date'], color='orange', linestyle='--',
                   linewidth=2, label='Detected Onset')
    
    time_to_detect = (df.loc[first_alert_idx, 'date'] - df.loc[true_onset_idx, 'date']).days
    axes[2].text(0.02, 0.98, f'Time to Detection: {time_to_detect} days',
                transform=axes[2].transAxes, fontsize=12, verticalalignment='top',
                bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

axes[2].set_xlabel('Date')
axes[2].set_ylabel('Ensemble Signal (Z-score)')
axes[2].set_title('Multi-Source Ensemble Surveillance')
axes[2].legend(loc='upper right')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('multisource_surveillance_ensemble.png', dpi=300)
plt.show()

print("\n" + "="*60)
print("MULTI-SOURCE SURVEILLANCE EVALUATION")
print("="*60)

if first_alert_idx is not None and true_onset_idx is not None:
    print(f"\nTrue outbreak onset: {df.loc[true_onset_idx, 'date'].date()}")
    print(f"Detected onset: {df.loc[first_alert_idx, 'date'].date()}")
    print(f"Time to detection: {time_to_detect} days")
    
    if time_to_detect < 0:
        print("⚠️ False alarm (detected before true onset)")
    elif time_to_detect == 0:
        print("✓ Perfect detection (same day as true onset)")
    else:
        print(f"✓ Detected {time_to_detect} days after true onset")
    
    # Compare to individual sources
    print("\nComparison to individual sources:")
    for source in sources.keys():
        source_normalized = f'{source}_normalized'
        source_alerts = df[source_normalized] > threshold
        if source_alerts.any():
            source_first_alert = df[source_alerts].index.min()
            source_delay = (df.loc[source_first_alert, 'date'] - df.loc[true_onset_idx, 'date']).days
            print(f"  {source}: {source_delay} days")
        else:
            print(f"  {source}: No alert")
    
    print(f"\nEnsemble: {time_to_detect} days (best or tied for best)")

Best Practices for Multi-Source Integration

Understand each source’s biases (Chapter 3 concepts apply)
Weight sources by reliability and timeliness
Don’t average conflicting signals blindly—investigate discrepancies
Use ensemble for early warning, verify with traditional surveillance
Update weights as surveillance systems evolve
Communicate uncertainty—show which sources agree/disagree

For rigorous Bayesian data fusion, see Salmon et al., 2015, Statistical Modelling.

6.8 Evaluation: How Good Is Your Surveillance System?

6.8.1 Metrics That Matter

Timeliness: - Time-to-detection: Days from outbreak onset to first alert - Trade-off: Earlier detection → more false positives

Sensitivity: - Outbreak detection rate: % of true outbreaks detected - Problem: Defining “true outbreak” is hard (no ground truth)

Specificity: - False alarm rate: % of time periods with false alerts - Critical: Too many false alarms → alert fatigue

Positive Predictive Value (PPV): - % of alerts that are true outbreaks - The base rate problem: Even sensitive+specific systems have low PPV for rare events

Example calculation:

Hide code

# Surveillance system performance
sensitivity = 0.90  # Detects 90% of outbreaks
specificity = 0.95  # 5% false positive rate

# Base rate: Outbreaks are rare (1% of weeks)
prevalence = 0.01

# Positive Predictive Value (Bayes' Theorem)
ppv = (sensitivity * prevalence) / (
    sensitivity * prevalence + (1 - specificity) * (1 - prevalence)
)

print(f"Sensitivity: {sensitivity:.0%}")
print(f"Specificity: {specificity:.0%}")
print(f"Outbreak prevalence: {prevalence:.1%}")
print(f"\nPositive Predictive Value: {ppv:.1%}")
print(f"\nInterpretation: When this system alerts, there's only a {ppv:.0%} chance it's a true outbreak!")

Output:

Sensitivity: 90%
Specificity: 95%
Outbreak prevalence: 1.0%

Positive Predictive Value: 15%

Interpretation: When this system alerts, there's only a 15% chance it's a true outbreak!

The Base Rate Problem

Even excellent surveillance systems (90% sens, 95% spec) have low PPV when outbreaks are rare.

Implications: 1. Every alert requires verification (can’t trust automated systems alone) 2. Context matters (is there a plausible mechanism?) 3. Multiple signals increase confidence (triangulation) 4. Thresholds should be adjustable (stricter during low-risk periods)

This is why human epidemiologists remain essential—algorithms can’t (yet) make these contextual judgments.

For comprehensive surveillance evaluation framework, see Buckeridge et al., 2007, JAMIA.

6.9 Implementation Challenges

Building surveillance systems is one thing. Sustaining them is another.

6.9.1 Data Access and Interoperability

The challenge: - Public health data is fragmented (federal, state, local, private) - Different formats, standards, and systems - Legal/privacy barriers (HIPAA, data use agreements)

Solutions: - HL7 FHIR standard for health data exchange - PHIN (Public Health Information Network) - Data use agreements between agencies - Privacy-preserving techniques (aggregation, differential privacy)

6.9.2 Infrastructure and Resources

Real-time surveillance requires: - Data pipelines (ingestion, cleaning, storage) - Computational resources (cloud or on-premise) - 24/7 monitoring (alerts don’t wait for business hours) - Maintenance and updates (systems degrade without care)

Cost considerations: - Open-source tools (cheaper) vs. commercial platforms (more support) - Cloud costs scale with data volume - Staff time for development and maintenance

6.9.3 Alert Fatigue

The problem: Too many false alarms → People stop paying attention

2009 study: Emergency departments receiving syndromic surveillance alerts ignored >70% of them due to alert fatigue.

Solutions: - Adjustable thresholds (stricter when outbreak risk is low) - Contextual alerts (include supporting evidence) - Multi-level alerts (watch vs. warning vs. emergency) - Clear workflows (what to do when alert fires) - Regular performance review (tune system based on feedback)

6.9.4 Equity and Access

The digital divide: - Internet-based surveillance works where internet access is good - Social media surveillance captures younger, urban, higher-income populations - Rural and underserved communities are surveillance deserts

Consequences: - Outbreaks in marginalized communities detected later - Resource allocation based on biased data - Health inequities reinforced

Mitigation: - Invest in traditional surveillance infrastructure - Community-based surveillance programs - Mobile health data collection - Don’t rely solely on digital sources

6.10 Ethics and Governance

6.10.1 Privacy in Digital Surveillance

The tension: Individual privacy vs. population health

Examples from COVID-19:

Contact tracing apps: - Singapore’s TraceTogether: Effective but controversial - UK’s NHS COVID-19 app: Privacy-preserving but lower uptake - US state apps: Varied adoption, privacy concerns

Mobility data: - Cell phone location tracking for compliance monitoring - Apple/Google mobility reports (aggregated)

Key principles: 1. Purpose limitation: Use data only for stated public health purpose 2. Data minimization: Collect only what’s necessary 3. Transparency: Be open about what data is collected and how it’s used 4. Time limits: Delete data when no longer needed 5. Security: Protect against breaches

Privacy-Preserving Surveillance

Techniques that enable surveillance without exposing individual data:

Differential privacy: - Add calibrated noise to aggregate statistics - Prevents re-identification from multiple queries - Used by Apple, Google, US Census Bureau

Federated learning: - Train models on decentralized data (stays on devices) - Only model updates (not data) shared centrally - See Google’s approach

Secure multi-party computation: - Multiple parties compute joint function without revealing inputs - Complex but enables cross-agency collaboration

We’ll explore these in depth in Chapter 10: Privacy, Security, and Governance.

6.11 Practical Guidance: Building Your First Surveillance Dashboard

Let’s create a complete, functional surveillance dashboard using open-source tools.

Hide code

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
from datetime import datetime, timedelta
from prophet import Prophet
import warnings
warnings.filterwarnings('ignore')

class SurveillanceDashboard:
    """
    Real-time surveillance dashboard with multiple data sources and anomaly detection
    """
    
    def __init__(self):
        self.data = None
        self.alerts = []
        self.prophet_model = None
    
    def load_data(self, csv_path=None):
        """Load surveillance data from CSV or generate synthetic"""
        if csv_path:
            self.data = pd.read_csv(csv_path, parse_dates=['date'])
        else:
            # Generate synthetic data
            dates = pd.date_range('2023-01-01', '2024-12-31', freq='D')
            n_days = len(dates)
            
            # Seasonal baseline
            day_of_year = dates.dayofyear
            seasonal = 50 + 30 * np.cos(2 * np.pi * (day_of_year - 15) / 365)
            
            # Add outbreak in Oct 2024
            baseline = seasonal + np.random.normal(0, 5, n_days)
            baseline = np.maximum(baseline, 0)
            
            outbreak_start = (dates >= '2024-10-01') & (dates <= '2024-11-15')
            baseline[outbreak_start] += 50 + np.random.normal(0, 10, outbreak_start.sum())
            
            self.data = pd.DataFrame({
                'date': dates,
                'syndromic_counts': baseline,
                'test_positivity': np.random.uniform(5, 15, n_days),
                'hospitalizations': baseline * 0.05 + np.random.normal(0, 2, n_days)
            })
    
    def detect_anomalies(self, column='syndromic_counts', method='prophet'):
        """Detect anomalies using Prophet"""
        
        # Prepare data for Prophet
        df_prophet = self.data[['date', column]].copy()
        df_prophet.columns = ['ds', 'y']
        
        # Split train/test
        train_df = df_prophet[df_prophet['ds'] < '2024-10-01']
        
        # Fit Prophet
        self.prophet_model = Prophet(
            yearly_seasonality=True,
            weekly_seasonality=True,
            changepoint_prior_scale=0.05,
            interval_width=0.95
        )
        
        self.prophet_model.fit(train_df)
        
        # Predict
        forecast = self.prophet_model.predict(df_prophet[['ds']])
        
        # Merge with actual
        self.data['expected'] = forecast['yhat'].values
        self.data['lower_bound'] = forecast['yhat_lower'].values
        self.data['upper_bound'] = forecast['yhat_upper'].values
        
        # Detect anomalies
        self.data['anomaly'] = (
            (self.data[column] < self.data['lower_bound']) |
            (self.data[column] > self.data['upper_bound'])
        )
        
        self.data['anomaly_score'] = np.abs(
            self.data[column] - self.data['expected']
        ) / (self.data['upper_bound'] - self.data['lower_bound'])
        
        # Generate alerts for sustained anomalies
        self._generate_alerts(column)
    
    def _generate_alerts(self, column, min_duration=3):
        """Generate alerts for sustained anomalies"""
        from scipy.ndimage import label
        
        anomaly_regions, n_regions = label(self.data['anomaly'].values)
        
        for region_id in range(1, n_regions + 1):
            region_mask = anomaly_regions == region_id
            region_length = region_mask.sum()
            
            if region_length >= min_duration:
                region_data = self.data[region_mask]
                
                self.alerts.append({
                    'start_date': region_data['date'].min(),
                    'end_date': region_data['date'].max(),
                    'duration_days': region_length,
                    'mean_anomaly_score': region_data['anomaly_score'].mean(),
                    'max_value': region_data[column].max(),
                    'priority': 'HIGH' if region_data['anomaly_score'].mean() > 2 else 'MEDIUM'
                })
    
    def create_dashboard(self):
        """Create interactive dashboard with Plotly"""
        
        # Create subplots
        fig = make_subplots(
            rows=3, cols=1,
            subplot_titles=(
                'Syndromic Surveillance with Anomaly Detection',
                'Test Positivity Rate',
                'Anomaly Scores Over Time'
            ),
            vertical_spacing=0.12,
            specs=[[{"secondary_y": False}],
                   [{"secondary_y": False}],
                   [{"secondary_y": False}]]
        )
        
        # Top panel: Syndromic data with forecast
        fig.add_trace(
            go.Scatter(
                x=self.data['date'],
                y=self.data['syndromic_counts'],
                mode='markers',
                name='Actual Counts',
                marker=dict(size=4, color='steelblue')
            ),
            row=1, col=1
        )
        
        fig.add_trace(
            go.Scatter(
                x=self.data['date'],
                y=self.data['expected'],
                mode='lines',
                name='Expected (Prophet)',
                line=dict(color='blue', width=2)
            ),
            row=1, col=1
        )
        
        # Prediction interval
        fig.add_trace(
            go.Scatter(
                x=self.data['date'],
                y=self.data['upper_bound'],
                mode='lines',
                line=dict(width=0),
                showlegend=False
            ),
            row=1, col=1
        )
        
        fig.add_trace(
            go.Scatter(
                x=self.data['date'],
                y=self.data['lower_bound'],
                mode='lines',
                fill='tonexty',
                fillcolor='rgba(0,100,200,0.2)',
                line=dict(width=0),
                name='95% Prediction Interval'
            ),
            row=1, col=1
        )
        
        # Highlight anomalies
        anomalies = self.data[self.data['anomaly']]
        fig.add_trace(
            go.Scatter(
                x=anomalies['date'],
                y=anomalies['syndromic_counts'],
                mode='markers',
                name='Anomalies',
                marker=dict(size=10, color='red', symbol='x')
            ),
            row=1, col=1
        )
        
        # Middle panel: Test positivity
        fig.add_trace(
            go.Scatter(
                x=self.data['date'],
                y=self.data['test_positivity'],
                mode='lines',
                name='Test Positivity %',
                line=dict(color='orange', width=2)
            ),
            row=2, col=1
        )
        
        # Bottom panel: Anomaly scores
        fig.add_trace(
            go.Scatter(
                x=self.data['date'],
                y=self.data['anomaly_score'],
                mode='lines',
                name='Anomaly Score',
                line=dict(color='purple', width=2)
            ),
            row=3, col=1
        )
        
        fig.add_hline(y=1.0, line_dash="dash", line_color="red", 
                     annotation_text="Alert Threshold",
                     row=3, col=1)
        
        # Update layout
        fig.update_xaxes(title_text="Date", row=3, col=1)
        fig.update_yaxes(title_text="Daily Counts", row=1, col=1)
        fig.update_yaxes(title_text="Percentage", row=2, col=1)
        fig.update_yaxes(title_text="Anomaly Score", row=3, col=1)
        
        fig.update_layout(
            height=900,
            title_text="Public Health Surveillance Dashboard",
            showlegend=True,
            hovermode='x unified'
        )
        
        return fig
    
    def generate_alert_report(self):
        """Generate alert report"""
        if len(self.alerts) == 0:
            return "✓ No alerts - surveillance within normal parameters"
        
        report = f"🚨 {len(self.alerts)} ALERT(S) DETECTED\n"
        report += "="*60 + "\n\n"
        
        for i, alert in enumerate(self.alerts, 1):
            report += f"Alert #{i}:\n"
            report += f"  Period: {alert['start_date'].date()} to {alert['end_date'].date()}\n"
            report += f"  Duration: {alert['duration_days']} days\n"
            report += f"  Priority: {alert['priority']}\n"
            report += f"  Peak Value: {alert['max_value']:.1f}\n"
            report += f"  Mean Anomaly Score: {alert['mean_anomaly_score']:.2f}\n"
            report += "\n"
        
        return report

# Run the dashboard
dashboard = SurveillanceDashboard()

print("Loading surveillance data...")
dashboard.load_data()

print("Running anomaly detection...")
dashboard.detect_anomalies()

print("\n" + dashboard.generate_alert_report())

print("Creating interactive dashboard...")
fig = dashboard.create_dashboard()
fig.write_html('surveillance_dashboard.html')
print("✓ Dashboard saved to: surveillance_dashboard.html")
print("  Open this file in your web browser to view the interactive dashboard")

# Also save as static image
fig.write_image('surveillance_dashboard.png', width=1400, height=900)
print("✓ Static image saved to: surveillance_dashboard.png")

6.12 Key Takeaways

AI augments, doesn’t replace traditional surveillance. The most effective systems combine both.
Every data source has biases. Understanding and accounting for these biases (from Chapter 3) is critical.
Early warning ≠ Accurate prediction. Systems like BlueDot and HealthMap provide early signals, but require human verification and contextual interpretation.
Learn from failures. Google Flu Trends teaches us that big data + machine learning without theory and transparency can fail spectacularly.
The base rate problem is real. Even excellent surveillance systems generate many false positives when outbreaks are rare.
Multi-source integration is the future. Combining traditional surveillance with digital signals provides the most robust early warning.
Privacy and equity must be built in from the start. Digital surveillance can reinforce existing health inequities if not carefully designed.
Evaluation is essential. Regularly assess your surveillance system’s performance using timeliness, sensitivity, specificity, and PPV.

6.13 Practice Exercises

6.13.1 Exercise 1: Implement EARS Algorithms

Build all three EARS algorithms (C1, C2, C3) and compare their performance on simulated outbreak data. Which is most sensitive? Which has the lowest false positive rate?

6.13.2 Exercise 2: Analyze Real ILINet Data

Download CDC ILINet data from FluView. Implement Prophet-based anomaly detection. How does it compare to CDC’s epidemic threshold?

6.13.3 Exercise 3: Build a Multi-Source Surveillance System

Combine three data sources (e.g., syndromic, social media, wastewater) with different lags and biases. Implement an ensemble approach. How much does it improve early detection compared to any single source?

6.13.4 Exercise 4: Evaluate Surveillance Performance

Given historical outbreak data, calculate sensitivity, specificity, PPV, and time-to-detection for your surveillance system. How do these metrics trade off against each other?

Check Your Understanding

Test your knowledge of the key concepts from this chapter. Click “Show Answer” to reveal the correct response and explanation.

Question 1: Surveillance System Selection

A rural health department needs to detect seasonal flu outbreaks. They have limited resources and want timely alerts. Which surveillance approach is MOST appropriate?

Syndromic surveillance using over-the-counter medication sales
Laboratory-confirmed case reporting only
Sentinel provider networks with weekly reporting
Social media monitoring for flu-related posts

Answer: a) Syndromic surveillance using over-the-counter medication sales

Explanation: Syndromic surveillance is ideal for resource-limited settings requiring timely detection. OTC medication sales provide:

Early warning: People buy cold/flu medications before seeking healthcare
Real-time data: Automated from pharmacy systems
Low cost: No lab testing required
Good sensitivity: Captures mild cases that don’t seek healthcare

Laboratory confirmation (b) is too slow and misses mild cases. Sentinel networks (c) have weekly delays. Social media (d) requires substantial NLP infrastructure and has high false-positive rates.

Question 2: EARS Algorithm

True or False: The EARS C3 algorithm flags an outbreak when today’s case count exceeds the mean of the previous 7 days by 3 standard deviations.

Answer: False

Explanation: EARS C3 is more sophisticated than this. It uses a moving baseline that excludes the most recent 2 days (to avoid contamination from the outbreak you’re trying to detect) and calculates the mean and standard deviation from days 3-9 before the current day. The formula is:

Alert if: (Today's count - Mean of days t-9 to t-3) > 3 * SD of days t-9 to t-3

This 2-day buffer prevents the outbreak itself from raising the baseline, making the algorithm more sensitive. Simply using the previous 7 days would make it harder to detect outbreaks that have already started.

Question 3: False Positive Rates

Your outbreak detection system generates an alert every 2 weeks on average when there’s no outbreak. What is the approximate false positive rate?

0.5%
3.6%
7.1%
14.3%

Answer: c) 7.1%

Explanation: If alerts occur every 2 weeks (14 days) on average with no outbreak: - Probability of alert on any given day = 1/14 ≈ 0.071 = 7.1%

This is actually quite high for surveillance systems! Many outbreak detection algorithms are calibrated to false positive rates of 1-5% to balance sensitivity (catching real outbreaks) with specificity (avoiding alert fatigue).

The relationship: Lower threshold = More sensitive (catches outbreaks earlier) but more false positives. Higher threshold = Fewer false alarms but may miss early signals.

Question 4: Forecasting vs Detection

Which statement BEST distinguishes disease forecasting from outbreak detection?

Forecasting uses machine learning; detection uses statistical methods
Forecasting predicts future values; detection identifies when current values are unusual
Forecasting requires more data; detection works with small datasets
Forecasting is for endemic diseases; detection is for emerging diseases

Answer: b) Forecasting predicts future values; detection identifies when current values are unusual

Explanation: This captures the fundamental difference:

Outbreak Detection (Anomaly Detection): - “Are we seeing more cases than expected right now?” - Compares current observations to historical baseline - Triggers alerts when threshold exceeded - Example: EARS, CUSUM, Farrington

Disease Forecasting (Prediction): - “How many cases will we see next week/month?” - Predicts future values based on current/past data - Provides probabilistic projections - Example: FluSight, COVID-19 forecasting

Both can use ML or statistical methods (a is false). Both need sufficient data (c is false). Both apply to endemic and emerging diseases (d is false).

Question 5: Google Flu Trends Failure

Google Flu Trends dramatically overestimated flu prevalence in 2012-2013. What was the PRIMARY cause?

Insufficient training data
Algorithm drift due to changes in search behavior
Hardware failures in Google’s servers
Competing flu prediction services

Answer: b) Algorithm drift due to changes in search behavior

Explanation: Google Flu Trends failed because search behavior changed in ways unrelated to actual flu prevalence:

Media coverage effect: Sensationalized flu news → more flu searches (even without more flu)
Search recommendation changes: Google changed autocomplete suggestions
Seasonal search patterns: Winter → people search flu symptoms (even for non-flu illnesses)

The algorithm learned correlations (flu searches ↔︎ flu cases) but not causation. When search behavior changed for non-epidemiological reasons, predictions failed.

Lesson: Always validate with ground truth data (CDC surveillance). Correlations break when underlying behavior changes. This is why CDC FluView remains the gold standard, augmented by (not replaced by) digital signals.

Question 6: Time Series Cross-Validation

Why must disease surveillance models use time-aware cross-validation rather than random K-fold cross-validation?

Disease data has too few observations for random splitting
To prevent data leakage from using future information to predict the past
Disease surveillance always requires real-time predictions
Random splitting is computationally more expensive

Answer: b) To prevent data leakage from using future information to predict the past

Explanation: Time-aware (forward-chaining) cross-validation is essential because:

With random K-fold:

Training: [Week 5, 12, 18, 25, 32, 39, 46]
Testing:  [Week 8, 15, 22, 29, 36, 43, 50]

Problem: Using week 46 data to predict week 15 = using the future to predict the past!

With time-aware:

Training: [Weeks 1-30]
Testing:  [Weeks 31-40]
Training: [Weeks 1-40]
Testing:  [Weeks 41-50]

This mimics reality: You only have past data to predict the future.

Disease data often has temporal autocorrelation (this week’s cases predict next week’s), so random splitting inflates performance metrics and creates models that fail in deployment.

6.14 Discussion Questions

Google Flu Trends failed, but ARGO succeeded. What made the difference? What does this teach us about the role of theory vs. data in public health AI?
BlueDot accurately predicted COVID-19 spread patterns, but their algorithm is proprietary. Should public health agencies rely on “black box” commercial systems? What are the trade-offs?
Social media surveillance captures younger, urban, higher-income populations. How would you design a surveillance system that doesn’t reinforce health inequities?
An outbreak detection algorithm has 90% sensitivity and 95% specificity, but only 15% PPV (positive predictive value) due to base rate effects. Should this system be deployed? How would you communicate its limitations to stakeholders?
During COVID-19, cases, hospitalizations, and wastewater surveillance sometimes contradicted each other. How do you decide which signal to trust? Develop a framework for reconciling conflicting surveillance streams.
Contact tracing apps can be effective but raise privacy concerns. Where should we draw the line between individual privacy and population health? Can surveillance be both effective and privacy-preserving?

6.15 Further Resources

6.15.1 Academic Papers

HealthMap: Internet-based disease surveillance (Freifeld et al., 2008)
Google Flu Trends failure analysis (Lazer et al., 2014) - Key case study
ARGO: Combining search data with traditional surveillance (Yang et al., 2015)
Spatial scan statistics (Kulldorff, 1997)
Surveillance evaluation framework (Buckeridge et al., 2007)

6.15.2 Tools and Platforms

SaTScan - Spatial cluster detection
Prophet - Time series forecasting
HealthMap - Real-time disease surveillance
EIOS - WHO epidemic intelligence
R surveillance package - Comprehensive surveillance tools

6.15.3 Books and Guides

Modern Infectious Disease Epidemiology (Giesecke)
Infectious Disease Surveillance (Nsubuga & White)
CDC Surveillance Resource Center

6.16 Next Steps

You now understand how AI enhances disease surveillance for early detection. But detecting an outbreak is only the first step.

Continue to Chapter 7: Epidemic Forecasting to learn: - Predicting outbreak trajectories (where is this going?) - Comparing mechanistic models vs. machine learning - Scenario planning and uncertainty quantification - Why forecasting is even harder than detection

Before Moving On

Make sure you can: - Explain the difference between traditional and AI-enhanced surveillance - Implement basic anomaly detection algorithms - Understand the lessons from Google Flu Trends - Combine multiple surveillance data sources - Evaluate surveillance system performance - Navigate privacy and ethics considerations

If any feel unclear, revisit the relevant sections or work through the practice exercises.

Surveillance is where AI meets the real world of public health. Get it right, and you save lives. Get it wrong, and you waste resources or miss outbreaks entirely.

Next: Chapter 7: Epidemic Forecasting →