[AI in Disease Surveillance and Outbreak Detection]{.chapter-title}

Q: How did BlueDot detect COVID-19 before WHO?

BlueDot detected unusual pneumonia in Wuhan nine days before WHO by combining news reports, airline data, and disease databases with human epidemiologist review. This human-AI collaboration distinguished it from purely algorithmic systems like Google Flu Trends.

Q: Why did Google Flu Trends fail?

Google Flu Trends overestimated flu by 140% due to overfitting spurious correlations, algorithm changes breaking the model, and media coverage altering search behavior. Engineers built it without epidemiologist input. Big data does not equal good data.

Q: What is anomaly detection in disease surveillance?

Anomaly detection flags statistically unusual patterns that may indicate outbreaks. CDC's EARS uses C1-C3 algorithms comparing current counts against baselines. The challenge is balancing sensitivity (early detection) against specificity (avoiding false alarms and alert fatigue).

Q: What is SaTScan for outbreak detection?

SaTScan is free software for spatial-temporal cluster detection by Martin Kulldorff. It scans geographic areas with a moving window to identify statistically significant disease clusters, accounting for population density and expected rates. Widely used by health departments.

Q: What is syndromic surveillance?

Syndromic surveillance monitors pre-diagnostic indicators like ED chief complaints, pharmacy sales, and school absenteeism to detect outbreaks before lab confirmation. AI enhances it through NLP of clinical notes. The approach trades specificity for speed.

Q: What is wastewater surveillance for disease detection?

Wastewater surveillance detects pathogen genetic material in sewage, capturing infections from entire communities including asymptomatic cases. CDC's NWSS monitors COVID-19 and other pathogens. AI analyzes trends and anomalies independent of healthcare-seeking behavior.

Q: What happened to CDC surveillance databases in 2025?

An Annals of Internal Medicine audit found 38 of 82 regularly updated CDC databases (46%) stopped updating with no explanation. 87% were vaccination-related. AI tools calibrated against these feeds need dependency audits. State coalitions like the West Coast Health Alliance are emerging alternatives.

doi:10.5281/zenodo.18263442

AI in Disease Surveillance and Outbreak Detection

AI transforms outbreak detection by analyzing diverse data streams in real-time, using anomaly detection algorithms like CDC’s EARS to flag unusual patterns, spatial-temporal clustering methods like SaTScan to identify geographic hotspots, and integrating signals from wastewater monitoring, social media, and clinical systems. BlueDot’s AI detected COVID-19 nine days before WHO’s announcement by combining internet surveillance with epidemiological expertise, demonstrating how AI augments traditional surveillance when properly validated and integrated.

Learning Objectives

This chapter examines AI in disease surveillance and outbreak detection. You will learn to:

Implement anomaly detection algorithms (CDC’s EARS, Facebook Prophet)
Apply spatial-temporal clustering (SaTScan, DBSCAN) for outbreak hotspots
Evaluate internet surveillance (Google Flu Trends failures vs. HealthMap)
Understand wastewater surveillance (CDC’s NWSS) and its AI applications
Integrate traditional and digital surveillance signals responsibly
Select metrics balancing early detection with alert fatigue
Navigate privacy-utility tradeoffs in digital disease monitoring
Develop judgment about when AI adds value vs. simpler methods

Prerequisites: AI in Context, Machine Learning Fundamentals, The Data Problem.

Chapter Summary (TL;DR)

The Big Picture: AI transforms disease surveillance from John Snow’s weeks-long door-to-door investigation (1854) to BlueDot’s automated 9-day warning before WHO announced COVID-19. But more data ≠ better signal, noise increases faster than useful information, and technology alone cannot replace human expertise.

The Surveillance Pyramid Challenge:

Traditional surveillance captures only lab-confirmed cases (tip of iceberg). AI enables detection at lower levels: - Social media mentions (symptomatic, not yet seeking care) - OTC medication sales (early self-treatment) - Wastewater viral load (all infections including asymptomatic) - Search queries (pre-symptomatic concern)

But each level introduces new biases and interpretation challenges.

Core AI Approaches for Outbreak Detection:

Anomaly Detection (CDC’s EARS, Facebook Prophet)

Detects when case counts deviate from expected baseline
Statistical control charts, time series decomposition
Challenge: Setting thresholds to balance early detection vs. alert fatigue

Spatial-Temporal Clustering (SaTScan, DBSCAN)

Identifies geographic hotspots and unusual disease clusters
Space-time scan statistics for outbreak localization
Critical for: Contact tracing, targeted interventions

Internet/Digital Surveillance (HealthMap, Flu Near You)

Social media monitoring, search trends, mobility data
Success: BlueDot detected COVID-19 nine days before WHO; BEACON (freely accessible) published 1,300+ disease reports covering nearly 600 outbreaks across 195 countries
Failure: Google Flu Trends overestimated flu by 140% (learned spurious correlations)

The Google Flu Trends Lesson (Why Digital Surveillance Fails):

Published in Nature (2008), discontinued (2015). What went wrong? - Overfitted to spurious correlations (“basketball” searches peak during flu season) - Algorithm changes broke the model - Media coverage changed search behavior - Lacked epidemiologist input - Lesson: Big data ≠ good data. Correlation ≠ causation. Domain expertise required.

Evaluation Metrics (Different from Standard ML):

Sensitivity for Early Detection: How quickly do you detect outbreaks? Days/weeks earlier than traditional?
Specificity vs. Alert Fatigue: False alarm rate, every false positive erodes trust
Timeliness: Real-time vs. near-real-time vs. delayed signals
Coverage: Geographic and population representativeness
Trade-off: Maximize early warning while minimizing alert fatigue

Privacy-Utility Trade-offs:

Mobility data enables contact tracing but raises surveillance concerns
COVID-19 apps: Centralized (effective but privacy-invasive) vs. Decentralized (privacy-preserving but less effective)
Social media monitoring captures early signals but lacks demographic representativeness
Critical question: What level of privacy sacrifice is justified for public health gain?

Integration with Traditional Surveillance:

AI signals should complement, not replace traditional surveillance: - Use AI for early warning, traditional methods for confirmation - Triangulate multiple data streams (clinical + digital + environmental) - Maintain human epidemiologist-in-the-loop for interpretation - Ground-truth digital signals with lab confirmation - Federal data disruption (2025–2026): 46% of regularly updated CDC databases paused, mostly vaccination-related. AI models calibrated against these feeds need dependency audits. State coalitions (West Coast Health Alliance) and academic networks are emerging as partial alternatives.

When AI Adds Value vs. Simpler Methods:

Use AI when: - Multiple heterogeneous data streams need integration - Real-time processing of massive data volumes - Detecting subtle patterns across space and time - Resource-limited settings lacking traditional infrastructure

Stick with traditional methods when: - Small geographic area with good reporting - Well-established surveillance system - Interpretability is paramount - Limited technical capacity for AI maintenance

The Takeaway for Public Health Practitioners: AI augments surveillance but does not solve fundamental challenges, selection bias, reporting delays, changing definitions, privacy constraints. BlueDot succeeded where Google Flu Trends failed because it combined AI with domain expertise. Faster detection means nothing if it generates false alarms that erode trust. The goal is not technological sophistication. It is actionable intelligence for timely public health response.

Introduction: The Evolution of Surveillance

September 1854, London: John Snow knocks on doors along Broad Street, interviewing residents about their water sources. He painstakingly maps cholera cases by hand. It takes him weeks to identify the contaminated pump, but his work revolutionizes epidemiology.

December 2019, Toronto: BlueDot, an AI surveillance platform, flags unusual pneumonia reports in Wuhan, China. It alerts its clients on December 31, nine days before the WHO’s public announcement. The algorithm analyzed airline ticketing data, predicted spread patterns, and identified at-risk cities, all automated, all in real-time.

The transformation is stunning. But this is the paradox: we have more surveillance data than ever, yet outbreak detection remains incredibly difficult.

Why?

More data ≠ Better signal: Noise increases faster than useful information
Faster does not always mean better: False alarms erode trust (alert fatigue)
Technology alone is not enough: Interpretation still requires human expertise
Equity gaps persist: Sophisticated surveillance exists where it is least needed

COVID-19 laid this bare. Despite unprecedented surveillance capabilities, genomic sequencing, wastewater monitoring, mobility data, social media signals, we still struggled with: - Delayed outbreak detection in resource-limited settings - Contradictory signals from different surveillance streams - The “denominator problem” (testing bias masking true disease burden) - Privacy backlash against contact tracing apps

These gaps are quantifiable. A multi-country evaluation of event-based surveillance systems (HealthMap, WHO EIOS) across 24 countries found that while these platforms detected 73.5% of influenza outbreaks overall, only 9.2% were detected within a two-week window, with LMIC performance consistently weaker due to data quality and reporting frequency limitations (Ganser et al., 2022).

A systematic review of 67 AI-based early warning studies found no pooled sensitivity benchmark across systems, with persistent concerns about data quality, algorithmic bias, and deployment readiness preventing standardized performance comparisons (Villanueva-Miranda et al., 2025).

Surveillance vs. Monitoring vs. Screening

These terms are often confused:

Surveillance: Ongoing, systematic collection and analysis of health data for public health action - Purpose: Early warning, trend monitoring, program evaluation - Population: Entire communities or populations - Example: Weekly influenza case counts

Monitoring: Tracking specific measures over time, often program outcomes - Purpose: Assess intervention effectiveness - Population: Usually program participants - Example: Vaccination coverage rates

Screening: Identifying disease in asymptomatic individuals - Purpose: Early diagnosis and treatment - Population: Individuals at risk - Example: Mammography for breast cancer

This chapter focuses on surveillance, specifically, how AI can enhance early detection of outbreaks.

The Surveillance Pyramid

Recall from The Data Problem the surveillance pyramid:

   Confirmed Cases
  / (Lab-confirmed, reported)
  /
  Healthcare-Seeking Cases
  / (Symptomatic, seeking care)
 /
 All Symptomatic Cases
 / (Including those who do not seek care)
 /
All Infections
 (Including asymptomatic)

Traditional surveillance captures only the tip (confirmed cases). AI enables us to potentially detect signals at lower levels: - Social media mentions of symptoms (symptomatic, not yet seeking care) - Over-the-counter medication sales (early self-treatment) - Wastewater viral load (all infections, including asymptomatic) - Search engine queries (pre-symptomatic concern)

But each level introduces new biases and challenges.

What AI Can (and Cannot) Do for Surveillance

AI excels at: - Processing massive, heterogeneous data streams in real-time - Detecting subtle patterns humans might miss - Automating repetitive monitoring tasks (freeing humans for interpretation) - Integrating multiple data sources with different biases - Providing early warning before traditional surveillance signals appear

AI struggles with: - Novel outbreaks with no historical training data - Explaining why an alert was triggered (black box problem) - Distinguishing true signal from noise without verification - Handling rapidly changing surveillance systems (non-stationarity) - Operating in data-poor environments (rural, low-income settings)

The key insight: AI should augment, not replace traditional surveillance. The most effective systems combine both.

Traditional Surveillance Systems: The Baseline

Before exploring AI approaches, we must understand the baseline. Traditional surveillance remains the gold standard against which AI systems are judged.

Federal Surveillance Infrastructure Disruption (2025–2026)

A peer-reviewed audit of all 82 CDC databases updated at least monthly found that 38 (46%) had stopped updating with no explanation and no timeline for resumption (Jacobs et al., 2026; editorial: Marrazzo, 2026). Of these, 34 showed no new data for six months or more. Approximately 87% of paused databases tracked vaccination coverage, including influenza, COVID-19, and RSV vaccination databases, as well as the respiratory illness ED visit dashboard and provisional overdose death counts.

ILINet and FluView (influenza-like illness surveillance) remain operational, and syndromic surveillance systems like BioSense continue reporting. The disruption is concentrated in vaccination coverage and specific disease burden databases, not all CDC surveillance.

Emerging alternatives: The West Coast Health Alliance (California, Oregon, Washington, Hawaii; launched September 2025) now coordinates regional public health guidelines and immunization recommendations independent of the CDC. Academic medical centers and EHR vendors represent additional potential data aggregation pathways, though none replicate the scope of integrated federal surveillance.

Implication for AI practitioners: Any AI surveillance tool trained on or calibrated against CDC data feeds should be audited for dependency on paused databases. Models calibrated against federal baselines lose their reference points when those baselines go dark. State health departments, academic sentinel networks, and wastewater surveillance (see sections below) are filling some gaps, but fragmented replacement is not equivalent to integrated federal infrastructure.

Syndromic Surveillance

The idea: Monitor pre-diagnosis syndromes (fever, cough, rash) rather than confirmed diseases. This provides earlier signals but lower specificity.

Common data sources: - Emergency department chief complaints - Over-the-counter medication sales - School/workplace absenteeism - Ambulance dispatches - Calls to health hotlines (e.g., 811 in Canada, NHS 111 in UK)

Major systems in the US:

1. BioSense Platform

The CDC’s BioSense Platform collects syndromic data from ~70% of emergency departments nationwide.

Strengths: - Near real-time data (daily updates) - Standardized data elements - Built-in anomaly detection

Weaknesses: - Healthcare-seeking bias (see Data Problem chapter) - Respiratory syndrome overload during flu season - High false positive rate

2. ESSENCE (Electronic Surveillance System for the Early Notification of Community-based Epidemics)

Originally developed by Johns Hopkins, now widely deployed by state and local health departments.

Features: - Customizable syndrome definitions - Multiple aberration detection algorithms - Real-time dashboards

3. EARS (Early Aberration Reporting System)

A set of simple statistical algorithms developed by the CDC for rapid outbreak detection.

The EARS Algorithms: - C1: Compares today’s count to average of previous 7 days - C2: Compares today to 2-day moving average - C3: Uses 3-standard-deviation threshold on baseline

Let’s implement EARS C3:

Hide code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

def ears_c3(counts, baseline_days=7, guard_band=2):
 """
 EARS C3 algorithm for outbreak detection
 
 Parameters:
 - counts: array of daily counts
 - baseline_days: number of days to use for baseline (default 7)
 - guard_band: days to exclude before baseline (default 2)
 
 Returns:
 - alerts: boolean array indicating alerts
 - thresholds: upper control limits for each day
 """
 alerts = np.zeros(len(counts), dtype=bool)
 thresholds = np.zeros(len(counts))
 
 # Need at least baseline_days + guard_band days
 start_idx = baseline_days + guard_band
 
 for i in range(start_idx, len(counts)):
  # Baseline period: from (i - baseline_days - guard_band) to (i - guard_band - 1)
  baseline_start = i - baseline_days - guard_band
  baseline_end = i - guard_band
  
  baseline = counts[baseline_start:baseline_end]
  
  # Calculate baseline statistics
  baseline_mean = np.mean(baseline)
  baseline_std = np.std(baseline, ddof=1)
  
  # C3 threshold: mean + 3*std
  threshold = baseline_mean + 3 * baseline_std
  thresholds[i] = threshold
  
  # Alert if today's count exceeds threshold
  if counts[i] > threshold:
   alerts[i] = True
 
 return alerts, thresholds

# Example: Detect outbreak in synthetic syndromic data
np.random.seed(42)
n_days = 100

# Simulate baseline: seasonal pattern + noise
days = np.arange(n_days)
seasonal = 20 + 10 * np.sin(2 * np.pi * days / 30) # 30-day cycle
noise = np.random.normal(0, 3, n_days)
baseline_counts = seasonal + noise

# Inject outbreak: days 60-75 have elevated counts
outbreak_counts = baseline_counts.copy()
outbreak_counts[60:75] += 15 + np.random.normal(0, 2, 15)

# Run EARS C3
alerts, thresholds = ears_c3(outbreak_counts, baseline_days=7, guard_band=2)

# Visualize
fig, ax = plt.subplots(figsize=(14, 6))

ax.plot(days, outbreak_counts, 'o-', label='Daily Counts', color='steelblue')
ax.plot(days, thresholds, '--', label='EARS C3 Threshold', color='orange', linewidth=2)
ax.fill_between(days, 0, thresholds, alpha=0.2, color='orange')

# Mark alerts
alert_days = days[alerts]
alert_counts = outbreak_counts[alerts]
ax.scatter(alert_days, alert_counts, color='red', s=100, zorder=5, 
   label=f'Alerts (n={alerts.sum()})', marker='X')

# Mark true outbreak period
ax.axvspan(60, 75, alpha=0.2, color='red', label='True Outbreak Period')

ax.set_xlabel('Day')
ax.set_ylabel('Syndromic Counts')
ax.set_title('EARS C3 Outbreak Detection Algorithm')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('ears_c3_example.png', dpi=300)
plt.show()

# Evaluate performance
true_outbreak = np.zeros(n_days, dtype=bool)
true_outbreak[60:75] = True

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report

cm = confusion_matrix(true_outbreak, alerts)
print("Confusion Matrix:")
print(cm)
print("\nClassification Report:")
print(classification_report(true_outbreak, alerts, 
       target_names=['No Outbreak', 'Outbreak']))

# Time to detection
first_alert = np.where(alerts)[0][0] if alerts.any() else None
outbreak_start_day = 60

if first_alert:
 time_to_detection = first_alert - outbreak_start_day
 print(f"\nTime to Detection: {time_to_detection} days")
 
 if time_to_detection < 0:
  print("[WARNING] False alarm before outbreak started")
 elif time_to_detection == 0:
  print("[OK] Detected on outbreak start day")
 else:
  print(f"[OK] Detected {time_to_detection} days after outbreak start")

The False Positive Problem

EARS and similar algorithms generate many false alarms. This is by design, trading specificity for sensitivity.

Why this matters: - Alert fatigue → Ignoring real outbreaks - Resource waste investigating false signals - Public trust erosion if alerts are publicized

The CDC’s MMWR reports show that only ~5-10% of syndromic surveillance alerts correspond to true outbreaks.

The solution: Layer multiple signals, require verification, adjust thresholds based on context.

Case-Based Surveillance

Notifiable disease reporting remains the cornerstone of public health surveillance.

The process: 1. Healthcare provider diagnoses disease 2. Reports to local health department (legally required) 3. Local → State → National (CDC/ECDC/WHO) 4. Aggregated and published (e.g., CDC’s NNDSS)

Timeliness challenges: - Days to weeks lag between infection and report - Incomplete reporting (estimated 10-50% of cases missed) - Varying definitions across jurisdictions

Electronic Lab Reporting (ELR): Automates step 2 by sending lab results directly to health departments via HL7 messaging.

Impact: - Reduces reporting delays by 4-7 days - Increases completeness - Still suffers from testing bias

Sentinel Surveillance

The idea: Monitor a representative sample of providers/sites intensively, rather than entire population superficially.

Example: FluView and ILINet

The CDC’s Influenza Surveillance System (ILINet) collects data from ~3,000 outpatient providers.

What they report weekly: - Total patient visits - Visits for influenza-like illness (ILI) - ILI percentage = (ILI visits / total visits) × 100

Strengths: - High-quality data (trained reporters) - Consistent definitions - Long time series for comparison

Limitations: - Small sample size - Not all regions equally represented - Healthcare-seeking bias still present

Code example: Visualizing ILINet data

Hide code

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# ILINet data is publicly available from CDC
# Download from: https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html
# For this example, we'll simulate similar data

# Simulate 5 years of weekly ILI data
np.random.seed(42)
weeks = pd.date_range('2018-01-01', periods=260, freq='W-MON')

# Baseline ILI with seasonal pattern
week_of_year = weeks.isocalendar().week
baseline_ili = 2.0 + 2.5 * np.exp(-((week_of_year - 52) % 52 - 6)**2 / 50)

# Add noise and trend
noise = np.random.normal(0, 0.3, len(weeks))
trend = np.linspace(0, 0.5, len(weeks)) # Slight upward trend

ili_pct = baseline_ili + noise + trend

# Create DataFrame
ili_data = pd.DataFrame({
 'week': weeks,
 'ili_pct': ili_pct,
 'season': weeks.year + (weeks.month >= 10).astype(int)
})

# Visualize
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Time series plot
for season in ili_data['season'].unique():
 season_data = ili_data[ili_data['season'] == season]
 axes[0].plot(season_data['week'], season_data['ili_pct'], 
     marker='o', label=f'{season-1}/{season}', alpha=0.7)

axes[0].set_xlabel('Week')
axes[0].set_ylabel('ILI Percentage (%)')
axes[0].set_title('Weekly Influenza-Like Illness Percentage (ILINet Style)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Seasonal comparison
ili_data['week_of_season'] = (ili_data['week'].dt.isocalendar().week - 40) % 52

for season in ili_data['season'].unique():
 season_data = ili_data[ili_data['season'] == season]
 season_data = season_data.sort_values('week_of_season')
 axes[1].plot(season_data['week_of_season'], season_data['ili_pct'],
     marker='o', label=f'{season-1}/{season}', alpha=0.7)

axes[1].set_xlabel('Week of Season (0 = Oct, 26 = Apr)')
axes[1].set_ylabel('ILI Percentage (%)')
axes[1].set_title('ILI Percentage by Week of Season (Aligned)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('ilinet_style_data.png', dpi=300)
plt.show()

# Calculate epidemic threshold (baseline + 2 SD)
historical_baseline = ili_data['ili_pct'].rolling(window=52, min_periods=26).mean()
historical_sd = ili_data['ili_pct'].rolling(window=52, min_periods=26).std()
epidemic_threshold = historical_baseline + 2 * historical_sd

print("ILI Surveillance Metrics:")
print(f"Mean ILI%: {ili_data['ili_pct'].mean():.2f}%")
print(f"Peak ILI%: {ili_data['ili_pct'].max():.2f}%")
print(f"Weeks above epidemic threshold: {(ili_data['ili_pct'] > epidemic_threshold).sum()}")

AI-Enhanced Early Warning Systems

Traditional surveillance works well for known diseases with established reporting. But what about novel threats or rapid detection before traditional reports arrive?

Enter internet-based surveillance (also called digital epidemiology or infoveillance).

HealthMap: Pioneering Digital Surveillance

HealthMap, launched in 2006 by researchers at Boston Children’s Hospital, was among the first automated disease surveillance systems.

How it works: 1. Data sources: News aggregators, social media, official reports, eyewitness accounts 2. NLP processing: Extract disease mentions, locations, severity indicators 3. Geocoding: Map events to geographic coordinates 4. Classification: Categorize by disease type, outbreak stage 5. Visualization: Display on interactive map

Notable successes:

2009 H1N1 Pandemic: HealthMap detected unusual respiratory illness reports in Mexico before the WHO announcement. The system tracked spread in real-time (Brownstein et al., 2010, NEJM), providing situational awareness.

2014 Ebola Outbreak: HealthMap identified the first Ebola cases in Guinea in March 2014, nine days before the WHO confirmation (Milinovich et al., 2015, Lancet Global Health).

2019-2020 COVID-19: HealthMap flagged pneumonia clusters in Wuhan on December 31, 2019, simultaneous with ProMED-mail’s human-curated alert.

The HealthMap Advantage

Unlike traditional surveillance that depends on healthcare-seeking and reporting infrastructure, HealthMap taps into informal information networks:

Local news in any language
Social media rumors (verified before display)
Unofficial disease reports
Eyewitness accounts

This is especially valuable in settings with weak surveillance infrastructure.

See Brownstein et al. 2008 for technical details.

Limitations: - Signal-to-noise ratio: Many rumors do not pan out - Verification needed: Automated detection ≠ confirmed outbreak - Language barriers: NLP struggles with low-resource languages - Digital divide: Underreports in areas with limited internet access

ProMED-mail: Human + AI Augmentation

ProMED-mail (Program for Monitoring Emerging Diseases) is a human-curated global surveillance system operated by the International Society for Infectious Diseases.

The model: - ~40,000 members worldwide submit outbreak reports - Expert moderators (physicians, epidemiologists) review and verify - Rapid dissemination via email list (60,000+ subscribers) - Now enhanced with AI for initial screening and translation

Historical impact:

SARS 2003: ProMED published the first English-language report of “atypical pneumonia” in Guangdong Province on February 10, 2003, providing early warning to the global community.

COVID-19: ProMED’s December 30, 2019 post about “undiagnosed pneumonia” in Wuhan was among the first public alerts.

The hybrid approach: - AI scans news/social media for potential signals - Human experts verify, contextualize, and comment - Community peer review (members respond with additional information)

Key lesson: AI handles volume, humans provide judgment. Neither alone is sufficient.

BlueDot: Commercial Success in Outbreak Intelligence

BlueDot, founded in 2014 by Dr. Kamran Khan (an infectious disease physician), represents the commercial state-of-the-art in AI surveillance.

Multi-source data integration: - News media (65,000 sources, 65 languages) - Official health reports - Airline ticketing data (global travel patterns) - Animal disease surveillance - Climate and environmental data - Population demographics

The algorithm: 1. Ingest: Real-time data from all sources 2. Filter: ML models identify anomalies and prioritize signals 3. Analyze: Predict disease spread using travel and climate data 4. Alert: Human epidemiologists review and contextualize 5. Report: Clients receive tailored intelligence

COVID-19 early warning:

On January 9, 2020, BlueDot alerted clients about a novel coronavirus outbreak in Wuhan and predicted which cities were at highest risk based on airline travel data.

This was: - 9 days before WHO’s public announcement - Days before ProMED and HealthMap alerts reached mass attention - Accurate predictions: Bangkok, Hong Kong, Tokyo, Taipei, Seoul were indeed early spread destinations

The catch: - Proprietary algorithm: Black box, cannot be independently validated - Expensive: Costs tens of thousands per year (out of reach for most health departments) - Still requires human verification: Automated alerts reviewed by BlueDot’s team

The Black Box Problem

BlueDot’s success raises a critical question: Can we trust outbreak intelligence we cannot verify?

Arguments in favor: - Track record: BlueDot’s alerts have been accurate - Human oversight: Expert team reviews all automated signals - Value proposition: Early warning justifies cost for paying clients

Arguments against: - No independent validation of algorithm performance - Public health decisions based on proprietary, unverifiable models - Equity concerns: Only wealthy entities can afford access - What happens if BlueDot is wrong? Who bears responsibility?

This tension, performance vs. transparency, appears throughout public health AI.

For academic perspective, see Bogoch et al., 2020, Journal of Travel Medicine.

BEACON: Open-Access AI Outbreak Detection

In contrast to BlueDot’s proprietary approach, BEACON (Boston University’s Hariri Institute for Computing) offers free, publicly accessible outbreak intelligence.

System Architecture:

BEACON combines automated AI scanning with human expert verification:

Web scanning: AI agents continuously monitor news sources, sometimes processing 1,000+ signals daily
LLM processing: PandemIQ Llama, an open-source domain-adapted language model, drafts outbreak reports, assesses urgency, and generates risk scores
Human verification: Infectious disease experts review, edit, and approve all content before publication
Public access: Reports published freely at beaconbio.org

The PandemIQ Llama Language Model:

Unlike general-purpose LLMs, PandemIQ Llama was purpose-built for outbreak detection and is freely available on GitHub and Hugging Face:

Base model: Meta’s Llama-3.1-8B, continuously pre-trained on pandemic-specific text
Training corpus: 5.8 billion tokens from over 500,000 documents spanning 31 languages and 16 priority pathogens
Optimization: Instruction tuning and LoRA-based fine-tuning for downstream tasks (information extraction, risk assessment, preliminary reasoning)
License: MIT (code), Llama 3.1 Community License (model weights)

For technical details, see Yang et al., 2026, AAAI Conference on Artificial Intelligence. For the platform’s operational design, see Bhadelia et al., 2025, Journal of Infectious Diseases.

Performance:

1,300+ disease reports published covering nearly 600 outbreaks across 195 countries and territories (Bhadelia et al., 2025, Journal of Infectious Diseases)
Over 100 diseases covered, with active users in 168 countries
Reports include urgency ratings, quality assessments, and risk scores
Reports integrated into WHO’s Epidemic Intelligence from Open Sources (EIOS) system

Computational Reality:

BEACON’s infrastructure illustrates the resource requirements for production AI surveillance:

Training: 32 NVIDIA GPUs
Single GPU server cost: approximately $150,000
Operations: AWS cloud infrastructure

This creates equity concerns: institutions in resource-limited settings cannot replicate this infrastructure. BEACON addresses this by making outputs freely available, even if the underlying compute remains expensive.

Key Differentiators from BlueDot:

Aspect	BlueDot	BEACON
Access	Commercial (tens of thousands/year)	Free
Algorithm	Proprietary	Open-source (PandemIQ Llama)
Validation	Internal only	Academic publication pathway
Human oversight	Yes	Yes

The Common Thread:

Both BlueDot and BEACON emphasize that AI handles volume while humans provide judgment. Neither system deploys fully autonomous outbreak alerts. The difference is transparency and accessibility.

For detailed technical coverage of how AI agents work (planning, tool use, iteration) and their applications beyond surveillance, see AI Agents: From Chatbot to Autonomous Assistant.

Building Your Own Web Scraper for Outbreak Signals

You can create a basic outbreak surveillance system using open-source tools:

Hide code

import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
import re
from geopy.geocoders import Nominatim
import time

# This is a simplified example - production systems need robust error handling,
# rate limiting, compliance with robots.txt, and proper data validation

def scrape_who_don():
 """
 Scrape WHO Disease Outbreak News (DON)
 URL: https://www.who.int/emergencies/disease-outbreak-news
 """
 url = "https://www.who.int/emergencies/disease-outbreak-news"
 
 try:
  response = requests.get(url, timeout=10)
  response.raise_for_status()
  soup = BeautifulSoup(response.content, 'html.parser')
  
  # Find article listings (adjust selectors based on current site structure)
  articles = soup.find_all('div', class_='list-view--item')
  
  outbreaks = []
  for article in articles[:10]: # Limit to 10 most recent
   try:
    title_elem = article.find('h3', class_='heading')
    date_elem = article.find('span', class_='timestamp')
    link_elem = article.find('a')
    
    if title_elem and date_elem and link_elem:
     outbreaks.append({
      'date': date_elem.text.strip(),
      'title': title_elem.text.strip(),
      'url': 'https://www.who.int' + link_elem['href'],
      'source': 'WHO DON'
     })
   except Exception as e:
    continue
  
  return pd.DataFrame(outbreaks)
 
 except Exception as e:
  print(f"Error scraping WHO DON: {e}")
  return pd.DataFrame()

def scrape_promed_via_rss():
 """
 Get ProMED posts via RSS feed
 """
 import feedparser
 
 feed_url = "https://promedmail.org/ajax/rss.php"
 
 try:
  feed = feedparser.parse(feed_url)
  
  posts = []
  for entry in feed.entries[:20]: # Last 20 posts
   posts.append({
    'date': entry.published if 'published' in entry else 'Unknown',
    'title': entry.title,
    'url': entry.link,
    'summary': entry.summary if 'summary' in entry else '',
    'source': 'ProMED'
   })
  
  return pd.DataFrame(posts)
 
 except Exception as e:
  print(f"Error fetching ProMED RSS: {e}")
  return pd.DataFrame()

def extract_disease_mentions(text, disease_keywords):
 """
 Simple keyword matching for disease extraction
 In production, use NER models like BioBERT
 """
 text_lower = text.lower()
 mentioned_diseases = []
 
 for disease, keywords in disease_keywords.items():
  for keyword in keywords:
   if keyword.lower() in text_lower:
    mentioned_diseases.append(disease)
    break
 
 return list(set(mentioned_diseases))

def geocode_location(location_text):
 """
 Extract location from text and geocode
 """
 geolocator = Nominatim(user_agent="outbreak_surveillance_demo")
 
 try:
  location = geolocator.geocode(location_text, timeout=10)
  if location:
   return {
    'latitude': location.latitude,
    'longitude': location.longitude,
    'location_full': location.address
   }
 except Exception as e:
  pass
 
 return {'latitude': None, 'longitude': None, 'location_full': None}

# Disease keywords (simplified - real systems use ML models)
DISEASE_KEYWORDS = {
 'COVID-19': ['covid', 'coronavirus', 'sars-cov-2', 'pandemic'],
 'Influenza': ['influenza', 'flu', 'h1n1', 'h5n1', 'h3n2'],
 'Ebola': ['ebola', 'ebolavirus', 'hemorrhagic fever'],
 'Dengue': ['dengue', 'dengue fever', 'breakbone fever'],
 'Cholera': ['cholera', 'vibrio cholerae'],
 'Measles': ['measles', 'rubeola'],
 'Malaria': ['malaria', 'plasmodium'],
 'Mpox': ['mpox', 'monkeypox']
}

# Main surveillance pipeline
print("Fetching outbreak reports from multiple sources...")

# Scrape data
who_data = scrape_who_don()
promed_data = scrape_promed_via_rss()

# Combine sources
all_reports = pd.concat([who_data, promed_data], ignore_index=True)

# Extract diseases
all_reports['diseases'] = all_reports.apply(
 lambda row: extract_disease_mentions(
  str(row.get('title', '')) + ' ' + str(row.get('summary', '')),
  DISEASE_KEYWORDS
 ),
 axis=1
)

# Filter to reports with disease mentions
outbreak_reports = all_reports[all_reports['diseases'].apply(len) > 0].copy()

print(f"\nFound {len(outbreak_reports)} outbreak reports")
print("\nRecent Outbreaks:")
print(outbreak_reports[['date', 'title', 'diseases', 'source']].head(10))

# Alert generation logic
def generate_alerts(reports, alert_diseases=['COVID-19', 'Ebola', 'Cholera']):
 """
 Generate alerts for high-priority diseases
 """
 alerts = []
 
 for _, report in reports.iterrows():
  detected_priority = [d for d in report['diseases'] if d in alert_diseases]
  
  if detected_priority:
   alerts.append({
    'alert_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'disease': ', '.join(detected_priority),
    'source': report['source'],
    'title': report['title'],
    'url': report['url'],
    'priority': 'HIGH'
   })
 
 return pd.DataFrame(alerts)

# Generate alerts
alerts = generate_alerts(outbreak_reports)

if len(alerts) > 0:
 print(f"\n[ALERT] {len(alerts)} HIGH PRIORITY ALERTS:")
 print(alerts[['alert_time', 'disease', 'title']].to_string(index=False))
else:
 print("\n[OK] No high-priority alerts at this time")

# Save results
outbreak_reports.to_csv('outbreak_surveillance_feed.csv', index=False)
alerts.to_csv('outbreak_alerts.csv', index=False)

print("\n[OK] Results saved to outbreak_surveillance_feed.csv and outbreak_alerts.csv")

Production-Ready Web Scraping

This example is educational. For production surveillance:

Respect robots.txt and terms of service
Implement rate limiting (do not hammer servers)
Use RSS feeds when available (ProMED, WHO, ECDC all provide them)
Use proper NLP models (BioBERT, SciBERT) for disease/location extraction
Store historical data for trend analysis
Implement verification workflow (do not auto-publish alerts)
Monitor for source changes (websites update structure frequently)

For a robust open-source solution, see EIOS (WHO’s Epidemic Intelligence from Open Sources platform).

Social Media Surveillance: Lessons from Google Flu Trends

Social media promised to transform disease surveillance. The reality has been… complicated.

The Google Flu Trends Story

2008: The Promise

Google researchers published a landmark paper in Nature showing that search query patterns could track influenza activity in near real-time.

The method: - Identify 45 search terms correlated with CDC ILINet data - Aggregate searches by region - Use linear model to predict current ILI levels

The results: - 97% correlation with CDC data - 1-2 weeks ahead of traditional surveillance - Updated daily (vs. weekly CDC reports)

The excitement: “Big data” + Machine learning = Real-time disease tracking!

Media proclaimed: “The end of traditional surveillance!”

2013: The Fall

During the 2012-2013 flu season, Google Flu Trends (GFT) massively overestimated influenza prevalence, predicting almost double the actual CDC-reported levels.

The post-mortem analysis identified multiple failures:

1. Algorithm Dynamics (Overfitting) - GFT used 50 million search terms → Selected 45 best correlates - With so many candidate predictors, spurious correlations were inevitable - Example: Searches for “high school basketball” correlated with flu season (both peak in winter) → Algorithm included it

2. Search Behavior Changes - Media coverage of flu → People searched more → Inflated estimates - Google’s search algorithm updates changed which terms appeared - Auto-complete suggestions biased searches

3. No Mechanism, Only Correlation - GFT had no epidemiological model, purely data-driven - When patterns changed (e.g., H1N1 pandemic), algorithm failed - As Lazer et al. wrote: “Big data hubris”, assumption that big data alone, without theory, is sufficient

4. Closed System, No Transparency - Google didn’t reveal which search terms were used - No independent validation possible - When it failed, could not diagnose why

The Fundamental Lessons

Google Flu Trends teaches us critical principles for public health AI:

Correlation ≠ Causation (especially with big data)
Systems change over time (non-stationarity kills prediction)
Transparency matters (black boxes cannot be debugged)
Theory + Data beats Data alone (epidemiological mechanisms matter)
Validation must be ongoing (performance degrades)
Don’t replace traditional surveillance (use AI as complement)

For detailed analysis, see Lazer et al., 2014 and Butler, 2013. For a reassessment of GFT performance, see Olson et al., 2013.

Modern Search-Based Surveillance: ARGO

Learning from GFT’s failure, researchers developed ARGO (AutoRegression with Google search data).

Key improvements: - Combines Google Trends data with CDC ILINet (not replacing it) - Uses time series methods (ARIMA) with epidemiological constraints - Regularly recalibrates as patterns change - Transparent (published algorithm, open validation)

Performance: - ~30% improvement over CDC ILINet alone for nowcasting - Useful for filling reporting gaps (e.g., estimating current week before CDC data arrives) - Robust to algorithm changes (because it adapts)

Code example: Simple nowcasting with search trends

Hide code

import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt

# Simulate weekly ILI data + search trends
np.random.seed(42)
weeks = pd.date_range('2020-01-01', periods=150, freq='W')

# True ILI (with 2-week reporting lag)
week_num = np.arange(len(weeks))
seasonal = 2.5 + 2.0 * np.sin(2 * np.pi * week_num / 52)
ili_true = seasonal + np.random.normal(0, 0.3, len(weeks))

# Search trends (leading indicator - correlated but noisy)
search_trends = ili_true + np.random.normal(0, 0.5, len(weeks))
search_trends = np.roll(search_trends, -1) # Searches lead by 1 week

# Reported ILI (with 2-week delay)
ili_reported = np.concatenate([
 [np.nan, np.nan], # First 2 weeks not yet reported
 ili_true[:-2]  # Everything else delayed 2 weeks
])

# Create DataFrame
data = pd.DataFrame({
 'week': weeks,
 'ili_true': ili_true,
 'ili_reported': ili_reported,
 'search_trends': search_trends
})

# Nowcasting: Predict current ILI using search trends + historical ILI
train_weeks = 100
test_weeks = len(weeks) - train_weeks

predictions_baseline = []
predictions_with_search = []

for i in range(train_weeks, len(weeks)):
 # Historical data up to this point
 train_data = data.iloc[:i]
 
 # Baseline: Use only reported ILI (ARIMA model)
 ili_reported_clean = train_data['ili_reported'].dropna()
 
 if len(ili_reported_clean) > 10:
  try:
   model_baseline = ARIMA(ili_reported_clean, order=(2,0,1))
   fit_baseline = model_baseline.fit()
   pred_baseline = fit_baseline.forecast(steps=1)[0]
  except:
   pred_baseline = ili_reported_clean.iloc[-1]
 else:
  pred_baseline = np.nan
 
 predictions_baseline.append(pred_baseline)
 
 # With search: Adjust prediction using current search trends
 current_search = train_data['search_trends'].iloc[-1]
 recent_search_avg = train_data['search_trends'].iloc[-4:].mean()
 
 # Simple adjustment: if search trends elevated, adjust upward
 search_signal = current_search - recent_search_avg
 pred_with_search = pred_baseline + 0.3 * search_signal # 0.3 is learned weight
 
 predictions_with_search.append(pred_with_search)

# Add predictions to dataframe
data.loc[train_weeks:, 'pred_baseline'] = predictions_baseline
data.loc[train_weeks:, 'pred_with_search'] = predictions_with_search

# Evaluate
from sklearn.metrics import mean_absolute_error, mean_squared_error

test_data = data.iloc[train_weeks:]

mae_baseline = mean_absolute_error(test_data['ili_true'], test_data['pred_baseline'])
mae_search = mean_absolute_error(test_data['ili_true'], test_data['pred_with_search'])

rmse_baseline = np.sqrt(mean_squared_error(test_data['ili_true'], test_data['pred_baseline']))
rmse_search = np.sqrt(mean_squared_error(test_data['ili_true'], test_data['pred_with_search']))

print("Nowcasting Performance:")
print(f"Baseline (ILI only) MAE: {mae_baseline:.3f}, RMSE: {rmse_baseline:.3f}")
print(f"With Search Trends MAE: {mae_search:.3f}, RMSE: {rmse_search:.3f}")
print(f"Improvement: {(1 - mae_search/mae_baseline)*100:.1f}% reduction in error")

# Visualize
fig, ax = plt.subplots(figsize=(14, 7))

ax.plot(data['week'], data['ili_true'], 'o-', label='True ILI', 
  color='black', linewidth=2)
ax.plot(data['week'], data['ili_reported'], 's--', label='Reported ILI (2-week delay)',
  color='gray', alpha=0.7)
ax.plot(data['week'], data['pred_baseline'], 'x-', label='Nowcast (baseline)',
  color='blue', alpha=0.7)
ax.plot(data['week'], data['pred_with_search'], '^-', label='Nowcast (with search)',
  color='red', alpha=0.7)

ax.axvline(weeks[train_weeks], color='green', linestyle='--', linewidth=2,
   label='Train/Test Split')

ax.set_xlabel('Week')
ax.set_ylabel('ILI Percentage')
ax.set_title('Nowcasting ILI with Search Trends (ARGO-style)')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('ili_nowcasting_with_search.png', dpi=300)
plt.show()

Twitter/X for Disease Surveillance

Social media offers real-time, high-volume data about health concerns. But it is noisy, biased, and privacy-sensitive.

Approaches:

1. Keyword-based tracking - Count mentions of “flu”, “fever”, “cough” - Pros: Simple, fast - Cons: Lots of false positives (“I’m sick of this traffic!”)

2. Sentiment analysis - Classify tweets as genuine health concerns vs. casual mentions - Aslam et al., 2014, JMIR showed reasonable correlation with CDC ILINet

3. Bot detection and filtering - Many “health” tweets are from bots or automated accounts - Must filter to genuine user posts

Challenges:

Selection bias: Twitter users ≠ general population (younger, urban, higher income) Privacy concerns: Even aggregated health data can reveal sensitive information Platform changes: API access, data policies constantly evolving Spam and manipulation: Bots, coordinated campaigns distort signal Language and cultural variation: Health expressions vary widely

Privacy and Ethics in Social Media Surveillance

Using social media for health surveillance raises serious concerns:

Consent: Users do not expect health posts to be used for surveillance
Re-identification risk: Aggregated data can sometimes be de-anonymized
Stigma: Mental health, HIV/AIDS mentions could be sensitive
Equity: Surveillance focused on social media users misses vulnerable populations

Best practices: - Aggregate data (never analyze individual accounts) - Remove identifying information - Obtain IRB approval for research use - Be transparent about surveillance activities - Consider community engagement

See Vayena et al., 2015, PLoS Computational Biology for ethical framework.

Wastewater Surveillance: The Unbiased Signal

Unlike clinical surveillance (which depends on people seeking care) or digital surveillance (which depends on internet access), wastewater surveillance captures all infections in a community, symptomatic or not, tested or not.

Why Wastewater Works

When infected individuals use toilets, viral RNA enters the sewage system. Testing wastewater at treatment plants provides:

Population-level signal: One sample represents thousands to millions of people
No healthcare access bias: Captures infections regardless of testing behavior
Asymptomatic detection: Finds infections that clinical surveillance misses
Early warning: Viral shedding often precedes symptom onset by days
Cost efficiency: ~$50-100 per sample vs. thousands of individual tests

CDC’s National Wastewater Surveillance System (NWSS)

Launched in September 2020, CDC’s NWSS rapidly scaled from 209 sites to over 1,500 sites by December 2022, covering approximately 47% of the U.S. population.

Current capabilities:

Coverage: All 50 states, 7 territories, tribal communities
Pathogens tracked: SARS-CoV-2, influenza A, RSV, mpox, measles (wild-type)
Update frequency: Weekly (data updated every Friday)
Turnaround: Toilet flush to results in 5-7 days

Measles wastewater monitoring was validated during a 2025 Mesa County, Colorado outbreak, where wastewater detected wild-type measles virus approximately four days before clinical case counts rose. Seven confirmed cases were ultimately linked to the outbreak (Jensen et al., MMWR, 2026). NWSS now publishes measles detection data publicly.

Data sources:

State and local health departments (CDC-funded)
CDC’s national testing contract (Verily Life Sciences)
WastewaterSCAN (Stanford/Emory/Verily partnership)

AI Applications in Wastewater Surveillance

1. Normalization and Quality Control

Raw viral concentrations vary with rainfall, population served, and sampling methods. AI helps normalize signals:

# Simplified wastewater normalization
def normalize_viral_load(raw_concentration, flow_rate, pepper_mild_mottle_virus):
    """
    Normalize viral concentration using:
    - Flow rate (dilution from rain/infiltration)
    - PMMoV (human fecal indicator, stable across populations)
    """
    # PMMoV normalization accounts for human waste contribution
    normalized = raw_concentration / pepper_mild_mottle_virus

    # Flow normalization accounts for dilution
    flow_normalized = normalized * flow_rate

    return flow_normalized

2. Trend Detection and Forecasting

Time series models detect when viral loads exceed expected baselines:

Prophet/ARIMA for seasonal adjustment
Anomaly detection for outbreak signals
Nowcasting to estimate current prevalence from wastewater

3. Variant Tracking

Sequencing wastewater enables variant surveillance without individual testing:

Detect emerging variants before clinical sequencing catches up
Estimate variant proportions in communities
Track geographic spread patterns

Limitations and Challenges

What Wastewater Surveillance Cannot Do

Individual identification: Cannot link signals to specific people (feature, not bug)
Clinical severity: Viral load does not indicate disease severity
Geographic precision: Sewersheds do not match political boundaries
Emerging pathogens: Requires assay development for new targets
Rural coverage: Septic systems (20% of U.S.) not captured

Technical challenges:

Matrix effects: Sewage composition varies, affecting PCR efficiency
Decay: RNA degrades during transit to treatment plant
Inhibitors: Industrial discharge can interfere with detection
Sampling variability: Composite vs. grab samples yield different results

Integration with Clinical Surveillance

Wastewater works best as a leading indicator, not a replacement for clinical data:

Signal	Lead Time	Strengths	Limitations
Wastewater	4-7 days before cases	Unbiased, comprehensive	No demographics, no severity
Emergency visits	0-2 days before cases	Clinical validation	Only severe cases
Test positivity	Real-time	Individual-level	Testing behavior bias
Hospitalizations	7-14 days after infection	Definitive outcome	Lagging indicator

Best practice: Triangulate wastewater signals with clinical data. Rising wastewater + stable cases = undertesting. Falling wastewater + rising cases = reporting artifact.

Resources

CDC NWSS Dashboard - National wastewater data
WastewaterSCAN - Academic partnership data
Biobot Analytics - Commercial wastewater epidemiology

Machine Learning for Anomaly Detection

Moving beyond simple thresholds, modern surveillance uses time series analysis and machine learning to detect outbreaks.

Time Series Forecasting with Prophet

Prophet, developed by Facebook (now Meta), is an open-source time series forecasting tool designed for business time series (which share features with epidemiological data):

Strong seasonal patterns (yearly, weekly cycles)
Holidays and special events
Piecewise trends with changepoints
Robustness to missing data

Why Prophet for public health: - Handles weekly seasonality (flu peaks in winter) - Automatically detects changepoints (outbreak starts/ends) - Provides uncertainty intervals (critical for decision-making) - Easy to use (minimal parameter tuning)

Complete example: Outbreak detection with Prophet

Hide code

import pandas as pd
import numpy as np
from prophet import Prophet
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Generate synthetic emergency department syndromic data
np.random.seed(42)
dates = pd.date_range('2022-01-01', '2024-12-31', freq='D')
n_days = len(dates)

# Baseline: seasonal pattern
day_of_year = dates.dayofyear
year = dates.year - dates.year.min()

# Seasonal component (influenza pattern)
seasonal = 50 + 30 * np.cos(2 * np.pi * (day_of_year - 15) / 365)

# Weekly pattern (lower on weekends)
day_of_week = dates.dayofweek
weekly = -10 * (day_of_week >= 5).astype(int)

# Trend
trend = 50 + 5 * year

# Noise
noise = np.random.normal(0, 5, n_days)

# Baseline counts
baseline_counts = seasonal + weekly + trend + noise
baseline_counts = np.maximum(baseline_counts, 0)

# Inject outbreak: September 2024
outbreak_start = (dates >= '2024-09-01') & (dates <= '2024-10-15')
outbreak_counts = baseline_counts.copy()
outbreak_counts[outbreak_start] += 40 + np.random.normal(0, 8, outbreak_start.sum())

# Create DataFrame
df = pd.DataFrame({
 'ds': dates, # Prophet requires 'ds' for dates
 'y': outbreak_counts # Prophet requires 'y' for values
})

# Split: train on pre-outbreak data
train_df = df[df['ds'] < '2024-09-01'].copy()
test_df = df[df['ds'] >= '2024-09-01'].copy()

# Fit Prophet model
model = Prophet(
 yearly_seasonality=True,
 weekly_seasonality=True,
 daily_seasonality=False,
 changepoint_prior_scale=0.05, # Controls flexibility of trend
 seasonality_prior_scale=10.0, # Controls flexibility of seasonality
 interval_width=0.95    # 95% prediction intervals
)

print("Training Prophet model on historical data...")
model.fit(train_df)

# Make predictions for future (including outbreak period)
future = model.make_future_dataframe(periods=len(test_df), freq='D')
forecast = model.predict(future)

# Merge actual values with forecast
forecast = forecast.merge(df[['ds', 'y']], on='ds', how='left')

# Detect anomalies: actual value outside prediction interval
forecast['anomaly'] = (
 (forecast['y'] < forecast['yhat_lower']) | 
 (forecast['y'] > forecast['yhat_upper'])
)

forecast['anomaly_score'] = np.abs(forecast['y'] - forecast['yhat']) / (forecast['yhat_upper'] - forecast['yhat_lower'])

# Identify sustained anomalies (outbreak detection)
# Require 3+ consecutive days outside prediction interval
from scipy.ndimage import label

anomaly_regions, n_regions = label(forecast['anomaly'].fillna(False))

outbreak_detected = []
for region_id in range(1, n_regions + 1):
 region_mask = anomaly_regions == region_id
 region_length = region_mask.sum()
 
 if region_length >= 3: # Sustained anomaly
  region_dates = forecast.loc[region_mask, 'ds']
  outbreak_detected.append({
   'start': region_dates.min(),
   'end': region_dates.max(),
   'duration': region_length,
   'mean_anomaly_score': forecast.loc[region_mask, 'anomaly_score'].mean()
  })

outbreak_detected_df = pd.DataFrame(outbreak_detected)

print("\n" + "="*60)
print("OUTBREAK DETECTION RESULTS")
print("="*60)

if len(outbreak_detected_df) > 0:
 print(f"\n🚨 {len(outbreak_detected_df)} potential outbreak(s) detected:")
 print(outbreak_detected_df.to_string(index=False))
 
 # Check if we detected the true outbreak
 true_outbreak_start = pd.Timestamp('2024-09-01')
 true_outbreak_end = pd.Timestamp('2024-10-15')
 
 for _, outbreak in outbreak_detected_df.iterrows():
  overlap_start = max(outbreak['start'], true_outbreak_start)
  overlap_end = min(outbreak['end'], true_outbreak_end)
  
  if overlap_start <= overlap_end:
   delay = (overlap_start - true_outbreak_start).days
   print(f"\n[OK] True outbreak detected with {delay} day delay")
   break
else:
 print("\nNo sustained anomalies detected")

# Visualize
fig, axes = plt.subplots(3, 1, figsize=(16, 12))

# Top panel: Full time series with forecast
axes[0].scatter(forecast['ds'], forecast['y'], alpha=0.5, s=10, label='Actual Counts')
axes[0].plot(forecast['ds'], forecast['yhat'], 'b-', label='Prophet Forecast')
axes[0].fill_between(forecast['ds'], forecast['yhat_lower'], forecast['yhat_upper'],
      alpha=0.3, color='blue', label='95% Prediction Interval')

# Mark detected outbreaks
for _, outbreak in outbreak_detected_df.iterrows():
 axes[0].axvspan(outbreak['start'], outbreak['end'], alpha=0.3, color='red')

axes[0].axvline(train_df['ds'].max(), color='green', linestyle='--', 
    linewidth=2, label='Train/Test Split')
axes[0].set_ylabel('Daily Counts')
axes[0].set_title('Prophet-Based Outbreak Detection')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Middle panel: Zoom into outbreak period
outbreak_period = forecast[forecast['ds'] >= '2024-08-01']
axes[1].scatter(outbreak_period['ds'], outbreak_period['y'], 
    alpha=0.7, s=20, label='Actual', color='black', zorder=3)
axes[1].plot(outbreak_period['ds'], outbreak_period['yhat'], 
   'b-', linewidth=2, label='Expected')
axes[1].fill_between(outbreak_period['ds'], 
      outbreak_period['yhat_lower'], 
      outbreak_period['yhat_upper'],
      alpha=0.3, color='blue', label='95% PI')

# Highlight anomalies
anomalies = outbreak_period[outbreak_period['anomaly'] == True]
axes[1].scatter(anomalies['ds'], anomalies['y'], 
    color='red', s=100, marker='X', label='Anomalies', zorder=4)

axes[1].set_ylabel('Daily Counts')
axes[1].set_title('Outbreak Period (Zoomed)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Bottom panel: Anomaly scores
axes[2].plot(forecast['ds'], forecast['anomaly_score'], 
   'o-', alpha=0.5, label='Anomaly Score')
axes[2].axhline(1.0, color='red', linestyle='--', label='Alert Threshold')
axes[2].set_ylabel('Anomaly Score')
axes[2].set_xlabel('Date')
axes[2].set_title('Anomaly Scores Over Time')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('prophet_outbreak_detection.png', dpi=300)
plt.show()

# Compare to EARS C3
from scipy import stats

def ears_c3_rolling(counts, window=7):
 """Simple EARS C3 for comparison"""
 alerts = []
 for i in range(window + 2, len(counts)):
  baseline = counts[i-window-2:i-2]
  threshold = baseline.mean() + 3 * baseline.std()
  alerts.append(counts[i] > threshold)
 
 return [False] * (window + 2) + alerts

forecast['ears_alert'] = ears_c3_rolling(forecast['y'].values)

print("\nComparison: Prophet vs. EARS C3")
print(f"Prophet anomalies: {forecast['anomaly'].sum()}")
print(f"EARS C3 alerts: {sum(forecast['ears_alert'])}")
print(f"Overlap: {(forecast['anomaly'] & forecast['ears_alert']).sum()} days")

When to Use Prophet

Good for:

Daily/weekly syndromic surveillance data
Data with strong seasonality (flu, gastroenteritis)
Need for uncertainty quantification
Quick implementation with minimal tuning
Multiple time series (can fit separate models per region)

Less suitable for:

Hourly or minute-level data (use LSTM or ARIMA)
Very short time series (<1 year)
Outbreak forecasting (predicting future trajectory), Prophet is for detecting current anomalies

For alternatives, see statsmodels for ARIMA/SARIMAX, or GluonTS for deep learning approaches.

Spatial-Temporal Cluster Detection

Diseases do not just change over time, they cluster in space. Where an outbreak is happening matters as much as when.

SaTScan: The Gold Standard

SaTScan (Spatial, Temporal, or Space-Time Scan Statistic), developed by Martin Kulldorff, is the most widely used spatial cluster detection tool in public health.

How it works:

Create a scanning window: Circle of varying radius moves across map
For each location and radius: Count cases inside vs. outside circle
Test hypothesis: Are there more cases than expected by chance?
Statistical significance: Use Monte Carlo simulation (permutation test)
Most likely cluster: Location/radius with lowest p-value

Example: Detecting cholera clusters

Hide code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist
from scipy.stats import poisson
import geopandas as gpd
from shapely.geometry import Point, Polygon
import seaborn as sns

# Simulate case data with spatial cluster
np.random.seed(42)

# Background cases (uniformly distributed)
n_background = 200
bg_x = np.random.uniform(0, 100, n_background)
bg_y = np.random.uniform(0, 100, n_background)

# Cluster cases (concentrated in one area)
n_cluster = 100
cluster_center = [30, 70]
cluster_x = np.random.normal(cluster_center[0], 5, n_cluster)
cluster_y = np.random.normal(cluster_center[1], 5, n_cluster)

# Population at risk (grid of census tracts)
n_grid = 20
grid_x = np.linspace(5, 95, n_grid)
grid_y = np.linspace(5, 95, n_grid)
grid_xx, grid_yy = np.meshgrid(grid_x, grid_y)
pop_locations = np.column_stack([grid_xx.ravel(), grid_yy.ravel()])

# Population sizes (roughly uniform, with random variation)
pop_sizes = np.random.poisson(500, len(pop_locations))

# Combine all cases
all_x = np.concatenate([bg_x, cluster_x])
all_y = np.concatenate([bg_y, cluster_y])

cases_df = pd.DataFrame({
 'x': all_x,
 'y': all_y,
 'case': 1
})

pop_df = pd.DataFrame({
 'x': pop_locations[:, 0],
 'y': pop_locations[:, 1],
 'population': pop_sizes
})

print(f"Total cases: {len(cases_df)}")
print(f"Population grid cells: {len(pop_df)}")
print(f"Total population: {pop_df['population'].sum():,}")

# Spatial scan statistic (simplified version)
def spatial_scan_statistic(cases, population, max_radius=20, n_simulations=999):
 """
 Simplified spatial scan statistic (Kulldorff method)
 
 Parameters:
 - cases: DataFrame with x, y coordinates
 - population: DataFrame with x, y, population
 - max_radius: maximum radius to scan
 - n_simulations: Monte Carlo simulations for p-value
 
 Returns:
 - best_cluster: dict with cluster info
 - all_clusters: list of all tested clusters
 """
 
 case_coords = cases[['x', 'y']].values
 pop_coords = population[['x', 'y']].values
 pop_counts = population['population'].values
 
 total_cases = len(case_coords)
 total_pop = pop_counts.sum()
 
 best_llr = -np.inf
 best_cluster = None
 all_clusters = []
 
 # Scan all possible center points and radii
 for i, center in enumerate(pop_coords):
  # Calculate distances from this center to all population points
  distances = np.sqrt(np.sum((pop_coords - center)**2, axis=1))
  
  # Try different radii
  unique_distances = np.sort(np.unique(distances))
  radii_to_try = unique_distances[unique_distances <= max_radius]
  
  for radius in radii_to_try:
   # Cases within this circle
   case_distances = np.sqrt(np.sum((case_coords - center)**2, axis=1))
   cases_inside = (case_distances <= radius).sum()
   
   # Population within this circle
   pop_inside = pop_counts[distances <= radius].sum()
   
   if pop_inside == 0 or pop_inside == total_pop:
    continue
   
   # Expected cases (under null hypothesis of uniform risk)
   expected_inside = total_cases * (pop_inside / total_pop)
   
   # Log likelihood ratio
   if cases_inside > expected_inside:
    cases_outside = total_cases - cases_inside
    pop_outside = total_pop - pop_inside
    expected_outside = total_cases - expected_inside
    
    # Poisson-based likelihood ratio
    llr = (cases_inside * np.log(cases_inside / expected_inside) +
      cases_outside * np.log(cases_outside / expected_outside))
    
    cluster_info = {
     'center_x': center[0],
     'center_y': center[1],
     'radius': radius,
     'cases_inside': cases_inside,
     'pop_inside': pop_inside,
     'expected_cases': expected_inside,
     'relative_risk': cases_inside / expected_inside,
     'llr': llr
    }
    
    all_clusters.append(cluster_info)
    
    if llr > best_llr:
     best_llr = llr
     best_cluster = cluster_info
 
 # Monte Carlo simulation for p-value
 print(f"\nRunning {n_simulations} Monte Carlo simulations...")
 
 simulated_llrs = []
 for sim in range(n_simulations):
  # Randomly assign cases to population locations
  random_assignment = np.random.choice(len(pop_coords), size=total_cases, 
           replace=True, p=pop_counts/total_pop)
  sim_case_coords = pop_coords[random_assignment]
  
  # Find best LLR for this random data
  sim_best_llr = -np.inf
  
  for center in pop_coords[::10]: # Sample centers for speed
   distances = np.sqrt(np.sum((pop_coords - center)**2, axis=1))
   radii_to_try = unique_distances[unique_distances <= max_radius][::5]
   
   for radius in radii_to_try:
    case_distances = np.sqrt(np.sum((sim_case_coords - center)**2, axis=1))
    cases_inside = (case_distances <= radius).sum()
    pop_inside = pop_counts[distances <= radius].sum()
    
    if pop_inside == 0 or pop_inside == total_pop:
     continue
    
    expected_inside = total_cases * (pop_inside / total_pop)
    
    if cases_inside > expected_inside:
     cases_outside = total_cases - cases_inside
     pop_outside = total_pop - pop_inside
     expected_outside = total_cases - expected_inside
     
     llr = (cases_inside * np.log(cases_inside / expected_inside) +
       cases_outside * np.log(cases_outside / expected_outside))
     
     if llr > sim_best_llr:
      sim_best_llr = llr
  
  simulated_llrs.append(sim_best_llr)
  
  if (sim + 1) % 100 == 0:
   print(f" Completed {sim + 1}/{n_simulations} simulations")
 
 # P-value: proportion of simulations with LLR >= observed
 p_value = (np.array(simulated_llrs) >= best_llr).sum() / n_simulations
 best_cluster['p_value'] = p_value
 
 return best_cluster, all_clusters

# Run spatial scan
print("Running spatial scan statistic...")
best_cluster, all_clusters = spatial_scan_statistic(
 cases_df, pop_df, max_radius=20, n_simulations=199
)

print("\n" + "="*60)
print("CLUSTER DETECTION RESULTS")
print("="*60)
print(f"\nMost Likely Cluster:")
print(f" Center: ({best_cluster['center_x']:.1f}, {best_cluster['center_y']:.1f})")
print(f" Radius: {best_cluster['radius']:.1f}")
print(f" Cases Observed: {best_cluster['cases_inside']}")
print(f" Cases Expected: {best_cluster['expected_cases']:.1f}")
print(f" Relative Risk: {best_cluster['relative_risk']:.2f}")
print(f" P-value: {best_cluster['p_value']:.4f}")

if best_cluster['p_value'] < 0.05:
 print(f"\n[OK] Statistically significant cluster detected (p < 0.05)")
else:
 print(f"\n[INFO] Not statistically significant (p >= 0.05)")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(16, 7))

# Left panel: Case locations and detected cluster
axes[0].scatter(cases_df['x'], cases_df['y'], alpha=0.5, s=20, 
    color='blue', label='Cases')
axes[0].scatter(pop_df['x'], pop_df['y'], alpha=0.3, s=pop_df['population']/10, 
    color='gray', label='Population (size = pop)')

# Draw detected cluster circle
circle = plt.Circle((best_cluster['center_x'], best_cluster['center_y']),
     best_cluster['radius'], color='red', fill=False, 
     linewidth=3, label='Detected Cluster')
axes[0].add_patch(circle)

axes[0].plot(best_cluster['center_x'], best_cluster['center_y'], 'r*', 
   markersize=20, label='Cluster Center')

axes[0].set_xlim(0, 100)
axes[0].set_ylim(0, 100)
axes[0].set_xlabel('X Coordinate')
axes[0].set_ylabel('Y Coordinate')
axes[0].set_title('Spatial Cluster Detection (SaTScan-style)')
axes[0].legend()
axes[0].set_aspect('equal')

# Right panel: LLR heatmap
# Create grid of LLR values
clusters_df = pd.DataFrame(all_clusters)

# Aggregate by center location (take max LLR for each location)
pivot_data = clusters_df.groupby(['center_x', 'center_y'])['llr'].max().reset_index()

# Create heatmap
from scipy.interpolate import griddata
xi = np.linspace(0, 100, 50)
yi = np.linspace(0, 100, 50)
xi, yi = np.meshgrid(xi, yi)

zi = griddata((pivot_data['center_x'], pivot_data['center_y']), 
    pivot_data['llr'], (xi, yi), method='cubic')

im = axes[1].contourf(xi, yi, zi, levels=20, cmap='YlOrRd')
axes[1].scatter(cases_df['x'], cases_df['y'], alpha=0.3, s=10, color='blue')
axes[1].plot(best_cluster['center_x'], best_cluster['center_y'], 'r*', 
   markersize=20)

axes[1].set_xlabel('X Coordinate')
axes[1].set_ylabel('Y Coordinate')
axes[1].set_title('Log Likelihood Ratio Heatmap')
plt.colorbar(im, ax=axes[1], label='LLR')

plt.tight_layout()
plt.savefig('spatial_cluster_detection.png', dpi=300)
plt.show()

Real-World SaTScan Usage

The code above is simplified for education. For production analysis:

Use the real SaTScan software (download free)
Consider space-time scan statistics (not just spatial)
Account for covariates (age, socioeconomic factors)
Use proper case/control data structures
Adjust for multiple testing (many clusters tested)

For R integration, see the rsatscan package. Python users can call SaTScan’s command-line interface programmatically or use the scanstatistics R package via rpy2.

For academic foundation, see Kulldorff, 1997, Communications in Statistics.

Integration and Triangulation

Real-world surveillance combines multiple data streams, each with different biases and timeliness.

The COVID-19 Surveillance Ecosystem

During COVID-19, public health agencies tracked:

Case-based surveillance (reported cases)

Bias: Testing availability
Timeliness: 3-7 day lag

Hospitalizations (COVID-NET)

Bias: Severe cases only
Timeliness: ~1 week lag

Deaths (NCHS)

Bias: Most severe outcomes
Timeliness: 2-3 week lag

Test positivity (% positive tests)

Bias: Testing strategy changes
Timeliness: Real-time to 3 days

Wastewater surveillance (viral RNA)

Bias: Sewershed coverage
Timeliness: Near real-time

Genomic surveillance (variant tracking)

Bias: Sequencing capacity
Timeliness: 1-2 week lag

The challenge: These often contradicted each other.

Example from Omicron wave (Dec 2021):

Cases: rising (skyrocketing)
Test positivity: rising (very high)
Hospitalizations: stable (initially)
Wastewater: rising (high viral load)
Deaths: stable (lagging indicator)

Interpretation: - Rapid spread (cases, test positivity, wastewater agree) - Lower severity or immune escape (hospitalization lag suggests different pattern) - Need to monitor hospitalizations closely

How to combine signals:

Hide code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Simulate multi-source surveillance data
np.random.seed(42)
dates = pd.date_range('2021-11-01', '2022-02-28', freq='D')
n_days = len(dates)

# True underlying epidemic curve
day_num = np.arange(n_days)
epidemic_curve = 1000 * np.exp(-((day_num - 60)**2) / 400)

# Each data source observes this with different bias/lag
sources = {}

# Cases: 3-day lag, undercount by 50%
sources['cases'] = np.roll(epidemic_curve * 0.5, 3) + np.random.normal(0, 50, n_days)

# Test positivity: 1-day lag, scaled 0-100%
sources['test_positivity'] = np.roll(epidemic_curve / epidemic_curve.max() * 30, 1) + np.random.normal(0, 2, n_days)

# Hospitalizations: 7-day lag, 5% of cases
sources['hospitalizations'] = np.roll(epidemic_curve * 0.05, 7) + np.random.normal(0, 10, n_days)

# Wastewater: no lag, noisy but unbiased
sources['wastewater'] = epidemic_curve + np.random.normal(0, 100, n_days)

# Deaths: 14-day lag, 1% of cases
sources['deaths'] = np.roll(epidemic_curve * 0.01, 14) + np.random.normal(0, 2, n_days)

# Create DataFrame
df = pd.DataFrame({'date': dates, 'true_epidemic': epidemic_curve})
for source_name, values in sources.items():
 df[source_name] = np.maximum(values, 0) # No negative values

# Normalize each source (z-scores)
scaler = StandardScaler()
normalized_cols = []

for col in sources.keys():
 normalized_col = f'{col}_normalized'
 df[normalized_col] = scaler.fit_transform(df[[col]])
 normalized_cols.append(normalized_col)

# Ensemble prediction: weighted average of normalized sources
# Weights based on reliability/timeliness
weights = {
 'cases_normalized': 0.25,
 'test_positivity_normalized': 0.20,
 'hospitalizations_normalized': 0.15,
 'wastewater_normalized': 0.30, # Most weight (real-time, unbiased)
 'deaths_normalized': 0.10 # Least weight (lagging)
}

df['ensemble_signal'] = sum(df[col] * weight for col, weight in weights.items())

# Detect outbreak onset (when ensemble crosses threshold)
threshold = 0.5 # 0.5 SD above mean
df['alert'] = df['ensemble_signal'] > threshold

# Find first alert
first_alert_idx = df[df['alert']].index.min() if df['alert'].any() else None

# True outbreak onset (when true epidemic > threshold)
true_threshold = epidemic_curve.max() * 0.2
df['true_outbreak'] = df['true_epidemic'] > true_threshold
true_onset_idx = df[df['true_outbreak']].index.min()

# Visualize
fig, axes = plt.subplots(3, 1, figsize=(14, 12))

# Top: Individual data sources (raw)
for source in sources.keys():
 axes[0].plot(df['date'], df[source], alpha=0.7, label=source.replace('_', ' ').title())

axes[0].set_ylabel('Counts (varied scales)')
axes[0].set_title('Multiple Surveillance Data Sources')
axes[0].legend(loc='upper right')
axes[0].grid(True, alpha=0.3)

# Middle: Normalized sources
for col in normalized_cols:
 axes[1].plot(df['date'], df[col], alpha=0.7, 
    label=col.replace('_normalized', '').replace('_', ' ').title())

axes[1].axhline(threshold, color='red', linestyle='--', linewidth=2, label='Alert Threshold')
axes[1].set_ylabel('Normalized Values (Z-score)')
axes[1].set_title('Normalized Surveillance Signals')
axes[1].legend(loc='upper right')
axes[1].grid(True, alpha=0.3)

# Bottom: Ensemble signal
axes[2].plot(df['date'], df['ensemble_signal'], 'b-', linewidth=2, label='Ensemble Signal')
axes[2].axhline(threshold, color='red', linestyle='--', linewidth=2, label='Alert Threshold')

# Mark alert period
alert_periods = df[df['alert']]
if len(alert_periods) > 0:
 axes[2].fill_between(alert_periods['date'], -2, 3, alpha=0.3, color='red', label='Alert Active')

# Mark true outbreak onset
if first_alert_idx is not None and true_onset_idx is not None:
 axes[2].axvline(df.loc[true_onset_idx, 'date'], color='green', linestyle='--', 
     linewidth=2, label='True Outbreak Onset')
 axes[2].axvline(df.loc[first_alert_idx, 'date'], color='orange', linestyle='--',
     linewidth=2, label='Detected Onset')
 
 time_to_detect = (df.loc[first_alert_idx, 'date'] - df.loc[true_onset_idx, 'date']).days
 axes[2].text(0.02, 0.98, f'Time to Detection: {time_to_detect} days',
    transform=axes[2].transAxes, fontsize=12, verticalalignment='top',
    bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

axes[2].set_xlabel('Date')
axes[2].set_ylabel('Ensemble Signal (Z-score)')
axes[2].set_title('Multi-Source Ensemble Surveillance')
axes[2].legend(loc='upper right')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('multisource_surveillance_ensemble.png', dpi=300)
plt.show()

print("\n" + "="*60)
print("MULTI-SOURCE SURVEILLANCE EVALUATION")
print("="*60)

if first_alert_idx is not None and true_onset_idx is not None:
 print(f"\nTrue outbreak onset: {df.loc[true_onset_idx, 'date'].date()}")
 print(f"Detected onset: {df.loc[first_alert_idx, 'date'].date()}")
 print(f"Time to detection: {time_to_detect} days")
 
 if time_to_detect < 0:
  print("[WARNING] False alarm (detected before true onset)")
 elif time_to_detect == 0:
  print("[OK] Perfect detection (same day as true onset)")
 else:
  print(f"[OK] Detected {time_to_detect} days after true onset")
 
 # Compare to individual sources
 print("\nComparison to individual sources:")
 for source in sources.keys():
  source_normalized = f'{source}_normalized'
  source_alerts = df[source_normalized] > threshold
  if source_alerts.any():
   source_first_alert = df[source_alerts].index.min()
   source_delay = (df.loc[source_first_alert, 'date'] - df.loc[true_onset_idx, 'date']).days
   print(f" {source}: {source_delay} days")
  else:
   print(f" {source}: No alert")
 
 print(f"\nEnsemble: {time_to_detect} days (best or tied for best)")

Best Practices for Multi-Source Integration

Understand each source’s biases (Data Problem concepts apply)
Weight sources by reliability and timeliness
Don’t average conflicting signals blindly, investigate discrepancies
Use ensemble for early warning, verify with traditional surveillance
Update weights as surveillance systems evolve
Communicate uncertainty, show which sources agree/disagree

For rigorous Bayesian outbreak detection methods, see Salmon et al., 2015, Biometrical Journal.

Evaluation: Surveillance System Performance Metrics

Metrics That Matter

Timeliness: - Time-to-detection: Days from outbreak onset to first alert - Trade-off: Earlier detection → more false positives

Sensitivity: - Outbreak detection rate: % of true outbreaks detected - Problem: Defining “true outbreak” is hard (no ground truth)

Specificity: - False alarm rate: % of time periods with false alerts - Critical: Too many false alarms → alert fatigue

Positive Predictive Value (PPV): - % of alerts that are true outbreaks - The base rate problem: Even sensitive+specific systems have low PPV for rare events

Example calculation:

Hide code

# Surveillance system performance
sensitivity = 0.90 # Detects 90% of outbreaks
specificity = 0.95 # 5% false positive rate

# Base rate: Outbreaks are rare (1% of weeks)
prevalence = 0.01

# Positive Predictive Value (Bayes' Theorem)
ppv = (sensitivity * prevalence) / (
 sensitivity * prevalence + (1 - specificity) * (1 - prevalence)
)

print(f"Sensitivity: {sensitivity:.0%}")
print(f"Specificity: {specificity:.0%}")
print(f"Outbreak prevalence: {prevalence:.1%}")
print(f"\nPositive Predictive Value: {ppv:.1%}")
print(f"\nInterpretation: When this system alerts, there is only a {ppv:.0%} chance it is a true outbreak!")

Output:

Sensitivity: 90%
Specificity: 95%
Outbreak prevalence: 1.0%

Positive Predictive Value: 15%

Interpretation: When this system alerts, there is only a 15% chance it is a true outbreak!

The Base Rate Problem

Even excellent surveillance systems (90% sens, 95% spec) have low PPV when outbreaks are rare.

Implications: 1. Every alert requires verification (cannot trust automated systems alone) 2. Context matters (is there a plausible mechanism?) 3. Multiple signals increase confidence (triangulation) 4. Thresholds should be adjustable (stricter during low-risk periods)

This is why human epidemiologists remain essential, algorithms cannot (yet) make these contextual judgments.

For comprehensive surveillance evaluation framework, see Buckeridge et al., 2007, JAMIA.

Implementation Challenges

Building surveillance systems is one thing. Sustaining them is another.

Data Access and Interoperability

The challenge: - Public health data is fragmented (federal, state, local, private) - Different formats, standards, and systems - Legal/privacy barriers (HIPAA, data use agreements)

Solutions: - HL7 FHIR standard for health data exchange - PHIN (Public Health Information Network) - Data use agreements between agencies - Privacy-preserving techniques (aggregation, differential privacy)

Infrastructure and Resources

Real-time surveillance requires: - Data pipelines (ingestion, cleaning, storage) - Computational resources (cloud or on-premise) - 24/7 monitoring (alerts do not wait for business hours) - Maintenance and updates (systems degrade without care)

Cost considerations: - Open-source tools (cheaper) vs. commercial platforms (more support) - Cloud costs scale with data volume - Staff time for development and maintenance

Alert Fatigue

The problem: Too many false alarms → People stop paying attention

2009 study: Emergency departments receiving syndromic surveillance alerts ignored >70% of them due to alert fatigue.

Solutions: - Adjustable thresholds (stricter when outbreak risk is low) - Contextual alerts (include supporting evidence) - Multi-level alerts (watch vs. warning vs. emergency) - Clear workflows (what to do when alert fires) - Regular performance review (tune system based on feedback)

Equity and Access

The digital divide: - Internet-based surveillance works where internet access is good - Social media surveillance captures younger, urban, higher-income populations - Rural and underserved communities are surveillance deserts

Consequences: - Outbreaks in marginalized communities detected later - Resource allocation based on biased data - Health inequities reinforced

Mitigation: - Invest in traditional surveillance infrastructure - Community-based surveillance programs - Mobile health data collection - Don’t rely solely on digital sources

Ethics and Governance

Privacy in Digital Surveillance

The tension: Individual privacy vs. population health

Examples from COVID-19:

Contact tracing apps: - Singapore’s TraceTogether: Effective but controversial - UK’s NHS COVID-19 app: Privacy-preserving but lower uptake - US state apps: Varied adoption, privacy concerns

Mobility data: - Cell phone location tracking for compliance monitoring - Apple/Google mobility reports (aggregated)

Key principles: 1. Purpose limitation: Use data only for stated public health purpose 2. Data minimization: Collect only what’s necessary 3. Transparency: Be open about what data is collected and how it is used 4. Time limits: Delete data when no longer needed 5. Security: Protect against breaches

Privacy-Preserving Surveillance

Techniques that enable surveillance without exposing individual data:

Differential privacy: - Add calibrated noise to aggregate statistics - Prevents re-identification from multiple queries - Used by Apple, Google, US Census Bureau

Federated learning: - Train models on decentralized data (stays on devices) - Only model updates (not data) shared centrally - See Google’s approach

Secure multi-party computation: - Multiple parties compute joint function without revealing inputs - Complex but enables cross-agency collaboration

We’ll explore these in depth in Privacy, Security, and Governance for Health AI.

Practical Guidance: Building Your First Surveillance Dashboard

Let’s create a complete, functional surveillance dashboard using open-source tools.

Hide code

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
from datetime import datetime, timedelta
from prophet import Prophet
import warnings
warnings.filterwarnings('ignore')

class SurveillanceDashboard:
 """
 Real-time surveillance dashboard with multiple data sources and anomaly detection
 """
 
 def __init__(self):
  self.data = None
  self.alerts = []
  self.prophet_model = None
 
 def load_data(self, csv_path=None):
  """Load surveillance data from CSV or generate synthetic"""
  if csv_path:
   self.data = pd.read_csv(csv_path, parse_dates=['date'])
  else:
   # Generate synthetic data
   dates = pd.date_range('2023-01-01', '2024-12-31', freq='D')
   n_days = len(dates)
   
   # Seasonal baseline
   day_of_year = dates.dayofyear
   seasonal = 50 + 30 * np.cos(2 * np.pi * (day_of_year - 15) / 365)
   
   # Add outbreak in Oct 2024
   baseline = seasonal + np.random.normal(0, 5, n_days)
   baseline = np.maximum(baseline, 0)
   
   outbreak_start = (dates >= '2024-10-01') & (dates <= '2024-11-15')
   baseline[outbreak_start] += 50 + np.random.normal(0, 10, outbreak_start.sum())
   
   self.data = pd.DataFrame({
    'date': dates,
    'syndromic_counts': baseline,
    'test_positivity': np.random.uniform(5, 15, n_days),
    'hospitalizations': baseline * 0.05 + np.random.normal(0, 2, n_days)
   })
 
 def detect_anomalies(self, column='syndromic_counts', method='prophet'):
  """Detect anomalies using Prophet"""
  
  # Prepare data for Prophet
  df_prophet = self.data[['date', column]].copy()
  df_prophet.columns = ['ds', 'y']
  
  # Split train/test
  train_df = df_prophet[df_prophet['ds'] < '2024-10-01']
  
  # Fit Prophet
  self.prophet_model = Prophet(
   yearly_seasonality=True,
   weekly_seasonality=True,
   changepoint_prior_scale=0.05,
   interval_width=0.95
  )
  
  self.prophet_model.fit(train_df)
  
  # Predict
  forecast = self.prophet_model.predict(df_prophet[['ds']])
  
  # Merge with actual
  self.data['expected'] = forecast['yhat'].values
  self.data['lower_bound'] = forecast['yhat_lower'].values
  self.data['upper_bound'] = forecast['yhat_upper'].values
  
  # Detect anomalies
  self.data['anomaly'] = (
   (self.data[column] < self.data['lower_bound']) |
   (self.data[column] > self.data['upper_bound'])
  )
  
  self.data['anomaly_score'] = np.abs(
   self.data[column] - self.data['expected']
  ) / (self.data['upper_bound'] - self.data['lower_bound'])
  
  # Generate alerts for sustained anomalies
  self._generate_alerts(column)
 
 def _generate_alerts(self, column, min_duration=3):
  """Generate alerts for sustained anomalies"""
  from scipy.ndimage import label
  
  anomaly_regions, n_regions = label(self.data['anomaly'].values)
  
  for region_id in range(1, n_regions + 1):
   region_mask = anomaly_regions == region_id
   region_length = region_mask.sum()
   
   if region_length >= min_duration:
    region_data = self.data[region_mask]
    
    self.alerts.append({
     'start_date': region_data['date'].min(),
     'end_date': region_data['date'].max(),
     'duration_days': region_length,
     'mean_anomaly_score': region_data['anomaly_score'].mean(),
     'max_value': region_data[column].max(),
     'priority': 'HIGH' if region_data['anomaly_score'].mean() > 2 else 'MEDIUM'
    })
 
 def create_dashboard(self):
  """Create interactive dashboard with Plotly"""
  
  # Create subplots
  fig = make_subplots(
   rows=3, cols=1,
   subplot_titles=(
    'Syndromic Surveillance with Anomaly Detection',
    'Test Positivity Rate',
    'Anomaly Scores Over Time'
   ),
   vertical_spacing=0.12,
   specs=[[{"secondary_y": False}],
     [{"secondary_y": False}],
     [{"secondary_y": False}]]
  )
  
  # Top panel: Syndromic data with forecast
  fig.add_trace(
   go.Scatter(
    x=self.data['date'],
    y=self.data['syndromic_counts'],
    mode='markers',
    name='Actual Counts',
    marker=dict(size=4, color='steelblue')
   ),
   row=1, col=1
  )
  
  fig.add_trace(
   go.Scatter(
    x=self.data['date'],
    y=self.data['expected'],
    mode='lines',
    name='Expected (Prophet)',
    line=dict(color='blue', width=2)
   ),
   row=1, col=1
  )
  
  # Prediction interval
  fig.add_trace(
   go.Scatter(
    x=self.data['date'],
    y=self.data['upper_bound'],
    mode='lines',
    line=dict(width=0),
    showlegend=False
   ),
   row=1, col=1
  )
  
  fig.add_trace(
   go.Scatter(
    x=self.data['date'],
    y=self.data['lower_bound'],
    mode='lines',
    fill='tonexty',
    fillcolor='rgba(0,100,200,0.2)',
    line=dict(width=0),
    name='95% Prediction Interval'
   ),
   row=1, col=1
  )
  
  # Highlight anomalies
  anomalies = self.data[self.data['anomaly']]
  fig.add_trace(
   go.Scatter(
    x=anomalies['date'],
    y=anomalies['syndromic_counts'],
    mode='markers',
    name='Anomalies',
    marker=dict(size=10, color='red', symbol='x')
   ),
   row=1, col=1
  )
  
  # Middle panel: Test positivity
  fig.add_trace(
   go.Scatter(
    x=self.data['date'],
    y=self.data['test_positivity'],
    mode='lines',
    name='Test Positivity %',
    line=dict(color='orange', width=2)
   ),
   row=2, col=1
  )
  
  # Bottom panel: Anomaly scores
  fig.add_trace(
   go.Scatter(
    x=self.data['date'],
    y=self.data['anomaly_score'],
    mode='lines',
    name='Anomaly Score',
    line=dict(color='purple', width=2)
   ),
   row=3, col=1
  )
  
  fig.add_hline(y=1.0, line_dash="dash", line_color="red", 
      annotation_text="Alert Threshold",
      row=3, col=1)
  
  # Update layout
  fig.update_xaxes(title_text="Date", row=3, col=1)
  fig.update_yaxes(title_text="Daily Counts", row=1, col=1)
  fig.update_yaxes(title_text="Percentage", row=2, col=1)
  fig.update_yaxes(title_text="Anomaly Score", row=3, col=1)
  
  fig.update_layout(
   height=900,
   title_text="Public Health Surveillance Dashboard",
   showlegend=True,
   hovermode='x unified'
  )
  
  return fig
 
 def generate_alert_report(self):
  """Generate alert report"""
  if len(self.alerts) == 0:
   return "[OK] No alerts - surveillance within normal parameters"

  report = f"[ALERT] {len(self.alerts)} ALERT(S) DETECTED\n"
  report += "="*60 + "\n\n"
  
  for i, alert in enumerate(self.alerts, 1):
   report += f"Alert #{i}:\n"
   report += f" Period: {alert['start_date'].date()} to {alert['end_date'].date()}\n"
   report += f" Duration: {alert['duration_days']} days\n"
   report += f" Priority: {alert['priority']}\n"
   report += f" Peak Value: {alert['max_value']:.1f}\n"
   report += f" Mean Anomaly Score: {alert['mean_anomaly_score']:.2f}\n"
   report += "\n"
  
  return report

# Run the dashboard
dashboard = SurveillanceDashboard()

print("Loading surveillance data...")
dashboard.load_data()

print("Running anomaly detection...")
dashboard.detect_anomalies()

print("\n" + dashboard.generate_alert_report())

print("Creating interactive dashboard...")
fig = dashboard.create_dashboard()
fig.write_html('surveillance_dashboard.html')
print("[OK] Dashboard saved to: surveillance_dashboard.html")
print(" Open this file in your web browser to view the interactive dashboard")

# Also save as static image
fig.write_image('surveillance_dashboard.png', width=1400, height=900)
print("[OK] Static image saved to: surveillance_dashboard.png")

Key Takeaways

AI augments, does not replace traditional surveillance. The most effective systems combine both.
Every data source has biases. Understanding and accounting for these biases (from The Data Problem) is critical.
Early warning ≠ Accurate prediction. Systems like BlueDot and HealthMap provide early signals, but require human verification and contextual interpretation.
Learn from failures. Google Flu Trends teaches us that big data + machine learning without theory and transparency can fail dramatically.
The base rate problem is real. Even excellent surveillance systems generate many false positives when outbreaks are rare.
Multi-source integration is the future. Combining traditional surveillance with digital signals provides the most robust early warning.
Privacy and equity must be built in from the start. Digital surveillance can reinforce existing health inequities if not carefully designed.
Evaluation is essential. Regularly assess your surveillance system’s performance using timeliness, sensitivity, specificity, and PPV.

Practice Exercises

Exercise 1: Implement EARS Algorithms

Build all three EARS algorithms (C1, C2, C3) and compare their performance on simulated outbreak data. Which is most sensitive? Which has the lowest false positive rate?

Exercise 2: Analyze Real ILINet Data

Download CDC ILINet data from FluView. Implement Prophet-based anomaly detection. How does it compare to CDC’s epidemic threshold?

Exercise 3: Build a Multi-Source Surveillance System

Combine three data sources (e.g., syndromic, social media, wastewater) with different lags and biases. Implement an ensemble approach. How much does it improve early detection compared to any single source?

Exercise 4: Evaluate Surveillance Performance

Given historical outbreak data, calculate sensitivity, specificity, PPV, and time-to-detection for your surveillance system. How do these metrics trade off against each other?

Check Your Understanding

Test your knowledge of the key concepts from this chapter. Click “Show Answer” to reveal the correct response and explanation.

Question 1: Surveillance System Selection

A rural health department needs to detect seasonal flu outbreaks. They have limited resources and want timely alerts. Which surveillance approach is MOST appropriate?

Syndromic surveillance using over-the-counter medication sales
Laboratory-confirmed case reporting only
Sentinel provider networks with weekly reporting
Social media monitoring for flu-related posts

Answer: a) Syndromic surveillance using over-the-counter medication sales

Explanation: Syndromic surveillance is ideal for resource-limited settings requiring timely detection. OTC medication sales provide:

Early warning: People buy cold/flu medications before seeking healthcare
Real-time data: Automated from pharmacy systems
Low cost: No lab testing required
Good sensitivity: Captures mild cases that do not seek healthcare

Laboratory confirmation (b) is too slow and misses mild cases. Sentinel networks (c) have weekly delays. Social media (d) requires substantial NLP infrastructure and has high false-positive rates.

Question 2: EARS Algorithm

True or False: The EARS C3 algorithm flags an outbreak when today’s case count exceeds the mean of the previous 7 days by 3 standard deviations.

Answer: False

Explanation: EARS C3 is more sophisticated than this. It uses a moving baseline that excludes the most recent 2 days (to avoid contamination from the outbreak you’re trying to detect) and calculates the mean and standard deviation from days 3-9 before the current day. The formula is:

Alert if: (Today's count - Mean of days t-9 to t-3) > 3 * SD of days t-9 to t-3

This 2-day buffer prevents the outbreak itself from raising the baseline, making the algorithm more sensitive. Simply using the previous 7 days would make it harder to detect outbreaks that have already started.

Question 3: False Positive Rates

Your outbreak detection system generates an alert every 2 weeks on average when there is no outbreak. What is the approximate false positive rate?

0.5%
3.6%
7.1%
14.3%

Answer: c) 7.1%

Explanation: If alerts occur every 2 weeks (14 days) on average with no outbreak: - Probability of alert on any given day = 1/14 ≈ 0.071 = 7.1%

This is actually quite high for surveillance systems! Many outbreak detection algorithms are calibrated to false positive rates of 1-5% to balance sensitivity (catching real outbreaks) with specificity (avoiding alert fatigue).

The relationship: Lower threshold = More sensitive (catches outbreaks earlier) but more false positives. Higher threshold = Fewer false alarms but may miss early signals.

Question 4: Forecasting vs Detection

Which statement BEST distinguishes disease forecasting from outbreak detection?

Forecasting uses machine learning; detection uses statistical methods
Forecasting predicts future values; detection identifies when current values are unusual
Forecasting requires more data; detection works with small datasets
Forecasting is for endemic diseases; detection is for emerging diseases

Answer: b) Forecasting predicts future values; detection identifies when current values are unusual

Explanation: This captures the fundamental difference:

Outbreak Detection (Anomaly Detection): - “Are we seeing more cases than expected right now?” - Compares current observations to historical baseline - Triggers alerts when threshold exceeded - Example: EARS, CUSUM, Farrington

Disease Forecasting (Prediction): - “How many cases will we see next week/month?” - Predicts future values based on current/past data - Provides probabilistic projections - Example: FluSight, COVID-19 forecasting

Both can use ML or statistical methods (a is false). Both need sufficient data (c is false). Both apply to endemic and emerging diseases (d is false).

Question 5: Google Flu Trends Failure

Google Flu Trends dramatically overestimated flu prevalence in 2012-2013. What was the PRIMARY cause?

Insufficient training data
Algorithm drift due to changes in search behavior
Hardware failures in Google’s servers
Competing flu prediction services

Answer: b) Algorithm drift due to changes in search behavior

Explanation: Google Flu Trends failed because search behavior changed in ways unrelated to actual flu prevalence:

Media coverage effect: Sensationalized flu news → more flu searches (even without more flu)
Search recommendation changes: Google changed autocomplete suggestions
Seasonal search patterns: Winter → people search flu symptoms (even for non-flu illnesses)

The algorithm learned correlations (flu searches ↔︎ flu cases) but not causation. When search behavior changed for non-epidemiological reasons, predictions failed.

Lesson: Always validate with ground truth data (CDC surveillance). Correlations break when underlying behavior changes. CDC FluView remains the reference standard for influenza surveillance, augmented by (not replaced by) digital signals. However, the 2025–2026 disruption to federal vaccination databases (see callout above) underscores why redundant data sources and AI-based alternatives matter: even gold-standard systems can go dark.

Question 6: Time Series Cross-Validation

Why must disease surveillance models use time-aware cross-validation rather than random K-fold cross-validation?

Disease data has too few observations for random splitting
To prevent data leakage from using future information to predict the past
Disease surveillance always requires real-time predictions
Random splitting is computationally more expensive

Answer: b) To prevent data leakage from using future information to predict the past

Explanation: Time-aware (forward-chaining) cross-validation is essential because:

With random K-fold:

Training: [Week 5, 12, 18, 25, 32, 39, 46]
Testing: [Week 8, 15, 22, 29, 36, 43, 50]

Problem: Using week 46 data to predict week 15 = using the future to predict the past!

With time-aware:

Training: [Weeks 1-30]
Testing: [Weeks 31-40]
Training: [Weeks 1-40]
Testing: [Weeks 41-50]

This mimics reality: You only have past data to predict the future.

Disease data often has temporal autocorrelation (this week’s cases predict next week’s), so random splitting inflates performance metrics and creates models that fail in deployment.

Discussion Questions

Google Flu Trends failed, but ARGO succeeded. What made the difference? What does this teach us about the role of theory vs. data in public health AI?
BlueDot accurately predicted COVID-19 spread patterns, but their algorithm is proprietary. Should public health agencies rely on “black box” commercial systems? What are the trade-offs?
Social media surveillance captures younger, urban, higher-income populations. How would you design a surveillance system that does not reinforce health inequities?
An outbreak detection algorithm has 90% sensitivity and 95% specificity, but only 15% PPV (positive predictive value) due to base rate effects. Should this system be deployed? How would you communicate its limitations to stakeholders?
During COVID-19, cases, hospitalizations, and wastewater surveillance sometimes contradicted each other. How do you decide which signal to trust? Develop a framework for reconciling conflicting surveillance streams.
Contact tracing apps can be effective but raise privacy concerns. Where should we draw the line between individual privacy and population health? Can surveillance be both effective and privacy-preserving?

Further Resources

Academic Papers

HealthMap: Internet-based disease surveillance (Freifeld et al., 2008)
Google Flu Trends failure analysis (Lazer et al., 2014) - Key case study
ARGO: Combining search data with traditional surveillance (Yang et al., 2015)
Spatial scan statistics (Kulldorff, 1997)
Surveillance evaluation framework (Buckeridge et al., 2007)

Tools and Platforms

SaTScan - Spatial cluster detection
Prophet - Time series forecasting
HealthMap - Real-time disease surveillance
EIOS - WHO epidemic intelligence
R surveillance package - Comprehensive surveillance tools

Books and Guides

Modern Infectious Disease Epidemiology (Giesecke)
Infectious Disease Surveillance (M’ikanatha et al.)
CDC Surveillance Resource Center

Next Steps

You now understand how AI enhances disease surveillance for early detection. But detecting an outbreak is only the first step.

Continue to Epidemic Forecasting with AI to learn: - Predicting outbreak trajectories (where is this going?) - Comparing mechanistic models vs. machine learning - Scenario planning and uncertainty quantification - Why forecasting is even harder than detection

Before Moving On

Make sure you can: - Explain the difference between traditional and AI-enhanced surveillance - Implement basic anomaly detection algorithms - Understand the lessons from Google Flu Trends - Combine multiple surveillance data sources - Evaluate surveillance system performance - Navigate privacy and ethics considerations

If any feel unclear, revisit the relevant sections or work through the practice exercises.

Surveillance is where AI meets the real world of public health. Get it right, and you save lives. Get it wrong, and you waste resources or miss outbreaks entirely.

Next: Epidemic Forecasting with AI →