Appendix B — Glossary

Appendix A: Glossary

A comprehensive reference of artificial intelligence, machine learning, public health, and related technical terms used throughout this book.

A

Accuracy The proportion of correct predictions (both true positives and true negatives) among all predictions. Accuracy = (TP + TN) / (TP + TN + FP + FN). While intuitive, accuracy can be misleading with imbalanced datasets.

Active Learning A machine learning approach where the algorithm iteratively selects the most informative examples for human labeling, optimizing learning efficiency with minimal labeled data. Particularly useful in medical settings where expert labeling is expensive.

Adversarial Examples Inputs deliberately designed to cause an AI model to make mistakes. In healthcare, adversarial attacks could manipulate medical images or patient data to cause misdiagnosis.

AI Ethics Committee An organizational body responsible for reviewing AI projects for ethical considerations, fairness, transparency, and compliance with institutional values and regulations.

AI Winter Historical periods (1974-1980, 1987-1993) when funding and interest in AI research declined dramatically due to unmet expectations. Serves as reminder to maintain realistic expectations.

Algorithm A step-by-step procedure or set of rules for solving a problem or accomplishing a task. In AI, algorithms process data to make predictions or decisions.

Algorithmic Bias Systematic and repeatable errors in AI systems that create unfair outcomes, such as privileging one group over others. Can arise from biased training data, flawed algorithms, or inappropriate use.

Algorithmic Fairness The principle that AI systems should make decisions that do not discriminate against individuals or groups based on protected characteristics (race, gender, age, etc.). Multiple mathematical definitions exist.

Anomaly Detection Identifying rare items, events, or observations that differ significantly from the majority of data. Used in public health for outbreak detection and fraud identification.

Artificial General Intelligence (AGI) Hypothetical AI with human-level intelligence across all domains. Current AI systems are “narrow AI” - specialized for specific tasks. AGI remains a distant goal.

Artificial Intelligence (AI) The science of creating machines capable of performing tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.

Attention Mechanism Neural network component that allows models to focus on relevant parts of input data, similar to human attention. Critical for transformer architectures and models like BERT and GPT.

AUC-ROC (Area Under the Receiver Operating Characteristic Curve) A performance metric for binary classification that measures the model’s ability to distinguish between classes across all classification thresholds. AUC ranges from 0 to 1, with 0.5 representing random chance and 1.0 representing perfect classification.

Augmented Intelligence AI systems designed to enhance rather than replace human intelligence and decision-making. Emphasizes human-AI collaboration over automation.

Autoencoder A type of neural network trained to compress data into a lower-dimensional representation and then reconstruct the original data. Used for dimensionality reduction, anomaly detection, and denoising.

Automated Machine Learning (AutoML) Tools and methods that automate the process of applying machine learning, including feature engineering, model selection, and hyperparameter tuning. Makes ML more accessible to non-experts.

Autonomous AI AI systems that make and execute decisions without human intervention. In healthcare, fully autonomous systems raise ethical and liability concerns.

B

Backpropagation Algorithm for training neural networks by computing gradients of the loss function with respect to network weights, then updating weights to minimize loss. Enables deep learning.

Baseline Model A simple model used as a reference point for evaluating more complex models. Examples include logistic regression, decision trees, or simply predicting the majority class.

Batch Normalization Technique that normalizes inputs to each layer of a neural network, accelerating training and improving stability. Helps address internal covariate shift.

Batch Size The number of training examples used in one iteration of model training. Larger batches provide more stable gradients but require more memory.

Bayesian Methods Statistical approaches that incorporate prior beliefs and update them based on new evidence. Bayesian deep learning provides uncertainty quantification in predictions.

Benchmark Dataset A standard dataset used to compare performance of different algorithms. Examples include ImageNet (images), MIMIC-III (clinical data), and COVID-19 datasets.

Bias (Statistical) Systematic error in predictions that causes them to consistently differ from true values in one direction. Distinct from algorithmic bias, though related.

Bias-Variance Tradeoff Fundamental concept in machine learning: simple models have high bias (underfitting) but low variance, while complex models have low bias but high variance (overfitting). Goal is to balance both.

Big Data Extremely large datasets that are difficult to process using traditional methods. Characterized by volume, velocity, variety, veracity, and value (the “5 Vs”).

Binary Classification Machine learning task of categorizing data into two classes (e.g., disease/no disease, high risk/low risk).

Biomarker Measurable biological indicator of health status or disease. AI can identify novel biomarkers from complex data.

Black Box Model AI models whose internal decision-making processes are opaque and difficult to interpret. Examples include deep neural networks. Contrasts with interpretable models like decision trees.

Boosting Ensemble method that combines weak learners sequentially, with each new model focusing on examples misclassified by previous models. XGBoost and AdaBoost are popular implementations.

Bootstrap Aggregating (Bagging) Ensemble method that trains multiple models on random subsamples of data (with replacement) and averages their predictions. Random Forests use bagging.

Brier Score Metric for evaluating probabilistic predictions. Measures mean squared difference between predicted probabilities and actual outcomes. Lower is better.

C

Calibration The degree to which predicted probabilities match observed frequencies. A well-calibrated model that predicts 70% risk should see events occur 70% of the time.

CARE Principles Framework for indigenous data governance: Collective benefit, Authority to control, Responsibility, Ethics. Complements FAIR principles for equitable data use.

Case-Control Study Observational study comparing individuals with a condition (cases) to those without (controls) to identify risk factors. AI can analyze case-control data but cannot establish causality alone.

Catastrophic Forgetting When a neural network trained on new tasks completely forgets previously learned information. Challenge for continual learning in deployed AI systems.

Categorical Variable Variable that can take on one of a limited set of values (categories). Examples: blood type, disease diagnosis, treatment type. Require encoding for ML algorithms.

Causal Inference Methods for determining cause-and-effect relationships from data. Important for understanding whether interventions will work. AI can aid but doesn’t automatically provide causality.

CE Marking European certification indicating that a medical device (including AI) conforms to EU regulations and can be sold in the European Economic Area.

Class Imbalance When one class significantly outnumbers others in training data. Common in healthcare (rare diseases, adverse events). Requires special handling to avoid biased models.

Classification Machine learning task of predicting which category an input belongs to. Binary classification has two classes; multiclass has more than two.

Clinical Decision Support System (CDSS) Software that provides clinicians with patient-specific assessments or recommendations to aid decision-making. AI-powered CDSS uses machine learning models.

Clinical Trial Prospective research study that evaluates medical interventions in human subjects. Gold standard for establishing efficacy. AI clinical trials evaluate AI tools’ clinical impact.

Clinical Validation Process of demonstrating that an AI system performs as intended in real clinical settings, with real patients, measuring clinical outcomes (not just algorithm metrics).

Clustering Unsupervised learning task of grouping similar data points together. Used in public health for disease subtyping, patient stratification, and pattern discovery.

Cohort Study Observational study following a group of people over time to examine outcomes. Prospective cohorts track forward from exposure; retrospective track backward from outcome.

Computer Vision AI field focused on enabling computers to interpret and understand visual information from images and videos. Applications include medical imaging analysis.

Concept Drift When the statistical properties of the target variable change over time, causing model performance to degrade. Common in healthcare due to evolving practices and populations.

Confounding Variable A variable that influences both the independent and dependent variables, potentially creating spurious associations. AI models can be misled by confounders.

Confusion Matrix Table showing true positives, true negatives, false positives, and false negatives. Foundation for computing sensitivity, specificity, precision, and other metrics.

Continuous Learning AI systems that update their models over time as new data becomes available. Raises regulatory challenges as the “approved” model keeps changing.

Convolutional Neural Network (CNN) Deep learning architecture specialized for processing grid-like data (images). Uses convolution operations to detect features at multiple scales. Standard for medical imaging.

Correlation Statistical relationship between two variables. Correlation does not imply causation. AI models detect correlations but don’t inherently understand causality.

COVID-19 Coronavirus disease 2019, caused by SARS-CoV-2 virus. Global pandemic accelerated AI adoption in public health for forecasting, diagnosis, and drug discovery.

Cross-Validation Technique for assessing model performance by training on different subsets of data and testing on held-out portions. K-fold cross-validation divides data into k parts.

Curse of Dimensionality Phenomena where many machine learning algorithms struggle with high-dimensional data. As dimensions increase, data becomes sparse and distance metrics become less meaningful.

D

Data Augmentation Artificially increasing training data by applying transformations (rotation, flipping, noise addition) to existing examples. Improves model generalization and robustness.

Data Colonialism Extraction of data from marginalized populations or low-income countries for benefit of wealthy nations/corporations, without fair benefit sharing or local control.

Data Drift When the distribution of input features changes over time, potentially degrading model performance. Related to but distinct from concept drift.

Data Governance Policies, procedures, and standards for managing data availability, usability, integrity, and security. Critical for responsible AI in healthcare.

Data Leakage When information from outside the training dataset is used to create the model, leading to overly optimistic performance estimates. Common pitfall in healthcare ML.

Data Poisoning Adversarial attack where training data is deliberately corrupted to compromise model performance. Security concern for AI systems.

Data Science Interdisciplinary field combining statistics, computer science, and domain expertise to extract knowledge from data. Broader than machine learning alone.

Data Sovereignty The principle that data is subject to the laws and governance structures of the nation or region where it is collected. Important for international health collaborations.

Dataset Shift When training and deployment data come from different distributions. Causes performance degradation. Types include covariate shift, prior probability shift, and concept shift.

Decision Boundary The surface in feature space that separates different classes. Linear models have linear boundaries; complex models can have arbitrarily complex boundaries.

Decision Support (vs. Autonomous Decision) AI that provides recommendations to humans (decision support) versus AI that makes and executes decisions independently (autonomous). Most healthcare AI is decision support.

Decision Tree Interpretable machine learning model that makes predictions by learning a hierarchy of if-then rules. Easy to understand but prone to overfitting.

Deep Learning Subset of machine learning using neural networks with many layers (deep architectures). Excels at learning hierarchical representations from raw data.

De-identification Process of removing personally identifiable information from data to protect privacy. Includes removing names, dates, addresses, etc. Not perfect - re-identification is possible.

Demographic Parity Fairness criterion requiring that predictions be independent of protected attributes. A model satisfies demographic parity if positive prediction rates are equal across groups.

Deployment Process of integrating a trained AI model into production systems where it will be used for real-world decision-making. Deployment introduces new challenges beyond model development.

Differential Privacy Mathematical framework for sharing aggregate information about a dataset while provably limiting information about individuals. Strongest privacy guarantee but reduces data utility.

Digital Divide Gap between those with access to digital technologies (internet, computers, smartphones) and those without. Creates health inequities in AI deployment.

Dimensionality Reduction Techniques for reducing the number of features while preserving important information. Examples include PCA, t-SNE, UMAP. Helps with visualization and computation.

Discrimination In fairness, unfair treatment of individuals based on protected characteristics. In machine learning, the ability to distinguish between classes (a desirable property).

Disease Surveillance Ongoing systematic collection, analysis, and interpretation of health data for planning, implementing, and evaluating public health interventions.

Disparate Impact When a practice has disproportionately negative effects on a protected group, regardless of intent. Legal concept applied to algorithmic fairness.

Dropout Regularization technique for neural networks that randomly deactivates neurons during training to prevent overfitting and improve generalization.

E

Edge Computing Processing data on local devices (edge) rather than centralized servers or cloud. Enables offline AI, reduced latency, and improved privacy.

Electronic Health Record (EHR) Digital version of patient medical history maintained by healthcare providers. Rich data source for AI but has quality and interoperability challenges.

Embedding Dense vector representation of data in a continuous space where similar items are close together. Used to represent words, images, or other complex data.

Emergent Properties Capabilities that arise in AI systems (particularly large models) that were not explicitly programmed or anticipated. Can be beneficial or concerning.

Ensemble Methods Machine learning techniques that combine multiple models to improve performance. Random Forests, boosting, and bagging are ensemble methods.

Epidemic Curve (Epi Curve) Graph showing the number of disease cases over time. AI can help generate and forecast epidemic curves for outbreak response.

Epidemiology Study of how diseases and health conditions are distributed in populations and what factors influence this distribution. Foundation for public health.

Epoch One complete pass through the entire training dataset during neural network training. Models typically train for multiple epochs.

Equal Opportunity Fairness criterion requiring equal true positive rates across groups. A model satisfies equal opportunity if it has equal sensitivity for all demographic groups.

Equalized Odds Fairness criterion requiring equal true positive rates AND equal false positive rates across groups. Stricter than equal opportunity.

Ethical AI AI systems designed and deployed in accordance with ethical principles including fairness, transparency, accountability, privacy, and beneficence.

Evaluation Metrics Quantitative measures of model performance. Choice depends on task and context. Common metrics: accuracy, AUC-ROC, F1 score, sensitivity, specificity.

Explainable AI (XAI) Methods and techniques for making AI decisions interpretable and understandable to humans. Includes LIME, SHAP, attention visualizations, and inherently interpretable models.

External Validation Testing a model on data from different sources than training data (different hospitals, geographic regions, time periods). Critical for assessing generalizability.

F

F1 Score Harmonic mean of precision and recall. Balances both metrics, useful when you care about both false positives and false negatives. F1 = 2 × (Precision × Recall) / (Precision + Recall).

Fairlearn Python library from Microsoft for assessing and improving fairness in machine learning models. Provides fairness metrics and mitigation algorithms.

FAIR Principles Guidelines for scientific data management: Findable, Accessible, Interoperable, Reusable. Promotes data sharing while respecting privacy and sovereignty.

False Discovery Rate (FDR) Expected proportion of false positives among all positive predictions. Used in multiple hypothesis testing and relevant for AI screening tools.

False Negative (FN) Instance where the model incorrectly predicts the negative class when the true class is positive. In medicine, missing a disease that’s present.

False Positive (FP) Instance where the model incorrectly predicts the positive class when the true class is negative. In medicine, falsely indicating disease when absent.

Feature Individual measurable property or characteristic used as input to a machine learning model. Examples: age, blood pressure, lab values. Also called attributes or variables.

Feature Engineering Process of creating new features or transforming existing features to improve model performance. Combines domain knowledge with data processing.

Feature Importance Measure of how much each feature contributes to a model’s predictions. Helps with interpretation and feature selection.

Feature Selection Process of choosing a subset of relevant features for modeling. Reduces overfitting, improves interpretability, and decreases computational cost.

Federated Learning Machine learning approach where models are trained across multiple decentralized devices or servers holding local data, without exchanging the data itself. Preserves privacy.

Few-Shot Learning Machine learning paradigm where models learn from very few examples (typically 1-10). Valuable when labeled data is scarce, as in rare diseases.

Fine-Tuning Taking a pre-trained model and continuing training on a new, typically smaller dataset for a related task. Key technique in transfer learning.

FHIR (Fast Healthcare Interoperability Resources) Standard for exchanging healthcare information electronically. Facilitates AI development by improving data access and interoperability.

510(k) Clearance FDA pathway for medical devices (including AI) that are substantially equivalent to legally marketed devices. Most AI medical devices use this pathway.

G

Generalization A model’s ability to perform well on new, unseen data. Good generalization means the model has learned underlying patterns rather than memorizing training data.

Generative Adversarial Network (GAN) Architecture where two neural networks compete: a generator creates synthetic data and a discriminator tries to distinguish real from synthetic. Used for data augmentation, image synthesis.

Generative AI AI systems that create new content (text, images, code) rather than just analyzing existing data. Examples: GPT, DALL-E, Stable Diffusion. Emerging applications in healthcare.

Global Health Study, research, and practice emphasizing health improvement and equity for all people worldwide, particularly in low- and middle-income countries.

Gradient Vector of partial derivatives indicating the direction and rate of fastest increase of a function. Used in optimization to update model parameters during training.

Gradient Boosting Ensemble method that builds models sequentially, with each new model correcting errors of the previous ensemble. XGBoost and LightGBM are popular implementations.

Gradient Descent Optimization algorithm that iteratively adjusts model parameters in the direction that reduces the loss function. Foundation of neural network training.

Ground Truth The actual, correct answer or label for data. In medical AI, typically established by expert annotation, gold-standard diagnostic tests, or patient outcomes.

H

Hallucination When AI models (particularly large language models) generate plausible-sounding but factually incorrect or nonsensical information. Significant concern for medical applications.

Health Equity Absence of avoidable or remediable differences in health among population groups defined socially, economically, demographically, or geographically.

Health Information Exchange (HIE) Electronic sharing of health information across organizations. Enables comprehensive data for AI but raises privacy and interoperability challenges.

Heuristic Simple, efficient rule or method for problem-solving and decision-making. Not guaranteed to be optimal but often sufficient. AI can learn heuristics from data.

Hidden Layer Intermediate layer in a neural network between input and output layers. Deep networks have many hidden layers, enabling learning of complex patterns.

HIPAA (Health Insurance Portability and Accountability Act) US federal law establishing privacy and security standards for protected health information. Governs use of health data, including for AI development.

Human-in-the-Loop AI systems that incorporate human judgment in their workflow. Humans may label data, review predictions, or make final decisions based on AI recommendations.

Hyperparameter Configuration setting for a machine learning algorithm that is set before training begins (not learned from data). Examples: learning rate, number of trees, maximum depth.

Hyperparameter Tuning Process of finding optimal hyperparameter values. Methods include grid search, random search, and Bayesian optimization.

I

Imbalanced Data Dataset where classes are not represented equally. Common in healthcare (rare diseases, adverse events). Requires special techniques like resampling or weighted loss.

Imputation Process of replacing missing values with substituted values. Methods range from simple (mean, median) to complex (multiple imputation, ML-based).

Incidence Number of new cases of a disease occurring in a population during a specified time period. Incidence rate = new cases / population at risk.

Inference Process of using a trained model to make predictions on new data. Distinguished from training (learning from data).

Informed Consent Ethical and legal requirement that patients understand and agree to use of their data or participation in research. Challenges arise with secondary use of data for AI.

Institutional Review Board (IRB) Committee that reviews research involving human subjects to ensure ethical standards. Oversees AI research using patient data.

Integrated Care Coordinated healthcare services across providers, settings, and time. AI can support care coordination and identify gaps.

Interpretability The degree to which humans can understand the cause of an AI decision. Some models (decision trees) are inherently interpretable; others (deep neural networks) require explanation methods.

Interoperability Ability of different systems and organizations to exchange and use information. Critical for healthcare AI to access comprehensive patient data.

Intervention Study Research where investigators assign treatments/interventions to study participants. Includes randomized controlled trials. Necessary to prove AI improves outcomes.

IoT (Internet of Things) Network of physical devices embedded with sensors and connectivity. Healthcare IoT includes wearables, remote monitors, smart devices. Data source for AI.

J

Jupyter Notebook Open-source web application for creating documents with live code, equations, visualizations, and text. Popular tool for data science and ML development.

K

K-Fold Cross-Validation Technique where data is divided into k subsets; model is trained k times, each time using k-1 subsets for training and 1 for validation. Provides robust performance estimate.

Keras High-level neural networks API running on TensorFlow. Designed for ease of use and rapid prototyping. Integrated into TensorFlow 2.0.

Kernel In machine learning, a function that computes similarity between data points. In CNNs, a filter that slides over input data to detect features.

Knowledge Graph Structured representation of knowledge as entities and relationships. Used in healthcare for representing medical concepts, drug interactions, disease relationships.

L

Label The correct answer or outcome for a supervised learning example. In classification, the class; in regression, the target value. Also called ground truth.

Labeled Data Data that has been tagged with the correct answer (label). Required for supervised learning. In healthcare, often requires expensive expert annotation.

Large Language Model (LLM) AI model with billions of parameters trained on massive text corpora. Examples: GPT-4, PaLM, Claude. Emerging applications in medical documentation, education, research.

Latent Variable Hidden variable not directly observed but inferred from other variables. Autoencoders learn latent representations; causal models may include latent confounders.

Learning Curve Plot showing model performance as a function of training data size or training time. Helps diagnose overfitting/underfitting and determine if more data would help.

Learning Rate Hyperparameter controlling how much model parameters are adjusted during training. Too high causes instability; too low causes slow convergence.

LightGBM Gradient boosting framework by Microsoft that uses tree-based algorithms. Faster and more memory-efficient than traditional gradient boosting for large datasets.

LIME (Local Interpretable Model-agnostic Explanations) Method for explaining individual predictions by approximating the complex model locally with an interpretable model (linear model or decision tree).

Linear Regression Statistical model that predicts continuous outcomes as a linear combination of input features. Interpretable baseline for regression tasks.

Logistic Regression Statistical model for binary classification that predicts probability of an outcome using a logistic function. Despite “regression” in name, it’s a classification method.

Loss Function Mathematical function quantifying the difference between model predictions and true values. Training aims to minimize loss. Also called cost function or objective function.

Low- and Middle-Income Countries (LMICs) Countries with GNI per capita below high-income threshold (~$13,000). 80% of world population lives in LMICs. Face unique challenges for AI deployment.

M

Machine Learning (ML) Subset of AI focused on algorithms that improve automatically through experience. Includes supervised, unsupervised, and reinforcement learning.

Medical Device Instrument, apparatus, software, or material intended for medical purposes. AI-based software can be regulated as medical device depending on intended use.

Meta-Analysis Statistical method for combining results from multiple studies to increase power and arrive at overall estimate. AI can assist with systematic reviews and meta-analyses.

Metric Learning Machine learning paradigm focused on learning distance or similarity metrics. Useful for patient similarity, case retrieval, and anomaly detection.

MIMIC (Medical Information Mart for Intensive Care) Freely available critical care database containing de-identified data on 40,000+ ICU patients. Widely used for healthcare AI research.

Missing at Random (MAR) Missing data mechanism where missingness depends on observed variables but not on the missing values themselves. Allows unbiased analysis with appropriate methods.

Missing Completely at Random (MCAR) Missing data mechanism where missingness is unrelated to any variables. Simplest case - analysis on complete cases is unbiased.

Missing Not at Random (MNAR) Missing data mechanism where missingness depends on the unobserved values themselves. Most challenging case - may cause bias regardless of analysis method.

MLOps (Machine Learning Operations) Practices for deploying, monitoring, and maintaining ML models in production. Includes version control, automated testing, monitoring, and continuous integration/deployment.

Model Card Documentation describing an ML model’s intended use, training data, performance, limitations, and ethical considerations. Promotes transparency and responsible use.

Model Drift Degradation in model performance over time due to changes in data distribution, relationships, or operational environment.

Mortality Death rate in a population. Crude mortality rate is deaths per population per time period. Age-standardized rates account for population age structure.

Multi-Task Learning Training a single model to perform multiple related tasks simultaneously. Can improve performance and efficiency by sharing learned representations.

Multiclass Classification Classification with more than two classes. Examples: disease subtype classification, triage prioritization (low/medium/high/critical).

Multilayer Perceptron (MLP) Feedforward neural network with multiple layers. Classic neural network architecture, though now often superseded by more specialized architectures.

Multimodal Learning Machine learning using multiple types of data (text, images, structured data, time series). Healthcare is naturally multimodal - EHRs contain diverse data types.

N

Natural Language Processing (NLP) AI field focused on understanding, interpreting, and generating human language. Applications include medical record analysis, literature mining, chatbots.

Negative Predictive Value (NPV) Probability that an individual with a negative test result truly does not have the condition. NPV = TN / (TN + FN).

Neural Architecture Search (NAS) Automated method for designing neural network architectures. Uses ML to find optimal architectures rather than manual engineering.

Neural Network Machine learning model inspired by biological neural networks. Consists of interconnected nodes (neurons) organized in layers. Foundation of deep learning.

Normalization Scaling features to a common range (often 0-1) or distribution (often mean 0, standard deviation 1). Improves training stability and convergence.

Null Hypothesis Statistical hypothesis that there is no relationship between variables or no difference between groups. Statistical tests aim to reject or fail to reject null hypothesis.

Number Needed to Treat (NNT) Number of patients who need to receive a treatment for one additional patient to benefit. Lower NNT indicates more effective intervention.

O

Odds Ratio (OR) Measure of association between exposure and outcome. OR > 1 indicates increased odds; OR < 1 indicates decreased odds. Common in logistic regression and epidemiology.

One-Hot Encoding Representing categorical variables as binary vectors. Each category becomes a feature with value 1 if present, 0 otherwise. Required for most ML algorithms.

Optimization Process of finding parameter values that minimize (or maximize) an objective function. Training ML models is an optimization problem.

Ordinal Variable Categorical variable with natural ordering (mild/moderate/severe; low/medium/high). Requires appropriate encoding that preserves order.

Outcome Measure Specific result or effect used to evaluate intervention or prediction success. Examples: mortality, readmission, quality of life.

Outlier Data point that differs significantly from other observations. May indicate error, novel phenomena, or important rare event. Requires careful handling.

Outbreak Detection Identifying occurrence of disease cases exceeding expected levels. AI methods can detect outbreaks earlier than traditional surveillance.

Overfitting When a model learns training data too well, including noise and random fluctuations, causing poor generalization to new data. Addressed through regularization, cross-validation, more data.

P

P-Value Probability of observing results at least as extreme as those observed, assuming null hypothesis is true. p < 0.05 traditionally considered statistically significant.

Pandemic Epidemic occurring worldwide or over very wide area, crossing international boundaries, affecting large number of people. COVID-19 was declared pandemic in March 2020.

Parameter Model component learned from training data. Neural network parameters include weights and biases. Distinguished from hyperparameters (set before training).

Personalized Medicine Medical approach tailoring treatment to individual patient characteristics. AI enables personalization by analyzing complex patient data to optimize interventions.

Positive Predictive Value (PPV) Probability that an individual with a positive test result truly has the condition. PPV = TP / (TP + FP). Also called precision.

Post-Market Surveillance Ongoing monitoring of medical device performance and safety after regulatory approval and commercial distribution. Critical for AI systems that may drift.

Precision In classification, the proportion of positive predictions that are correct. Precision = TP / (TP + FP). High precision means few false alarms.

Precision Medicine See Personalized Medicine. Focus on understanding molecular and genetic basis of disease for targeted interventions.

Predetermined Change Control Plan (PCCP) FDA regulatory framework allowing pre-specification of anticipated model modifications, enabling updates without new submissions for each change.

Predictive Analytics Using historical data, statistical algorithms, and ML to forecast future outcomes. Applications include readmission risk, disease progression, outbreak prediction.

Prevalence Proportion of a population having a disease or condition at a specific time. Prevalence affects predictive values of diagnostic tests.

Principal Component Analysis (PCA) Dimensionality reduction technique that transforms correlated features into smaller set of uncorrelated components (principal components) that capture maximum variance.

Privacy-Preserving Machine Learning Techniques enabling ML while protecting individual privacy. Includes differential privacy, federated learning, homomorphic encryption, secure multi-party computation.

Protected Health Information (PHI) Individually identifiable health information covered by HIPAA privacy regulations. Includes 18 specific identifiers that must be removed for de-identification.

Proxy Variable Variable used as substitute for another that cannot be directly measured. Can introduce bias if proxy is imperfect or reflects historical discrimination.

Public Health Science and art of preventing disease, prolonging life, and promoting health through organized efforts of society. Population-level focus distinguishing it from clinical medicine.

PyTorch Open-source deep learning framework developed by Facebook. Known for flexibility, dynamic computation graphs, and strong research community adoption.

Q

Quantization Reducing precision of model parameters (e.g., from 32-bit to 8-bit). Decreases model size and speeds up inference with minimal accuracy loss. Critical for edge deployment.

Quasi-Experimental Study Study evaluating interventions without random assignment. Uses methods like difference-in-differences, regression discontinuity to approximate experimental conditions.

R

Random Forest Ensemble method that builds multiple decision trees on random subsets of data and features, then averages their predictions. Robust and relatively interpretable.

Randomized Controlled Trial (RCT) Experimental study where participants are randomly assigned to intervention or control groups. Gold standard for establishing causality. AI-RCTs test clinical impact.

Recall See Sensitivity. Proportion of actual positives correctly identified. Recall = TP / (TP + FN).

Receiver Operating Characteristic (ROC) Curve Plot of true positive rate (sensitivity) vs false positive rate (1-specificity) across classification thresholds. AUC-ROC summarizes overall performance.

Recurrent Neural Network (RNN) Neural network architecture for sequential data (time series, text). Maintains internal state (memory) to process sequences. Variants include LSTM and GRU.

Regression Machine learning task of predicting continuous numerical outcomes. Examples: blood pressure, length of stay, survival time.

Regularization Techniques for preventing overfitting by adding constraints or penalties. Examples: L1/L2 regularization, dropout, early stopping.

Reinforcement Learning (RL) ML paradigm where agent learns to make decisions by interacting with environment and receiving rewards/penalties. Applications in treatment optimization, resource allocation.

Relative Risk (RR) Ratio of probability of event in exposed group to probability in unexposed group. RR > 1 indicates increased risk; RR < 1 indicates decreased risk.

Reproducibility Ability to obtain consistent results using same data and analysis methods. Critical for scientific validity. Challenges include randomness, software versions, data access.

Residual Network (ResNet) Deep neural network architecture using skip connections that allow gradients to flow directly through network. Enables training of very deep networks (100+ layers).

Risk Stratification Process of categorizing individuals into risk groups (low/medium/high) based on likelihood of adverse outcome. Guides targeted interventions.

Robustness Model’s ability to maintain performance despite variations in input data, noise, or adversarial perturbations. Critical for reliable AI deployment.

S

Sampling Bias Systematic error due to non-random sampling, causing sample to not represent target population. Common in healthcare databases (sicker patients at tertiary centers).

Scikit-Learn Popular Python library for machine learning providing simple interface for classification, regression, clustering, dimensionality reduction, and preprocessing.

Screening Testing asymptomatic individuals to detect disease early. AI-based screening tools must balance sensitivity (catching cases) with specificity (avoiding false alarms).

Selection Bias Distortion of statistical analysis due to non-random selection of individuals, groups, or data. Can cause AI models to learn inappropriate patterns.

Sensitivity Proportion of actual positives correctly identified by model. Sensitivity = TP / (TP + FN). Also called true positive rate or recall. Critical for diagnostic tools.

Sensitivity Analysis Testing how model outputs change with variations in inputs or assumptions. Assesses robustness and identifies influential factors.

Sequence-to-Sequence Model Neural architecture mapping input sequences to output sequences. Used in machine translation, text summarization, medical report generation.

SHAP (SHapley Additive exPlanations) Method for explaining individual predictions based on game theory. Assigns each feature an importance value for a particular prediction. Model-agnostic.

Sigmoid Function S-shaped activation function mapping any input to (0,1) range. Used in logistic regression and neural network output layers for binary classification.

Social Determinants of Health (SDOH) Non-medical factors affecting health outcomes: economic stability, education, healthcare access, neighborhood environment, social context. AI can incorporate SDOH for holistic predictions.

Specificity Proportion of actual negatives correctly identified by model. Specificity = TN / (TN + FP). Also called true negative rate. Important for avoiding false alarms.

Standardization Scaling features to have mean 0 and standard deviation 1. Alternative to normalization; assumes roughly normal distributions.

Stochastic Gradient Descent (SGD) Optimization algorithm that updates parameters using gradients computed on random mini-batches rather than entire dataset. Enables training on large datasets.

Stratification Dividing population into subgroups (strata) based on characteristics. Used in sampling, analysis, and ensuring balanced train/test splits.

Supervised Learning Machine learning using labeled data where algorithm learns to map inputs to outputs. Includes classification and regression. Most common ML paradigm in healthcare.

Surveillance Bias Bias occurring when detection of disease is related to exposure. More screening in certain groups leads to more detected cases, confounding exposure-disease relationship.

Synthetic Data Artificially generated data mimicking real data’s statistical properties. Used for privacy protection, augmentation, testing. Quality varies - may not capture all real data complexity.

T

Target Leakage Form of data leakage where features that wouldn’t be available at prediction time are included in model. Example: using discharge diagnosis to predict hospital admission outcome.

TensorFlow Open-source deep learning framework developed by Google. Production-focused with comprehensive ecosystem including TensorFlow Lite (mobile), TensorFlow Serving (deployment).

Test Set Data held out from model training, used only for final performance evaluation. Must be representative and truly held out to provide unbiased performance estimate.

Threshold Decision boundary for converting predicted probabilities to class predictions. Default 0.5 for binary classification, but should be optimized for application’s sensitivity/specificity requirements.

Time Series Data points indexed in time order. Examples: vital signs monitors, disease surveillance counts, patient trajectories. Requires specialized ML methods.

Time Series Cross-Validation Cross-validation variant respecting temporal order - training always precedes testing. Prevents temporal leakage. Essential for time-ordered healthcare data.

Tokenization Splitting text into smaller units (tokens) for processing. Tokens can be words, subwords, or characters. First step in NLP pipelines.

Training Set Data used to train ML model by adjusting parameters to minimize loss. Typically 60-80% of available data, with remainder split between validation and test sets.

Transfer Learning Reusing model trained on one task as starting point for related task. Enables effective learning with less data. Example: ImageNet pre-training for medical images.

Transformer Neural architecture based on attention mechanisms, without recurrence or convolution. Foundation of modern NLP (BERT, GPT). Increasingly used for healthcare tasks.

Transparency Openness about AI system’s functioning, limitations, and decision-making processes. Legal and ethical requirement in many jurisdictions.

True Negative (TN) Instance where model correctly predicts negative class when true class is negative. In medicine, correctly identifying absence of disease.

True Positive (TP) Instance where model correctly predicts positive class when true class is positive. In medicine, correctly identifying presence of disease.

Type I Error False positive in hypothesis testing - rejecting null hypothesis when it’s true. Controlled by significance level (α).

Type II Error False negative in hypothesis testing - failing to reject null hypothesis when it’s false. Related to statistical power (1-β).

U

Uncertainty Quantification Estimating confidence or reliability of predictions. Bayesian methods and ensemble approaches provide principled uncertainty estimates. Critical for clinical decision support.

Underfitting When model is too simple to capture underlying patterns in data, performing poorly on both training and test data. Addressed by increasing model complexity or adding features.

Unique Device Identification (UDI) System for marking and identifying medical devices through distribution and use. Required by FDA. Enables tracking and post-market surveillance of AI medical devices.

Unsupervised Learning Machine learning using unlabeled data where algorithm finds hidden patterns or structure. Includes clustering, dimensionality reduction, anomaly detection.

Upsampling Increasing representation of minority class in training data to address class imbalance. Techniques include random oversampling, SMOTE. Alternative to downsampling majority class.

V

Validation Set Data used during model development to tune hyperparameters and make development decisions. Distinct from test set, which is only used for final evaluation.

Vanishing Gradient Problem in training deep networks where gradients become extremely small, preventing effective weight updates in early layers. Addressed by ReLU activation, batch normalization, skip connections.

Variable Importance See Feature Importance. Quantifies contribution of each input variable to model predictions.

Variance In machine learning, model’s sensitivity to fluctuations in training data. High variance models change dramatically with different training data (overfitting).

Version Control System for tracking changes to code over time. Git is standard. Essential for reproducible ML research and production systems.

Vision Transformer (ViT) Transformer architecture adapted for image analysis. Treats image patches as sequence tokens. Alternative to CNNs for medical imaging.

W

Weak Supervision Learning from imperfect or noisy labels. Includes programmatically generated labels, crowdsourced annotations, or distant supervision. Reduces need for expensive expert labeling.

Weight Learnable parameter in neural networks determining strength of connection between neurons. Optimized during training to minimize loss function.

WHO (World Health Organization) United Nations agency for international public health. Publishes ethical guidelines for health AI, coordinates global health responses.

Workflow Integration Incorporating AI tools into existing clinical or public health workflows. Poor integration is common cause of AI deployment failures.

X

XAI See Explainable AI.

XGBoost (eXtreme Gradient Boosting) Optimized gradient boosting library. Highly efficient and accurate, particularly for tabular data. Winner of many ML competitions. Popular for healthcare applications.

Y

YOLO (You Only Look Once) Real-time object detection algorithm. Applications include identifying anatomical structures in medical images, detecting protective equipment use, analyzing crowd density for social distancing.

Z

Zero-Shot Learning Machine learning paradigm where model makes predictions for classes not seen during training. Achieved through transfer learning or by learning from descriptions. Useful for rare diseases.

Z-Score Number of standard deviations a value is from the mean. Used for outlier detection and feature normalization. Z = (x - μ) / σ.

Additional Resources

Online Glossaries: - AI Index Glossary - Stanford HAI - Google Machine Learning Glossary - Microsoft AI Glossary

Medical AI Terminology: - FDA Software as a Medical Device Guidance - WHO AI Ethics Guidance

Public Health Terms: - CDC Glossary of Epidemiology Terms - WHO Health Topics