Genetics, Genomics, Rare Diseases
Computational approaches to understanding genetic variation and rare disease mechanisms
Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases
Surveyed pipelines across 12 academic centers, finding consensus in variant calling and QC but divergence in prioritization and data integration.
Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements
Pioneering pairwise mutual information thresholding to form biologically meaningful relevance networks.
Linking gene expression data with patient survival times using partial least squares
Demonstrates that partial least squares can model censored survival data to identify gene expression signatures correlated with patient outcomes.
Genetic Diagnosis on Patients with Previously Undiagnosed Disease
The Undiagnosed Diseases Network applied exome/genome sequencing to 382 patients and achieved a 35% diagnostic yield (132/382), with genomic diagnoses prompting changes in therapy (21%), diagnostic testing (37%), and genetic counseling (36%), and defining 31 new syndromes.
Polygenic risk scores for autoimmune related diseases are significantly different in cancer exceptional responders
Chen et al. compared PRSs across autoimmune diseases in exceptional responders versus typical cancer patients and found significantly elevated scores for type 1 diabetes, hypothyroidism, and psoriasis.
Simulation of undiagnosed patients with novel genetic conditions
A pipeline simulates realistic patients with novel genetic conditions and shows that common gene prioritization methods underperform on these cases.
Comprehensive Analysis of Tissue-wide Gene Expression and Phenotype Data Reveals Tissues Affected in Rare Genetic Disorders
Feiglin et al. combined multi‑tissue expression profiles with phenotype associations evealing distinct patterns of tissue tropism in rare genetic diseases.
Genetic Misdiagnoses and the Potential for Health Disparities
Manrai et al. analyzed public exome data and clinical testing records, finding that variants once deemed pathogenic were recategorized as benign—predominantly in individuals of African ancestry—underscoring the necessity of sequencing diverse populations and using ancestry‑matched controls for variant interpretation.
Population-Based Penetrance of Deleterious Clinical Variants
Forrest et al. evaluated 37,780 pathogenic or loss-of-function variants in 72,434 individuals from the BioMe and UK Biobanks, demonstrating that penetrance is generally low and variable by gene, age, and ancestry—highlighting the importance of population-based penetrance estimates for accurate risk stratification.
An International Effort towards Standards for Best Practices in Clinical Genome Sequencing (CLARITY Challenge)
Brownstein et al. convened multiple teams to analyze standardized genome sequencing cases, revealing convergence in bioinformatic pipelines but notable variability in medical interpretation and clinical reporting, thereby identifying areas requiring further standardization.
Creation and Implications of a Phenome‑Genome Network
Butte & Kohane built a network linking UMLS‑annotated phenotypic and environmental concepts from GEO with genes showing differential expression, clustering data sets by phenotype and uncovering novel gene associations such as aging regulators—paving the way toward a Human Phenome Project.
A Gene Expression Profile of Stem Cell Pluripotentiality and Differentiation Is Conserved across Diverse Solid and Hematopoietic Cancers
Palmer et al. defined a 189‑gene stem‑cell gene set (SCGS) using an unbiased filter on Affymetrix data, showing that SCGS activity orders human and murine samples by plasticity and correlates with tumor grade in multiple cancers—offering a quantitative measure of cancer stem‑like transcriptional activity.
Analysis of Gene Expression in a Developmental Context Emphasizes Distinct Biological Leitmotifs in Human Cancers
Naxerova et al. mapped gene expression from 32 tumor types onto developmental timelines derived from ten embryonic processes, identifying three classes of cancers with distinct developmental signatures.
Gene regulation and DNA damage in the ageing human brain
Lu et al. used transcriptional profiling of human frontal cortex samples from individuals aged 26–106 years to identify genes whose expression declines after age 40. They demonstrate that oxidative DNA damage selectively accumulates in promoters of age‑downregulated genes.
Using electronic health records to drive discovery in disease genomics
Kohane proposes leveraging the codified and narrative data in EHRs to cost‑effectively accelerate genomic research at population scale, reproducing and extending GWAS findings across diverse populations, while highlighting regulatory and consent challenges to broad EDGR adoption.
Conserved mechanisms across development and tumorigenesis revealed by a mouse development perspective of human cancers
Kohane et al. projected human medulloblastoma gene‑expression profiles onto a mouse cerebellar development timeline (P1–P60), finding that tumors most closely mirror early postnatal stages, with metastatic MBs aligning to a narrow developmental window, thereby demonstrating that developmental context can illuminate tumor biology and model generation.
Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements
Butte & Kohane introduce a method computing pairwise mutual information across all genes in an expression dataset, applying a threshold to construct “Relevance Networks” of 22 gene clusters from 2,467 genes in 79 samples, and demonstrate the biological significance of each network for functional genomics analysis.
Artificial Intelligence/Machine Learning
Pioneering AI applications in healthcare and biomedical research
Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases
SHEPHERD, a knowledge-graph–grounded few‑shot learning framework that accurately diagnoses patients with rare genetic diseases across multiple cohorts (Undiagnosed Diseases Network, MyGene2, and Deciphering Developmental Disorders).
Artificial Intelligence in Medicine
Broad view of the future of medical AI research and announcing the launch of NEJM AI journal.
Medical Artificial Intelligence and Human Values
Framework for incorporating human values and ethical considerations into AI clinical decision-support systems.
Systematic Characterization of the Effectiveness of Alignment in Large Language Models for Categorical Decisions
Introduces the Alignment Compliance Index and demonstrates variable alignment effectiveness across three LLMs in medical triage.
The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study
Across eight university hospitals in the USA, Colombia, Singapore, and Italy, GPT-4 demonstrated robust multilingual clinical text understanding.
Artificial intelligence in healthcare
Seminal review of AI applications and implications for healthcare, covering technical advances and implementation challenges.
A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data
Presents a unified inferential framework for measures like NRI and IDI with censored survival data.
Machine Learning in Medicine
Rajkomar, Dean & Kohane review how ML algorithms can process vast healthcare data to support prognosis, diagnosis, treatment, and clinician workflows, discuss integration challenges including data quality and clinical workflow fit, and envision a future where ML meaningfully augments medical practice.
Heterogeneity of continuous glucose monitoring features and their clinical associations in a type 2 diabetes population
Analysed CGM and electronic health record data from 6,533 individuals with type 2 diabetes. Clustering revealed four distinct feature patterns with heterogeneous associations to clinical covariates, underscoring the potential of CGM‑derived metrics to inform precision diabetes management.
Adversarial attacks on medical machine learning
Finlayson et al. outline how small, carefully designed perturbations (“adversarial examples”) can subvert state‑of‑the‑art medical deep‑learning classifiers across multiple clinical domains, warn of healthcare‑specific incentives for such attacks, and call for research into defenses to safeguard clinical deployments.
Framing the challenges of artificial intelligence in medicine
Yu & Kohane discuss key hurdles for safe AI integration in clinical settings—data quality, algorithm reliability, workflow compatibility, and patient trust—and emphasize that addressing these challenges is critical for realizing AI’s potential in medicine.
Big Data and Machine Learning in Health Care
Beam & Kohane highlight how large-scale healthcare datasets and ML techniques can scale in performance and data set size.
Longitudinal histories as predictors of future diagnoses of domestic abuse
Developed Bayesian models using routine diagnostic codes to predict domestic abuse diagnoses 10–30 months in advance highlighting the potential for early identification and intervention .
Biases in electronic health record data due to processes within the healthcare system: retrospective observational study
Analyzing 669 452 patients across two Boston hospitals, the authors show that ordering patterns (time of day, day of week, frequency) predict three‑year survival more accurately than actual test results in 68% of tests, underscoring the need to model healthcare processes in EHR research.
Bayesian approach to discovering pathogenic SNPs in conserved protein domains
Bayesian algorithm that integrates evolutionary and biochemical features to predict pathogenic nsSNPs in conserved domains, achieving 90% specificity when tested on OMIM and dbSNP variants.
Fuzzy logic controller for weaning neonates from mechanical ventilation
Developed a fuzzy logic controller using heart rate, respiratory rate, tidal volume, and oxygen saturation trends to adjust SIMV settings for newborns.
Temporal reasoning in medical expert systems
Methods for temporal abstraction, constraint propagation, and diagnostic evaluation, established frameworks for handling time‑dependent medical data in clinical decision support and influencing subsequent research in biomedical temporal reasoning.
Clinical Informatics
Electronic health records, clinical decision support, and healthcare information systems
Predicting seizure recurrence after an initial seizure-like episode from routine clinical notes using large language models: a retrospective cohort study
LLMs outperformed structured-data models in evaluating first time seizures.
SMART on FHIR: a standards‑based, interoperable apps platform for electronic health records
Mandel et al. describe the creation and industry prototyping of SMART on FHIR, demonstrating feasibility across multiple EHR vendors.
Postsurgical prescriptions for opioid naive patients and association with overdose and misuse: retrospective cohort study
Brat et al. show that each refill is associated with a 44.0% increase in misuse and each additional week of use with a 19.9% increase in hazard.
Enabling phenotypic big data with PheNorm
Yu et al. introduce PheNorm, which uses anchor features and a mixture model with denoising self‑regression to generate phenotype predictions from unlabeled EHR data.
Development of an Algorithm to Identify Patients with Physician‑Documented Insomnia
Kartoun et al. combine structured ICD‑9 codes with unstructured clinical note mentions, achieving an AUROC of 0.83 versus 0.55 for billing codes alone, and identify 36,810 insomnia patients—fewer than 17% of whom had insomnia billing codes.
Finding the missing link for big biomedical data
Explore the technical and social challenges of integrating de‑identified datasets across institutions, advocating for federated data linkage frameworks.
Medicine's uncomfortable relationship with math: calculating positive predictive value
Manrai et al. highlight pervasive challenges in applying statistical reasoning to clinical decision‑making and calling for enhanced quantitative training in medicine.
Autism
Computational approaches to understanding autism spectrum disorders
Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis
Hierarchical clustering of longitudinal ICD‑9 codes in ASD patients revealed four subgroups (seizure, multisystem, psychiatric, unspecific), indicating heterogeneous comorbidity patterns with etiologic and therapeutic implications.
Characteristics and predictive value of blood transcriptome signature in males with autism spectrum disorders
Developed a 55‑gene blood expression signature that classified male ASD cases with AUC 0.70 and early evidence of immunological dysregulation in a subgroup
Gene expression analysis in Fmr1KO mice identifies an immunological signature in brain tissue and mGluR5-related signaling in primary neuronal cultures
Finds an immunological pathway signature in Fmr1KO embryonic brain tissue contrasting with synaptic signatures in primary neuronal cultures.
Association of Sex With Recurrence of Autism Spectrum Disorder Among Siblings
In 3.17 million children from 1.58 million families, recurrence rates were 12.9% in male and 4.2% in female siblings if the older sibling was male, and 16.7% and 7.6% respectively if the older sibling was female. :contentReference[oaicite:0]{index=0}
Integrative analysis of genetic data sets reveals a shared innate immune component in autism spectrum disorder and its co-morbidities
Three-tier transcriptomic meta-analysis identifies Toll‑like receptor and chemokine signaling as common to ASD and its co‑morbid diseases.
Finding a new balance between a genetics-first or phenotype-first approach to the study of disease
Argues for integrating genetics-first and phenotype-first strategies with high-throughput phenotyping to enhance disease understanding. :contentReference[oaicite:1]{index=1}
Therapeutics
Drug discovery, repurposing, and therapeutic intervention research
The Tell-Tale Heart: Population-Based Surveillance Reveals an Association of Rofecoxib and Celecoxib with Myocardial Infarction
Population-level surveillance at two Boston hospitals found an 18.5% rise in MI hospitalizations concurrent with COX-2 inhibitor prescriptions. The hospitals were not aware during the exposure.
Drug target-gene signatures that predict teratogenicity are enriched for developmentally related genes
Identifies developmentally enriched target‑gene signatures that predict teratogenic risk with 79% accuracy.
Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks
Joined expression profiles of 7,245 genes in 60 cancer cell lines with sensitivity data for 5,084 drugs to build “relevance networks,” identifying candidate single‑gene determinants of chemotherapeutic response.}
Rapid Identification of Myocardial Infarction Risk Associated With Diabetes Medications Using Electronic Medical Records
EHR analysis of thousands of patients led to "black-boxing" of a hypoglycemic agent.
Policy
Healthcare policy, regulation, and ethical frameworks for medical AI
Medicine. Reestablishing the researcher-patient compact
Provides early blueprint to leverage modern information technology to restore bidirectional communication between researchers and participants while preserving privacy and autonomy
Multidimensional results reporting to participants in genomic studies: getting it right
Introduces communicability as a research variable to structure ethics-based reporting and align participant preferences with results disclosure frameworks. :contentReference[oaicite:0]{index=0}
Medical Artificial Intelligence and Human Values
In this review, Yu et al. examine how and where human values and ethics do (and do not) inform AI programs.
To do no harm – and the most good – with AI in health care
Goldberg et al. discuss ethical and practical frameworks to guide safe, equitable, and transparent AI deployment in medicine.
"\The AI Revolution in Medicine: GPT-4 and Beyond
Early book on LLM transformation of healthcare practice, research including policy implications and regulatory considerations.
Designing a public square for research computing
Proposes principles to boost adoption of research computing.
No small change for the health information economy
Mandl and Kohane discuss how monolithic EHRs can be replaced by a thriving app economy.
Understanding Covid Vaccine Efficacy over Time - Bridging a Gap Between Public Health and Health Care
Dicusses how linking vaccination dates with clinical data enables near–real‑time monitoring of vaccine efficacy to inform both public health policy and clinical care.
Perspective: The Human Values Project
Motivates an international Human Values Project by exploring AI‑driven medical triage decisions and proposing studies to capture descriptive and normative clinical decision dynamics.