HCLS Foundation Model Map

Foundation models are moving fast across biology, chemistry, and clinical medicine. This is my map of prominent models across healthcare and life sciences, auto-generated from my research notes.

Area

Category

Prominence

Tags

Hover a tag to see what it means.

Clear all filters
Model Category Modality Description Tags
Biology
AlphaGenome Genomics DNA DeepMind regulatory-genomics model for variant-effect prediction across modalities including expression, splicing, chromatin, and contact maps.
private
DNABERT-S Genomics DNA Species-aware DNA embedding genome foundation model.
DNAGPT Genomics DNA Generalized pre-trained tool for multiple DNA sequence analysis tasks.
generative
Enformer Genomics DNA Long-range regulatory and gene-expression prediction from DNA sequence
Evo 2 Genomics DNA 7B and 40B-parameter DNA model that processes up to 1M base pairs at nucleotide resolution; backbone of the Mayo Clinic EVEE variant pathogenicity work
generativeinterpretability
GENA-LM Genomics DNA Open-source foundational DNA language models tuned for long sequences.
GPN-MSA Genomics DNA Alignment-based DNA language model for genome-wide variant effect prediction.
HyenaDNA Genomics DNA Long-context DNA sequence model based on the Hyena architecture
Nucleotide Transformer Genomics DNA Foundation models for DNA sequences trained on 3202 human genomes plus multispecies genomes
multispecies
ATOM-1 Transcriptomics RNA RNA foundation model trained on chemical mapping data for structure and function.
private
CellPLM Transcriptomics scRNA Cell language model pre-trained beyond single cells, using spatially resolved transcriptomics.
ERNIE-RNA Transcriptomics RNA RNA language model with structure-enhanced representations.
GeneCompass Transcriptomics scRNA Knowledge-informed cross-species single-cell foundation model for gene regulation.
multispecies
Geneformer Transcriptomics RNA Transformer pre-trained on 30M single-cell human transcriptomes; widely used for drug-target and cell-state inference
Orthrus Transcriptomics RNA Evolutionary and functional RNA foundation model.
RiNALMo Transcriptomics RNA General-purpose RNA language model that generalizes to structure prediction tasks.
RNA-FM Transcriptomics RNA Interpretable RNA foundation model trained on unannotated ncRNA for structure and function prediction.
RNABERT Transcriptomics RNA Informative RNA base embedding via masked LM for structural alignment and clustering.
scBERT Transcriptomics RNA BERT-style single-cell RNA-seq foundation model for cell type annotation
scFoundation Transcriptomics scRNA Large-scale foundation model on single-cell transcriptomics.
scGPT Transcriptomics RNA Single-cell foundation model encoding gene and cell relationships across large atlases
scMulan Transcriptomics scRNA Multitask generative pre-trained language model for single-cell analysis.
generative
scPRINT Transcriptomics scRNA Single-cell RNA FM pre-trained on 50M cells for robust gene network prediction.
UNI-RNA Transcriptomics RNA Universal pre-trained RNA representation model.
private
Universal Cell Embeddings Transcriptomics scRNA Single-cell foundation model producing universal cell representations across species.
multispecies
UTR-LM Transcriptomics mRNA -UTR 5' UTR language model for decoding untranslated regions of mRNA.
xTrimoGene Transcriptomics scRNA Efficient and scalable representation learner for single-cell RNA-seq data.
private
AlphaFold 3 Protein protein Atomic structures of proteins DNA RNA and ligand complexes
multimodalprivate
Ankh Protein protein Optimized protein language model for general-purpose modelling.
BioEmu-1 Protein protein -conformation Scalable emulation of protein equilibrium ensembles via generative deep learning.
generative
CaLM Protein codon Codon language embeddings for protein engineering.
CARP Protein protein Convolutional protein sequence model competitive with transformers.
ESM-3 Protein protein Multimodal protein language model over sequence structure and function
multimodalgenerative
Evolla Protein protein + text Decodes molecular language of proteins for question answering and annotation.
multimodal
GearNet Protein protein -structure Geometric structure pretraining for protein representation learning.
HelixFold-Single Protein protein -structure MSA-free protein structure prediction using a protein language model.
OntoProtein Protein protein Protein pretraining with gene ontology embeddings.
OpenFold3 Protein protein Open-source effort to support reproducible biomolecular co-folding models.
PrimateAI-3D Protein protein Variant-effect prediction model integrating 3D structure
private
ProGen2 Protein protein Family of autoregressive protein language models up to 6.4B parameters for design and fitness.
generative
ProteinBERT Protein protein Universal deep-learning model of protein sequence and function with GO annotation pretraining.
ProtGPT2 Protein protein Deep unsupervised generative language model for protein design.
generative
ProTrek Protein protein + structure + text Tri-modal contrastive learning across protein sequence, structure, and text.
multimodal
ProtST Protein protein + text Multi-modality learning of protein sequences and biomedical texts.
multimodal
ProtTrans Protein protein Family of protein language models (T5, BERT, Albert, XLNet, Electra) trained via self-supervision at HPC scale.
RoseTTAFold All-Atom Protein protein Open structure-prediction and design model from the Baker Lab, extending RoseTTAFold to broader biomolecular assemblies including proteins, nucleic acids, small molecules, and covalent modifications.
SaProt Protein protein + structure Protein language model with structure-aware vocabulary.
multimodal
UniRep Protein protein Unified rational protein engineering with sequence-based deep representation learning.
xTrimoPGLM Protein protein Unified 100B-scale pre-trained transformer for protein language.
private
AbLang2 Antibody and biologics antibody -sequence Antibody-specific language model for infilling and humanization
AntiBERTy Antibody and biologics antibody -sequence BERT-style antibody language model trained on the Observed Antibody Space
IgLM Antibody and biologics antibody -sequence Generative antibody language model for heavy and light chain design
generative
Cell2Sentence Multi-omics and systems scRNA + text Teaches LLMs the language of biology by rendering scRNA cells as sentences.
multimodal
Nicheformer Multi-omics and systems scRNA + spatial Single-cell and spatial omics foundation model for tissue context modeling.
multimodal
Chemistry
ChemGPT Small-molecule SMILES Large-scale generative model for chemistry over SMILES
generative
MegaMolBART Small-molecule SMILES BART-style model for SMILES part of the BioNeMo umbrella
generativeprivate
MoLFormer Small-molecule SMILES Large-scale chemical language model over SMILES originally explored with AstraZeneca
MolMIM Small-molecule SMILES Controlled molecule generation with property guidance
generativeprivate
Clinical
BiomedCLIP Medical imaging vision -language Biomedical vision-language model pretrained on 15M PubMed image-text pairs
CXR Foundation Medical imaging radiograph Chest X-ray foundation model from Google
private
Med-Gemini Medical imaging multimodal Multimodal medical foundation model for radiology report generation and visual question answering
multimodalprivate
Med-Gemini-2D Medical imaging vision -language 2D medical imaging variant of Med-Gemini
private
RadFM Medical imaging radiology Radiology foundation model for general medical imaging tasks
private
CONCH Digital pathology vision -language Vision-language pathology model from the Mahmood Lab
private
H-optimus-0 Digital pathology WSI Histology foundation model for downstream pathology tasks.
Prov-GigaPath Digital pathology WSI Gigapixel pathology foundation model with tile-to-slide hierarchical representations
private
UNI Digital pathology WSI General-purpose pathology foundation model from the Mahmood Lab
private
Virchow Digital pathology WSI Whole-slide image transformer trained on roughly 1.5M slides; pan-cancer features
private
AMIE Clinical language and patient medical -text Google Research diagnostic-dialogue research system for medical reasoning and conversations.
clinical-reasoningprivate
CLMBR Clinical language and patient EHR -codes Clinical Language Model Based Representations from Stanford's Shah lab
EHRlongitudinalprivate
GatorTron Clinical language and patient EHR -text Clinical LM backbone for downstream NLP at UF Health
EHRclinical-NLP
Med-PaLM Clinical language and patient medical -text Google's medical large language model for question answering and clinical reasoning
clinical-reasoningprivate
MedLM Clinical language and patient medical -text Google's productized medical language model offering, sitting closer to medical question-answering than EHR-only pretraining.
private
NYUTron Clinical language and patient EHR -text Clinical language model trained on NYU Langone clinical notes
EHRclinical-NLPprivate
Truveta Language Model Clinical language and patient EHR -text Trained on the largest linked EHR corpus in the United States; clinical reasoning over longitudinal records
EHRlongitudinalclinical-reasoningprivate
Endo-FM Surgical video video Endoscopy foundation model with video pre-training for downstream endoscopic tasks
Emerging
BioT5+ Emerging molecules + protein + text Generalized biological understanding with IUPAC integration and multi-task tuning.
multimodal
LaBraM Emerging EEG Large-scale pretraining for EEG signals; representative early entry in the brain-waveform foundation-model space.
clinical-waveforms
METAGENE-1 Emerging DNA -metagenomic Arc Institute foundation model for metagenomic sequence, oriented toward microbiome and pathogen-surveillance applications.