Find open-source science resources

Deep probabilistic framework for single-cell and spatial omics analysis, integrating scVI, scANVI, totalVI and other VAE-based models for batch correction, cell annotation, multi-omics integration, and RNA velocity (scverse/NumFOCUS, Nature Methods 2018/2024)

Active1.7K4 days ago

BSD-3-Clause

Dorado

Oxford Nanopore's official deep-learning basecaller for nanopore sequencing, converting raw electrical signals into DNA/RNA sequences with integrated modified-base (methylation) detection and efficient CPU/GPU inference; foundational tool for long-read genomics, epigenetics, and real-time sequencing analysis (nanoporetech, 846+ stars, actively maintained)

Active8471 week ago

C++

CellRank

Probabilistic framework for inferring cell fate decisions and trajectory dynamics from multi-view single-cell data using Markov chains and machine learning, integrating RNA velocity, pseudotime, and metabolic labeling to predict differentiation paths and terminal states (scverse/Theis Lab, 449+ stars, BSD 3-Clause)

Active4542 weeks ago

BSD-3-Clause

GENERator (bioRxiv 2026)

Long-context generative genomic foundation model using 6-mer tokenization for DNA sequence modeling and generation, with v2 model families for prokaryote and eukaryote genomes and pretrained weights available on HuggingFace (GenerTeam, 460+ stars, MIT License, 2025-2026)

Active4603 weeks ago

cellxgene (Chan Zuckerberg Initiative)

Interactive explorer for single-cell transcriptomics data enabling visualization of UMAP/t-SNE embeddings, differential expression analysis, and cross-dataset comparison through a fast web-based interface; widely adopted for exploring atlas-scale single-cell datasets and integrating with AI/ML analysis workflows (773+ stars, MIT License)

Active7813 weeks ago

JavaScript

OmicVerse

Unified Python framework for bulk, single-cell, and spatial RNA-seq multi-omics analysis with deep learning deconvolution (VAE) and graph neural networks, bridging Bindea, Bindea, scanpy and squidpy ecosystems (Nature Communications 2024)

Active1.1K3 weeks ago

State (Arc Institute, bioRxiv 2025)

Machine learning model predicting cellular perturbation response across diverse contexts with State Transition (ST) and State Embedding (SE) variants, featuring CLI tooling, PyPI distribution, and Virtual Cell Challenge integration (575+ stars)

Active6093 weeks ago

mLLMCelltype

Multi-LLM consensus framework for automated cell type annotation in single-cell transcriptomics, integrating predictions from 10+ large language models with iterative discussion and uncertainty quantification to reduce single-model biases, achieving up to 95% accuracy without reference datasets; available as CRAN R package and PyPI Python package with Scanpy/Seurat integration (2025)

Active6494 weeks ago

AlphaGenome

Google DeepMind's unified DNA sequence foundation model predicting molecular consequences of genetic variants from single-base resolution up to 1 megabase context, jointly outputting thousands of regulatory tracks (RNA expression, splicing, chromatin accessibility, TF binding, contact maps) for human and mouse genomes via a Python client and non-commercial API (2025)

Active1.9K1 month ago

Casanovo

Transformer encoder-decoder for de novo peptide sequencing from tandem mass spectrometry, translating MS/MS spectra directly to peptide sequences without reference databases, enabling identification of novel peptides for immunopeptidomics, antibody repertoires, and metaproteomes (Noble Lab UW, Nature Communications 2024)

Active1911 month ago

Evo 2

Arc Institute's 40B-parameter genome foundation model trained on 9 trillion nucleotides from all domains of life, supporting 1M base pair context for generalist DNA/RNA/protein prediction and design (Nature 2026)

Active4K1 month ago

GENERanno (bioRxiv 2025)

Genomic foundation model for metagenomic and genome annotation, featuring an 8k base-pair context and 500M parameters trained on 386B base pairs of eukaryotic DNA; provides expert models and a unified CLI for prokaryotic/eukaryotic coding-sequence annotation with strong performance on Genomic Benchmarks, Nucleotide Transformer tasks, and custom Gener tasks (GenerTeam, 314+ stars, MIT License)

Active3141 month ago

ChatSpatial

MCP server enabling spatial transcriptomics analysis via natural language, integrating 60+ methods including SpaGCN, Cell2location, LIANA+, CellRank for Visium, Xenium, MERFISH platforms

Active401 month ago

gReLU (Genentech, 2024)

Python library to train, interpret, and apply deep learning models to DNA sequences, providing a unified framework for regulatory genomics with support for CNN and transformer architectures, variant effect prediction, and attribution analysis (325+ stars)

Active3311 month ago

Enformer

Gene expression prediction

Active15K1 month ago

Carbon (Hugging Face, 2026)

Family of causal genomic foundation models trained on 1T tokens (~6T DNA base pairs) from the Carbon Pretraining Corpus, combining eukaryote genes, mRNA transcripts, and prokaryote genomes with a hybrid text/6-mer tokenizer; Carbon-3B matches or beats Evo2-7B on zero-shot DNA evaluations including sequence recovery, variant effect prediction, and perturbations (Apache 2.0, 201+ stars)

Active1991 month ago

GPN-Star (Song Lab, UC Berkeley, bioRxiv 2025)

Phylogeny-aware genomic language model trained on whole-genome alignments across multiple evolutionary timescales, predicting functional constraints and variant effects for human, mouse, chicken, fly, worm, and Arabidopsis genomes (344+ stars, MIT License)

Active3441 month ago

Helical

Unified framework for state-of-the-art pre-trained bio foundation models across genomics and transcriptomics, providing standardized interfaces and pipelines for DNA, RNA, and single-cell models including Evo 2, Geneformer, scGPT, and UCE with streamlined inference, benchmarking, and fine-tuning workflows (213+ stars, 2024-2025)

Active2191 month ago

AGPL-3.0

RNAPro (NVIDIA, 2026)

State-of-the-art RNA 3D folding model developed with Stanford Das Lab and Kaggle competition winners, featuring a 488M-parameter AF3-like architecture with MSA and template-based modeling, enabling structure-driven drug discovery and RNA therapeutics design (NVIDIA-Digital-Bio, Apache 2.0)

Active861 month ago

Tahoe-x1

Apache 2.0 single-cell foundation model family scaling to 3B parameters, pretrained on 266M cell profiles including perturbation data and released with training, embedding, and downstream benchmarking workflows for disease-relevant single-cell tasks (2025)

Active1581 month ago

BioReason (NeurIPS 2025)

First architecture deeply integrating a DNA foundation model with an LLM for multimodal biological reasoning, achieving 98% accuracy on KEGG disease pathway prediction and 15%+ average gains on variant effect prediction with interpretable step-by-step reasoning traces (bowang-lab, 390+ stars)

Active3981 month ago

LucaOne

Generalized biological foundation model with unified nucleic acid and protein language, integrating DNA/RNA/protein sequences (Nature Machine Intelligence 2025)

Active3652 months ago

CellTypist

Automated cell type annotation tool for single-cell transcriptomics using gradient boosting and logistic regression with reference atlases, enabling standardized classification across datasets (Wellcome Sanger Institute, Nature Biotechnology 2022)

Active4952 months ago

scPRINT (Nature Communications 2025)

Large transformer-based single-cell foundation model pretrained on 50 million cells for robust gene network inference, expression denoising, cell embedding, and zero-shot label prediction, leveraging ESM2 protein embeddings and bidirectional transformer architecture (Cantini Lab, 148+ stars, GPL-3.0)

Active1502 months ago

CellWhisperer (Nature Biotechnology 2025)

Multimodal AI bridging transcriptomics data and natural language, enabling intuitive chat-based exploration and analysis of single-cell RNA-seq datasets through conversational interaction without coding; fine-tuned Mistral 7B LLaVA model emulating biologist-bioinformatician discussions (207+ stars, GPL-3.0)

Active2122 months ago

RiNALMo (Nature Communications 2025)

General-purpose RNA language model with 650M parameters pretrained on 36M non-coding RNA sequences, achieving strong generalization on structure prediction tasks including secondary structure prediction, splice-site prediction, mean ribosome loading, and ncRNA classification (lbcb-sci, 165+ stars, Apache-2.0)

Active1692 months ago

scGPT

Single-cell analysis with transformers

Active1.6K2 months ago

Stack

Arc Institute's single-cell foundation model enabling in-context learning at inference time via a novel tabular attention architecture, trained on 150M uniformly-preprocessed cells for generalizing biological effects and generating unseen cell profiles in novel contexts (2025)

Active1393 months ago

scDFM (ICLR 2026)

Distributional flow matching model for robust single-cell perturbation prediction, modeling the full distribution of perturbed cellular expression profiles conditioned on control states via PAD-Transformer and multi-kernel MMD regularization; reduces MSE by 19.6% over the strongest baseline in combinatorial settings (Westlake University, 41+ stars, MIT License)

Active413 months ago

Caduceus (ICML 2024)

Bi-directional DNA language model based on the Mamba state space architecture, enabling efficient long-range genomic sequence modeling with linear-time complexity and built-in reverse-complement equivariance; achieves strong performance on chromatin accessibility, enhancer, and promoter prediction benchmarks (Stanford & UC Berkeley, 500+ stars)

Active2434 months ago

DNA Claude Analysis

Interactive personal genome analysis toolkit using Claude Code and Python. Parses raw genotyping data from consumer DNA services and analyzes SNPs across 17 categories including health risks, pharmacogenomics, ancestry, and nutrition, with a terminal-style HTML dashboard.

Active524 months ago

AIDO.ModelGenerator

GenBio AI's software stack for the AI-Driven Digital Organism, supporting adaptation and finetuning of multiscale biological foundation models across DNA, RNA, protein, structure, and single-cell tasks with reproducible CLIs and pretrained model zoo (2025)

Active1185 months ago

Nucleotide Transformer

Foundation models for genomics and transcriptomics pretrained on 3,000+ human genomes and 850+ diverse species, enabling chromatin accessibility prediction, splice site detection, and promoter classification across multiple model scales (InstaDeep, NVIDIA & TUM, Nature Methods 2023)

Active9015 months ago

DNABERT

DNA sequence analysis

Idle7596 months ago

DNABERT-2 (ICLR 2024)

Efficient foundation model and benchmark for multi-species genome understanding with context-aware nucleotide representations, improving upon DNABERT for diverse genomic task transfer learning (UIUC MAGICS Lab, 484+ stars)

Idle5016 months ago

Shell

gRNAde

Generative AI framework for inverse design of 3D RNA structure and function using geometric deep learning, learning design rules from 3D structures to capture complex tertiary interactions (pseudoknots, non-canonical base pairs) with expert-level accuracy for designing functional RNAs including aptamers and ribozymes (bioRxiv 2025)

Idle3097 months ago

Nicheformer

Foundation model jointly trained on single-cell and spatial transcriptomics data, enabling unified representation learning across cellular and tissue spatial contexts for cell type prediction, spatial domain inference, and cross-modal integration (theislab, bioRxiv 2024, 164+ stars)

Idle1658 months ago

BSD-3-Clause

scFoundation

100M-parameter foundation model pretrained on 50M+ human single-cell transcriptomes covering ~20,000 genes, achieving SOTA on gene expression enhancement, drug response and perturbation prediction (Nature Methods 2024)

Idle4188 months ago

Cell2Sentence

Teaching Large Language Models the Language of Biology through single-cell transcriptomics (ICML 2024)

Idle8728 months ago

CodonFM (NVIDIA)

Family of codon-resolution language models trained on 130 million protein-coding sequences from over 20,000 species, enabling cross-species gene expression prediction and codon-level functional genomics (2025)

Idle818 months ago

NuFold (Nature Communications 2025)

End-to-end deep learning approach for RNA tertiary structure prediction with a flexible nucleobase center representation, achieving ~7 Å C1' RMSD across test RNAs and predicting ~545,000 structures covering 2,200+ RNA families (Kihara Lab, Purdue University, 50+ stars)

Idle528 months ago

OpenCRISPR

First open-source AI-generated gene editing systems developed with protein language models, enabling programmable CRISPR-Cas nucleases for synthetic biology and therapeutic genome editing (Profluent, 2024)

Idle1.2K10 months ago

scTranslator (Nature Biomedical Engineering 2025)

Pre-trained large generative model translating single-cell transcriptomes to proteomes in an alignment-free manner, generating absent protein abundance data for CITE-seq, spatial CITE-seq, REAP-seq, and NEAT-seq across tissues and diseases; offers three model variants pretrained on 2M human cells, 160K PBMCs, or 18K bulk samples (Tencent AI Lab Healthcare, 96+ stars)

Idle9611 months ago

RhoFold+

End-to-end RNA 3D structure prediction using RNA language model pretrained on 23.7M sequences, outperforming existing methods and human expert groups on RNA-Puzzles and CASP15 (Nature Methods 2024)

Idle2381 year ago

RNA-FM (Nature Methods 2024)

RNA foundation model trained on millions of RNA sequences for generalist RNA sequence understanding, enabling downstream structure prediction, function annotation, and representation learning for non-coding RNAs (ml4bio, 372+ stars)

Idle3861 year ago

HyenaDNA

Long-range genomic foundation model using subquadratic Hyena operators instead of Transformer attention, enabling context lengths up to 1 million nucleotides for chromosome-scale DNA sequence modeling and downstream genomics tasks (Stanford Hazy Research, NeurIPS 2023, 784+ stars, Apache 2.0)

Idle7911 year ago

Assembly

GEARS

Geometric deep learning model predicting transcriptional outcomes of novel single- and multi-gene perturbations using gene–gene knowledge graphs, 40% higher precision than prior methods on combinatorial perturbation prediction (Stanford, Nature Biotechnology 2024)

Idle3791 year ago

Geneformer

Single-cell transformer foundation model pretrained on 104M human transcriptomes via masked gene prediction, enabling transfer learning for cell type classification, gene network analysis, and in silico perturbation with limited labeled data (Nature 2023, V2 2024)

Idle01 year ago

GenePT

Generative pre-training for genomics

Stale3212 years ago

AlphaMissense

Google DeepMind's AlphaFold-derived classifier for proteome-wide missense variant effect prediction, providing pathogenicity scores for all ~71M possible human missense variants and classifying 89% with 90% precision; pre-computed predictions are integrated into Ensembl VEP and UCSC Genome Browser to support clinical variant interpretation (Science 2023)

Archived6332 years ago