State (Arc Institute, bioRxiv 2025)

Genomics & Bioinformatics
Actively maintained587updated 3 weeks ago
Python
NOASSERTION

Machine learning model predicting cellular perturbation response across diverse contexts with State Transition (ST) and State Embedding (SE) variants, featuring CLI tooling, PyPI distribution, and Virtual Cell Challenge integration (575+ stars)

README

Predicting cellular responses to perturbation across diverse contexts with State Train State transition models or pretrain State embedding models. See the State paper. See the Google Colab to train STATE for the Virtual Cell Challenge. Associated repositories Model evaluation framework: cell-eval Dataloaders and preprocessing: cell-load Getting started Train an ST model for genetic perturbation prediction using the Replogle-Nadig dataset: Colab Perform inference using an ST model trained on…

Source attribution

  • Awesome AI for Sciencegithub.com/arcinstitute/state
  • GitHubgithub.com/arcinstitute/state

Related resources

Unified Python framework for bulk, single-cell, and spatial RNA-seq multi-omics analysis with deep learning deconvolution (VAE) and graph neural networks, bridging Bindea, Bindea, scanpy and squidpy ecosystems (Nature Communications 2024)

1K1 day ago
Python
GPL-3.0

First architecture deeply integrating a DNA foundation model with an LLM for multimodal biological reasoning, achieving 98% accuracy on KEGG disease pathway prediction and 15%+ average gains on variant effect prediction with interpretable step-by-step reasoning traces (bowang-lab, 390+ stars)

3902 months ago
Jupyter Notebook
Apache-2.0

Unified framework for state-of-the-art pre-trained bio foundation models across genomics and transcriptomics, providing standardized interfaces and pipelines for DNA, RNA, and single-cell models including Evo 2, Geneformer, scGPT, and UCE with streamlined inference, benchmarking, and fine-tuning workflows (213+ stars, 2024-2025)

2154 weeks ago
Python
AGPL-3.0

Transformer encoder-decoder for de novo peptide sequencing from tandem mass spectrometry, translating MS/MS spectra directly to peptide sequences without reference databases, enabling identification of novel peptides for immunopeptidomics, antibody repertoires, and metaproteomes (Noble Lab UW, Nature Communications 2024)

1874 days ago
Python
Apache-2.0

Arc Institute's 40B-parameter genome foundation model trained on 9 trillion nucleotides from all domains of life, supporting 1M base pair context for generalist DNA/RNA/protein prediction and design (Nature 2026)

RNA foundation model trained on millions of RNA sequences for generalist RNA sequence understanding, enabling downstream structure prediction, function annotation, and representation learning for non-coding RNAs (ml4bio, 372+ stars)