Caduceus (ICML 2024)

github.com/kuleshov-group/caduceus

Bi-directional DNA language model based on the Mamba state space architecture, enabling efficient long-range genomic sequence modeling with linear-time complexity and built-in reverse-complement equivariance; achieves strong performance on chromatin accessibility, enhancer, and promoter prediction benchmarks (Stanford & UC Berkeley, 500+ stars)

Sourced from

  • Awesome AI for Sciencegithub.com/kuleshov-group/caduceus

Related resources

Deep learning-based variant caller

Active3.7K2 months ago
Python
BSD-3-Clause

Google DeepMind's unified DNA sequence foundation model predicting molecular consequences of genetic variants from single-base resolution up to 1 megabase context, jointly outputting thousands of regulatory tracks (RNA expression, splicing, chromatin accessibility, TF binding, contact maps) for human and mouse genomes via a Python client and non-commercial API (2025)

Active1.9K1 month ago
Python
Apache-2.0

Single-cell analysis with transformers

Active1.6K1 month ago
Jupyter Notebook
MIT

Unified Python framework for bulk, single-cell, and spatial RNA-seq multi-omics analysis with deep learning deconvolution (VAE) and graph neural networks, bridging Bindea, Bindea, scanpy and squidpy ecosystems (Nature Communications 2024)

Active1K2 weeks ago
Python
GPL-3.0

Teaching Large Language Models the Language of Biology through single-cell transcriptomics (ICML 2024)

Idle8627 months ago
Jupyter Notebook
Apache-2.0

Interactive explorer for single-cell transcriptomics data enabling visualization of UMAP/t-SNE embeddings, differential expression analysis, and cross-dataset comparison through a fast web-based interface; widely adopted for exploring atlas-scale single-cell datasets and integrating with AI/ML analysis workflows (773+ stars, MIT License)

Active7731 week ago
JavaScript
MIT