DNABERT-2 (ICLR 2024)

github.com/magics-lab/dnabert_2
Active488updated 5 months ago
Shell
Apache-2.0

Efficient foundation model and benchmark for multi-species genome understanding with context-aware nucleotide representations, improving upon DNABERT for diverse genomic task transfer learning (UIUC MAGICS Lab, 484+ stars)

Sourced from

  • GitHubgithub.com/magics-lab/dnabert_2
  • Awesome AI for Sciencegithub.com/magics-lab/dnabert_2

Related resources

Deep learning-based variant caller

Active3.7K2 months ago
Python
BSD-3-Clause

Interactive personal genome analysis toolkit using Claude Code and Python. Parses raw genotyping data from consumer DNA services and analyzes SNPs across 17 categories including health risks, pharmacogenomics, ancestry, and nutrition, with a terminal-style HTML dashboard.

Active443 months ago
Python
MIT

Foundation models for genomics and transcriptomics pretrained on 3,000+ human genomes and 850+ diverse species, enabling chromatin accessibility prediction, splice site detection, and promoter classification across multiple model scales (InstaDeep, NVIDIA & TUM, Nature Methods 2023)

Active8843 months ago
Jupyter Notebook
NOASSERTION

First architecture deeply integrating a DNA foundation model with an LLM for multimodal biological reasoning, achieving 98% accuracy on KEGG disease pathway prediction and 15%+ average gains on variant effect prediction with interpretable step-by-step reasoning traces (bowang-lab, 390+ stars)

Active3902 months ago
Jupyter Notebook
Apache-2.0

General-purpose RNA language model with 650M parameters pretrained on 36M non-coding RNA sequences, achieving strong generalization on structure prediction tasks including secondary structure prediction, splice-site prediction, mean ribosome loading, and ncRNA classification (lbcb-sci, 165+ stars, Apache-2.0)

Active1651 month ago
Python
Apache-2.0

# GPN trained on Arabidopsis thaliana and 7 other Brassicales See https://github.com/songlab-cal/gpn for more details.

Idle3201 year ago
Python