arcinstitute/evo2_20b
https://huggingface.co/arcinstitute/evo2_20bEvo 2 is a state-of-the-art DNA language model trained autoregressively on trillions of DNA tokens.
Sourced from
- HuggingFace — arcinstitute/evo2_20b
Related resources
DOEJGI/GenomeOcean-4B
by DOEJGIThis is the base model of GenomeOcean-4B. It is trained with Causal Language Modeling (CLM) and uses a BPE tokenizer with 4096 tokens. It supports a maximum sequence length of 10240 tokens (~50kbp).
pankajpandey-dev/Carbon-3B-GGUF
by pankajpandey-devGGUF quantizations of HuggingFaceBio/Carbon-3B — a generative DNA foundation model — for efficient inference with llama.cpp.
A PyTorch port of AlphaGenome, the DNA sequence model from Google DeepMind that predicts hundreds of genomic tracks at single base-pair resolution from sequences up to 1M bp.
songlab/tokenizer-dna-mlm
by songlabevo-design/evo-2-7b-8k-microviridae
by evo-designEvo 2 is a state of the art DNA language model for long context modeling and design. Evo 2 models DNA sequences at single-nucleotide resolution at up to 1 million base pair context length using the StripedHyena 2 architecture, using Savanna.
Deep learning-based variant caller