arcinstitute/evo2_20b

https://huggingface.co/arcinstitute/evo2_20b
Activeby arcinstitute10611updated 3 months ago

Evo 2 is a state-of-the-art DNA language model trained autoregressively on trillions of DNA tokens.

Sourced from

  • HuggingFacearcinstitute/evo2_20b

Related resources

This is the base model of GenomeOcean-4B. It is trained with Causal Language Modeling (CLM) and uses a BPE tokenizer with 4096 tokens. It supports a maximum sequence length of 10240 tokens (~50kbp).

Idle4.4K1 year ago

GGUF quantizations of HuggingFaceBio/Carbon-3B — a generative DNA foundation model — for efficient inference with llama.cpp.

Active6201 week ago

A PyTorch port of AlphaGenome, the DNA sequence model from Google DeepMind that predicts hundreds of genomic tracks at single base-pair resolution from sequences up to 1M bp.

Active433 months ago
Stale02 years ago

Evo 2 is a state of the art DNA language model for long context modeling and design. Evo 2 models DNA sequences at single-nucleotide resolution at up to 1 million base pair context length using the StripedHyena 2 architecture, using Savanna.

Idle08 months ago

Deep learning-based variant caller

Active3.7K2 months ago
Python
BSD-3-Clause