DOEJGI/GenomeOcean-4B

The Mistral-DNA-v1-138M-bacteria Large Language Model (LLM) is a pretrained generative DNA text model with 17.31M parameters x 8 experts = 138.5M parameters. It is derived from Mistral-7B-v0.1 model, which was simplified for DNA: the number of layers and the hidden size were reduced.

Idle161 year ago

GenerTeam/GENERator-v2-eukaryote-1.2b-base

by GenerTeam

## Important Notice If you are using GENERator for sequence generation, please ensure that the length of each input sequence is a multiple of 6. This can be achieved by either: 1. Padding the sequence on the left with 'A' (left padding); 2. Truncating the sequence from the left (left truncation).

Active3.5K1 month ago

DaisyChainAI/daisychain-genomics

by DaisyChainAI

Active03 weeks ago

zhangtaolab/plant-dnabert-6mer

by zhangtaolab

The plant DNA large language models (LLMs) contain a series of foundation models based on different model architectures, which are pre-trained on various plant reference genomes. All the models have a comparable model size between 90 MB and 150 MB, BPE tokenizer is used for tokenization and 8000…

Idle41 year ago

Hengchang-Liu/D3LM-from-nt

by Hengchang-Liu

This repository contains the model presented in D3LM: A Discrete DNA Diffusion Language Model for Bidirectional DNA Understanding and Generation.

Active324 months ago

InstaDeepAI/nucleotide-transformer-v2-100m-multi-species

by InstaDeepAI

fill-mask

The Nucleotide Transformers are a collection of foundational language models that were pre-trained on DNA sequences from whole-genomes. Compared to other approaches, our models do not only integrate information from single reference genomes, but leverage DNA sequences from over 3,200 diverse human…

Idle17.9K10 months ago