Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source(1)
Type(1)
303 of 5,923 resources
Showing 151β200
Raziel1234/OSTLM
by Raziel1234A Neural Machine Translation (NMT) model based on a custom Transformer (Encoder-Decoder) architecture, trained from scratch. This model is designed to translate English sentences into Hebrew using multilingual encoding and specialized layer configurations.
google/alphagenome-all-folds
by googleWhile large language models (LLMs) have achieved impressive progress, their application in scientific domains such as chemistry remains hindered by shallow domain understanding and limited reasoning capabilities. In this work, we focus on the specific field of chemistry and develop a Chemicalβ¦
ChemDFM-v2.0 is the latest non-thinking model of ChemDFM, the pioneering open-sourced dialogue foundation model for Chemistry and molecule science.
Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.
PII Detection Model | 44M Parameters | Open Source
microsoft/MediPhi-Instruct
by microsoftThe MediPhi Model Collection comprises 7 small language models of 3.8B parameters from the base model Phi-3.5-mini-instruct specialized in the medical and clinical domains. The collection is designed in a modular fashion. Five MediPhi experts are fine-tuned on various medical corpora (i.e.
## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.
## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. This model version was continually pretrained on ~14 million cancer transcriptomesβ¦
## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.
## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.
plant-llms/PlantBiMoE
by plant-llms## Model Overview PlantBiMoE is a DNA language model trained on 42 representative plant species genomes. More specifically, PlantBiMoE uses the BiMamba and SparseMoE architecture with a masked language modeling objective to leverage highly available genotype data from 42 different plant speices toβ¦
prov-gigatime/GigaTIME
by prov-gigatimeThis is a merge of pre-trained language models created using mergekit.
mace-foundations/mace-mh-1
by mace-foundationsMACE-MH-1 is a foundation machine-learning interatomic potential (MLIP) that bridges molecular, surface, and materials chemistry through cross-domain learning:
ZJU-AI4H/Hulu-Med-4B
by ZJU-AI4HHulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
Large Language and Vision Assistant for bioMedicine (i.e., βLLaVA-Medβ) is a large language and vision model trained using a curriculum learning method for adapting LLaVA to the biomedical domain. It is an open-source release intended for research use only to facilitate reproducibility of theβ¦
ChemFIE-BED is a sentence-transformers based on gbyuvd/chemselfies-base-bertmlm fine-tuned on around (for now) 2 million pairs of valid molecules' SELFIES (Krenn et al. 2020) taken from COCONUTDB (Sorokina et al. 2021) and ChemBL34 (Zdrazil et al. 2023).
vandijklab/C2S-Scale-Gemma-2-27B
by vandijklabGitHub homepage: Cell2Sentence GitHub
tahoebio/Tahoe-x1
by tahoebioTahoe-x1 is a family of perturbation-trained single-cell foundation models with up to 3 billion parameters, developed by Tahoe Therapeutics. Pretrained on 266 million single-cell transcriptomic profiles including the Tahoe-100M perturbation compendium, Tahoe-x1 achieves state-of-the-art performanceβ¦
elonlit/GeneJEPA
by elonlitGeneJEPA is a Joint-Embedding Predictive Architecture (JEPA) trained for self-supervised representation learning on scRNA-seq. It uses a Perceiver-style encoder to handle sparse, high-dimensional gene count vectors and a Fourier-feature tokenizer for numerical tokenization.
biomni/Biomni-R0-32B-Preview
by biomni# Biomni-R0-32B-Preview This repo contains the weights of Biomni-R0-32B-Preview, a research preview of the series of biomedical AI agents trained by the Biomni team.
InstaDeepAI/instanovoplus-v1.1.0
by InstaDeepAIInstaNovoPlus is a diffusion-based model for de novo peptide sequencing from mass spectrometry data. This model leverages multinomial diffusion for accurate, database-free peptide identification for large-scale proteomics experiments.
This model is a lightweight model pre-trained on SELFIES (Self-Referencing Embedded Strings) representations of molecules. It is trained on 2.7M unique and valid molecules taken from COCONUTDB and ChemBL34, with 7.3M total generated masked examples.
nvidia/AMPLIFY_350M
by nvidia> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > model. For instructions on how to install TransformerEngine, please refer to the > official documentation.
nvidia/AMPLIFY_120M
by nvidia> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > model. For instructions on how to install TransformerEngine, please refer to the > official documentation.
lingshu-medical-mllm/Lingshu-7B
by lingshu-medical-mllmWebsite π€ 7B Model π€ 32B Model MedEvalKit Technical Report Lingshu MCP
evo-design/evo-2-7b-8k-microviridae
by evo-designEvo 2 is a state of the art DNA language model for long context modeling and design. Evo 2 models DNA sequences at single-nucleotide resolution at up to 1 million base pair context length using the StripedHyena 2 architecture, using Savanna.
Neeto-1.0-8b is an openly released biomedical large language model (LLM) created by BYOL Academy to assist learners and practitioners with medical exam study, literature understanding, and structured clinical reasoning.
ByteDance-Seed/bamboo_mixer
by ByteDance-SeedThis repository contains the official model of the paper A Unified Predictive and Generative Solution for Liquid Electrolyte Formulation.
sagawa/ReactionT5v2-forward
by sagawaThis is a ReactionT5 pre-trained to predict the products of reactions. You can use the demo here.
This repos contains the biomedicine MLLM developed from Qwen2.5-VL-3B-Instruct in our paper: On Domain-Adaptive Post-Training for Multimodal Large Language Models. The correspoding training dataset is in biomed-visual-instructions.
Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature
ameya98/JAMUN
by ameya98JAMUN is a novel approach for generating conformational ensembles of protein structures, presented in the paper JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles.
darkknight25/deepseek-16b-medical-GPT
by darkknight25darkknight25/deepseek-16b-medical-GPT is a fine-tuned version of deepseek-ai/deepseek-l6b-moe-chat, optimized for medical question answering, reasoning, and clinical summarization using QLoRA and open-access healthcare datasets.
Using llama.cpp release b5868 for quantization.
mradermacher/Qwen-3-32B-Medical-Reasoning-i1-GGUF
by mradermacherFor a convenient overview and download list, visit our model page for this model.
mradermacher/Dans-PersonalityEngine-V1.3.0-24b-i1-GGUF
by mradermacherFor a convenient overview and download list, visit our model page for this model.
makiyeah/CMRCLIP
by makiyeah> A CMR-report contrastive model combining Vision Transformers and pretrained text encoders.
Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.