Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language(1)
License
Source(1)
Type
145 of 5,893 resources
Showing 1–50
HuggingFaceBio/Carbon-3B
by HuggingFaceBioTechnical Report 🧬
gemma4-12b-bioinfo is a fine-tuned Gemma 4 12B instruction model for bioinformatics, genomics, and computational biology question answering.
biohub/esm3-sm-open-v1
by biohubesm3-sm-open-v1 is trained on 2.78 billion natural proteins. With synthetic data augmentation, this led to 3.15 billion protein sequences, 236 million protein structures, and 539 million proteins with function annotations, totaling 771 billion tokens.
Edoardo-BS/HuBERT-ECG-SFT-CardioLearning-large
by Edoardo-BSOriginal code at (https://github.com/Edoar-do/HuBERT-ECG)
Original code at https://github.com/Edoar-do/HuBERT-ECG
Original code at https://github.com/Edoar-do/HuBERT-ECG
Original code at https://github.com/Edoar-do/HuBERT-ECG
This model card provides an overview of the intended use of the ESMC SAE models and examples of how to access them, but it does not have a specific model or model weights. To access each SAE model collection, use the links below:
biohub/ESMFold2
by biohubESMFold2 is a state-of-the-art model for protein structure prediction and design that defines a new frontier for speed and accuracy. The model predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input for…
biohub/ESMFold2-Fast
by biohubESMFold2 is a state-of-the-art model for protein structure prediction and design that defines a new frontier for speed and accuracy. The model predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input for…
biohub/ESMC-6B
by biohubESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…
biohub/ESMC-600M
by biohubESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…
FINAL-Bench/Darwin-218B-Delphi
by FINAL-Bench> VIDRAFT FINAL-Bench — chemistry-specialized 218B MoE, served via the DELPHI 5-Phase inference cascade.
HealthJudge is a domain-adapted helpfulness evaluator for health-related Community Notes. It is designed to judge whether a note provides helpful context for a potentially misleading social-media post, following the Community Notes helpfulness criteria.
ProtGPT3-MSA is a multiple-sequence, homolog-conditioned autoregressive protein language model. It is part of the ProtGPT3 family, an open-source suite of promptable and aligned protein language models for protein sequence generation.
Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
biohub/ESMC-300M
by biohubESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…
biohub/esmc-600m-2024-12
by biohubThis set of model weights was released with the GitHub-compatible esm package format. The models here are kept for backwards compatibility, but we recommend you use the HuggingFace-compatible model weights at biohub/ESMC-6B (or biohub/ESMC-300M / biohub/ESMC-600M) instead.
biohub/esmc-300m-2024-12
by biohubThis set of model weights was released with the GitHub-compatible esm package format. The models here are kept for backwards compatibility, but we recommend you use the HuggingFace-compatible model weights at biohub/ESMC-6B (or biohub/ESMC-300M / biohub/ESMC-600M) instead.
ctheodoris/Geneformer
by ctheodoris# Geneformer Geneformer is a foundational transformer model pretrained on a large-scale corpus of human single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.
ScientaLab/eva-rna
by ScientaLabManhph2211/D-BETA
by Manhph2211Hamdan003/inventmol-r1
by Hamdan003Target-Conditioned Molecular Ideation Model for Drug Discovery Research
Junhauwong/Surge-Cognition-4x8B
by JunhauwongBGI-HangzhouAI/Genos-m
by BGI-HangzhouAIGenos-m is a foundation model for human-associated microbial genomes. It is trained to model microbial DNA sequences at single-nucleotide resolution and supports ultra-long genomic contexts up to one million tokens.
Qwen3-8B-syco_med-gated-attention-FT is a plug-and-play gated attention weight released for AI safety research.
Apertus-8B-MeditronFO is a 8B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-8B-Instruct on the Fully Open Meditron Corpus.
Apertus-70B-MeditronFO is a 70B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-70B-Instruct on the Fully Open Meditron Corpus.
Gemma 4 E2B fine-tuned on 225K drug–target pairs for novel small-molecule generation.
macwiatrak/bacformer-large-masked-MAG
by macwiatrak- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…
- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…
mradermacher/zerank-2-GGUF
by mradermacherFor a convenient overview and download list, visit our model page for this model.
ConvergeBio/virtual-cell-patient
by ConvergeBioA patient-level disease classification model trained on single-cell RNA-seq data. Given a matrix of gene expression profiles (one row per cell), the model produces a disease-category prediction for the patient.
MedPsy-4B is a state-of-the-art, text-only medical and healthcare language model purpose-built for edge deployment. Built on top of Qwen3-4B-Thinking-2507 and post-trained with a multi-stage pipeline (supervised fine-tuning + reinforcement learning) on curated medical data, it surpasses models…
zeroentropy/zerank-2-reranker
by zeroentropyIn search engines, rerankers are crucial for improving the accuracy of your retrieval system.
Duchifat-2.3-Instruct is a state-of-the-art, instruction-tuned Large Language Model developed by TopAI. As the flagship of the Duchifat series, this model represents a fundamental breakthrough in how Hebrew is processed, reasoned, and generated in the LLM era.
Fine-tuned version of google/gemma-4-E4B-it across three professional domains — Medical, Legal, and Finance — using QLoRA (4-bit NF4) with Optuna-tuned hyperparameters, trained on Kaggle T4 GPU.
## Model Description This is a lightweight, high-performance image classification model built to diagnose histopathological scans of lung and colon tissues. This model was specifically designed for rapid web deployment without sacrificing clinical accuracy.
docling-project/MarkushGrapher-2
by docling-projectMarkushGrapher-2 is an end-to-end multimodal model for recognizing chemical structures from patent document images. It jointly encodes vision, text, and layout information to convert Markush structure images into machine-readable CXSMILES representations.
Verdugie/STEM-Oracle-27B
by Verdugie# or·a·cle /ˈôrəkəl/ — a source of wise counsel; one who provides authoritative knowledge. From Latin ōrāculum, meaning divine announcement. In computer science, an oracle is a black box that always returns the correct answer — you don't ask it how it knows, you ask and it answers.
zeroentropy/zembed-1-embedding
by zeroentropyIn retrieval systems, embedding models determine the quality of your search.
zeroentropy/zerank-1-small-reranker
by zeroentropyIn search enginers, rerankers are crucial for improving the accuracy of your retrieval system.