Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source(1)
Type(1)
303 of 5,923 resources
Showing 51β100
Apertus-70B-MeditronFO is a 70B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-70B-Instruct on the Fully Open Meditron Corpus.
vadimbelsky/qwen3.5-medical-ft
by vadimbelskyLoRA fine-tune of Qwen3.5-9B on synthetic clinical triage Q&A pairs generated from PubMed Central open-access papers. The model is specialized for emergency-medicine decision-making: triaging patients, applying clinical decision rules, and generating protocol-grounded triage recommendations.
Base model: google/gemma-4-26b-it Architecture: MoE β 26B total / β4B active parameters (1 shared expert + 8 routed from a pool of 128 per MoE layer, 30 MoE layers) Method: Activation-directed expert surgery β 128 β 64 experts per layer (50% reduction) Quantization: Q4KM (β9.7 GB on disk) Tags:β¦
Base model: google/gemma-4-26b-it Architecture: MoE β 26B total / β4B active parameters (1 shared expert + 8 routed from a pool of 128 per MoE layer, 30 MoE layers) Method: Activation-directed expert surgery β 128 β 64 experts per layer (50% reduction) Quantization: Q4KM (β9.7 GB on disk) Tags:β¦
Gemma 4 E2B fine-tuned on 225K drugβtarget pairs for novel small-molecule generation.
macwiatrak/bacformer-large-masked-MAG
by macwiatrak- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before runningβ¦
## Important Notice If you are using GENERator for sequence generation, please ensure that the length of each input sequence is a multiple of 6. This can be achieved by either: 1. Padding the sequence on the left with 'A' (left padding); 2. Truncating the sequence from the left (left truncation).
- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before runningβ¦
CuspAI/kUPS-mattersim-jax
by CuspAIThis repository hosts JAX exports of MatterSim v1.0.0 for use with kUPS, a JAX-native molecular-simulation toolkit. Each artefact is a self-contained .zip containing the serialized JAX computation graph, the original model parameters, and the minimal metadata needed to run inference.
smgjch/Meow-Omni-1
by smgjchMeow-Omni 1 is the worldβs first Multimodal Large Language Model (MLLM) specifically engineered for Computational Ethology. It natively co-embeds four distinct modalitiesβText, Video, Audio, and Biological Time-Seriesβto decode the latent intentions of non-verbal species.
mradermacher/zerank-2-GGUF
by mradermacherFor a convenient overview and download list, visit our model page for this model.
Heath-AFM-Lab/afMLevel-background-unet
by Heath-AFM-LabThis UβNet model predicts tilt, z scanner drift, and other largeβscale imaging artifacts present in Atomic Force Microscopy (AFM) height maps. It outputs a background image, the same size and scale as the raw AFM image, which can be subtracted (via the accompanying afMLevel code) to produce aβ¦
Heath-AFM-Lab/afMLevel-mask-unet
by Heath-AFM-LabThis UβNet model masks features in Atomic Force Microscopy (AFM) height maps. It outputs a probability mask image, the same size as the raw AFM image; the accompanying python package, afMLevel code then applies a threshold (typically 0.5) to produce a binary mask.
ConvergeBio/virtual-cell-patient
by ConvergeBioA patient-level disease classification model trained on single-cell RNA-seq data. Given a matrix of gene expression profiles (one row per cell), the model produces a disease-category prediction for the patient.
SandboxAQ/aqcat25-ev2
by SandboxAQMedPsy-4B is a state-of-the-art, text-only medical and healthcare language model purpose-built for edge deployment. Built on top of Qwen3-4B-Thinking-2507 and post-trained with a multi-stage pipeline (supervised fine-tuning + reinforcement learning) on curated medical data, it surpasses modelsβ¦
openadmet/pxr-chemeleon-baseline
by openadmet> [!WARNING] > This is a baseline model trained on publicly available data. While we've done our best to curate the data, the model performance is quite poor. Proceed with caution.
InstaDeepAI/instanovo-phospho-v1.0.0
by InstaDeepAIInstaNovo-P is a specialized transformer-based model for de novo peptide sequencing from phosphoproteomics mass spectrometry data. This model is specifically trained and optimized for identifying phosphorylated peptides and their modification sites.
InstaDeepAI/instanovo-v1.0.0
by InstaDeepAI# InstaNovo: De novo Peptide Sequencing Model ## Model Description
InstaDeepAI/instanovo-v1.1.0
by InstaDeepAI# InstaNovo: De novo Peptide Sequencing Model ## Model Description
zeroentropy/zerank-2-reranker
by zeroentropyIn search engines, rerankers are crucial for improving the accuracy of your retrieval system.
arcinstitute/Stack-Large
by arcinstituteStack is a large-scale encoder-decoder foundation model for single-cell biology. It introduces a novel tabular attention architecture that enables both intra- and inter-cellular information flow, setting cell-by-gene matrix chunks as the basic input data unit.
jackxinning/Leanly_AI
by jackxinningharshitsiwach/qwen3.5-0.8b-peptide-steroid
by harshitsiwachThis model is a fine-tuned version of Qwen 3.5 0.8B on a specialized dataset covering biochemistry, peptides, and steroids. It is optimized for providing detailed information on compound mechanisms, dosage (including gender-specific considerations), cycle planning, and physiological effects.
LexBwmn/ACE-V1
by LexBwmn# ACE-V1.1: Brain Tumor Detection !Python!Format > [!CAUTION] > MEDICAL RESEARCH USE ONLY. ACE-V1.1 is NOT a cleared medical device. It must not be used for primary diagnosis or clinical decision-making. All outputs must be verified by a qualified professional.
ByteDance-Seed/byteff2
by ByteDance-SeedThis repository contains the model used for the paper Bridging Quantum Mechanics to Organic Liquid Properties via a Universal Force Fieldγ
A domain-optimized reasoning model built on DeepSeek-R1-Distill-Qwen-32B, refined through a multi-stage pipeline of GPTQ quantization-aware training and QLoRA fine-tuning. Achieves 84% on MedQA β within 4 points of GPT-4o β in a ~20GB package that fits on a single L40/L40s GPU.
AIRI-Institute/moderngena-base
by AIRI-Institute# ModernGENA base ModernGENA is a DNA foundation model based on ModernBERT (a modernized BERT-style encoder architecture) adapted for genomic sequence modeling. ModernGENA base is the 377M-parameter version introduced in the paper Back to BERT in 2026: ModernGENA as a Strong, Efficient Baseline forβ¦
hussenmi/scimilarity_expanded_model
by hussenmiAn extended version of SCimilarity, a metric-learning model for single-cell RNA-seq that maps cells to a unified 128-dimensional embedding space. The original model and method are described in:
Duchifat-2.3-Instruct is a state-of-the-art, instruction-tuned Large Language Model developed by TopAI. As the flagship of the Duchifat series, this model represents a fundamental breakthrough in how Hebrew is processed, reasoned, and generated in the LLM era.
Fine-tuned version of google/gemma-4-E4B-it across three professional domains β Medical, Legal, and Finance β using QLoRA (4-bit NF4) with Optuna-tuned hyperparameters, trained on Kaggle T4 GPU.
learning-unit/L1-16B-A3B
by learning-unitL1 (Learning Unit 1) is the first language model from Lunit and Lunit Consortium, purpose-built for the medical domain. Derived from Gravity-16B-A3B-Base, L1 is designed for clinical reasoning and decision support.
alegendaryfish/CodonTranslator
by alegendaryfishCodonTranslator is a protein-conditioned codon sequence generation model trained on the representative-only data_v3 release.
epicmajorman/Gemma4-Biomedical-E4B-gguf
by epicmajormanA specialized biomedical AI assistant created by Major Grant, built on Google's Gemma 4 E4B foundation with OpenMed training data. GGUF format for efficient local inference.
## Model Description This is a lightweight, high-performance image classification model built to diagnose histopathological scans of lung and colon tissues. This model was specifically designed for rapid web deployment without sacrificing clinical accuracy.
Prior-Labs/tabpfn_2_6
by Prior-Labs### Model Overview TabPFN-2.6 is a transformer-based foundation model that uses in-context-learning to solve tabular prediction problems in a forward pass. Inference code can be found at https://github.com/PriorLabs/tabPFN.
Prior-Labs/tabpfn_2_5
by Prior-Labs### Model Overview TabPFN-2.5 is a transformer-based foundation model that uses in-context-learning to solve tabular prediction problems in a forward pass. Inference code can be found at https://github.com/PriorLabs/tabPFN.
docling-project/MarkushGrapher-2
by docling-projectMarkushGrapher-2 is an end-to-end multimodal model for recognizing chemical structures from patent document images. It jointly encodes vision, text, and layout information to convert Markush structure images into machine-readable CXSMILES representations.
Verdugie/STEM-Oracle-27B
by Verdugie# orΒ·aΒ·cle /ΛΓ΄rΙkΙl/ β a source of wise counsel; one who provides authoritative knowledge. From Latin ΕrΔculum, meaning divine announcement. In computer science, an oracle is a black box that always returns the correct answer β you don't ask it how it knows, you ask and it answers.
Xaira-Therapeutics/X-Cell
by Xaira-TherapeuticsA diffusion language model for genome-scale perturbation prediction across diverse cellular contexts.