Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

303 of 5,923 resources

Showing 51–100

Apertus-70B-MeditronFO is a 70B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-70B-Instruct on the Fully Open Meditron Corpus.

Active3973 weeks ago
Python

LoRA fine-tune of Qwen3.5-9B on synthetic clinical triage Q&A pairs generated from PubMed Central open-access papers. The model is specialized for emergency-medicine decision-making: triaging patients, applying clinical decision rules, and generating protocol-grounded triage recommendations.

Active03 weeks ago
Python

Base model: google/gemma-4-26b-it Architecture: MoE β€” 26B total / β‰ˆ4B active parameters (1 shared expert + 8 routed from a pool of 128 per MoE layer, 30 MoE layers) Method: Activation-directed expert surgery β€” 128 β†’ 64 experts per layer (50% reduction) Quantization: Q4KM (β‰ˆ9.7 GB on disk) Tags:…

Active903 weeks ago

Base model: google/gemma-4-26b-it Architecture: MoE β€” 26B total / β‰ˆ4B active parameters (1 shared expert + 8 routed from a pool of 128 per MoE layer, 30 MoE layers) Method: Activation-directed expert surgery β€” 128 β†’ 64 experts per layer (50% reduction) Quantization: Q4KM (β‰ˆ9.7 GB on disk) Tags:…

Active3063 weeks ago

Gemma 4 E2B fine-tuned on 225K drug–target pairs for novel small-molecule generation.

Active253 weeks ago
Python

- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…

Active8K4 weeks ago
Python

## Important Notice If you are using GENERator for sequence generation, please ensure that the length of each input sequence is a multiple of 6. This can be achieved by either: 1. Padding the sequence on the left with 'A' (left padding); 2. Truncating the sequence from the left (left truncation).

Active4.9K4 weeks ago
Python

- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…

Active5474 weeks ago
Python

This repository hosts JAX exports of MatterSim v1.0.0 for use with kUPS, a JAX-native molecular-simulation toolkit. Each artefact is a self-contained .zip containing the serialized JAX computation graph, the original model parameters, and the minimal metadata needed to run inference.

Active01 month ago

Meow-Omni 1 is the world’s first Multimodal Large Language Model (MLLM) specifically engineered for Computational Ethology. It natively co-embeds four distinct modalitiesβ€”Text, Video, Audio, and Biological Time-Seriesβ€”to decode the latent intentions of non-verbal species.

Active2521 month ago

For a convenient overview and download list, visit our model page for this model.

Active7031 month ago
Python

This U‑Net model predicts tilt, z scanner drift, and other large‑scale imaging artifacts present in Atomic Force Microscopy (AFM) height maps. It outputs a background image, the same size and scale as the raw AFM image, which can be subtracted (via the accompanying afMLevel code) to produce a…

Active01 month ago

This U‑Net model masks features in Atomic Force Microscopy (AFM) height maps. It outputs a probability mask image, the same size as the raw AFM image; the accompanying python package, afMLevel code then applies a threshold (typically 0.5) to produce a binary mask.

Active01 month ago

A patient-level disease classification model trained on single-cell RNA-seq data. Given a matrix of gene expression profiles (one row per cell), the model produces a disease-category prediction for the patient.

Active761 month ago
Python
Active01 month ago

MedPsy-4B is a state-of-the-art, text-only medical and healthcare language model purpose-built for edge deployment. Built on top of Qwen3-4B-Thinking-2507 and post-trained with a multi-stage pipeline (supervised fine-tuning + reinforcement learning) on curated medical data, it surpasses models…

Active1.2K1 month ago
Python

> [!WARNING] > This is a baseline model trained on publicly available data. While we've done our best to curate the data, the model performance is quite poor. Proceed with caution.

Active241 month ago

InstaNovo-P is a specialized transformer-based model for de novo peptide sequencing from phosphoproteomics mass spectrometry data. This model is specifically trained and optimized for identifying phosphorylated peptides and their modification sites.

Active71 month ago

# InstaNovo: De novo Peptide Sequencing Model ## Model Description

Active111 month ago

# InstaNovo: De novo Peptide Sequencing Model ## Model Description

Active161 month ago

In search engines, rerankers are crucial for improving the accuracy of your retrieval system.

Active30K1 month ago
Python

Stack is a large-scale encoder-decoder foundation model for single-cell biology. It introduces a novel tabular attention architecture that enables both intra- and inter-cellular information flow, setting cell-by-gene matrix chunks as the basic input data unit.

Active01 month ago
Active6.2K1 month ago

PickyBinders/tea

by PickyBinders

!Model Architecture

Active142.7K1 month ago

This model is a fine-tuned version of Qwen 3.5 0.8B on a specialized dataset covering biochemistry, peptides, and steroids. It is optimized for providing detailed information on compound mechanisms, dosage (including gender-specific considerations), cycle planning, and physiological effects.

Active6481 month ago

FitCareer_AI Introduction

Active01 month ago

# ACE-V1.1: Brain Tumor Detection !Python!Format > [!CAUTION] > MEDICAL RESEARCH USE ONLY. ACE-V1.1 is NOT a cleared medical device. It must not be used for primary diagnosis or clinical decision-making. All outputs must be verified by a qualified professional.

Active01 month ago

This repository contains the model used for the paper Bridging Quantum Mechanics to Organic Liquid Properties via a Universal Force Field。

Active01 month ago

A domain-optimized reasoning model built on DeepSeek-R1-Distill-Qwen-32B, refined through a multi-stage pipeline of GPTQ quantization-aware training and QLoRA fine-tuning. Achieves 84% on MedQA β€” within 4 points of GPT-4o β€” in a ~20GB package that fits on a single L40/L40s GPU.

Active3471 month ago

# ModernGENA base ModernGENA is a DNA foundation model based on ModernBERT (a modernized BERT-style encoder architecture) adapted for genomic sequence modeling. ModernGENA base is the 377M-parameter version introduced in the paper Back to BERT in 2026: ModernGENA as a Strong, Efficient Baseline for…

Active4431 month ago

An extended version of SCimilarity, a metric-learning model for single-cell RNA-seq that maps cells to a unified 128-dimensional embedding space. The original model and method are described in:

Active01 month ago

Duchifat-2.3-Instruct is a state-of-the-art, instruction-tuned Large Language Model developed by TopAI. As the flagship of the Duchifat series, this model represents a fundamental breakthrough in how Hebrew is processed, reasoned, and generated in the LLM era.

Active1541 month ago
Python

!image

Active3.9K1 month ago
Python
Active437K2 months ago
Python

Fine-tuned version of google/gemma-4-E4B-it across three professional domains β€” Medical, Legal, and Finance β€” using QLoRA (4-bit NF4) with Optuna-tuned hyperparameters, trained on Kaggle T4 GPU.

Active1K2 months ago
Python

L1 (Learning Unit 1) is the first language model from Lunit and Lunit Consortium, purpose-built for the medical domain. Derived from Gravity-16B-A3B-Base, L1 is designed for clinical reasoning and decision support.

Active2492 months ago
Python

CodonTranslator is a protein-conditioned codon sequence generation model trained on the representative-only data_v3 release.

Active02 months ago

!Banner.

Active3182 months ago

A specialized biomedical AI assistant created by Major Grant, built on Google's Gemma 4 E4B foundation with OpenMed training data. GGUF format for efficient local inference.

Active2082 months ago

## Model Description This is a lightweight, high-performance image classification model built to diagnose histopathological scans of lung and colon tissues. This model was specifically designed for rapid web deployment without sacrificing clinical accuracy.

Active42 months ago
Python

### Model Overview TabPFN-2.6 is a transformer-based foundation model that uses in-context-learning to solve tabular prediction problems in a forward pass. Inference code can be found at https://github.com/PriorLabs/tabPFN.

Active4K2 months ago

### Model Overview TabPFN-2.5 is a transformer-based foundation model that uses in-context-learning to solve tabular prediction problems in a forward pass. Inference code can be found at https://github.com/PriorLabs/tabPFN.

Active22.1K2 months ago

MarkushGrapher-2 is an end-to-end multimodal model for recognizing chemical structures from patent document images. It jointly encodes vision, text, and layout information to convert Markush structure images into machine-readable CXSMILES representations.

Active2242 months ago
Python

# orΒ·aΒ·cle /ˈôrΙ™kΙ™l/ β€” a source of wise counsel; one who provides authoritative knowledge. From Latin ōrāculum, meaning divine announcement. In computer science, an oracle is a black box that always returns the correct answer β€” you don't ask it how it knows, you ask and it answers.

Active1422 months ago
Python

Github | Cite

Active82 months ago

Github | Cite

Active62 months ago

A diffusion language model for genome-scale perturbation prediction across diverse cellular contexts.

Active02 months ago