Find open-source science resources

Technical Report 🧬

Active7.1K1 day ago

yashm/gemma4-12b-bioinfo-GGUF

by yashm

This repository contains GGUF files for gemma4-12b-bioinfo, a fine-tuned Gemma 4 12B model for bioinformatics and computational biology.

Active5372 days ago

C

yashm/gemma4-12b-bioinfo

by yashm

gemma4-12b-bioinfo is a fine-tuned Gemma 4 12B instruction model for bioinformatics, genomics, and computational biology question answering.

Active1202 days ago

pfnet/GenerRNA

by pfnet

*GenerRNA is a generative pre-trained language model for de novo RNA sequence design. It is a Transformer (decoder-only, GPT-style) model that learns the "language" of RNA from millions of natural sequences and can generate novel, realistic RNA sequences without any structural input, functional…

Active04 days ago

InstaDeepAI/winnow-helaqc-model

by InstaDeepAI

Winnow recalibrates confidence scores and provides FDR control for de novo peptide sequencing (DNS) workflows. This repository hosts a calibrator trained on the HeLa Single Shot dataset as referenced in our paper: De novo peptide sequencing rescoring and FDR estimation with Winnow.

Active155 days ago

InstaDeepAI/winnow-general-model

by InstaDeepAI

Winnow recalibrates confidence scores and provides FDR control for de novo peptide sequencing (DNS) workflows. This repository hosts a pretrained, general-purpose calibrator that maps raw InstaNovo model confidences and complementary features (mass error, retention time, beam features, fragment…

Active245 days ago

biohub/esm3-sm-open-v1

by biohub

esm3-sm-open-v1 is trained on 2.78 billion natural proteins. With synthetic data augmentation, this led to 3.15 billion protein sequences, 236 million protein structures, and 539 million proteins with function annotations, totaling 771 billion tokens.

Active3.5K5 days ago

Edoardo-BS/HuBERT-ECG-SFT-CardioLearning-large

by Edoardo-BS

Original code at (https://github.com/Edoar-do/HuBERT-ECG)

Active515 days ago

Edoardo-BS/hubert-ecg-large

by Edoardo-BS

Original code at https://github.com/Edoar-do/HuBERT-ECG

Active1466 days ago

Edoardo-BS/hubert-ecg-base

by Edoardo-BS

Original code at https://github.com/Edoar-do/HuBERT-ECG

Active2.2K6 days ago

Edoardo-BS/hubert-ecg-small

by Edoardo-BS

Original code at https://github.com/Edoar-do/HuBERT-ECG

Active5506 days ago

biohub/ESMC-SAE-Overview

by biohub

feature-extraction

This model card provides an overview of the intended use of the ESMC SAE models and examples of how to access them, but it does not have a specific model or model weights. To access each SAE model collection, use the links below:

Active06 days ago

biohub/ESMFold2

by biohub

ESMFold2 is a state-of-the-art model for protein structure prediction and design that defines a new frontier for speed and accuracy. The model predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input for…

Active96.1K6 days ago

biohub/ESMFold2-Fast

by biohub

ESMFold2 is a state-of-the-art model for protein structure prediction and design that defines a new frontier for speed and accuracy. The model predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input for…

Active24.2K6 days ago

biohub/ESMC-6B

by biohub

ESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…

Active614.4K6 days ago

biohub/ESMC-600M

by biohub

ESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…

Active3.5K6 days ago

lastmass/Qwen3.5-Medical-GSPO

by lastmass

A Chinese medical reasoning model fine-tuned from Qwen3.5-4B using a two-stage training pipeline: Supervised Fine-Tuning (SFT) for format alignment, followed by Group Sequence Policy Optimization (GSPO) with an LLM-as-Judge reward function.

Active3.7K1 week ago

PhysicsWallahAI/Aryabhata-2.0

by PhysicsWallahAI

Aryabhata 2 is a reasoning-focused language model developed by PhysicsWallah for competitive STEM examinations (JEE, NEET). It is obtained by post-training GPT-OSS-20B via reinforcement learning on a curated curriculum of Physics, Chemistry, Mathematics, and General Reasoning questions — achieving…

Active2921 week ago

FINAL-Bench/Darwin-218B-Delphi

by FINAL-Bench

> VIDRAFT FINAL-Bench — chemistry-specialized 218B MoE, served via the DELPHI 5-Phase inference cascade.

Active281 week ago

Eculid/HealthJudge

by Eculid

HealthJudge is a domain-adapted helpfulness evaluator for health-related Community Notes. It is designed to judge whether a note provides helpful context for a potentially misleading social-media post, following the Community Notes helpfulness criteria.

Active311 week ago

AI4PD/ProtGPT3-MSA

by AI4PD

ProtGPT3-MSA is a multiple-sequence, homolog-conditioned autoregressive protein language model. It is part of the ProtGPT3 family, an open-source suite of promptable and aligned protein language models for protein sequence generation.

Active1751 week ago

RiverZ/reclip

by RiverZ

This repository hosts release artifacts for ReCLIP:

Active01 week ago

pankajpandey-dev/Carbon-3B-GGUF

by pankajpandey-dev

GGUF quantizations of HuggingFaceBio/Carbon-3B — a generative DNA foundation model — for efficient inference with llama.cpp.

Active6201 week ago

ZJU-AI4H/Hulu-Med-Flash-Preview-27B

by ZJU-AI4H

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Active8791 week ago

ZJU-AI4H/Hulu-Med-30A3

by ZJU-AI4H

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Active1241 week ago

ZJU-AI4H/Hulu-Med-235A22

by ZJU-AI4H

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Active651 week ago

biohub/ESMC-300M

by biohub

ESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…

Active2.8K1 week ago

Prior-Labs/tabpfn_3

by Prior-Labs

tabular-classification

### Model Overview TabPFN-3 is a transformer-based foundation model that uses in-context-learning to solve tabular prediction problems in a forward pass. Inference code can be found at https://github.com/PriorLabs/TabPFN. More details can be found in the Model Report.

Active17K1 week ago

biohub/esmc-600m-2024-12

by biohub

This set of model weights was released with the GitHub-compatible esm package format. The models here are kept for backwards compatibility, but we recommend you use the HuggingFace-compatible model weights at biohub/ESMC-6B (or biohub/ESMC-300M / biohub/ESMC-600M) instead.

Active2.5K2 weeks ago

biohub/esmc-300m-2024-12

by biohub

This set of model weights was released with the GitHub-compatible esm package format. The models here are kept for backwards compatibility, but we recommend you use the HuggingFace-compatible model weights at biohub/ESMC-6B (or biohub/ESMC-300M / biohub/ESMC-600M) instead.

Active6.2K2 weeks ago

ctheodoris/Geneformer

by ctheodoris

# Geneformer Geneformer is a foundational transformer model pretrained on a large-scale corpus of human single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.

Active8.9K2 weeks ago

automatic-speech-recognition

google/medasr

by google

Active16.4K2 weeks ago

ScientaLab/eva-rna

by ScientaLab

feature-extraction

Active732 weeks ago

aasatorres/esm2-sae-topk-16384-k512

by aasatorres

Sparse Autoencoder (SAE) trained on residue-level embeddings from ESM-2 (650M, layer 33) for interpretability research on protein language models.

Active182 weeks ago

DISCO-Design/DISCO

by DISCO-Design

other

DISCO (DIffusion for Sequence-structure CO-design) is a multimodal generative model that simultaneously co-designs protein sequences and 3D structures, conditioned on and co-folded with arbitrary biomolecules — including small-molecule ligands, DNA, and RNA.

Active172 weeks ago

Keylab/COMO

by Keylab

COMO (Closed-loop Optical Molecule recOgnition) is a deep learning framework for Optical Chemical Structure Recognition (OCSR). It recognizes chemical structure diagrams from images and predicts SMILES strings with atom-level 2D coordinates and bond matrices.

Active02 weeks ago

birder-project/vit_reg4_so150m_p14_ls_dino-v2-bio

by birder-project

vitreg4so150mp14ls_dino-v2-bio is a Bio-DINO image encoder for natural photographs of living organisms. It uses a SoViT-150M/14 Vision Transformer with 4 register tokens and 133.6M backbone parameters, trained with a DINOv2-style self-supervised objective on approximately 31 million curated images…

Active3.9K2 weeks ago

birder-project/vit_reg1_s14_ls_dino-v2-dist-bio

by birder-project

vitreg1s14lsdino-v2-dist-bio is a compact Bio-DINO image encoder distilled from the larger Bio-DINO SoViT-150M/14 model. It keeps the same natural-photography biodiversity scope as the teacher model, but uses a much smaller ViT-S/14-style student with 21.7M backbone parameters and 384-dimensional…

Active5222 weeks ago

Manhph2211/D-BETA

by Manhph2211

feature-extraction

Active822 weeks ago

birder-project/dino_v2_vit_reg4_so150m_p14_ls_bio

by birder-project

This repository contains the full Bio-DINO DINOv2 training weights for a SoViT-150M/14 Vision Transformer trained on natural photographs of living organisms. It is the companion release to the Birder backbone checkpoints at .

Active1322 weeks ago

Hamdan003/inventmol-r1

by Hamdan003

Target-Conditioned Molecular Ideation Model for Drug Discovery Research

Active02 weeks ago

Junhauwong/Surge-Cognition-4x8B

by Junhauwong

Active322 weeks ago

BGI-HangzhouAI/Genos-m

by BGI-HangzhouAI

Genos-m is a foundation model for human-associated microbial genomes. It is trained to model microbial DNA sequences at single-nucleotide resolution and supports ultra-long genomic contexts up to one million tokens.

Active312 weeks ago

sichengwang04/Qwen3-8B-syco_med-gated-attention-FT

by sichengwang04

Qwen3-8B-syco_med-gated-attention-FT is a plug-and-play gated attention weight released for AI safety research.

Active03 weeks ago

EPFLiGHT/Apertus-8B-MeditronFO

by EPFLiGHT

Apertus-8B-MeditronFO is a 8B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-8B-Instruct on the Fully Open Meditron Corpus.

Active3813 weeks ago

EPFLiGHT/Apertus-70B-MeditronFO

by EPFLiGHT

Apertus-70B-MeditronFO is a 70B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-70B-Instruct on the Fully Open Meditron Corpus.

Active3973 weeks ago

JThomas-CoE/coe-gemma4-biology-mmlu_pro-14b-a4b-q4

by JThomas-CoE

Base model: google/gemma-4-26b-it Architecture: MoE — 26B total / ≈4B active parameters (1 shared expert + 8 routed from a pool of 128 per MoE layer, 30 MoE layers) Method: Activation-directed expert surgery — 128 → 64 experts per layer (50% reduction) Quantization: Q4KM (≈9.7 GB on disk) Tags:…

Active903 weeks ago

JThomas-CoE/coe-gemma4-math-mmlu_pro-14b-a4b-q4

by JThomas-CoE