Find open-source science resources

Technical Report 🧬

Active7.1K1 day ago

yashm/gemma4-12b-bioinfo

by yashm

gemma4-12b-bioinfo is a fine-tuned Gemma 4 12B instruction model for bioinformatics, genomics, and computational biology question answering.

Active1202 days ago

esm3-sm-open-v1 is trained on 2.78 billion natural proteins. With synthetic data augmentation, this led to 3.15 billion protein sequences, 236 million protein structures, and 539 million proteins with function annotations, totaling 771 billion tokens.

Active3.5K5 days ago

Edoardo-BS/HuBERT-ECG-SFT-CardioLearning-large

by Edoardo-BS

Original code at (https://github.com/Edoar-do/HuBERT-ECG)

Active515 days ago

Edoardo-BS/hubert-ecg-large

by Edoardo-BS

image-feature-extraction

Original code at https://github.com/Edoar-do/HuBERT-ECG

Active1466 days ago

Edoardo-BS/hubert-ecg-base

by Edoardo-BS

image-feature-extraction

Original code at https://github.com/Edoar-do/HuBERT-ECG

Active2.2K6 days ago

Edoardo-BS/hubert-ecg-small

by Edoardo-BS

image-feature-extraction

Original code at https://github.com/Edoar-do/HuBERT-ECG

Active5506 days ago

biohub/ESMC-SAE-Overview

by biohub

This model card provides an overview of the intended use of the ESMC SAE models and examples of how to access them, but it does not have a specific model or model weights. To access each SAE model collection, use the links below:

Active06 days ago

biohub/ESMFold2

by biohub

ESMFold2 is a state-of-the-art model for protein structure prediction and design that defines a new frontier for speed and accuracy. The model predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input for…

Active96.1K6 days ago

biohub/ESMFold2-Fast

by biohub

ESMFold2 is a state-of-the-art model for protein structure prediction and design that defines a new frontier for speed and accuracy. The model predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input for…

Active24.2K6 days ago

biohub/ESMC-6B

by biohub

ESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…

Active614.4K6 days ago

biohub/ESMC-600M

by biohub

ESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…

Active3.5K6 days ago

FINAL-Bench/Darwin-218B-Delphi

by FINAL-Bench

> VIDRAFT FINAL-Bench — chemistry-specialized 218B MoE, served via the DELPHI 5-Phase inference cascade.

Active281 week ago

Eculid/HealthJudge

by Eculid

HealthJudge is a domain-adapted helpfulness evaluator for health-related Community Notes. It is designed to judge whether a note provides helpful context for a potentially misleading social-media post, following the Community Notes helpfulness criteria.

Active311 week ago

AI4PD/ProtGPT3-MSA

by AI4PD

ProtGPT3-MSA is a multiple-sequence, homolog-conditioned autoregressive protein language model. It is part of the ProtGPT3 family, an open-source suite of promptable and aligned protein language models for protein sequence generation.

Active1751 week ago

ZJU-AI4H/Hulu-Med-Flash-Preview-27B

by ZJU-AI4H

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Active8791 week ago

ZJU-AI4H/Hulu-Med-30A3

by ZJU-AI4H

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Active1241 week ago

ZJU-AI4H/Hulu-Med-235A22

by ZJU-AI4H

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Active651 week ago

biohub/ESMC-300M

by biohub

ESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…

Active2.8K1 week ago

biohub/esmc-600m-2024-12

by biohub

This set of model weights was released with the GitHub-compatible esm package format. The models here are kept for backwards compatibility, but we recommend you use the HuggingFace-compatible model weights at biohub/ESMC-6B (or biohub/ESMC-300M / biohub/ESMC-600M) instead.

Active2.5K2 weeks ago

biohub/esmc-300m-2024-12

by biohub

This set of model weights was released with the GitHub-compatible esm package format. The models here are kept for backwards compatibility, but we recommend you use the HuggingFace-compatible model weights at biohub/ESMC-6B (or biohub/ESMC-300M / biohub/ESMC-600M) instead.

Active6.2K2 weeks ago

ctheodoris/Geneformer

by ctheodoris

# Geneformer Geneformer is a foundational transformer model pretrained on a large-scale corpus of human single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.

Active8.9K2 weeks ago

automatic-speech-recognition

google/medasr

by google

Active16.4K2 weeks ago

ScientaLab/eva-rna

by ScientaLab

Active732 weeks ago

Manhph2211/D-BETA

by Manhph2211

Active822 weeks ago

Hamdan003/inventmol-r1

by Hamdan003

Target-Conditioned Molecular Ideation Model for Drug Discovery Research

Active02 weeks ago

Junhauwong/Surge-Cognition-4x8B

by Junhauwong

Active322 weeks ago

BGI-HangzhouAI/Genos-m

by BGI-HangzhouAI

Genos-m is a foundation model for human-associated microbial genomes. It is trained to model microbial DNA sequences at single-nucleotide resolution and supports ultra-long genomic contexts up to one million tokens.

Active312 weeks ago

sichengwang04/Qwen3-8B-syco_med-gated-attention-FT

by sichengwang04

Qwen3-8B-syco_med-gated-attention-FT is a plug-and-play gated attention weight released for AI safety research.

Active03 weeks ago

EPFLiGHT/Apertus-8B-MeditronFO

by EPFLiGHT

Apertus-8B-MeditronFO is a 8B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-8B-Instruct on the Fully Open Meditron Corpus.

Active3813 weeks ago

EPFLiGHT/Apertus-70B-MeditronFO

by EPFLiGHT

Apertus-70B-MeditronFO is a 70B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-70B-Instruct on the Fully Open Meditron Corpus.

Active3973 weeks ago

dlyog/gemma-cure

by dlyog

Gemma 4 E2B fine-tuned on 225K drug–target pairs for novel small-molecule generation.

Active253 weeks ago

macwiatrak/bacformer-large-masked-MAG

by macwiatrak

- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…

Active8K3 weeks ago

macwiatrak/bacformer-large-masked-complete-genomes

by macwiatrak

- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…

Active5473 weeks ago

mradermacher/zerank-2-GGUF

by mradermacher

For a convenient overview and download list, visit our model page for this model.

Active7034 weeks ago

oriel9p/protsent-esm2-150M

by oriel9p

sentence-similarity

Active01 month ago

oriel9p/protsent-esm2-35M

by oriel9p

sentence-similarity

Active01 month ago

ConvergeBio/virtual-cell-patient

by ConvergeBio

A patient-level disease classification model trained on single-cell RNA-seq data. Given a matrix of gene expression profiles (one row per cell), the model produces a disease-category prediction for the patient.

Active761 month ago

qvac/MedPsy-4B

by qvac

MedPsy-4B is a state-of-the-art, text-only medical and healthcare language model purpose-built for edge deployment. Built on top of Qwen3-4B-Thinking-2507 and post-trained with a multi-stage pipeline (supervised fine-tuning + reinforcement learning) on curated medical data, it surpasses models…

Active1.4K1 month ago

zeroentropy/zerank-2-reranker

by zeroentropy

text-ranking

In search engines, rerankers are crucial for improving the accuracy of your retrieval system.

Active28K1 month ago

razielAI/Duchifat-2.3-Instruct

by razielAI

Duchifat-2.3-Instruct is a state-of-the-art, instruction-tuned Large Language Model developed by TopAI. As the flagship of the Duchifat series, this model represents a fundamental breakthrough in how Hebrew is processed, reasoned, and generated in the LLM era.

Active1541 month ago

Jackrong/Qwopus3.5-27B-v3.5-GGUF

by Jackrong

!image

Active3.9K1 month ago

google/medgemma-1.5-4b-it

by google

Active419.6K1 month ago

Acryl-aLLM/ALLM.H-Bv4-Gemma4-31B-BF16

by Acryl-aLLM

Active131 month ago

rajveer43/gemma-4-E4B-medical-legal-finance-qa

by rajveer43

Fine-tuned version of google/gemma-4-E4B-it across three professional domains — Medical, Legal, and Finance — using QLoRA (4-bit NF4) with Optuna-tuned hyperparameters, trained on Kaggle T4 GPU.

Active1K1 month ago

rarfileexe/XPathology-CNN_2.0_advance

by rarfileexe

image-classification

## Model Description This is a lightweight, high-performance image classification model built to diagnose histopathological scans of lung and colon tissues. This model was specifically designed for rapid web deployment without sacrificing clinical accuracy.

Active42 months ago

docling-project/MarkushGrapher-2

by docling-project

image-to-text

MarkushGrapher-2 is an end-to-end multimodal model for recognizing chemical structures from patent document images. It jointly encodes vision, text, and layout information to convert Markush structure images into machine-readable CXSMILES representations.

Active2542 months ago

Verdugie/STEM-Oracle-27B

by Verdugie

# or·a·cle /ˈôrəkəl/ — a source of wise counsel; one who provides authoritative knowledge. From Latin ōrāculum, meaning divine announcement. In computer science, an oracle is a black box that always returns the correct answer — you don't ask it how it knows, you ask and it answers.

Active1422 months ago