Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

145 of 5,893 resources

Showing 150

Technical Report 🧬

Active7.1K1 day ago
Python

gemma4-12b-bioinfo is a fine-tuned Gemma 4 12B instruction model for bioinformatics, genomics, and computational biology question answering.

Active1202 days ago
Python

esm3-sm-open-v1 is trained on 2.78 billion natural proteins. With synthetic data augmentation, this led to 3.15 billion protein sequences, 236 million protein structures, and 539 million proteins with function annotations, totaling 771 billion tokens.

Active3.5K5 days ago
Python

Original code at (https://github.com/Edoar-do/HuBERT-ECG)

Active515 days ago
Python

Original code at https://github.com/Edoar-do/HuBERT-ECG

Active1466 days ago
Python

Original code at https://github.com/Edoar-do/HuBERT-ECG

Active2.2K6 days ago
Python

Original code at https://github.com/Edoar-do/HuBERT-ECG

Active5506 days ago
Python

This model card provides an overview of the intended use of the ESMC SAE models and examples of how to access them, but it does not have a specific model or model weights. To access each SAE model collection, use the links below:

Active06 days ago
Python

ESMFold2 is a state-of-the-art model for protein structure prediction and design that defines a new frontier for speed and accuracy. The model predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input for…

Active96.1K6 days ago
Python

ESMFold2 is a state-of-the-art model for protein structure prediction and design that defines a new frontier for speed and accuracy. The model predicts high-resolution, all-atom 3D protein structures directly from amino acid sequences, with optional multiple sequence alignment (MSA) input for…

Active24.2K6 days ago
Python

ESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…

Active614.4K6 days ago
Python

ESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…

Active3.5K6 days ago
Python

> VIDRAFT FINAL-Bench — chemistry-specialized 218B MoE, served via the DELPHI 5-Phase inference cascade.

Active281 week ago
Python

HealthJudge is a domain-adapted helpfulness evaluator for health-related Community Notes. It is designed to judge whether a note provides helpful context for a potentially misleading social-media post, following the Community Notes helpfulness criteria.

Active311 week ago
Python

ProtGPT3-MSA is a multiple-sequence, homolog-conditioned autoregressive protein language model. It is part of the ProtGPT3 family, an open-source suite of promptable and aligned protein language models for protein sequence generation.

Active1751 week ago
Python

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Active8791 week ago
Python

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Active1241 week ago
Python

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Active651 week ago
Python

ESMC is a state-of-the-art protein language model that has learned the rules of protein biology from training on billions of protein sequences. ESMC provides representations of proteins enabling novel AI applications from therapeutic protein engineering to unlocking basic insights into protein…

Active2.8K1 week ago
Python

This set of model weights was released with the GitHub-compatible esm package format. The models here are kept for backwards compatibility, but we recommend you use the HuggingFace-compatible model weights at biohub/ESMC-6B (or biohub/ESMC-300M / biohub/ESMC-600M) instead.

Active2.5K2 weeks ago
Python

This set of model weights was released with the GitHub-compatible esm package format. The models here are kept for backwards compatibility, but we recommend you use the HuggingFace-compatible model weights at biohub/ESMC-6B (or biohub/ESMC-300M / biohub/ESMC-600M) instead.

Active6.2K2 weeks ago
Python

# Geneformer Geneformer is a foundational transformer model pretrained on a large-scale corpus of human single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.

Active8.9K2 weeks ago
Python
Active732 weeks ago
Python
Active822 weeks ago
Python

Target-Conditioned Molecular Ideation Model for Drug Discovery Research

Active02 weeks ago
Python

Genos-m is a foundation model for human-associated microbial genomes. It is trained to model microbial DNA sequences at single-nucleotide resolution and supports ultra-long genomic contexts up to one million tokens.

Active312 weeks ago
Python

Qwen3-8B-syco_med-gated-attention-FT is a plug-and-play gated attention weight released for AI safety research.

Active03 weeks ago
Python

Apertus-8B-MeditronFO is a 8B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-8B-Instruct on the Fully Open Meditron Corpus.

Active3813 weeks ago
Python

Apertus-70B-MeditronFO is a 70B-parameter medical specialist LLM, produced by supervised fine-tuning of Apertus-70B-Instruct on the Fully Open Meditron Corpus.

Active3973 weeks ago
Python

Gemma 4 E2B fine-tuned on 225K drug–target pairs for novel small-molecule generation.

Active253 weeks ago
Python

- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…

Active8K3 weeks ago
Python

- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…

Active5473 weeks ago
Python

For a convenient overview and download list, visit our model page for this model.

Active7034 weeks ago
Python

A patient-level disease classification model trained on single-cell RNA-seq data. Given a matrix of gene expression profiles (one row per cell), the model produces a disease-category prediction for the patient.

Active761 month ago
Python

MedPsy-4B is a state-of-the-art, text-only medical and healthcare language model purpose-built for edge deployment. Built on top of Qwen3-4B-Thinking-2507 and post-trained with a multi-stage pipeline (supervised fine-tuning + reinforcement learning) on curated medical data, it surpasses models…

Active1.4K1 month ago
Python

In search engines, rerankers are crucial for improving the accuracy of your retrieval system.

Active28K1 month ago
Python

Duchifat-2.3-Instruct is a state-of-the-art, instruction-tuned Large Language Model developed by TopAI. As the flagship of the Duchifat series, this model represents a fundamental breakthrough in how Hebrew is processed, reasoned, and generated in the LLM era.

Active1541 month ago
Python

!image

Active3.9K1 month ago
Python

Fine-tuned version of google/gemma-4-E4B-it across three professional domains — Medical, Legal, and Finance — using QLoRA (4-bit NF4) with Optuna-tuned hyperparameters, trained on Kaggle T4 GPU.

Active1K1 month ago
Python

## Model Description This is a lightweight, high-performance image classification model built to diagnose histopathological scans of lung and colon tissues. This model was specifically designed for rapid web deployment without sacrificing clinical accuracy.

Active42 months ago
Python

MarkushGrapher-2 is an end-to-end multimodal model for recognizing chemical structures from patent document images. It jointly encodes vision, text, and layout information to convert Markush structure images into machine-readable CXSMILES representations.

Active2542 months ago
Python

# or·a·cle /ˈôrəkəl/ — a source of wise counsel; one who provides authoritative knowledge. From Latin ōrāculum, meaning divine announcement. In computer science, an oracle is a black box that always returns the correct answer — you don't ask it how it knows, you ask and it answers.

Active1422 months ago
Python

In retrieval systems, embedding models determine the quality of your search.

Active272.3K2 months ago
Python

In search enginers, rerankers are crucial for improving the accuracy of your retrieval system.

Active22.9K3 months ago
Python