Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

448 of 5,923 resources

Showing 201250

In search enginers, rerankers are crucial for improving the accuracy of your retrieval system.

Active22.9K3 months ago
Python

Structure-aware protein language model using 3D structural vocabulary (Foldseek) for joint sequence-structure pretraining, achieving SOTA on protein engineering and fitness prediction benchmarks (ICML 2024, Westlake University & Repl)

Active6043 months ago
Python
MIT

Multimodal deep learning framework integrating peptide-MHC protein sequence, structure, and biochemical properties to predict class-I immunogenicity for infectious disease epitopes and cancer neoepitopes with cancer-wildtype contrastive learning, enabling personalized vaccine design (Krishnaswamy Lab, Yale University)

Active443 months ago
Python
NOASSERTION

Interactive personal genome analysis toolkit using Claude Code and Python. Parses raw genotyping data from consumer DNA services and analyzes SNPs across 17 categories including health risks, pharmacogenomics, ancestry, and nutrition, with a terminal-style HTML dashboard.

Active443 months ago
Python
MIT

Open-source platform for building, extending, and experimenting with scientific agents, providing modular agent construction tools and standardized evaluation pipelines for accelerating autonomous scientific discovery research (748+ stars, MIT License)

Active7483 months ago
Python
MIT

For a convenient overview and download list, visit our model page for this model.

Active613 months ago
Python

!image/png

Active83 months ago
Python

Deep learning library for Chemistry based on Tensorflow

Active6.8K3 months ago
Python
MIT

![Language: Multilingual]()

Active1.4K3 months ago
Python

Deep learning library for solving PDEs

Active4.2K3 months ago
Python
LGPL-2.1

This ontology integrates cell type markers for cells in the Cell Ontology from various sources along with details of marker context (anatomical context, assay), confidence (where available) and provenance. [from repository]

Active13 months ago
Python

Sahal Shaji Mullappilly\, Mohammed Irfan K\, Omair Mohamed, Mohamed Zidan, Fahad Khan, Salman Khan, Rao Muhammad Anwer, and Hisham Cholakkal

Active3723 months ago
Python

First benchmark evaluating LLMs' ability to rediscover scientific laws through interactive experimentation across 324 tasks in 12 physics domains, featuring memorization-resistant metaphysical shifts of canonical laws (HKUST)

Active1513 months ago
Python
MIT

GenBio AI's software stack for the AI-Driven Digital Organism, supporting adaptation and finetuning of multiscale biological foundation models across DNA, RNA, protein, structure, and single-cell tasks with reproducible CLIs and pretrained model zoo (2025)

Active1153 months ago
Python
NOASSERTION

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 31% better perplexity than standard knowledge distillation at 3.8x compression.

Active703 months ago
Python

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 54% better perplexity than standard knowledge distillation at 9.4x compression.

Active53 months ago
Python

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 87% better perplexity than standard knowledge distillation at 20x compression.

Active143 months ago
Python

FASTQ and SAM quality control using Python.

Active1093 months ago
Python
MIT
Active6.5K3 months ago
Python

A library containing basis sets for use in quantum chemistry calculations. In addition, this library has functionality for manipulation of basis set data.

Active1993 months ago
Python
BSD-3-Clause

A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.

Active373 months ago
Python
MIT

Universal pretrained neural network potential with charge and magnetic moment awareness, trained on 1.5M+ Materials Project inorganic structures for charge-informed molecular dynamics and phase diagram prediction (Berkeley, Nature Machine Intelligence 2023 Cover)

Active3833 months ago
Python
NOASSERTION

Deep learning-based object detection and segmentation for star-convex shapes, widely adopted for cell and nucleus segmentation in fluorescence and electron microscopy via a compact neural network architecture with non-maximum suppression and shape-based post-processing (Nature Methods 2020, 1.2K+ stars)

Active1.2K3 months ago
Python
BSD-3-Clause

Rectified Quaternion Flow for efficient protein backbone generation, 37× faster than RFDiffusion with 0.972 designability (ICML 2025)

Active844 months ago
Python

Physics-informed neural networks

Active5.9K4 months ago
Python
MIT

Azure Semantic Kernel multi-agent PPT generation reference

Active494 months ago
Python
MIT

From Inquiry to Decision: Building Trustworthy Medical AI

Active204 months ago
Python
Active454 months ago
Python

A package for working with nuclear magnetic resonance (NMR) data including functions for reading common binary file formats and processing NMR data.

Active2654 months ago
Python
BSD-3-Clause

File parser/converter for QM, MD and plane-wave DFT programs.

Active1634 months ago
Python
LGPL-3.0

A Neural Machine Translation (NMT) model based on a custom Transformer (Encoder-Decoder) architecture, trained from scratch. This model is designed to translate English sentences into Hebrew using multilingual encoding and specialized layer configurations.

Active184 months ago
Python

A package for creating fast and accurate interatomic potentials.

Active3584 months ago
Python
MIT

Evaluating multimodal autonomous agents in realistic scientific workflows across real scientific software environments (KAlgebra, Celestia, Grass GIS, Lean 4, etc.) with VM-based evaluation infrastructure and agent trajectories

Active1294 months ago
Python
MIT

Apache 2.0 single-cell foundation model family scaling to 3B parameters, pretrained on 266M cell profiles including perturbation data and released with training, embedding, and downstream benchmarking workflows for disease-relevant single-cell tasks (2025)

Active1564 months ago
Python
Apache-2.0

Foundation model for joint segmentation, detection, and recognition of biomedical objects across nine imaging modalities, with v2 introducing BoltzFormer architecture for end-to-end 3D inference (Microsoft, Nature Methods 2025)

Active6684 months ago
Python
Apache-2.0

Open-source toolkit and benchmark for learning-based theorem proving in Lean, providing programmatic Lean interaction, a 98K+ theorem dataset extracted from 217 Lean projects, and ReProver—the first retrieval-augmented LLM-based theorem prover for Lean—with reproducible training pipelines underpinning much subsequent Lean prover research (Caltech & NVIDIA, NeurIPS 2023 Outstanding Paper, Datasets & Benchmarks)

Active8034 months ago
Python
MIT

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.

Active2574 months ago
Python
Other

Discrete diffusion framework for generative protein sequence design over evolutionary-scale databases, supporting unconditional generation, evolutionary-guided conditional design, motif scaffolding, and intrinsically disordered region generation through order-agnostic autoregressive diffusion, enabling sequence-only protein design without structural priors (Microsoft Research, Nature Communications 2024)

Active6704 months ago
Python
MIT

ICML 2025 drug discovery generalist using masked discrete diffusion and fragment-based generation with molecular context guidance (NVIDIA)

Active1804 months ago
Python

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Active7.8K5 months ago
Python

PII Detection Model | 44M Parameters | Open Source

Active27K5 months ago
Python

DeepMind's Olympiad-level geometry theorem prover combining neural language model with symbolic deduction engine, AlphaGeometry2 solves 84% of IMO geometry problems (42/50) at gold-medalist level (Nature 2024)

Active4.8K5 months ago
Python
Apache-2.0

A library for building, manipulating, analyzing and automatic design of molecules, including a genetic algorithm.

Active2845 months ago
Python
MIT

Fast, modular, and accurate de novo design of protein binders based on the Protenix foundation model, achieving 17-82% nanomolar hit rates across diverse targets with 2-6× improvement over prior methods like AlphaProteo and RFdiffusion (229+ stars, Apache 2.0)

Active2295 months ago
Python
Apache-2.0

Shanghai AI Lab's deep learning-based global weather forecasting model pushing skillful forecasts beyond 10 days lead, with open-source inference code and pretrained ONNX model weights (arXiv 2023)

Active1695 months ago
Python

ECMWF's unified framework and command-line tool to run AI-based weather forecasting models (GraphCast, Aurora, Pangu, NeuralGCM, FourCastNet) with operational ECMWF data infrastructure, enabling standardized inference and benchmarking across state-of-the-art meteorological AI systems (ECMWF, 576+ stars)

Active5795 months ago
Python
Apache-2.0

Trainable, memory-efficient PyTorch reproduction and retraining of AlphaFold2 providing new insights into its learning dynamics and out-of-distribution generalization; widely used as the open-source AlphaFold2 backbone underpinning many downstream protein structure prediction and design pipelines (Columbia AlQuraishi Lab & OpenFold Consortium, Nature Methods 2024)

Active3.4K5 months ago
Python
Apache-2.0

The MediPhi Model Collection comprises 7 small language models of 3.8B parameters from the base model Phi-3.5-mini-instruct specialized in the medical and clinical domains. The collection is designed in a modular fashion. Five MediPhi experts are fine-tuned on various medical corpora (i.e.

Active2K5 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

Active325 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. This model version was continually pretrained on ~14 million cancer transcriptomes…

Active165 months ago
Python