Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

29 of 5,923 resources

Neural network-based cryo-EM heterogeneous reconstruction, modeling continuous 3D structure distributions from single-particle images, with CryoDRGN-ET extending to in-cell cryo-electron tomography (MIT CSAIL, Nature Methods 2021/2024)

Active3781 week ago
Python
GPL-3.0

First fully open-source model achieving AlphaFold3-level accuracy with 1000x faster binding affinity prediction (MIT)

Active4K2 weeks ago
Python
MIT

AlphaFold 3 inference pipeline for unified biomolecular structure prediction of proteins, nucleic acids, small molecules, ions, and post-translational modifications (Google DeepMind, Nature 2024)

Active8.1K3 weeks ago
Python
NOASSERTION

Automatic atomic model building program for cryo-EM maps using deep learning, enabling rapid de novo protein structure determination from electron density with high accuracy (3DEM/EMBL, 169+ stars)

Active1693 weeks ago
Python
MIT

General multimodal protein design framework enabling DNA-encoding of chemistry for programmable enzyme design and diverse protein generation through diffusion-based generative modeling (190+ stars, Apache 2.0, 2026)

Active1901 month ago
Python
Apache-2.0

Composite-objective protein design framework integrating Boltz, AlphaFold2, OpenFold3, ProteinMPNN, and ESM via JAX-based gradient optimization over continuous relaxed sequence space for multi-property binder design (319+ stars, MIT License, 2025)

Active3231 month ago
Python
MIT

Learning the language of protein-protein interactions

Active1501 month ago
Python
MIT

Multi-modal foundation model for biomolecular structure prediction (proteins, small molecules, DNA, RNA, glycans) achieving SOTA across benchmarks, with optional MSA/template support (Chai Discovery, 2024)

Active1.9K2 months ago
Python
Apache-2.0

Accessible protein design platform via Google Colab integrating AlphaFold2, RoseTTAFold, and ProteinMPNN for de novo hallucination, fixed backbone design, and binder design (Sergey Ovchinnikov, 2022+)

Active9132 months ago
Python
NOASSERTION

Baidu's open-source reproduction of AlphaFold3 in PaddlePaddle, providing pretrained weights and inference pipelines for unified biomolecular structure prediction across proteins, nucleic acids, ligands, ions, and post-translational modifications within the PaddleHelix biocomputing platform (Baidu, bioRxiv 2024)

Active1.1K2 months ago
Python
NOASSERTION

Unified ML/DL framework for drug discovery workflows, integrating RDKit, DeepChem, and scikit-learn with SHAP explainability

Active1782 months ago
Python
BSD-2-Clause

Structure-aware protein language model using 3D structural vocabulary (Foldseek) for joint sequence-structure pretraining, achieving SOTA on protein engineering and fitness prediction benchmarks (ICML 2024, Westlake University & Repl)

Active6043 months ago
Python
MIT

Multimodal deep learning framework integrating peptide-MHC protein sequence, structure, and biochemical properties to predict class-I immunogenicity for infectious disease epitopes and cancer neoepitopes with cancer-wildtype contrastive learning, enabling personalized vaccine design (Krishnaswamy Lab, Yale University)

Active443 months ago
Python
NOASSERTION

Rectified Quaternion Flow for efficient protein backbone generation, 37× faster than RFDiffusion with 0.972 designability (ICML 2025)

Active844 months ago
Python

Discrete diffusion framework for generative protein sequence design over evolutionary-scale databases, supporting unconditional generation, evolutionary-guided conditional design, motif scaffolding, and intrinsically disordered region generation through order-agnostic autoregressive diffusion, enabling sequence-only protein design without structural priors (Microsoft Research, Nature Communications 2024)

Active6704 months ago
Python
MIT

ICML 2025 drug discovery generalist using masked discrete diffusion and fragment-based generation with molecular context guidance (NVIDIA)

Active1804 months ago
Python

Fast, modular, and accurate de novo design of protein binders based on the Protenix foundation model, achieving 17-82% nanomolar hit rates across diverse targets with 2-6× improvement over prior methods like AlphaProteo and RFdiffusion (229+ stars, Apache 2.0)

Active2295 months ago
Python
Apache-2.0

Trainable, memory-efficient PyTorch reproduction and retraining of AlphaFold2 providing new insights into its learning dynamics and out-of-distribution generalization; widely used as the open-source AlphaFold2 backbone underpinning many downstream protein structure prediction and design pipelines (Columbia AlQuraishi Lab & OpenFold Consortium, Nature Methods 2024)

Active3.4K5 months ago
Python
Apache-2.0

Discovering interpretable features in protein language models via sparse autoencoders, enabling mechanistic understanding of PLM representations for protein engineering and design (288+ stars, MIT License)

Idle2927 months ago
Python
MIT

Dynamic Protein Data Bank integrating dynamic behaviors and physical properties into protein structures via a new dataset and SE(3) model extension, enabling richer understanding of protein conformational landscapes (Fudan University, 784+ stars)

Idle78310 months ago
Python

Family of diffusion protein language models demonstrating versatile generative and predictive capabilities for protein sequences and structures, including multimodal co-generation, conditional folding, inverse folding, motif scaffolding, and representation learning, with open pretrained weights and training scripts (327+ stars, ICML 2024, ICLR 2025, ICML 2025 Spotlight)

Idle33510 months ago
Python
Apache-2.0

In silico directed evolution framework using few-shot active learning to optimize protein activities, enabling rapid protein engineering with minimal experimental data (352+ stars, 2023)

Idle3601 year ago
Python
NOASSERTION

Universal 3D molecular pretraining framework with 209M conformations, scaling to 1.1B parameters (Uni-Mol2) on 800M conformations for molecular property prediction, docking, and quantum chemistry (ICLR 2023, NeurIPS 2024)

Idle1.1K1 year ago
Python
MIT

Industrial-grade reinforcement-learning-based generative platform for de novo molecular design with transformer architectures, supporting multi-objective optimization, scaffold decoration, and curriculum learning (AstraZeneca MolecularAI, REINVENT 4, 2024)

Archived3731 year ago
Python
Apache-2.0

Diffusion model for scalable protein structure design with multi-motif scaffolding capabilities, achieving state-of-the-art designability, diversity, and novelty through SE(3)-equivariant attention and massive data augmentation (AlQuraishi Lab, 2024)

Idle1921 year ago
Python
Apache-2.0

General-purpose deep learning backbone for molecular modeling

Stale2.5K2 years ago
Python
MIT

Generative model for programmable protein design using diffusion modeling, equivariant graph neural networks, and conditional random fields to efficiently sample diverse all-atom structures; supports conditional generation via composable conditioners for substructure, symmetry, shape, and neural-network predictions; validated crystallographically (Generate Biomedicines, Nature 2023)

Stale8192 years ago
Python
Apache-2.0

Protein structure prediction from ESM models

Archived4.1K2 years ago
Python
MIT

3D Equivariant Diffusion for Target-Aware Molecule Generation (ICLR2023)

Stale3412 years ago
Python