Find open-source science resources

Composite-objective protein design framework integrating Boltz, AlphaFold2, OpenFold3, ProteinMPNN, and ESM via JAX-based gradient optimization over continuous relaxed sequence space for multi-property binder design (319+ stars, MIT License, 2025)

Active3453 weeks ago

ColabFold (2025 Updates)

AlphaFold/ESMFold accessible implementation with AF3 JSON export, database updates

Active2.8K1 month ago

MegaFold

Cross-platform system optimizations for accelerating AlphaFold3 training with 1.73x speedup and 1.23x memory reduction

Active711 month ago

mint

Learning the language of protein-protein interactions

Active1501 month ago

ModelAngelo

Automatic atomic model building program for cryo-EM maps using deep learning, enabling rapid de novo protein structure determination from electron density with high accuracy (3DEM/EMBL, 169+ stars)

Active1711 month ago

Graphormer

General-purpose deep learning backbone for molecular modeling

Active2.5K1 month ago

BioEmu

Microsoft's generative model for sampling protein equilibrium conformations 100,000× faster than MD simulations, predicting domain motions, local unfolding and cryptic binding pockets on a single GPU (Science 2025)

Active8361 month ago

SwitchCraft

Programmatic framework for designing state-switching proteins via backpropagation through compositional design constraints parameterized by structure prediction models; enables de novo design of allosteric regulators and fluorescent biosensors for arbitrary small-molecule analytes (79+ stars, MIT License, ICML 2026)

Active791 month ago

Boltz

First fully open-source model achieving AlphaFold3-level accuracy with 1000x faster binding affinity prediction (MIT)

Active4.1K1 month ago

IgGM

Generative foundation model for functional antibody and nanobody design, supporting de novo generation, affinity maturation, inverse design, structure prediction, and humanization (Tencent AI4S, ICLR 2025)

Active2011 month ago

BindCraft

Simple and accurate de novo protein binder design pipeline using AlphaFold2 backpropagation, MPNN, and PyRosetta for automated binder discovery (bioRxiv 2024)

Active1.1K2 months ago

AlphaFlow

AlphaFold fine-tuned with flow matching for generating protein conformational ensembles, covering both experimental PDB states and molecular dynamics ensembles at physiological temperatures; includes ESMFlow variant (MIT, 526+ stars, 2024)

Active5294 months ago

SaProt

Structure-aware protein language model using 3D structural vocabulary (Foldseek) for joint sequence-structure pretraining, achieving SOTA on protein engineering and fitness prediction benchmarks (ICML 2024, Westlake University & Repl)

Active6134 months ago

ProstT5 (NAR Genomics and Bioinformatics 2024)

Bilingual protein language model translating between protein sequence and structure, finetuned from ProtT5-XL on 17M AlphaFoldDB structures using Foldseek's 3Di structural alphabet, enabling sequence-to-structure prediction, structure-to-sequence inverse folding, and unified protein representation learning (RostLab, 310+ stars)

Active3174 months ago

SimpleFold (Apple, arXiv 2025)

Flow-matching protein folding model using only general-purpose transformer layers, scaled to 3B parameters and trained on 8.6M+ distilled structures; challenges the reliance on complex domain-specific architectures and supports PyTorch and MLX backends with model sizes from 100M to 3B parameters (985+ stars, MIT License)

Active9865 months ago

EvoDiff

Discrete diffusion framework for generative protein sequence design over evolutionary-scale databases, supporting unconditional generation, evolutionary-guided conditional design, motif scaffolding, and intrinsically disordered region generation through order-agnostic autoregressive diffusion, enabling sequence-only protein design without structural priors (Microsoft Research, Nature Communications 2024)

Idle6756 months ago

DynamicBind (NeurIPS 2024)

Deep equivariant generative model predicting ligand-specific protein-ligand complex structures with dynamic receptor conformational flexibility, enabling accurate docking for flexible protein targets

Idle2997 months ago

InterPLM (Nature Methods 2025)

Discovering interpretable features in protein language models via sparse autoencoders, enabling mechanistic understanding of PLM representations for protein engineering and design (288+ stars, MIT License)

Idle2988 months ago

Uni-Mol

Universal 3D molecular pretraining framework with 209M conformations, scaling to 1.1B parameters (Uni-Mol2) on 800M conformations for molecular property prediction, docking, and quantum chemistry (ICLR 2023, NeurIPS 2024)

Idle1.1K1 year ago

ProtTrans

State-of-the-art pretrained language models for proteins trained on thousands of GPUs and Google TPUs using Transformer architectures, enabling protein property prediction, feature extraction, and transfer learning across diverse downstream tasks (1.3K+ stars, MIT, 2020-2026)

Idle1.3K1 year ago

DiffDock

Diffusion-based molecular docking achieving SOTA blind docking performance, treating ligand pose prediction as generative diffusion over SE(3), with DiffDock-L update for improved generalization (MIT CSAIL, ICLR 2023)

Idle1.5K1 year ago

LigandMPNN

Extension of ProteinMPNN for protein sequence design in the context of small-molecule ligands, metal ions, and nucleic acids, enabling binding site engineering and co-factor redesign (Baker Lab)

Idle5881 year ago

Mol-Instructions

Large-scale biomolecular instruction dataset for chemistry/biology LLMs (ICLR2024)

Idle2941 year ago

ChemBERTa

Chemical language model

Idle5001 year ago

ProteinMPNN

Deep learning-based protein sequence design (inverse folding) from backbone structures, achieving 52.4% sequence recovery vs 32.9% for Rosetta, core tool in modern protein design pipelines (Baker Lab, Science 2022)

Idle1.8K1 year ago

ESMFold

Protein structure prediction from ESM models

Archived4.2K2 years ago