Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain(1)
Language(1)
License
Source
Type
29 of 5,923 resources
Neural network-based cryo-EM heterogeneous reconstruction, modeling continuous 3D structure distributions from single-particle images, with CryoDRGN-ET extending to in-cell cryo-electron tomography (MIT CSAIL, Nature Methods 2021/2024)
First fully open-source model achieving AlphaFold3-level accuracy with 1000x faster binding affinity prediction (MIT)
AlphaFold 3 inference pipeline for unified biomolecular structure prediction of proteins, nucleic acids, small molecules, ions, and post-translational modifications (Google DeepMind, Nature 2024)
Automatic atomic model building program for cryo-EM maps using deep learning, enabling rapid de novo protein structure determination from electron density with high accuracy (3DEM/EMBL, 169+ stars)
General multimodal protein design framework enabling DNA-encoding of chemistry for programmable enzyme design and diverse protein generation through diffusion-based generative modeling (190+ stars, Apache 2.0, 2026)
Composite-objective protein design framework integrating Boltz, AlphaFold2, OpenFold3, ProteinMPNN, and ESM via JAX-based gradient optimization over continuous relaxed sequence space for multi-property binder design (319+ stars, MIT License, 2025)
Learning the language of protein-protein interactions
Multi-modal foundation model for biomolecular structure prediction (proteins, small molecules, DNA, RNA, glycans) achieving SOTA across benchmarks, with optional MSA/template support (Chai Discovery, 2024)
Accessible protein design platform via Google Colab integrating AlphaFold2, RoseTTAFold, and ProteinMPNN for de novo hallucination, fixed backbone design, and binder design (Sergey Ovchinnikov, 2022+)
Baidu's open-source reproduction of AlphaFold3 in PaddlePaddle, providing pretrained weights and inference pipelines for unified biomolecular structure prediction across proteins, nucleic acids, ligands, ions, and post-translational modifications within the PaddleHelix biocomputing platform (Baidu, bioRxiv 2024)
Unified ML/DL framework for drug discovery workflows, integrating RDKit, DeepChem, and scikit-learn with SHAP explainability
Structure-aware protein language model using 3D structural vocabulary (Foldseek) for joint sequence-structure pretraining, achieving SOTA on protein engineering and fitness prediction benchmarks (ICML 2024, Westlake University & Repl)
Multimodal deep learning framework integrating peptide-MHC protein sequence, structure, and biochemical properties to predict class-I immunogenicity for infectious disease epitopes and cancer neoepitopes with cancer-wildtype contrastive learning, enabling personalized vaccine design (Krishnaswamy Lab, Yale University)
Rectified Quaternion Flow for efficient protein backbone generation, 37× faster than RFDiffusion with 0.972 designability (ICML 2025)
Discrete diffusion framework for generative protein sequence design over evolutionary-scale databases, supporting unconditional generation, evolutionary-guided conditional design, motif scaffolding, and intrinsically disordered region generation through order-agnostic autoregressive diffusion, enabling sequence-only protein design without structural priors (Microsoft Research, Nature Communications 2024)
ICML 2025 drug discovery generalist using masked discrete diffusion and fragment-based generation with molecular context guidance (NVIDIA)
Fast, modular, and accurate de novo design of protein binders based on the Protenix foundation model, achieving 17-82% nanomolar hit rates across diverse targets with 2-6× improvement over prior methods like AlphaProteo and RFdiffusion (229+ stars, Apache 2.0)
Trainable, memory-efficient PyTorch reproduction and retraining of AlphaFold2 providing new insights into its learning dynamics and out-of-distribution generalization; widely used as the open-source AlphaFold2 backbone underpinning many downstream protein structure prediction and design pipelines (Columbia AlQuraishi Lab & OpenFold Consortium, Nature Methods 2024)
Discovering interpretable features in protein language models via sparse autoencoders, enabling mechanistic understanding of PLM representations for protein engineering and design (288+ stars, MIT License)
Dynamic Protein Data Bank integrating dynamic behaviors and physical properties into protein structures via a new dataset and SE(3) model extension, enabling richer understanding of protein conformational landscapes (Fudan University, 784+ stars)
Family of diffusion protein language models demonstrating versatile generative and predictive capabilities for protein sequences and structures, including multimodal co-generation, conditional folding, inverse folding, motif scaffolding, and representation learning, with open pretrained weights and training scripts (327+ stars, ICML 2024, ICLR 2025, ICML 2025 Spotlight)
In silico directed evolution framework using few-shot active learning to optimize protein activities, enabling rapid protein engineering with minimal experimental data (352+ stars, 2023)
Universal 3D molecular pretraining framework with 209M conformations, scaling to 1.1B parameters (Uni-Mol2) on 800M conformations for molecular property prediction, docking, and quantum chemistry (ICLR 2023, NeurIPS 2024)
Industrial-grade reinforcement-learning-based generative platform for de novo molecular design with transformer architectures, supporting multi-objective optimization, scaffold decoration, and curriculum learning (AstraZeneca MolecularAI, REINVENT 4, 2024)
Diffusion model for scalable protein structure design with multi-motif scaffolding capabilities, achieving state-of-the-art designability, diversity, and novelty through SE(3)-equivariant attention and massive data augmentation (AlQuraishi Lab, 2024)
General-purpose deep learning backbone for molecular modeling
Generative model for programmable protein design using diffusion modeling, equivariant graph neural networks, and conditional random fields to efficiently sample diverse all-atom structures; supports conditional generation via composable conditioners for substructure, symmetry, shape, and neural-network predictions; validated crystallographically (Generate Biomedicines, Nature 2023)
Protein structure prediction from ESM models
3D Equivariant Diffusion for Target-Aware Molecule Generation (ICLR2023)