Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

65 of 5,893 resources

Showing 5165

Microsoft's generative model for sampling protein equilibrium conformations 100,000× faster than MD simulations, predicting domain motions, local unfolding and cryptic binding pockets on a single GPU (Science 2025)

Extension of ProteinMPNN for protein sequence design in the context of small-molecule ligands, metal ions, and nucleic acids, enabling binding site engineering and co-factor redesign (Baker Lab)

Simple and accurate de novo protein binder design pipeline using AlphaFold2 backpropagation, MPNN, and PyRosetta for automated binder discovery (bioRxiv 2024)

Fast, all-atom SE(3)-equivariant diffusion model for protein design achieving state-of-the-art performance on unconditional generation, motif scaffolding, and binder design while retaining the computational efficiency of equivariant architectures (bioRxiv 2026)

Latest RFdiffusion for protein structure design with 10× speedup and atom-level precision (December 2025)

Structure-based de novo antibody design pipeline built on RFdiffusion for computational generation of target-specific antibodies (RosettaCommons, 2025)

Generative foundation model for functional antibody and nanobody design, supporting de novo generation, affinity maturation, inverse design, structure prediction, and humanization (Tencent AI4S, ICLR 2025)

LLM-based molecular optimization tool

Large-scale biomolecular instruction dataset for chemistry/biology LLMs (ICLR2024)

Powerful and flexible machine learning platform for drug discovery, providing comprehensive tools for molecular property prediction, generative models, knowledge graph reasoning, and reaction prediction with PyTorch backend (1.5K+ stars)

Cheminformatics toolkit

Discovering interpretable features in protein language models via sparse autoencoders, enabling mechanistic understanding of PLM representations for protein engineering and design (288+ stars, MIT License)

AI-assisted mutation nomination approach optimizing protein function by integrating structural and evolutionary constraints into protein inverse folding models, compatible with ProteinMPNN, LigandMPNN, ESM-IF1, and SaProt (Chinese Academy of Sciences, 359+ stars)

Structure prediction and design of proteins with noncanonical amino acids, enabling AI-powered modeling of synthetic biology constructs and expanded genetic code systems (133+ stars, 2025)

Large-scale flow-based protein backbone generator utilizing hierarchical fold class labels for conditioning with a tailored scalable transformer architecture, enabling controllable de novo protein design (264+ stars)