Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language(1)
License
Source
Type(1)
269 of 5,923 resources
Showing 101–150
Benchmark evaluating AI agents on 75 curated Kaggle-style ML engineering competitions with reproducible Docker-based grading harness, human baselines, and end-to-end task lifecycle, used as a primary benchmark for autonomous ML research agents (e.g., InternAgent #1 at 36.44%)
ChemML is a machine learning and informatics program suite for the analysis, mining, and modeling of chemical and materials data. (based on Tensorflow)
Manipulation and analysis of geometric objects.
FutureHouse's end-to-end scientific discovery multi-agent system orchestrating literature search (Crow/Falcon) and data analysis (Finch) agents, first AI-generated drug discovery identifying ripasudil as novel dry AMD therapeutic (2025)
Benchmark evaluating AI agents' ability to replicate 20 ICML 2024 Spotlight/Oral papers from scratch, with 8,316 gradable tasks and author-co-developed rubrics
Pretrained time series foundation model for zero-shot forecasting across diverse scientific and real-world domains; tokenizes continuous time series into discrete bins to train transformer language models on large-scale corpora, achieving strong zero-shot generalization and competitive performance with task-specific supervised models on climate, energy, and health benchmarks (5.3K+ stars, Apache 2.0, 2024-2026)
Access to Biological Web Services from Python.
Computational toolbox for large scale Calcium Imaging Analysis, including movie handling, motion correction, source extraction, spike deconvolution and result visualization, using machine learning for automated neuron detection and activity inference in two-photon and one-photon calcium imaging data (723+ stars, actively maintained)
Learning the language of protein-protein interactions
Fully autonomous medical image segmentation research system that generates complete manuscripts end-to-end from datasets with zero human intervention, beating strongest baselines on 24 of 31 datasets and achieving T1-T2 tier manuscript quality in double-blind evaluations (USTC & Shanghai AI Lab, 2026)
First agentic LLM for autonomous data science with end-to-end pipeline from data to analyst-grade reports
Scalable agentic training environment for code-centric reasoning in biomedical data science
Multi-modal foundation model for biomolecular structure prediction (proteins, small molecules, DNA, RNA, glycans) achieving SOTA across benchmarks, with optional MSA/template support (Chai Discovery, 2024)
Programmatic data labeling and weak supervision
Parameter/topology editor and molecular simulator with visualization capability.
Medical time series foundation model pretrained on 454B time points from heterogeneous clinical corpora spanning ICU physiological signals and hospital EHR, with continuous-time rotary positional encoding, frequency-specialized Mixture-of-Experts, and neural ODE extrapolation for zero-shot forecasting across irregular and multimodal temporal health data (Microsoft, 399+ stars, MIT License)
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
Accessible protein design platform via Google Colab integrating AlphaFold2, RoseTTAFold, and ProteinMPNN for de novo hallucination, fixed backbone design, and binder design (Sergey Ovchinnikov, 2022+)
LLM agent framework for Earth Observation with 104 specialized tools across 5 functional kits
Tool to build force field input files for molecular simulation.
GFF and GTF file manipulation and interconversion.
Baidu's open-source reproduction of AlphaFold3 in PaddlePaddle, providing pretrained weights and inference pipelines for unified biomolecular structure prediction across proteins, nucleic acids, ligands, ions, and post-translational modifications within the PaddleHelix biocomputing platform (Baidu, bioRxiv 2024)
Google DeepMind's diffusion-based ensemble weather forecasting model at 0.25° resolution, outperforming ECMWF ENS on 97.2% of targets up to 15 days ahead, with open-source code and weights (Nature 2024)
A library and command-line tool for building and analyzing complex homogeneous microkinetic models from quantum chemistry calculations, with support for quasi-harmonic thermochemistry, quantum tunnelling corrections, molecular symmetries and more.
Unified ML/DL framework for drug discovery workflows, integrating RDKit, DeepChem, and scikit-learn with SHAP explainability
End-to-end semi-automated scientific discovery system that designs, iterates, and analyzes code-based experiments via LLM-as-a-mutator over scientific articles and code examples; auto-creates, runs, and debugs experiment code in containers and writes meta-analysis reports (339+ stars, Apache 2.0)
Free-text promptable universal 3D medical image segmentation foundation model enabling zero-shot segmentation of diverse anatomical structures and pathologies via natural language prompts across CT, MRI, and other volumetric imaging modalities (DKFZ, 195+ stars, Apache 2.0)
Predicts the pKa values of ionizable groups in proteins and protein-ligand complexes based in the 3D structure.
Deep learning-based variant caller
Open-source implementation of AlphaEvolve's evolutionary coding agent paradigm, enabling LLMs to autonomously discover and optimize algorithms through iterative evolution, matching the approach behind DeepMind's breakthrough matrix multiplication discovery (6.2K+ stars, 2025)
A Python package for protein dynamics analysis
Structure-aware protein language model using 3D structural vocabulary (Foldseek) for joint sequence-structure pretraining, achieving SOTA on protein engineering and fitness prediction benchmarks (ICML 2024, Westlake University & Repl)
Multimodal deep learning framework integrating peptide-MHC protein sequence, structure, and biochemical properties to predict class-I immunogenicity for infectious disease epitopes and cancer neoepitopes with cancer-wildtype contrastive learning, enabling personalized vaccine design (Krishnaswamy Lab, Yale University)
Interactive personal genome analysis toolkit using Claude Code and Python. Parses raw genotyping data from consumer DNA services and analyzes SNPs across 17 categories including health risks, pharmacogenomics, ancestry, and nutrition, with a terminal-style HTML dashboard.
Open-source platform for building, extending, and experimenting with scientific agents, providing modular agent construction tools and standardized evaluation pipelines for accelerating autonomous scientific discovery research (748+ stars, MIT License)
Deep learning library for Chemistry based on Tensorflow
Deep learning library for solving PDEs
First benchmark evaluating LLMs' ability to rediscover scientific laws through interactive experimentation across 324 tasks in 12 physics domains, featuring memorization-resistant metaphysical shifts of canonical laws (HKUST)
GenBio AI's software stack for the AI-Driven Digital Organism, supporting adaptation and finetuning of multiscale biological foundation models across DNA, RNA, protein, structure, and single-cell tasks with reproducible CLIs and pretrained model zoo (2025)
FASTQ and SAM quality control using Python.
A library containing basis sets for use in quantum chemistry calculations. In addition, this library has functionality for manipulation of basis set data.
A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.
Universal pretrained neural network potential with charge and magnetic moment awareness, trained on 1.5M+ Materials Project inorganic structures for charge-informed molecular dynamics and phase diagram prediction (Berkeley, Nature Machine Intelligence 2023 Cover)
Deep learning-based object detection and segmentation for star-convex shapes, widely adopted for cell and nucleus segmentation in fluorescence and electron microscopy via a compact neural network architecture with non-maximum suppression and shape-based post-processing (Nature Methods 2020, 1.2K+ stars)
Rectified Quaternion Flow for efficient protein backbone generation, 37× faster than RFDiffusion with 0.972 designability (ICML 2025)
Physics-informed neural networks
Azure Semantic Kernel multi-agent PPT generation reference
A package for working with nuclear magnetic resonance (NMR) data including functions for reading common binary file formats and processing NMR data.
File parser/converter for QM, MD and plane-wave DFT programs.