Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

269 of 5,923 resources

Showing 101150

Benchmark evaluating AI agents on 75 curated Kaggle-style ML engineering competitions with reproducible Docker-based grading harness, human baselines, and end-to-end task lifecycle, used as a primary benchmark for autonomous ML research agents (e.g., InternAgent #1 at 36.44%)

Active1.5K1 month ago
Python
NOASSERTION

ChemML is a machine learning and informatics program suite for the analysis, mining, and modeling of chemical and materials data. (based on Tensorflow)

Active1761 month ago
Python
BSD-3-Clause

Manipulation and analysis of geometric objects.

Active4.4K1 month ago
Python
BSD-3-Clause

FutureHouse's end-to-end scientific discovery multi-agent system orchestrating literature search (Crow/Falcon) and data analysis (Finch) agents, first AI-generated drug discovery identifying ripasudil as novel dry AMD therapeutic (2025)

Active4411 month ago
Python
Apache-2.0

Benchmark evaluating AI agents' ability to replicate 20 ICML 2024 Spotlight/Oral papers from scratch, with 8,316 gradable tasks and author-co-developed rubrics

Active1.2K1 month ago
Python
MIT

Pretrained time series foundation model for zero-shot forecasting across diverse scientific and real-world domains; tokenizes continuous time series into discrete bins to train transformer language models on large-scale corpora, achieving strong zero-shot generalization and competitive performance with task-specific supervised models on climate, energy, and health benchmarks (5.3K+ stars, Apache 2.0, 2024-2026)

Active5.4K1 month ago
Python
Apache-2.0

Access to Biological Web Services from Python.

Active3371 month ago
Python
NOASSERTION

A python-based workflow manager.

Active5901 month ago
Python
Apache-2.0

Computational toolbox for large scale Calcium Imaging Analysis, including movie handling, motion correction, source extraction, spike deconvolution and result visualization, using machine learning for automated neuron detection and activity inference in two-photon and one-photon calcium imaging data (723+ stars, actively maintained)

Active7231 month ago
Python
GPL-2.0

Learning the language of protein-protein interactions

Active1501 month ago
Python
MIT

Fully autonomous medical image segmentation research system that generates complete manuscripts end-to-end from datasets with zero human intervention, beating strongest baselines on 24 of 31 datasets and achieving T1-T2 tier manuscript quality in double-blind evaluations (USTC & Shanghai AI Lab, 2026)

Active3502 months ago
Python
Apache-2.0

First agentic LLM for autonomous data science with end-to-end pipeline from data to analyst-grade reports

Active4.2K2 months ago
Python
MIT

Scalable agentic training environment for code-centric reasoning in biomedical data science

Active1142 months ago
Python

Multi-modal foundation model for biomolecular structure prediction (proteins, small molecules, DNA, RNA, glycans) achieving SOTA across benchmarks, with optional MSA/template support (Chai Discovery, 2024)

Active1.9K2 months ago
Python
Apache-2.0

Programmatic data labeling and weak supervision

Active6K2 months ago
Python
Apache-2.0

Parameter/topology editor and molecular simulator with visualization capability.

Active4522 months ago
Python

Medical time series foundation model pretrained on 454B time points from heterogeneous clinical corpora spanning ICU physiological signals and hospital EHR, with continuous-time rotary positional encoding, frequency-specialized Mixture-of-Experts, and neural ODE extrapolation for zero-shot forecasting across irregular and multimodal temporal health data (Microsoft, 399+ stars, MIT License)

Active3992 months ago
Python
MIT

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

Active2662 months ago
Python
BSD-3-Clause

Accessible protein design platform via Google Colab integrating AlphaFold2, RoseTTAFold, and ProteinMPNN for de novo hallucination, fixed backbone design, and binder design (Sergey Ovchinnikov, 2022+)

Active9132 months ago
Python
NOASSERTION

LLM agent framework for Earth Observation with 104 specialized tools across 5 functional kits

Active1522 months ago
Python
MIT

Tool to build force field input files for molecular simulation.

Active2012 months ago
Python
MIT

GFF and GTF file manipulation and interconversion.

Active3192 months ago
Python
MIT

Baidu's open-source reproduction of AlphaFold3 in PaddlePaddle, providing pretrained weights and inference pipelines for unified biomolecular structure prediction across proteins, nucleic acids, ligands, ions, and post-translational modifications within the PaddleHelix biocomputing platform (Baidu, bioRxiv 2024)

Active1.1K2 months ago
Python
NOASSERTION

Google DeepMind's diffusion-based ensemble weather forecasting model at 0.25° resolution, outperforming ECMWF ENS on 97.2% of targets up to 15 days ahead, with open-source code and weights (Nature 2024)

Active6.7K2 months ago
Python
Apache-2.0

A library and command-line tool for building and analyzing complex homogeneous microkinetic models from quantum chemistry calculations, with support for quasi-harmonic thermochemistry, quantum tunnelling corrections, molecular symmetries and more.

Active642 months ago
Python
MIT

Unified ML/DL framework for drug discovery workflows, integrating RDKit, DeepChem, and scikit-learn with SHAP explainability

Active1782 months ago
Python
BSD-2-Clause

End-to-end semi-automated scientific discovery system that designs, iterates, and analyzes code-based experiments via LLM-as-a-mutator over scientific articles and code examples; auto-creates, runs, and debugs experiment code in containers and writes meta-analysis reports (339+ stars, Apache 2.0)

Active3392 months ago
Python
Apache-2.0

Free-text promptable universal 3D medical image segmentation foundation model enabling zero-shot segmentation of diverse anatomical structures and pathologies via natural language prompts across CT, MRI, and other volumetric imaging modalities (DKFZ, 195+ stars, Apache 2.0)

Active1972 months ago
Python
Apache-2.0

Predicts the pKa values of ionizable groups in proteins and protein-ligand complexes based in the 3D structure.

Active3612 months ago
Python
LGPL-2.1

Deep learning-based variant caller

Active3.7K2 months ago
Python
BSD-3-Clause

Open-source implementation of AlphaEvolve's evolutionary coding agent paradigm, enabling LLMs to autonomously discover and optimize algorithms through iterative evolution, matching the approach behind DeepMind's breakthrough matrix multiplication discovery (6.2K+ stars, 2025)

Active6.4K2 months ago
Python
Apache-2.0

A Python package for protein dynamics analysis

Active5463 months ago
Python
NOASSERTION

Structure-aware protein language model using 3D structural vocabulary (Foldseek) for joint sequence-structure pretraining, achieving SOTA on protein engineering and fitness prediction benchmarks (ICML 2024, Westlake University & Repl)

Active6043 months ago
Python
MIT

Multimodal deep learning framework integrating peptide-MHC protein sequence, structure, and biochemical properties to predict class-I immunogenicity for infectious disease epitopes and cancer neoepitopes with cancer-wildtype contrastive learning, enabling personalized vaccine design (Krishnaswamy Lab, Yale University)

Active443 months ago
Python
NOASSERTION

Interactive personal genome analysis toolkit using Claude Code and Python. Parses raw genotyping data from consumer DNA services and analyzes SNPs across 17 categories including health risks, pharmacogenomics, ancestry, and nutrition, with a terminal-style HTML dashboard.

Active443 months ago
Python
MIT

Open-source platform for building, extending, and experimenting with scientific agents, providing modular agent construction tools and standardized evaluation pipelines for accelerating autonomous scientific discovery research (748+ stars, MIT License)

Active7483 months ago
Python
MIT

Deep learning library for Chemistry based on Tensorflow

Active6.8K3 months ago
Python
MIT

Deep learning library for solving PDEs

Active4.2K3 months ago
Python
LGPL-2.1

First benchmark evaluating LLMs' ability to rediscover scientific laws through interactive experimentation across 324 tasks in 12 physics domains, featuring memorization-resistant metaphysical shifts of canonical laws (HKUST)

Active1513 months ago
Python
MIT

GenBio AI's software stack for the AI-Driven Digital Organism, supporting adaptation and finetuning of multiscale biological foundation models across DNA, RNA, protein, structure, and single-cell tasks with reproducible CLIs and pretrained model zoo (2025)

Active1153 months ago
Python
NOASSERTION

FASTQ and SAM quality control using Python.

Active1093 months ago
Python
MIT

A library containing basis sets for use in quantum chemistry calculations. In addition, this library has functionality for manipulation of basis set data.

Active1993 months ago
Python
BSD-3-Clause

A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.

Active373 months ago
Python
MIT

Universal pretrained neural network potential with charge and magnetic moment awareness, trained on 1.5M+ Materials Project inorganic structures for charge-informed molecular dynamics and phase diagram prediction (Berkeley, Nature Machine Intelligence 2023 Cover)

Active3833 months ago
Python
NOASSERTION

Deep learning-based object detection and segmentation for star-convex shapes, widely adopted for cell and nucleus segmentation in fluorescence and electron microscopy via a compact neural network architecture with non-maximum suppression and shape-based post-processing (Nature Methods 2020, 1.2K+ stars)

Active1.2K3 months ago
Python
BSD-3-Clause

Rectified Quaternion Flow for efficient protein backbone generation, 37× faster than RFDiffusion with 0.972 designability (ICML 2025)

Active844 months ago
Python

Physics-informed neural networks

Active5.9K4 months ago
Python
MIT

Azure Semantic Kernel multi-agent PPT generation reference

Active494 months ago
Python
MIT

A package for working with nuclear magnetic resonance (NMR) data including functions for reading common binary file formats and processing NMR data.

Active2654 months ago
Python
BSD-3-Clause

File parser/converter for QM, MD and plane-wave DFT programs.

Active1634 months ago
Python
LGPL-3.0