Find open-source science resources

Python library to train, interpret, and apply deep learning models to DNA sequences, providing a unified framework for regulatory genomics with support for CNN and transformer architectures, variant effect prediction, and attribution analysis (325+ stars)

QuPath

Open-source bioimage analysis platform for digital pathology and research, featuring AI-powered cell detection, tissue classification, and whole-slide image analysis with extensible scripting and plugin architecture (1.3K+ stars, actively maintained)

PlantCV

Agricultural AI

Weak Supervision & Auto-Labeling

Open-source image analysis toolkit for high-throughput plant phenotyping, extracting morphological, color, and texture traits from RGB, hyperspectral, and thermal imagery with modular Python workflows for crop improvement, stress detection, and plant biology research (Donald Danforth Plant Science Center, 795+ stars, MPL-2.0)

Cleanlab

Standard data-centric AI package for data quality and machine learning, automatically detecting label errors, outliers, and dataset issues to improve scientific dataset reliability and model performance (11K+ stars, MIT License)

MedRAX (ICML 2025)

First versatile medical reasoning agent for chest X-ray interpretation, dynamically integrating state-of-the-art CXR analysis tools and multimodal LLMs into a unified framework; introduces ChestAgentBench with 2,500 complex medical queries across 7 categories (bowang-lab, 1.1K+ stars)

ekokrati

Ecology

Domain-Specific Research Agents

ekokrati computes habitat connectivity metrics (PC, IIC, EC(PC), dPC and its decomposition into intra-patch, flux and connector components) for habitat patch networks. Users upload polygon data as GeoPackage or shapefile, set species-specific dispersal parameters, and receive patch importance scores and landscape-level indices. Designed for conservation planners, landscape ecologists and environmental consultants. No installation required.

Proprietary

Get Physics Done (PSI)

Domain-Specific Research Agents

First open-source agentic AI physicist turning research questions into structured workflows with rigorous verification and multi-step analytical work for long-horizon physics projects; integrates with Claude Code, Codex, Gemini CLI, and OpenCode (804+ stars, Apache 2.0, 2026)

Foam-Agent (NeurIPS 2025)

Domain-Specific Research Agents

End-to-end composable multi-agent framework for automating OpenFOAM-based CFD simulations from natural language prompts, managing meshing, case setup, execution, error correction, and post-processing; achieves 100% success rate on 110 FoamBench tasks with Claude Opus 4.6 through Architect-Input Writer-Runner-Reviewer agent collaboration with RAG-enhanced generation and MCP tool integration (RPI CSML, 242+ stars, MIT License)

BioAgents

AI scientist framework for autonomous deep research in biological sciences, combining literature analysis agents with data scientist agents to enable iterative scientific discovery through user feedback integration; achieves state-of-the-art performance on BixBench benchmark (48.78% open-answer, 64.39% multiple-choice) outperforming Kepler and GPT-5 (bio-xyz, arXiv 2601.12542, 160+ stars, 2025-2026)

Verbex

Scientific Text Processing & NLP

Verbex is a private, on-device Voice-to-ELN iOS app for scientists. It helps researchers capture experiment notes by voice as work happens, organize those notes into scientific sections, and prepare clean, reviewable, ELN-ready scientific records.

Proprietary

scispacy (AllenAI)

Neuroscience & Behavioral Analysis

Full spaCy pipeline and models for scientific/biomedical documents, enabling named entity recognition, abbreviation resolution, and UMLS linking for scientific literature mining (1.9K+ stars, Apache 2.0)

SpikeInterface

Unified Python framework for extracellular electrophysiology, standardizing interfaces to 10+ ML-based spike sorting algorithms including Kilosort for reproducible neural spike sorting workflows (792+ stars, actively maintained)

UniBiomed (Nature Communications 2026)

Universal foundation model for grounded biomedical image interpretation, enabling comprehensive visual understanding, reasoning, and grounding across diverse biomedical imaging modalities with strong zero-shot generalization (55+ stars, Apache 2.0, 2025-2026)

BioImage.IO

Community-driven model zoo and deployment infrastructure for AI-powered bioimage analysis, enabling standardized sharing, validation, and cross-platform execution of deep learning models across Fiji, Ilastik, napari, and other scientific imaging tools (EPFL, EMBL, and global collaborators, actively maintained)

Qiskit

Specialized Frameworks

Open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives, enabling quantum algorithm development for quantum chemistry, materials science, and optimization research (IBM, 7.4K+ stars, Apache 2.0)

MITObim

Sequencing

MITObim - mitochondrial baiting and iterative mapping

Autonomous Research Systems (2023-2025 Breakthroughs)

The AI Scientist (SakanaAI)

Autonomous Research Systems (2023-2025 Breakthroughs)

First fully autonomous open-ended scientific discovery system with official implementation: hypothesis→experiment→writing→review simulation (13.8K+ stars, 2024)

AutoScientists (Harvard MIMS, arXiv 2026)

Machine Learning for Physics

Decentralized self-organizing teams of AI agents for long-running computational scientific experimentation; agents critique each other's proposals before spending compute and share successes/failures to avoid redundant exploration, achieving +8.33% on BioML-Bench, 1.9× faster nanoGPT optimization, and +12.5% on ProteinGym ACE2-Spike (425+ stars, 2026)

DiffPhysDrone (Nature Machine Intelligence 2025)

First real quadrotor robot trained end-to-end with differentiable physics for vision-based agile flight, bridging simulation-based learning and real-world deployment with physics-informed neural network controllers (558+ stars)

MetaProteomeAnalyzer-Cloud

Proteins

The MetaProteomeAnalyzer Cloud (MPA Cloud) is an intuitive, open-source tool for metaproteomics data analysis and interpretation, designed to analyse comprehensive metaproteomics data from tandem mass spectrometry experiments through a web interface.

Java

GPL-3.0

influential

Computational biology

A comprehensive R package for identifying and ranking influential nodes in biological and other complex networks. The package implements the Integrated Value of Influence (IVI), Experimental data-based Integrative Ranking (ExIR), SIRIR, and numerous network centrality measures, enabling network topology analysis, influential node detection, feature prioritization, and candidate biomarker discovery. It also provides functions for network reconstruction, centrality assessment, visualization, and analysis of relationships between centrality measures.

GPL-3.0

openbabel

Chemistry

Open Babel is a chemical toolbox designed to speak the many languages of chemical data.

GPL-2.0

RepEnrich

Sequence analysis

RepEnrich is a method to estimate repetitive element enrichment using high-throughput sequencing data.

Other

NOVOPlasty

Sequence assembly

NOVOPlasty - The organelle assembler and heteroplasmy caller. NOVOPlasty is a de novo assembler and heteroplasmy/variance caller for short circular genomes..

Perl

Other

Porechop

adapter trimmer for Oxford Nanopore reads

GPL-3.0

screen_assembly

Sequence assembly

Screen a bacterial assembly (contigs/CDS or proteins) for nucleotide or protein sequences. Pipeline that screens for presence of genes of interest (GOI) in bacterial assemblies. Generates multiple CSVs and plots that describe which genes are present and how variable their sequence is. Can use DNA or protein query sequences (GOIs) and DNA contigs/fastas or protein fastas as database (db) to search in.

maeparser

Structural biology

maeparser is a parser for Schrodinger Maestro files.

C++

LongTR

Comparative genomics

Tandem repeat genotyping with long reads, being a modified version of HipSTR.

GPL-2.0

ProteinWorkshop

Biology & Medicine

Unified benchmarking framework for protein representation learning, providing standardized interfaces for pre-training and diverse downstream tasks including structure prediction, fitness prediction, and property prediction across multiple protein datasets and model architectures (ICLR 2024, 273+ stars, MIT License)

Pepkio Serial Dilution Planner

Plans geometric serial dilution series for molecular biology and biochemistry workflows, rounding transfer volumes to declared pipette ranges and optional 96- or 384-well plate layouts. A browser calculator supports interactive protocol design; a Python client and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured step tables and shareable run identifiers.

RNAPro (NVIDIA, 2026)

State-of-the-art RNA 3D folding model developed with Stanford Das Lab and Kaggle competition winners, featuring a 488M-parameter AF3-like architecture with MSA and template-based modeling, enabling structure-driven drug discovery and RNA therapeutics design (NVIDIA-Digital-Bio, Apache 2.0)

Pepkio Cell Seeding Density Calculator

Derives cells per well and suspension pipette volumes for standard 6-, 12-, 24-, 48-, 96-, and 384-well plates from a hemocytometer stock count, trypan blue viability, and target seeding confluency, with QC flags for low viability and impractical transfers. A browser calculator supports interactive planning with cell-line presets; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured plate tables and shareable run identifiers.

Pepkio Standard Curve Calculator

Processes 96-well plate absorbance data through blank subtraction, regression fitting, and dilution correction to report sample concentrations with QC flags for BCA, Bradford, and ELISA workflows. A browser calculator supports interactive grid entry with CSV and PDF export; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits plate layout and absorbance values and returns model comparison, per-sample concentrations, and shareable run identifiers.

Pepkio Tm Annealing Temperature Calculator

Probes and primers

Estimates PCR primer melting temperatures and polymerase-specific annealing temperatures from sequence and buffer inputs, with per-pair QC for hairpins, dimers, and Tm balance. A browser calculator supports interactive single-pair and batch entry (up to 200 pairs) with method comparison and export; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic for the API client is hosted remotely; sequences are transmitted for programmatic runs while the web interface performs calculations in the browser.

Mol Biology Tools

Mol Biology Tools is a free browser-based collection of molecular biology utilities for routine sequence analysis, primer design, Sanger sequencing primer planning, cloning setup, and wet-lab calculations. The site includes tools for PCR primer design, Sanger primer design and primer walking, primer binding checks, restriction site analysis, reverse complement generation, ORF and protein translation, codon optimization, ligation calculations, molarity calculations, dilution calculations, and multi-solute solution recipe preparation. The tools run in the browser and are intended for quick experimental planning without requiring logins or server-side sequence upload.

JavaScript

Proprietary

Pepkio PCR Master Mix Calculator

Plans PCR and qPCR master-mix reagent volumes from stock and final concentrations, reaction counts, and pipetting overage, with consolidated totals when several assays are prepared together. A browser calculator supports interactive recipe entry with printable bench sheets; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured volume tables, dilution warnings, and shareable run identifiers.

Pepkio Molarity Solution Calculator

Computes laboratory solution preparation parameters—powder mass to weigh, stock and diluent volumes for single dilutions, and multi-step serial concentration tables—with correction for hydrated salts and supplier purity. A browser calculator supports interactive prep planning with saved recipes and shareable links; a Python client and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured protocol steps and shareable run identifiers.

PHYLIP

Phylogenetics

Comprehensive set of programs for phylogenetic analyses; available for PC and Mac; source code available for easy compiling in UNIX.

BSD-3-Clause

AntAngelMed (2026)

Domain-Specific Models

Molecular interactions, pathways and networks

103B-parameter open-source medical language model with 1/32 Mixture-of-Experts architecture, achieving HealthBench-leading performance among open-source models with only 6.1B active parameters; jointly developed by Ant Group and Zhejiang Province Health Information Center (MIT License)

GRNsight

Web application and service for visualizing small- to medium-scale models of gene regulatory networks. It automatically lays out either an unweighted or weighted network graph based on an Excel input spreadsheet containing an adjacency matrix where regulators are named in the columns and target genes in the rows. It is best-suited for visualizing networks of fewer than 35 nodes and 70 edges and has general applicability.

JavaScript

BSD-3-Clause

Proteina (NVIDIA, ICLR 2025 Oral)

Protein & Drug Discovery

Large-scale flow-based protein backbone generator utilizing hierarchical fold class labels for conditioning with a tailored scalable transformer architecture, enabling controllable de novo protein design (264+ stars)

scPRINT (Nature Communications 2025)

Large transformer-based single-cell foundation model pretrained on 50 million cells for robust gene network inference, expression denoising, cell embedding, and zero-shot label prediction, leveraging ESM2 protein embeddings and bidirectional transformer architecture (Cantini Lab, 148+ stars, GPL-3.0)

CellWhisperer (Nature Biotechnology 2025)