Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

5,882 resources indexed

Showing 5,8515,882

Python library to train, interpret, and apply deep learning models to DNA sequences, providing a unified framework for regulatory genomics with support for CNN and transformer architectures, variant effect prediction, and attribution analysis (325+ stars)

Open-source bioimage analysis platform for digital pathology and research, featuring AI-powered cell detection, tissue classification, and whole-slide image analysis with extensible scripting and plugin architecture (1.3K+ stars, actively maintained)

Open-source image analysis toolkit for high-throughput plant phenotyping, extracting morphological, color, and texture traits from RGB, hyperspectral, and thermal imagery with modular Python workflows for crop improvement, stress detection, and plant biology research (Donald Danforth Plant Science Center, 795+ stars, MPL-2.0)

Standard data-centric AI package for data quality and machine learning, automatically detecting label errors, outliers, and dataset issues to improve scientific dataset reliability and model performance (11K+ stars, MIT License)

First versatile medical reasoning agent for chest X-ray interpretation, dynamically integrating state-of-the-art CXR analysis tools and multimodal LLMs into a unified framework; introduces ChestAgentBench with 2,500 complex medical queries across 7 categories (bowang-lab, 1.1K+ stars)

ekokrati computes habitat connectivity metrics (PC, IIC, EC(PC), dPC and its decomposition into intra-patch, flux and connector components) for habitat patch networks. Users upload polygon data as GeoPackage or shapefile, set species-specific dispersal parameters, and receive patch importance scores and landscape-level indices. Designed for conservation planners, landscape ecologists and environmental consultants. No installation required.

First open-source agentic AI physicist turning research questions into structured workflows with rigorous verification and multi-step analytical work for long-horizon physics projects; integrates with Claude Code, Codex, Gemini CLI, and OpenCode (804+ stars, Apache 2.0, 2026)

End-to-end composable multi-agent framework for automating OpenFOAM-based CFD simulations from natural language prompts, managing meshing, case setup, execution, error correction, and post-processing; achieves 100% success rate on 110 FoamBench tasks with Claude Opus 4.6 through Architect-Input Writer-Runner-Reviewer agent collaboration with RAG-enhanced generation and MCP tool integration (RPI CSML, 242+ stars, MIT License)

AI scientist framework for autonomous deep research in biological sciences, combining literature analysis agents with data scientist agents to enable iterative scientific discovery through user feedback integration; achieves state-of-the-art performance on BixBench benchmark (48.78% open-answer, 64.39% multiple-choice) outperforming Kepler and GPT-5 (bio-xyz, arXiv 2601.12542, 160+ stars, 2025-2026)

Verbex is a private, on-device Voice-to-ELN iOS app for scientists. It helps researchers capture experiment notes by voice as work happens, organize those notes into scientific sections, and prepare clean, reviewable, ELN-ready scientific records.

Full spaCy pipeline and models for scientific/biomedical documents, enabling named entity recognition, abbreviation resolution, and UMLS linking for scientific literature mining (1.9K+ stars, Apache 2.0)

Unified Python framework for extracellular electrophysiology, standardizing interfaces to 10+ ML-based spike sorting algorithms including Kilosort for reproducible neural spike sorting workflows (792+ stars, actively maintained)

Universal foundation model for grounded biomedical image interpretation, enabling comprehensive visual understanding, reasoning, and grounding across diverse biomedical imaging modalities with strong zero-shot generalization (55+ stars, Apache 2.0, 2025-2026)

Community-driven model zoo and deployment infrastructure for AI-powered bioimage analysis, enabling standardized sharing, validation, and cross-platform execution of deep learning models across Fiji, Ilastik, napari, and other scientific imaging tools (EPFL, EMBL, and global collaborators, actively maintained)

Open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives, enabling quantum algorithm development for quantum chemistry, materials science, and optimization research (IBM, 7.4K+ stars, Apache 2.0)

MITObim - mitochondrial baiting and iterative mapping

First fully autonomous open-ended scientific discovery system with official implementation: hypothesis→experiment→writing→review simulation (13.8K+ stars, 2024)

Decentralized self-organizing teams of AI agents for long-running computational scientific experimentation; agents critique each other's proposals before spending compute and share successes/failures to avoid redundant exploration, achieving +8.33% on BioML-Bench, 1.9× faster nanoGPT optimization, and +12.5% on ProteinGym ACE2-Spike (425+ stars, 2026)

First real quadrotor robot trained end-to-end with differentiable physics for vision-based agile flight, bridging simulation-based learning and real-world deployment with physics-informed neural network controllers (558+ stars)

The MetaProteomeAnalyzer Cloud (MPA Cloud) is an intuitive, open-source tool for metaproteomics data analysis and interpretation, designed to analyse comprehensive metaproteomics data from tandem mass spectrometry experiments through a web interface.

A comprehensive R package for identifying and ranking influential nodes in biological and other complex networks. The package implements the Integrated Value of Influence (IVI), Experimental data-based Integrative Ranking (ExIR), SIRIR, and numerous network centrality measures, enabling network topology analysis, influential node detection, feature prioritization, and candidate biomarker discovery. It also provides functions for network reconstruction, centrality assessment, visualization, and analysis of relationships between centrality measures.

Open Babel is a chemical toolbox designed to speak the many languages of chemical data.

RepEnrich is a method to estimate repetitive element enrichment using high-throughput sequencing data.

NOVOPlasty - The organelle assembler and heteroplasmy caller. NOVOPlasty is a de novo assembler and heteroplasmy/variance caller for short circular genomes..

adapter trimmer for Oxford Nanopore reads

Screen a bacterial assembly (contigs/CDS or proteins) for nucleotide or protein sequences. Pipeline that screens for presence of genes of interest (GOI) in bacterial assemblies. Generates multiple CSVs and plots that describe which genes are present and how variable their sequence is. Can use DNA or protein query sequences (GOIs) and DNA contigs/fastas or protein fastas as database (db) to search in.

maeparser is a parser for Schrodinger Maestro files.

Tandem repeat genotyping with long reads, being a modified version of HipSTR.

Unified benchmarking framework for protein representation learning, providing standardized interfaces for pre-training and diverse downstream tasks including structure prediction, fitness prediction, and property prediction across multiple protein datasets and model architectures (ICLR 2024, 273+ stars, MIT License)

Plans geometric serial dilution series for molecular biology and biochemistry workflows, rounding transfer volumes to declared pipette ranges and optional 96- or 384-well plate layouts. A browser calculator supports interactive protocol design; a Python client and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured step tables and shareable run identifiers.

State-of-the-art RNA 3D folding model developed with Stanford Das Lab and Kaggle competition winners, featuring a 488M-parameter AF3-like architecture with MSA and template-based modeling, enabling structure-driven drug discovery and RNA therapeutics design (NVIDIA-Digital-Bio, Apache 2.0)

Derives cells per well and suspension pipette volumes for standard 6-, 12-, 24-, 48-, 96-, and 384-well plates from a hemocytometer stock count, trypan blue viability, and target seeding confluency, with QC flags for low viability and impractical transfers. A browser calculator supports interactive planning with cell-line presets; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured plate tables and shareable run identifiers.

Processes 96-well plate absorbance data through blank subtraction, regression fitting, and dilution correction to report sample concentrations with QC flags for BCA, Bradford, and ELISA workflows. A browser calculator supports interactive grid entry with CSV and PDF export; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits plate layout and absorbance values and returns model comparison, per-sample concentrations, and shareable run identifiers.

Estimates PCR primer melting temperatures and polymerase-specific annealing temperatures from sequence and buffer inputs, with per-pair QC for hairpins, dimers, and Tm balance. A browser calculator supports interactive single-pair and batch entry (up to 200 pairs) with method comparison and export; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic for the API client is hosted remotely; sequences are transmitted for programmatic runs while the web interface performs calculations in the browser.

Mol Biology Tools is a free browser-based collection of molecular biology utilities for routine sequence analysis, primer design, Sanger sequencing primer planning, cloning setup, and wet-lab calculations. The site includes tools for PCR primer design, Sanger primer design and primer walking, primer binding checks, restriction site analysis, reverse complement generation, ORF and protein translation, codon optimization, ligation calculations, molarity calculations, dilution calculations, and multi-solute solution recipe preparation. The tools run in the browser and are intended for quick experimental planning without requiring logins or server-side sequence upload.

Plans PCR and qPCR master-mix reagent volumes from stock and final concentrations, reaction counts, and pipetting overage, with consolidated totals when several assays are prepared together. A browser calculator supports interactive recipe entry with printable bench sheets; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured volume tables, dilution warnings, and shareable run identifiers.

Computes laboratory solution preparation parameters—powder mass to weigh, stock and diluent volumes for single dilutions, and multi-step serial concentration tables—with correction for hydrated salts and supplier purity. A browser calculator supports interactive prep planning with saved recipes and shareable links; a Python client and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured protocol steps and shareable run identifiers.

Comprehensive set of programs for phylogenetic analyses; available for PC and Mac; source code available for easy compiling in UNIX.

103B-parameter open-source medical language model with 1/32 Mixture-of-Experts architecture, achieving HealthBench-leading performance among open-source models with only 6.1B active parameters; jointly developed by Ant Group and Zhejiang Province Health Information Center (MIT License)

Web application and service for visualizing small- to medium-scale models of gene regulatory networks. It automatically lays out either an unweighted or weighted network graph based on an Excel input spreadsheet containing an adjacency matrix where regulators are named in the columns and target genes in the rows. It is best-suited for visualizing networks of fewer than 35 nodes and 70 edges and has general applicability.

Large-scale flow-based protein backbone generator utilizing hierarchical fold class labels for conditioning with a tailored scalable transformer architecture, enabling controllable de novo protein design (264+ stars)

Large transformer-based single-cell foundation model pretrained on 50 million cells for robust gene network inference, expression denoising, cell embedding, and zero-shot label prediction, leveraging ESM2 protein embeddings and bidirectional transformer architecture (Cantini Lab, 148+ stars, GPL-3.0)

Multimodal AI bridging transcriptomics data and natural language, enabling intuitive chat-based exploration and analysis of single-cell RNA-seq datasets through conversational interaction without coding; fine-tuned Mistral 7B LLaVA model emulating biologist-bioinformatician discussions (207+ stars, GPL-3.0)