Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

3,202 of 5,923 resources

Showing 3,1513,200

The tool performs a DICOM quality check in terms of correct number of files per sequence, corrupted files, precise directory hierarchy, separated dynamic series merging them, interest series filtering/selection by specific series description lists and diffusion sequence identification by b-values. It applies the desired changes to the dataset and generates a report containing information about the selected sequences, corrupted files, missing files and merged files. Status: Deployed

VIP is a web portal for medical imaging applications. It allows users to access scientific applications as a service (directly through the web browser with no installation required), as well as distributed computing resources in a transparent manner. It exploits the resources available in the biomed virtual organization of the EGI e-infrastructure to offer an open service to researchers worldwide.

PyTorch-based embedding instance segmentation algorithm optimized for accurate, efficient, and portable cell and nucleus segmentation across fluorescence and brightfield microscopy images, achieving state-of-the-art speed and accuracy with lightweight model sizes suitable for edge deployment (224+ stars, Apache 2.0)

Deep Graph Library for scalable deep learning on graphs, powering molecular modeling, materials discovery, protein interaction networks, and scientific knowledge graph learning across PyTorch, TensorFlow, and MXNet backends (14K+ stars)

Membrane Protein-Lipid Interaction Database. A large-scale experimentally validated dataset of 80685 residue-level lipid contact annotations across 4712 membrane proteins derived from PDB crystal and cryo-EM structures. Provides pre-computed binary contact labels, continuous distance values, sequence-identity-based cluster assignments, and ready-made train-validation-test splits for machine learning.

Advanced paper search agent powered by large language models, autonomously invoking search tools, reading papers, and selecting references to deliver comprehensive and accurate results for complex scholarly queries (1.5K+ stars, Apache 2.0, 2024)

Python library to train, interpret, and apply deep learning models to DNA sequences, providing a unified framework for regulatory genomics with support for CNN and transformer architectures, variant effect prediction, and attribution analysis (325+ stars)

Open-source bioimage analysis platform for digital pathology and research, featuring AI-powered cell detection, tissue classification, and whole-slide image analysis with extensible scripting and plugin architecture (1.3K+ stars, actively maintained)

Open-source image analysis toolkit for high-throughput plant phenotyping, extracting morphological, color, and texture traits from RGB, hyperspectral, and thermal imagery with modular Python workflows for crop improvement, stress detection, and plant biology research (Donald Danforth Plant Science Center, 795+ stars, MPL-2.0)

Standard data-centric AI package for data quality and machine learning, automatically detecting label errors, outliers, and dataset issues to improve scientific dataset reliability and model performance (11K+ stars, MIT License)

First versatile medical reasoning agent for chest X-ray interpretation, dynamically integrating state-of-the-art CXR analysis tools and multimodal LLMs into a unified framework; introduces ChestAgentBench with 2,500 complex medical queries across 7 categories (bowang-lab, 1.1K+ stars)

ekokrati computes habitat connectivity metrics (PC, IIC, EC(PC), dPC and its decomposition into intra-patch, flux and connector components) for habitat patch networks. Users upload polygon data as GeoPackage or shapefile, set species-specific dispersal parameters, and receive patch importance scores and landscape-level indices. Designed for conservation planners, landscape ecologists and environmental consultants. No installation required.

First open-source agentic AI physicist turning research questions into structured workflows with rigorous verification and multi-step analytical work for long-horizon physics projects; integrates with Claude Code, Codex, Gemini CLI, and OpenCode (804+ stars, Apache 2.0, 2026)

End-to-end composable multi-agent framework for automating OpenFOAM-based CFD simulations from natural language prompts, managing meshing, case setup, execution, error correction, and post-processing; achieves 100% success rate on 110 FoamBench tasks with Claude Opus 4.6 through Architect-Input Writer-Runner-Reviewer agent collaboration with RAG-enhanced generation and MCP tool integration (RPI CSML, 242+ stars, MIT License)

AI scientist framework for autonomous deep research in biological sciences, combining literature analysis agents with data scientist agents to enable iterative scientific discovery through user feedback integration; achieves state-of-the-art performance on BixBench benchmark (48.78% open-answer, 64.39% multiple-choice) outperforming Kepler and GPT-5 (bio-xyz, arXiv 2601.12542, 160+ stars, 2025-2026)

Verbex is a private, on-device Voice-to-ELN iOS app for scientists. It helps researchers capture experiment notes by voice as work happens, organize those notes into scientific sections, and prepare clean, reviewable, ELN-ready scientific records.

Full spaCy pipeline and models for scientific/biomedical documents, enabling named entity recognition, abbreviation resolution, and UMLS linking for scientific literature mining (1.9K+ stars, Apache 2.0)

Unified Python framework for extracellular electrophysiology, standardizing interfaces to 10+ ML-based spike sorting algorithms including Kilosort for reproducible neural spike sorting workflows (792+ stars, actively maintained)

Universal foundation model for grounded biomedical image interpretation, enabling comprehensive visual understanding, reasoning, and grounding across diverse biomedical imaging modalities with strong zero-shot generalization (55+ stars, Apache 2.0, 2025-2026)

Community-driven model zoo and deployment infrastructure for AI-powered bioimage analysis, enabling standardized sharing, validation, and cross-platform execution of deep learning models across Fiji, Ilastik, napari, and other scientific imaging tools (EPFL, EMBL, and global collaborators, actively maintained)

Open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives, enabling quantum algorithm development for quantum chemistry, materials science, and optimization research (IBM, 7.4K+ stars, Apache 2.0)

Decentralized self-organizing teams of AI agents for long-running computational scientific experimentation; agents critique each other's proposals before spending compute and share successes/failures to avoid redundant exploration, achieving +8.33% on BioML-Bench, 1.9× faster nanoGPT optimization, and +12.5% on ProteinGym ACE2-Spike (425+ stars, 2026)

First real quadrotor robot trained end-to-end with differentiable physics for vision-based agile flight, bridging simulation-based learning and real-world deployment with physics-informed neural network controllers (558+ stars)

The MetaProteomeAnalyzer Cloud (MPA Cloud) is an intuitive, open-source tool for metaproteomics data analysis and interpretation, designed to analyse comprehensive metaproteomics data from tandem mass spectrometry experiments through a web interface.

A comprehensive R package for identifying and ranking influential nodes in biological and other complex networks. The package implements the Integrated Value of Influence (IVI), Experimental data-based Integrative Ranking (ExIR), SIRIR, and numerous network centrality measures, enabling network topology analysis, influential node detection, feature prioritization, and candidate biomarker discovery. It also provides functions for network reconstruction, centrality assessment, visualization, and analysis of relationships between centrality measures.

Open Babel is a chemical toolbox designed to speak the many languages of chemical data.

RepEnrich is a method to estimate repetitive element enrichment using high-throughput sequencing data.

NOVOPlasty - The organelle assembler and heteroplasmy caller. NOVOPlasty is a de novo assembler and heteroplasmy/variance caller for short circular genomes..

adapter trimmer for Oxford Nanopore reads

Screen a bacterial assembly (contigs/CDS or proteins) for nucleotide or protein sequences. Pipeline that screens for presence of genes of interest (GOI) in bacterial assemblies. Generates multiple CSVs and plots that describe which genes are present and how variable their sequence is. Can use DNA or protein query sequences (GOIs) and DNA contigs/fastas or protein fastas as database (db) to search in.

maeparser is a parser for Schrodinger Maestro files.

Tandem repeat genotyping with long reads, being a modified version of HipSTR.

Plans geometric serial dilution series for molecular biology and biochemistry workflows, rounding transfer volumes to declared pipette ranges and optional 96- or 384-well plate layouts. A browser calculator supports interactive protocol design; a Python client and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured step tables and shareable run identifiers.

State-of-the-art RNA 3D folding model developed with Stanford Das Lab and Kaggle competition winners, featuring a 488M-parameter AF3-like architecture with MSA and template-based modeling, enabling structure-driven drug discovery and RNA therapeutics design (NVIDIA-Digital-Bio, Apache 2.0)

Processes 96-well plate absorbance data through blank subtraction, regression fitting, and dilution correction to report sample concentrations with QC flags for BCA, Bradford, and ELISA workflows. A browser calculator supports interactive grid entry with CSV and PDF export; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits plate layout and absorbance values and returns model comparison, per-sample concentrations, and shareable run identifiers.

Estimates PCR primer melting temperatures and polymerase-specific annealing temperatures from sequence and buffer inputs, with per-pair QC for hairpins, dimers, and Tm balance. A browser calculator supports interactive single-pair and batch entry (up to 200 pairs) with method comparison and export; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic for the API client is hosted remotely; sequences are transmitted for programmatic runs while the web interface performs calculations in the browser.

Mol Biology Tools is a free browser-based collection of molecular biology utilities for routine sequence analysis, primer design, Sanger sequencing primer planning, cloning setup, and wet-lab calculations. The site includes tools for PCR primer design, Sanger primer design and primer walking, primer binding checks, restriction site analysis, reverse complement generation, ORF and protein translation, codon optimization, ligation calculations, molarity calculations, dilution calculations, and multi-solute solution recipe preparation. The tools run in the browser and are intended for quick experimental planning without requiring logins or server-side sequence upload.

Plans PCR and qPCR master-mix reagent volumes from stock and final concentrations, reaction counts, and pipetting overage, with consolidated totals when several assays are prepared together. A browser calculator supports interactive recipe entry with printable bench sheets; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured volume tables, dilution warnings, and shareable run identifiers.

Computes laboratory solution preparation parameters—powder mass to weigh, stock and diluent volumes for single dilutions, and multi-step serial concentration tables—with correction for hydrated salts and supplier purity. A browser calculator supports interactive prep planning with saved recipes and shareable links; a Python client and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured protocol steps and shareable run identifiers.

Comprehensive set of programs for phylogenetic analyses; available for PC and Mac; source code available for easy compiling in UNIX.

103B-parameter open-source medical language model with 1/32 Mixture-of-Experts architecture, achieving HealthBench-leading performance among open-source models with only 6.1B active parameters; jointly developed by Ant Group and Zhejiang Province Health Information Center (MIT License)

Web application and service for visualizing small- to medium-scale models of gene regulatory networks. It automatically lays out either an unweighted or weighted network graph based on an Excel input spreadsheet containing an adjacency matrix where regulators are named in the columns and target genes in the rows. It is best-suited for visualizing networks of fewer than 35 nodes and 70 edges and has general applicability.

Large-scale flow-based protein backbone generator utilizing hierarchical fold class labels for conditioning with a tailored scalable transformer architecture, enabling controllable de novo protein design (264+ stars)

Large transformer-based single-cell foundation model pretrained on 50 million cells for robust gene network inference, expression denoising, cell embedding, and zero-shot label prediction, leveraging ESM2 protein embeddings and bidirectional transformer architecture (Cantini Lab, 148+ stars, GPL-3.0)

Multimodal AI bridging transcriptomics data and natural language, enabling intuitive chat-based exploration and analysis of single-cell RNA-seq datasets through conversational interaction without coding; fine-tuned Mistral 7B LLaVA model emulating biologist-bioinformatician discussions (207+ stars, GPL-3.0)

FlavoTyper is a bioinformatics tool that performs in silico serotyping of Flavobacterium psychrophilum genome assemblies.

MONAI Label is an intelligent open source image labeling and learning tool that enables users to create annotated datasets and build AI annotation models for clinical evaluation. MONAI Label enables application developers to build labeling apps in a serverless way, where custom labeling apps are exposed as a service through the MONAI Label Server.

Controllable foundation model for general and specialized biomolecular structure prediction across proteins, nucleic acids, and complexes, featuring a public web server for interactive prediction workflows (IntelliGen AI, 223+ stars, Apache 2.0, 2025)

PyTorch-native atomistic simulation engine for the machine-learned interatomic potential (MLIP) era, enabling batched molecular dynamics and structural relaxation with automatic GPU memory management; supports MACE, Fairchem, SevenNet, ORB, MatterSim and other popular MLIPs with up to 100x speedup over ASE (Radical AI, AI for Science 2026, 468+ stars, MIT License)

The ProteinsPlus web server aims to support life scientists in working with protein structures. Protein structures are the key to understanding protein function. They are an important resource in many biotechnological application areas from pharmaceutical research to biocatalysis. ProteinsPlus focuses on protein-ligand interactions. The server provides support for the initial steps of dealing with protein structures, namely structure search, quality assessment, and preprocessing. JAMDA enables users to perform an on-the-fly molecular docking of up to five molecules. The poses can then be visualized in 2D (PoseView, PoseEdit). Furthermore, advanced options, such as protein pocket detection (DoGSite), prediction of water molecule positions (WarPP), protein structure ensemble generation (SIENA), prediction of metal coordination (METALizer), the analysis of solvent channels in protein crystals (LifeSoaks), or the categorization of protein-protein-interfaces (HyPPI) are supported.