Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active329
Idle126
Stale99
Archived5
(None)156

Domain

Software96
SingleCell27
Protein & Drug Discovery26
GeneExpression22
ImmunoOncology19
DataImport15
Genomics & Bioinformatics14
Autonomous Research Systems (2023-2025 Breakthroughs)13
Visualization12
Infrastructure10
Machine Learning10
Sequencing10
(None)22

Language

R390
Python198
Jupyter Notebook36
C++10
C8
JavaScript8
TypeScript7
Go6
HTML6
Shell6
Julia3
Nextflow3
(None)18

License(1)

MIT715
GPL-3.0656
Artistic-2.0554
CC-BY-4.0262
GPL-2.0254
GPL-2.0+245
Apache-2.0228
NOASSERTION167
CC0-1.0114
GPL-3.0+99
CC-BY-3.079
BSD-3-Clause78
(None)2427

Source

github563
bioconductor386
awesome-ai-for-science170
bio.tools53
awesome-bioinformatics43
awesome-python-chemistry35
bioregistry21
awesome-cheminformatics12
awesome-scientific-python2

Type

Software tool694
Database21

Filters

Health

Active329
Idle126
Stale99
Archived5
(None)156

Domain

Software96
SingleCell27
Protein & Drug Discovery26
GeneExpression22
ImmunoOncology19
DataImport15
Genomics & Bioinformatics14
Autonomous Research Systems (2023-2025 Breakthroughs)13
Visualization12
Infrastructure10
Machine Learning10
Sequencing10
(None)22

Language

R390
Python198
Jupyter Notebook36
C++10
C8
JavaScript8
TypeScript7
Go6
HTML6
Shell6
Julia3
Nextflow3
(None)18

License(1)

MIT715
GPL-3.0656
Artistic-2.0554
CC-BY-4.0262
GPL-2.0254
GPL-2.0+245
Apache-2.0228
NOASSERTION167
CC0-1.0114
GPL-3.0+99
CC-BY-3.079
BSD-3-Clause78
(None)2427

Source

github563
bioconductor386
awesome-ai-for-science170
bio.tools53
awesome-bioinformatics43
awesome-python-chemistry35
bioregistry21
awesome-cheminformatics12
awesome-scientific-python2

Type

Software tool694
Database21

715 of 6,366 resources

Showing 151–200

ScholarAIO

Interactive Research Environments

Agent-agnostic research infrastructure providing AI agents with a structured scientific workspace for deep PDF parsing, hybrid semantic/keyword literature search, citation-graph analysis, topic discovery, and academic writing workflows; natively integrates with Claude Code, Codex, Cursor, Cline, and AgentSkills.io (530+ stars, MIT License, 2026)

Active★5301 month ago

fgsea

The package implements an algorithm for fast gene set enrichment analysis. Using the fast algorithm allows to make more permutations and get more fine grained p-values, which allows to use accurate stantard approaches to multiple hypothesis correction.

Active★4451 month ago

sfi

MassSpectrometry

Data analysis for Single File Injections(SFIs) mode LC-MS analysis. In SFIs mode, pooled samples are initially injected to serve as reference peaks for subsequent analyses. Repeated injections of individual samples are then performed at fixed time intervals using isocratic elution. This package provides the functions to analyze data from SFIs mode including peak picking and peak reassignment.

Active★11 month ago

Bedtools2

GFF BED File Utilities

A Swiss Army knife for genome arithmetic.

Active★1K1 month ago

SpaceTrooper

SpaceTrooper performs Quality Control analysis using data driven GLM models of Image-Based spatial data, providing exploration plots, QC metrics computation, outlier detection. It implements a GLM strategy for the detection of low quality cells in imaging-based spatial data (Transcriptomics and Proteomics). It additionally implements several plots for the visualization of imaging based polygons through the ggplot2 package.

Active★111 month ago

PhiFlow

Neural Operators & Model Discovery

Differentiable PDE solving framework for machine learning with built-in fluid simulation, supporting PyTorch/JAX/TensorFlow backends and enabling neural network training within physical simulations (TUM, MIT License)

Active★1.9K1 month ago

(Poly)merase

A Go library and command line utility for engineering organisms.

Active★7311 month ago

Pepkio Knowledge Explorer: Single-Cell Long-Read RNA Sequencing

A static web application presents an interactive knowledge graph of single-cell long-read RNA sequencing literature synthesized from seven source papers. Users navigate mind-tree, network graph, guided learning-path, and Sankey views linking platforms, protocols, methods, and software. A benchmark tab provides 34 question-answer pairs with category and difficulty filters, exportable as JSON or CSV for LLM and agent evaluation.

Active★21 month ago

Phylo-Movies

Phylo-Movies is an open-source React and Flask web application, also available as a desktop app, for inspecting ordered phylogenetic tree series. It computes and visualizes subtree-prune-and-regraft transition frames between consecutive trees, helping users see which taxa or subtrees move across sliding-window analyses, bootstrap replicates, and curated tree-series comparisons. The viewer includes timeline playback, tree comparison, MSA context, coloring, analytics, image export, and recording tools.

Active★11 month ago

rhinotypeR

"rhinotypeR" is designed to automate the comparison of sequence data against prototype strains, streamlining the genotype assignment process. By implementing predefined pairwise distance thresholds, this package makes genotype assignment accessible to researchers and public health professionals. This tool enhances our epidemiological toolkit by enabling more efficient surveillance and analysis of rhinoviruses (RVs) and other viral pathogens with complex genomic landscapes. Additionally, "rhinotypeR" supports comprehensive visualization and analysis of single nucleotide polymorphisms (SNPs) and amino acid substitutions, facilitating in-depth genetic and evolutionary studies.

Active★41 month ago

ChromBPNet

Transcription factors and regulatory sites

Bias factorized, base-resolution deep learning models of chromatin accessibility (chromBPNet).

Active★2341 month ago

Jupyter Notebook

Paper2Poster

Poster Generation

Multi-agent system with Parser-Planner-Painter architecture converting `paper.pdf` to editable `poster.pptx`, outperforms GPT-4o with 87% fewer tokens

Active★3.8K1 month ago

ScienceClaw

Autonomous Research Systems (2023-2025 Breakthroughs)

Self-evolving AI research colleague built on OpenClaw with 285+ runtime-adaptive skills across 28+ disciplines, persistent cross-session research memory, and zero-hallucination citation protocols; agent autonomously writes new SKILL.md files based on research patterns without redeployment (828+ stars, MIT License, 2026)

Active★8641 month ago

IGV js

Genome Browsers / Gene Diagrams

Java-based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats.

Active★7431 month ago

crisprDesign

Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.

Active★311 month ago

dbbuilder

Galaxy Tool Shed repositories maintained and developed by the GalaxyP community

Active★361 month ago

Microsoft Biodiversity

Ecological Modeling

Microsoft AI for Good Lab's open-source biodiversity research hub providing AI models, edge devices, and tools for wildlife monitoring and conservation, including MegaDetector (camera trap animal detection), SPARROW (species recognition), PytorchWildlife (conservation AI toolkit), and bioacoustics analysis pipelines (1K+ stars)

Active★1K1 month ago

MoleCode

LLM for Chemistry

LLM-native molecular language that represents molecules as explicit graph-based code, enabling LLMs to operate and reason on chemistry directly with 5× lower token cost and ~76-80% accuracy on novel molecules vs ~20% for SMILES; supports small molecules, polymers, and Markush structures with lossless RDKit interconversion and Claude Code/Codex agent skills (AtomFlow, arXiv:2605.16480, 281+ stars, MIT License, 2026)

Active★2811 month ago

AutoResearchClaw

Autonomous Research Systems (2023-2025 Breakthroughs)

Fully autonomous research from idea to paper with multi-agent debate, citation verification, and OpenClaw integration (11K+ stars, 2026)

Active★13.5K1 month ago

MolecularGraph.jl

General Purpose

A graph-based molecule modeling and chemoinformatics analysis toolkit fully implemented in Julia

Active★2241 month ago

Foam-Agent (NeurIPS 2025)

Domain-Specific Research Agents

End-to-end composable multi-agent framework for automating OpenFOAM-based CFD simulations from natural language prompts, managing meshing, case setup, execution, error correction, and post-processing; achieves 100% success rate on 110 FoamBench tasks with Claude Opus 4.6 through Architect-Input Writer-Runner-Reviewer agent collaboration with RAG-enhanced generation and MCP tool integration (RPI CSML, 242+ stars, MIT License)

Active★2821 month ago

Lean Copilot

Domain-Specific Research Agents

LLMs as copilots for theorem proving in Lean 4, exposing native tactics (`suggest_tactics`, `search_proof`, `select_premises`) that embed language model inference and premise retrieval directly inside the Lean proof environment, supporting local CTranslate2/CUDA inference as well as remote model APIs for interactive and automated proof search (Caltech & NVIDIA, NeurIPS 2024, 1.2K+ stars)

Active★1.3K1 month ago

plyxp

The package provides `rlang` data masks for the SummarizedExperiment class. The enables the evaluation of unquoted expression in different contexts of the SummarizedExperiment object with optional access to other contexts. The goal for `plyxp` is for evaluation to feel like a data.frame object without ever needing to unwind to a rectangular data.frame.

Active★91 month ago

SwitchCraft

Protein & Drug Discovery

Programmatic framework for designing state-switching proteins via backpropagation through compositional design constraints parameterized by structure prediction models; enables de novo design of allosteric regulators and fluorescent biosensors for arbitrary small-molecule analytes (79+ stars, MIT License, ICML 2026)

Active★791 month ago

exponax

Neural Operators & Model Discovery

Efficient differentiable n-dimensional PDE solvers built on JAX and Equinox, shipping 46+ built-in equations with Fourier spectral methods, exponential time differencing, and full auto-differentiation for physics-based deep learning workflows (MIT, 200+ stars, 2024)

Active★2101 month ago

Jupyter Notebook

PhyloProfile

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Active★381 month ago

hifiasm

Long-read Assembly

A haplotype-resolved assembler for accurate Hifi reads.

Active★7891 month ago

psichomics

Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), Sequence Read Archive (SRA) and user-provided data. The tool interactively performs survival, dimensionality reduction and median- and variance-based differential splicing and gene expression analyses that benefit from the incorporation of clinical and molecular sample-associated features (such as tumour stage or survival). Interactive visual access to genomic mapping and functional annotation of selected alternative splicing events is also included.

Active★371 month ago

ImageArray

ImageArray provides a framework for on-disk and in-memory image arrays, specifically for pyramidal images stored in HDF5, Zarr and life sciences image file formats (OME Bio-Formats).

Active★61 month ago

TenDNA DNA Format Converter

Tool for converting raw DNA data files between 23andMe, AncestryDNA, MyHeritage, and FamilyTreeDNA formats.

Active★21 month ago

ChemInformant

Database Wrappers

High-throughput PubChem client for batch queries with caching, validation, rate-limit-aware retries, and a simple CLI.

Active★501 month ago

Boltz

Protein & Drug Discovery

First fully open-source model achieving AlphaFold3-level accuracy with 1000x faster binding affinity prediction (MIT)

Active★4.1K1 month ago

Allegro

Materials Discovery

Highly scalable equivariant deep learning interatomic potentials enabling million-atom molecular dynamics simulations with ab initio accuracy, building on E(3)-equivariant architectures for large-scale atomistic modeling (mir-group, MIT License, 480+ stars)

Active★4922 months ago

gVenn

Tools to compute and visualize overlaps between gene sets or genomic regions. Venn diagrams with proportional areas are provided, while UpSet plots are recommended for larger numbers of sets. The package supports GRanges and GRangesList inputs, and integrates with analysis workflows for ChIP-seq, ATAC-seq, and other genomic interval data. It generates clean, interpretable, and publication-ready figures.

Active★22 months ago

IgGM

Protein & Drug Discovery

Generative foundation model for functional antibody and nanobody design, supporting de novo generation, affinity maturation, inverse design, structure prediction, and humanization (Tencent AI4S, ICLR 2025)

Active★2012 months ago

AION (arXiv 2025)

Astronomy & Astrophysics

Polymathic AI's large omnimodal foundation model for astronomical surveys, seamlessly integrating 39 distinct data modalities including imaging, spectra, photometry, and catalog entries for similarity search, property prediction, and generative modeling across legacy surveys (MIT)

Active★1352 months ago

Jupyter Notebook

notameStats

BiomedicalInformatics

Provides univariate and multivariate statistics for feature prioritization in untargeted LC-MS metabolomics research.

Active★02 months ago

NanoResearch

Autonomous Research Systems (2023-2025 Breakthroughs)

End-to-end autonomous AI research engine that turns an idea into a complete LaTeX paper by dispatching real computational experiments to local GPUs or SLURM clusters, collecting actual results, generating figures/tables, and writing a data-grounded manuscript rather than LLM hallucinations (OpenRaiser, 1.5K+ stars, MIT License, 2026)

Active★1.5K2 months ago

assorthead

Vendors an assortment of useful header-only C++ libraries. Bioconductor packages can use these libraries in their own C++ code by LinkingTo this package without introducing any additional dependencies. The use of a central repository avoids duplicate vendoring of libraries across multiple R packages, and enables better coordination of version updates across cohorts of interdependent C++ libraries.

Active★12 months ago

Merlin (Stanford MIMI, Nature 2026)

Medical AI & Clinical Applications

3D vision-language model for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining, enabling multimodal medical understanding and radiology report generation (447+ stars, MIT License, 2026)

Active★4472 months ago

OER Schema

A RDF vocabulary for OER content on the web.

Active★222 months ago

CellTypist

Genomics & Bioinformatics

Automated cell type annotation tool for single-cell transcriptomics using gradient boosting and logistic regression with reference atlases, enabling standardized classification across datasets (Wellcome Sanger Institute, Nature Biotechnology 2022)

Active★4952 months ago

drugbank-downloader

Database Wrappers

Automate downloading, opening, and parsing DrugBank.

Active★672 months ago

SRAgent

Domain-Specific Research Agents

LLM agents for working with the SRA (Sequence Read Archive) and associated bioinformatics databases, enabling natural language querying of high-throughput sequencing data and metadata across genomic repositories (Arc Institute, 169+ stars, 2024-2026)

Active★1762 months ago

iscream

BED files store ranged genomic data that can be queried even when the files are compressed. iscream can query data from BED files and return them in muliple formats: parsed records or their summary statistics as data frames or GenomicRanges objects, and matrices as matrix, GenomicRanges, or SummarizedExperiment objects. iscream also provides specialized support for importing methylation data.

Active★02 months ago

Bactopia

A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes.

Active★5182 months ago

GONetView

Ontology and terminology

Standalone browser-based Gene Ontology network viewer for exploring, filtering, searching, and exporting GO term and gene annotation neighborhoods from locally preprocessed GO OBO and GAF data.

Active★02 months ago

NistChemPy

Database Wrappers

A package for accessing data from the NIST webbook...

Active★582 months ago

BiocPkgTools

Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.

Active★222 months ago

tripr

TRIP is a software framework that provides analytics services on antigen receptor (B cell receptor immunoglobulin, BcR IG | T cell receptor, TR) gene sequence data. It is a web application written in R Shiny. It takes as input the output files of the IMGT/HighV-Quest tool. Users can select to analyze the data from each of the input samples separately, or the combined data files from all samples and visualize the results accordingly.

Active★32 months ago

1
2
3
4
5
6
15

Submit a resource bio.tools Awesome Bioinformatics