Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

527 of 5,893 resources

Showing 150

A static web application presents an interactive knowledge graph of single-cell long-read RNA sequencing literature synthesized from seven source papers. Users navigate mind-tree, network graph, guided learning-path, and Sankey views linking platforms, protocols, methods, and software. A benchmark tab provides 34 question-answer pairs with category and difficulty filters, exportable as JSON or CSV for LLM and agent evaluation.

Active016 hours ago
JavaScript
MIT

Comprehensive collection of 125+ ready-to-use scientific skill modules for Claude AI across bioinformatics, cheminformatics, clinical research, ML, and materials science

Active27.8K22 hours ago
Python
MIT

Open-source biomedical AI platform integrating multimodal foundation models (BioMedGPT, PharmolixFM, LangCell) with agentic workflows and 45+ Claude Code skills for drug discovery, protein engineering, and single-cell omics analysis (PharMolix & Tsinghua AIR, 1K+ stars, 2023-2026)

Active1.1K1 day ago
Python
MIT

Computes weighed laboratory buffer recipes from target pH, concentration, and volume, accounting for separate preparation and working temperatures when pKa shifts with temperature. Supports calculator mode from dry reagents and stock dilution mode, returning acid and base masses, ionic strength estimates, optional NaCl adjustment, gravimetric and titration routes, and stepwise protocols. A browser calculator supports interactive recipe entry with shareable links; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured recipe tables, compatibility warnings, and shareable run identifiers.

Active13 days ago
Python
MIT

Galaxy workflow for BlockClust pipeline.

Active1234 days ago
R
MIT

Open-source deep learning toolbox for bioimage analysis providing a unified, configuration-driven framework for 2D/3D semantic segmentation, instance segmentation, classification, denoising, super-resolution, and self-supervised learning; integrates state-of-the-art architectures including U-Net, Vision Transformers, and ConvNeXt, designed for microscopy and biomedical imaging researchers without extensive coding expertise (MIT License, actively maintained)

Active2016 days ago
Jupyter Notebook
MIT

pairedGSEA makes it simple to run a paired Differential Gene Expression (DGE) and Differencital Gene Splicing (DGS) analysis. The package allows you to store intermediate results for further investiation, if desired. pairedGSEA comes with a wrapper function for running an Over-Representation Analysis (ORA) and functionalities for plotting the results.

Active41 week ago
R
MIT

Segment Anything Model for microscopy: interactive and automatic segmentation of light, electron, and fluorescence microscopy images in 2D and 3D, with domain-specific fine-tuning workflows for scientific imaging (1.5K+ stars)

Active6851 week ago
Jupyter Notebook
MIT

High-level open-source geospatial AI package for satellite/aerial imagery analysis, model training, inference, interactive visualization, and QGIS integration, bridging PyTorch/Transformers with remote sensing workflows (MIT, 2026)

Active3.1K1 week ago
Python
MIT

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Active381 week ago
R
MIT

A haplotype-resolved assembler for accurate Hifi reads.

Active7791 week ago
C++
MIT

Interactive explorer for single-cell transcriptomics data enabling visualization of UMAP/t-SNE embeddings, differential expression analysis, and cross-dataset comparison through a fast web-based interface; widely adopted for exploring atlas-scale single-cell datasets and integrating with AI/ML analysis workflows (773+ stars, MIT License)

Active7731 week ago
JavaScript
MIT

Tool for converting raw DNA data files between 23andMe, AncestryDNA, MyHeritage, and FamilyTreeDNA formats.

Active11 week ago
PHP
MIT
Active1241 week ago
Ruby
MIT

Spatial transcriptomic technologies have helped to resolve the connection between gene expression and the 2D orientation of tissues relative to each other. However, the limited single-cell resolution makes it difficult to highlight the most important molecular interactions in these tissues. SpaceMarkers, R/Bioconductor software, can help to find molecular interactions, by identifying genes associated with latent space interactions in spatial transcriptomics.

Active81 week ago
R
MIT
Active13.5K1 week ago
Ruby
MIT

Highly scalable equivariant deep learning interatomic potentials enabling million-atom molecular dynamics simulations with ab initio accuracy, building on E(3)-equivariant architectures for large-scale atomistic modeling (mir-group, MIT License, 480+ stars)

Active4821 week ago
Python
MIT

Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science with 140+ ready-to-use skills and 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Antigravity, and the open Agent Skills standard (K-Dense-AI, 26K+ stars, 2025)

Active26.5K1 week ago
Python
MIT

Biological simulation tools

Active151 week ago
Python
MIT

Provides univariate and multivariate statistics for feature prioritization in untargeted LC-MS metabolomics research.

Active02 weeks ago
R
MIT

Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and observed phenotypes. Epiregulon infers TF activity in single cells by constructing a gene regulatory network (regulons). This is achieved through integration of scATAC-seq and scRNA-seq data and incorporation of public bulk TF ChIP-seq data. Links between regulatory elements and their target genes are established by computing correlations between chromatin accessibility and gene expressions.

Active282 weeks ago
R
MIT

Self-evolving AI research colleague built on OpenClaw with 285+ runtime-adaptive skills across 28+ disciplines, persistent cross-session research memory, and zero-hallucination citation protocols; agent autonomously writes new SKILL.md files based on research patterns without redeployment (828+ stars, MIT License, 2026)

Active8292 weeks ago
TypeScript
MIT

The Zarr specification defines a format for chunked, compressed, N-dimensional arrays. It's design allows efficient access to subsets of the stored array, and supports both local and cloud storage systems. Rarr aims to implement this specification in R with minimal reliance on an external tools or libraries.

Active522 weeks ago
R
MIT

Vendors an assortment of useful header-only C++ libraries. Bioconductor packages can use these libraries in their own C++ code by LinkingTo this package without introducing any additional dependencies. The use of a central repository avoids duplicate vendoring of libraries across multiple R packages, and enables better coordination of version updates across cohorts of interdependent C++ libraries.

Active12 weeks ago
R
MIT

Provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks.

Active1.2K2 weeks ago
Jupyter Notebook
MIT

Automated cell type annotation tool for single-cell transcriptomics using gradient boosting and logistic regression with reference atlases, enabling standardized classification across datasets (Wellcome Sanger Institute, Nature Biotechnology 2022)

Active4862 weeks ago
Python
MIT

Interactive and hardware-agnostic SDK for laboratory automation, enabling programmatic control of liquid handlers, plate readers, and other lab instruments across multiple vendors; foundational infrastructure for self-driving laboratories and AI-driven experimental execution (447+ stars)

Active4502 weeks ago
Python
MIT

Automate downloading, opening, and parsing DrugBank.

Active652 weeks ago
Python
MIT

Deep learning-based bioacoustic monitoring framework for automated bird species identification from audio recordings, supporting 6,000+ species globally with real-time analysis, batch processing, and API deployment; foundational tool in biodiversity research, conservation biology, and ecological acoustic monitoring (Cornell Lab of Ornithology, 1.5K+ stars, MIT License)

Active1.6K2 weeks ago
Python
MIT

LLM agents for working with the SRA (Sequence Read Archive) and associated bioinformatics databases, enabling natural language querying of high-throughput sequencing data and metadata across genomic repositories (Arc Institute, 169+ stars, 2024-2026)

Active1702 weeks ago
Python
MIT

BED files store ranged genomic data that can be queried even when the files are compressed. iscream can query data from BED files and return them in muliple formats: parsed records or their summary statistics as data frames or GenomicRanges objects, and matrices as matrix, GenomicRanges, or SummarizedExperiment objects. iscream also provides specialized support for importing methylation data.

Active02 weeks ago
R
MIT

R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots. The package is still under development.

Active62 weeks ago
R
MIT

Standalone browser-based Gene Ontology network viewer for exploring, filtering, searching, and exporting GO term and gene annotation neighborhoods from locally preprocessed GO OBO and GAF data.

Active02 weeks ago
TypeScript
MIT

A package for accessing data from the NIST webbook...

Active562 weeks ago
Python
MIT

E(3)-equivariant neural network interatomic potentials achieving DFT accuracy with up to 1000× less training data than invariant models, foundational architecture behind MACE and Allegro (Harvard, MIT, Nature Communications 2022)

Active9142 weeks ago
Python
MIT

StatescopeR is an R wrapper around Statescope, a computational framework designed to discover cell states from cell type-specific gene expression profiles inferred from bulk RNA profiles.

Active03 weeks ago
R
MIT

Translate differential transcript usage results into discrete splice events.

Active13 weeks ago
R
MIT

The Simplified Upper Level Ontology (SULO) is ontology with a minimal set of classes and relations to guide the development of a personal health knowledge graph. [from homepage]

Active163 weeks ago
Python
MIT

DeeDeeExperiment is an S4 class extending the SingleCellExperiment class, designed to integrate and manage omics analysis results. It introduces two dedicated slots to store Differential Expression Analysis (DEA) results and Functional Enrichment Analysis (FEA) results, providing a structured approach for downstream analysis.

Active03 weeks ago
R
MIT

Utilities for working with CSV/Tab-delimited files.

Active6.4K3 weeks ago
Python
MIT

Fit a latent embedding multivariate regression (LEMUR) model to multi-condition single-cell data. The model provides a parametric description of single-cell data measured with treatment vs. control or more complex experimental designs. The parametric model is used to (1) align conditions, (2) predict log fold changes between conditions for all cells, and (3) identify cell neighborhoods with consistent log fold changes. For those neighborhoods, a pseudobulked differential expression test is conducted to assess which genes are significantly changed.

Active1013 weeks ago
R
MIT

Implements R bindings to C++ code for analyzing single-cell (expression) data, mostly from various libscran libraries. Each function performs an individual step in the single-cell analysis workflow, ranging from quality control to clustering and marker detection. Additional wrappers are provided for easy construction of end-to-end workflows involving Bioconductor objects like SingleCellExperiments.

Active83 weeks ago
R
MIT

Automatic atomic model building program for cryo-EM maps using deep learning, enabling rapid de novo protein structure determination from electron density with high accuracy (3DEM/EMBL, 169+ stars)

Active1693 weeks ago
Python
MIT

Provides an R interface for various subsampling algorithms implemented in python packages. Currently, interfaces to the geosketch and scSampler python packages are implemented. In addition it also provides diagnostic plots to evaluate the subsampling.

Active33 weeks ago
R
MIT

This package allows to estimate chronological and gestational DNA methylation (DNAm) age as well as biological age using different methylation clocks. Chronological DNAm age (in years) : Horvath's clock, Hannum's clock, BNN, Horvath's skin+blood clock, PedBE clock and Wu's clock. Gestational DNAm age : Knight's clock, Bohlin's clock, Mayne's clock and Lee's clocks. Biological DNAm clocks : Levine's clock and Telomere Length's clock.

Active533 weeks ago
R
MIT

SEMPLR computes transcription factor binding affinity scores for genomic positions and genetic variants. Scores are computed from SNP Effect Matrices (SEMs) produced by SEMpl. 223 pre-computed SEMs are included with the package or custom sets can be provided. Enrichment can be tested among sets of genomic positions to determine if transcription factor binding events occur more often than expected. Comparing binding affinity scores between alleles can reveal differences in transcription factor binding with genetic variation. This package also includes several visualization functions to view scores both on the motif and variant/position level.

Active13 weeks ago
R
MIT

First fully customizable open-source multiagent framework automating complete research lifecycle from idea conception to LaTeX papers with dynamic workflows

Active5603 weeks ago
Python
MIT

A comprehensive toolkit that bridges popular Python-based immune repertoire analysis tools and Hugging Face protein language models into the R environment. Provides unified interfaces for TCR distance calculations (tcrdist3), sequence generation probability (OLGA), selection inference (soNNia), clustering (clusTCR), protein embeddings (ESM-2), metaclone discovery (metaclonotypist). Fully compatible with the scRepertoire and immApex ecosystem for single-cell immune repertoire analysis.

Active23 weeks ago
R
MIT

Cython + HTSlib == fast VCF parsing; even faster parsing than pyVCF.

Active4433 weeks ago
Cython
MIT

Deep learning with spiking neural networks in Python, providing gradient-based training of SNNs via PyTorch autodifferentiation for brain-inspired computing and neuromorphic research, with online learning capabilities and extensive tutorials (1.9K+ stars, actively maintained)

Active2K4 weeks ago
Python
MIT