Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

273 of 5,923 resources

Showing 150

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools.

Active39413 hours ago
Python
MIT

Physics-Informed Neural networks for Advanced modeling in PyTorch

Active7581 day ago
Python
MIT

Nallo is a bioinformatics analysis pipeline for long-reads from both PacBio and (targeted) ONT-data, focused on rare-disease. The pipeline detects a wide range of genetic variants, performs genome assembly, and reports CpG methylation. It also enables annotation and ranking of variants based on their predicted functional consequences.

Active672 days ago
Nextflow
MIT

A static web application presents an interactive knowledge graph of single-cell long-read RNA sequencing literature synthesized from seven source papers. Users navigate mind-tree, network graph, guided learning-path, and Sankey views linking platforms, protocols, methods, and software. A benchmark tab provides 34 question-answer pairs with category and difficulty filters, exportable as JSON or CSV for LLM and agent evaluation.

Active03 days ago
JavaScript
MIT

Comprehensive collection of 125+ ready-to-use scientific skill modules for Claude AI across bioinformatics, cheminformatics, clinical research, ML, and materials science

Active27.8K3 days ago
Python
MIT

Open-source biomedical AI platform integrating multimodal foundation models (BioMedGPT, PharmolixFM, LangCell) with agentic workflows and 45+ Claude Code skills for drug discovery, protein engineering, and single-cell omics analysis (PharMolix & Tsinghua AIR, 1K+ stars, 2023-2026)

Active1.1K4 days ago
Python
MIT

Ultra-fast, sensitive search and clustering suite for protein and nucleotide sequence sets.

Active2.1K4 days ago
C
MIT

Computes weighed laboratory buffer recipes from target pH, concentration, and volume, accounting for separate preparation and working temperatures when pKa shifts with temperature. Supports calculator mode from dry reagents and stock dilution mode, returning acid and base masses, ionic strength estimates, optional NaCl adjustment, gravimetric and titration routes, and stepwise protocols. A browser calculator supports interactive recipe entry with shareable links; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured recipe tables, compatibility warnings, and shareable run identifiers.

Active16 days ago
Python
MIT

Galaxy workflow for BlockClust pipeline.

Active1236 days ago
R
MIT

Open-source deep learning toolbox for bioimage analysis providing a unified, configuration-driven framework for 2D/3D semantic segmentation, instance segmentation, classification, denoising, super-resolution, and self-supervised learning; integrates state-of-the-art architectures including U-Net, Vision Transformers, and ConvNeXt, designed for microscopy and biomedical imaging researchers without extensive coding expertise (MIT License, actively maintained)

Active2011 week ago
Jupyter Notebook
MIT

pairedGSEA makes it simple to run a paired Differential Gene Expression (DGE) and Differencital Gene Splicing (DGS) analysis. The package allows you to store intermediate results for further investiation, if desired. pairedGSEA comes with a wrapper function for running an Over-Representation Analysis (ORA) and functionalities for plotting the results.

Active41 week ago
R
MIT

A graph-based molecule modeling and chemoinformatics analysis toolkit fully implemented in Julia

Active2241 week ago
Julia
MIT

Segment Anything Model for microscopy: interactive and automatic segmentation of light, electron, and fluorescence microscopy images in 2D and 3D, with domain-specific fine-tuning workflows for scientific imaging (1.5K+ stars)

Active6851 week ago
Jupyter Notebook
MIT

High-level open-source geospatial AI package for satellite/aerial imagery analysis, model training, inference, interactive visualization, and QGIS integration, bridging PyTorch/Transformers with remote sensing workflows (MIT, 2026)

Active3.1K1 week ago
Python
MIT

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Active381 week ago
R
MIT

A haplotype-resolved assembler for accurate Hifi reads.

Active7791 week ago
C++
MIT

Interactive explorer for single-cell transcriptomics data enabling visualization of UMAP/t-SNE embeddings, differential expression analysis, and cross-dataset comparison through a fast web-based interface; widely adopted for exploring atlas-scale single-cell datasets and integrating with AI/ML analysis workflows (773+ stars, MIT License)

Active7732 weeks ago
JavaScript
MIT

Tool for converting raw DNA data files between 23andMe, AncestryDNA, MyHeritage, and FamilyTreeDNA formats.

Active12 weeks ago
PHP
MIT
Active1242 weeks ago
Ruby
MIT

Spatial transcriptomic technologies have helped to resolve the connection between gene expression and the 2D orientation of tissues relative to each other. However, the limited single-cell resolution makes it difficult to highlight the most important molecular interactions in these tissues. SpaceMarkers, R/Bioconductor software, can help to find molecular interactions, by identifying genes associated with latent space interactions in spatial transcriptomics.

Active82 weeks ago
R
MIT

First fully open-source model achieving AlphaFold3-level accuracy with 1000x faster binding affinity prediction (MIT)

Active4K2 weeks ago
Python
MIT
Active13.5K2 weeks ago
Ruby
MIT

Highly scalable equivariant deep learning interatomic potentials enabling million-atom molecular dynamics simulations with ab initio accuracy, building on E(3)-equivariant architectures for large-scale atomistic modeling (mir-group, MIT License, 480+ stars)

Active4822 weeks ago
Python
MIT

Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science with 140+ ready-to-use skills and 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Antigravity, and the open Agent Skills standard (K-Dense-AI, 26K+ stars, 2025)

Active26.5K2 weeks ago
Python
MIT

Biological simulation tools

Active152 weeks ago
Python
MIT

Provides univariate and multivariate statistics for feature prioritization in untargeted LC-MS metabolomics research.

Active02 weeks ago
R
MIT

Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and observed phenotypes. Epiregulon infers TF activity in single cells by constructing a gene regulatory network (regulons). This is achieved through integration of scATAC-seq and scRNA-seq data and incorporation of public bulk TF ChIP-seq data. Links between regulatory elements and their target genes are established by computing correlations between chromatin accessibility and gene expressions.

Active282 weeks ago
R
MIT

Self-evolving AI research colleague built on OpenClaw with 285+ runtime-adaptive skills across 28+ disciplines, persistent cross-session research memory, and zero-hallucination citation protocols; agent autonomously writes new SKILL.md files based on research patterns without redeployment (828+ stars, MIT License, 2026)

Active8292 weeks ago
TypeScript
MIT

The Zarr specification defines a format for chunked, compressed, N-dimensional arrays. It's design allows efficient access to subsets of the stored array, and supports both local and cloud storage systems. Rarr aims to implement this specification in R with minimal reliance on an external tools or libraries.

Active522 weeks ago
R
MIT

Vendors an assortment of useful header-only C++ libraries. Bioconductor packages can use these libraries in their own C++ code by LinkingTo this package without introducing any additional dependencies. The use of a central repository avoids duplicate vendoring of libraries across multiple R packages, and enables better coordination of version updates across cohorts of interdependent C++ libraries.

Active12 weeks ago
R
MIT

Provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks.

Active1.2K3 weeks ago
Jupyter Notebook
MIT

Automated cell type annotation tool for single-cell transcriptomics using gradient boosting and logistic regression with reference atlases, enabling standardized classification across datasets (Wellcome Sanger Institute, Nature Biotechnology 2022)

Active4863 weeks ago
Python
MIT

Interactive and hardware-agnostic SDK for laboratory automation, enabling programmatic control of liquid handlers, plate readers, and other lab instruments across multiple vendors; foundational infrastructure for self-driving laboratories and AI-driven experimental execution (447+ stars)

Active4503 weeks ago
Python
MIT

Automate downloading, opening, and parsing DrugBank.

Active653 weeks ago
Python
MIT

Deep learning-based bioacoustic monitoring framework for automated bird species identification from audio recordings, supporting 6,000+ species globally with real-time analysis, batch processing, and API deployment; foundational tool in biodiversity research, conservation biology, and ecological acoustic monitoring (Cornell Lab of Ornithology, 1.5K+ stars, MIT License)

Active1.6K3 weeks ago
Python
MIT

LLM agents for working with the SRA (Sequence Read Archive) and associated bioinformatics databases, enabling natural language querying of high-throughput sequencing data and metadata across genomic repositories (Arc Institute, 169+ stars, 2024-2026)

Active1703 weeks ago
Python
MIT

BED files store ranged genomic data that can be queried even when the files are compressed. iscream can query data from BED files and return them in muliple formats: parsed records or their summary statistics as data frames or GenomicRanges objects, and matrices as matrix, GenomicRanges, or SummarizedExperiment objects. iscream also provides specialized support for importing methylation data.

Active03 weeks ago
R
MIT

R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots. The package is still under development.

Active63 weeks ago
R
MIT

Standalone browser-based Gene Ontology network viewer for exploring, filtering, searching, and exporting GO term and gene annotation neighborhoods from locally preprocessed GO OBO and GAF data.

Active03 weeks ago
TypeScript
MIT

A package for accessing data from the NIST webbook...

Active563 weeks ago
Python
MIT

E(3)-equivariant neural network interatomic potentials achieving DFT accuracy with up to 1000× less training data than invariant models, foundational architecture behind MACE and Allegro (Harvard, MIT, Nature Communications 2022)

Active9143 weeks ago
Python
MIT

StatescopeR is an R wrapper around Statescope, a computational framework designed to discover cell states from cell type-specific gene expression profiles inferred from bulk RNA profiles.

Active03 weeks ago
R
MIT

Translate differential transcript usage results into discrete splice events.

Active13 weeks ago
R
MIT

The Simplified Upper Level Ontology (SULO) is ontology with a minimal set of classes and relations to guide the development of a personal health knowledge graph. [from homepage]

Active163 weeks ago
Python
MIT

DeeDeeExperiment is an S4 class extending the SingleCellExperiment class, designed to integrate and manage omics analysis results. It introduces two dedicated slots to store Differential Expression Analysis (DEA) results and Functional Enrichment Analysis (FEA) results, providing a structured approach for downstream analysis.

Active03 weeks ago
R
MIT

Utilities for working with CSV/Tab-delimited files.

Active6.4K3 weeks ago
Python
MIT

Fit a latent embedding multivariate regression (LEMUR) model to multi-condition single-cell data. The model provides a parametric description of single-cell data measured with treatment vs. control or more complex experimental designs. The parametric model is used to (1) align conditions, (2) predict log fold changes between conditions for all cells, and (3) identify cell neighborhoods with consistent log fold changes. For those neighborhoods, a pseudobulked differential expression test is conducted to assess which genes are significantly changed.

Active1013 weeks ago
R
MIT

Implements R bindings to C++ code for analyzing single-cell (expression) data, mostly from various libscran libraries. Each function performs an individual step in the single-cell analysis workflow, ranging from quality control to clustering and marker detection. Additional wrappers are provided for easy construction of end-to-end workflows involving Bioconductor objects like SingleCellExperiments.

Active83 weeks ago
R
MIT

Automatic atomic model building program for cryo-EM maps using deep learning, enabling rapid de novo protein structure determination from electron density with high accuracy (3DEM/EMBL, 169+ stars)

Active1693 weeks ago
Python
MIT

Provides an R interface for various subsampling algorithms implemented in python packages. Currently, interfaces to the geosketch and scSampler python packages are implemented. In addition it also provides diagnostic plots to evaluate the subsampling.

Active33 weeks ago
R
MIT