Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active147
Idle73
Stale48
Archived4
(None)1

Domain

Software32
Protein & Drug Discovery15
SingleCell9
GeneExpression8
Genomics & Bioinformatics6
Machine Learning6
Autonomous Research Systems (2023-2025 Breakthroughs)5
Climate Modeling5
CRISPR5
DNAMethylation5
Command Line Utilities4
Force Fields4
(None)8

Language

R125
Python85
Jupyter Notebook23
C6
Go4
C++3
TypeScript3
HTML2
JavaScript2
Julia2
Ruby2
CSS1
(None)7

License(1)

MIT273
GPL-3.0175
Artistic-2.0139
Apache-2.092
NOASSERTION82
GPL-2.0+38
BSD-3-Clause37
GPL-3.0+35
GPL-2.033
CC-BY-4.030
CC0-1.018
Other12
(None)123

Source(1)

bioconductor386
github273
awesome-ai-for-science75
awesome-bioinformatics25
bio.tools25
awesome-python-chemistry20
bioregistry12
awesome-cheminformatics7
awesome-scientific-python1

Type

Software tool264
Database9

Filters

Health

Active147
Idle73
Stale48
Archived4
(None)1

Domain

Software32
Protein & Drug Discovery15
SingleCell9
GeneExpression8
Genomics & Bioinformatics6
Machine Learning6
Autonomous Research Systems (2023-2025 Breakthroughs)5
Climate Modeling5
CRISPR5
DNAMethylation5
Command Line Utilities4
Force Fields4
(None)8

Language

R125
Python85
Jupyter Notebook23
C6
Go4
C++3
TypeScript3
HTML2
JavaScript2
Julia2
Ruby2
CSS1
(None)7

License(1)

MIT273
GPL-3.0175
Artistic-2.0139
Apache-2.092
NOASSERTION82
GPL-2.0+38
BSD-3-Clause37
GPL-3.0+35
GPL-2.033
CC-BY-4.030
CC0-1.018
Other12
(None)123

Source(1)

bioconductor386
github273
awesome-ai-for-science75
awesome-bioinformatics25
bio.tools25
awesome-python-chemistry20
bioregistry12
awesome-cheminformatics7
awesome-scientific-python1

Type

Software tool264
Database9

273 of 5,923 resources

Showing 51–100

methylclock

This package allows to estimate chronological and gestational DNA methylation (DNAm) age as well as biological age using different methylation clocks. Chronological DNAm age (in years) : Horvath's clock, Hannum's clock, BNN, Horvath's skin+blood clock, PedBE clock and Wu's clock. Gestational DNAm age : Knight's clock, Bohlin's clock, Mayne's clock and Lee's clocks. Biological DNAm clocks : Levine's clock and Telomere Length's clock.

Active★533 weeks ago

SEMPLR

MotifAnnotation

SEMPLR computes transcription factor binding affinity scores for genomic positions and genetic variants. Scores are computed from SNP Effect Matrices (SEMs) produced by SEMpl. 223 pre-computed SEMs are included with the package or custom sets can be provided. Enrichment can be tested among sets of genomic positions to determine if transcription factor binding events occur more often than expected. Comparing binding affinity scores between alleles can reveal differences in transcription factor binding with genetic variation. This package also includes several visualization functions to view scores both on the motif and variant/position level.

Active★13 weeks ago

freephdlabor

Autonomous Research Systems (2023-2025 Breakthroughs)

First fully customizable open-source multiagent framework automating complete research lifecycle from idea conception to LaTeX papers with dynamic workflows

Active★5604 weeks ago

immLynx

A comprehensive toolkit that bridges popular Python-based immune repertoire analysis tools and Hugging Face protein language models into the R environment. Provides unified interfaces for TCR distance calculations (tcrdist3), sequence generation probability (OLGA), selection inference (soNNia), clustering (clusTCR), protein embeddings (ESM-2), metaclone discovery (metaclonotypist). Fully compatible with the scRepertoire and immApex ecosystem for single-cell immune repertoire analysis.

Active★24 weeks ago

cyvcf2

Cython + HTSlib == fast VCF parsing; even faster parsing than pyVCF.

Active★4434 weeks ago

snntorch

Neuroscience & Behavioral Analysis

Deep learning with spiking neural networks in Python, providing gradient-based training of SNNs via PyTorch autodifferentiation for brain-inspired computing and neuromorphic research, with online learning capabilities and extensive tutorials (1.9K+ stars, actively maintained)

Active★2K1 month ago

Fourier Neural Operator

Neural Operators & Model Discovery

Learning operators in Fourier space

Active★3.7K1 month ago

edm.fibo

Active★5691 month ago

mLLMCelltype

Genomics & Bioinformatics

Multi-LLM consensus framework for automated cell type annotation in single-cell transcriptomics, integrating predictions from 10+ large language models with iterative discussion and uncertainty quantification to reduce single-model biases, achieving up to 95% accuracy without reference datasets; available as CRAN R package and PyPI Python package with Scanpy/Seurat integration (2025)

Active★6411 month ago

DiffEqFlux.jl

Neural Differential Equations

Neural differential equations in Julia

Active★9201 month ago

mosaic

Protein & Drug Discovery

Composite-objective protein design framework integrating Boltz, AlphaFold2, OpenFold3, ProteinMPNN, and ESM via JAX-based gradient optimization over continuous relaxed sequence space for multi-property binder design (319+ stars, MIT License, 2025)

Active★3231 month ago

SlideDeck AI

Slides & Presentation Generation

Co-create PowerPoint presentations with Generative AI from documents or topics

Active★3581 month ago

DelayedMatrixStats

A port of the 'matrixStats' API for use with DelayedMatrix objects from the 'DelayedArray' package. High-performing functions operating on rows and columns of DelayedMatrix objects, e.g. col / rowMedians(), col / rowRanks(), and col / rowSds(). Functions optimized per data type and for subsetted calculations such that both memory usage and processing time is minimized.

Active★151 month ago

plyxp

The package provides `rlang` data masks for the SummarizedExperiment class. The enables the evaluation of unquoted expression in different contexts of the SummarizedExperiment object with optional access to other contexts. The goal for `plyxp` is for evaluation to feel like a data.frame object without ever needing to unwind to a rectangular data.frame.

Active★91 month ago

immApex

A set of tools to for machine and deep learning in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.

Active★141 month ago

Persistent IDentifiers for Semantic Artifacts

An Apache-based persistent URL (PURL) service

Active★51 month ago

Awesome AI Scientist Papers

📋 Paper Collections & Repositories

Autonomous AI scientist research

Active★1551 month ago

zellkonverter

Provides methods to convert between Python AnnData objects and SingleCellExperiment objects. These are primarily intended for use by downstream Bioconductor packages that wrap Python methods for single-cell data analysis. It also includes functions to read and write H5AD files used for saving AnnData objects to disk.

Active★2091 month ago

LipidTrend

"LipidTrend" is an R package that implements a permutation-based statistical test to identify significant differences in lipidomic features between groups. The test incorporates Gaussian kernel smoothing of region statistics to improve stability and accuracy, particularly when dealing with small sample sizes. This package also includes two plotting functions for visualizing significant tendencies in 1D and 2D feature data, respectively.

Active★01 month ago

AIDE (WecoAI, arXiv 2025)

Autonomous Research Systems (2023-2025 Breakthroughs)

LLM-driven machine learning engineering agent using agentic tree search to autonomously draft, debug and benchmark ML code; wins 4× more medals than the best linear agent on OpenAI's MLE-Bench (75 Kaggle competitions) (1.3K+ stars, MIT License)

Active★1.3K1 month ago

mosdepth

BAM File Utilities

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.

Active★8561 month ago

csvtk

Command Line Utilities

Another cross-platform, efficient, practical and pretty CSV/TSV toolkit.

Active★1.2K1 month ago

igvShiny

This package is a wrapper of Integrative Genomics Viewer (IGV). It comprises an htmlwidget version of IGV. It can be used as a module in Shiny apps.

Active★381 month ago

pathogensurveillance

Pathogensurveillance is a population genomics pipeline for pathogen identification, variant detection, and biosurveillance. The pipeline accepts paths to raw reads for one or more organisms and creates reports in the form of an interactive HTML document. Significant features include the ability to analyze unidentified eukaryotic and prokaryotic samples, creation of reports for multiple user-defined groupings of samples, automated discovery and downloading of reference assemblies from NCBI RefSeq, and rapid initial identification based on k-mer sketches followed by a more robust multi gene phylogeny and SNP-based phylogeny.

Active★601 month ago

scGPT

Genomics & Bioinformatics

Single-cell analysis with transformers

Active★1.6K1 month ago

Jupyter Notebook

HiContacts

HiContacts provides a collection of tools to analyse and visualize Hi-C datasets imported in R by HiCExperiment.

Active★161 month ago

HiCExperiment

R generic interface to Hi-C contact matrices in `.(m)cool`, `.hic` or HiC-Pro derived formats, as well as other Hi-C processed file formats. Contact matrices can be partially parsed using a random access method, allowing a memory-efficient representation of Hi-C data in R. The `HiCExperiment` class stores the Hi-C contacts parsed from local contact matrix files. `HiCExperiment` instances can be further investigated in R using the `HiContacts` analysis package.

Active★131 month ago

dominatR

dominatR is an R package for quantifying and visualizing feature dominance in datasets. dominatR applies concepts drawn from physics such as center of mass and shannon's entropy to effectively visualize features (e.g. genes) that are present within a specific context or condition. The package integrates, dataframes, matrices and SummerizedExperiment objects and is able to perform common genomic normalization methods. The key aspect is the generation of plots that serve to highlight context-relevant feature dominance.

Active★31 month ago

mutscan

GeneticVariability

Provides functionality for processing and statistical analysis of multiplexed assays of variant effect (MAVE) and similar data. The package contains functions covering the full workflow from raw FASTQ files to publication-ready visualizations. A broad range of library designs can be processed with a single, unified interface.

Active★141 month ago

ExploreModelMatrix

ExperimentalDesign

Given a sample data table and a design formula, ExploreModelMatrix generates an interactive application for exploration of the resulting design matrix. This can be helpful for interpreting model coefficients and constructing appropriate contrasts in (generalized) linear models. Static visualizations can also be generated.

Active★381 month ago

postNet

A tool that enables in silico identification, integration, and modeling of mRNA features that influence post-transcriptional regulation of gene expression at a transcriptome-wide scale.

Active★01 month ago

ClonalSim

ClonalSim generates realistic mutational profiles of tumor samples with hierarchical clonal structure. It simulates founder, shared, and private mutations with biologically realistic noise models including intra-tumor heterogeneity (Beta distribution) and technical sequencing noise (negative binomial depth variation, binomial read sampling, base errors). The package is designed for benchmarking variant callers, testing clonal deconvolution algorithms, and teaching tumor heterogeneity concepts.

Active★11 month ago

HilbertCurve

Hilbert curve is a type of space-filling curves that fold one dimensional axis into a two dimensional space, but with still preserves the locality. This package aims to provide an easy and flexible way to visualize data through Hilbert curve.

Active★441 month ago

PaperBench (OpenAI, 2025)

Evaluation & Benchmarking

Benchmark evaluating AI agents' ability to replicate 20 ICML 2024 Spotlight/Oral papers from scratch, with 8,316 gradable tasks and author-co-developed rubrics

Active★1.2K1 month ago

ORFik

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Active★381 month ago

mint

Protein & Drug Discovery

Learning the language of protein-protein interactions

Active★1501 month ago

LRDE

Provides hurdle negative binomial models for differential expression analysis with long-read RNA-Seq data.

Active★02 months ago

DeepAnalyze

Data Analysis & Visualization

First agentic LLM for autonomous data science with end-to-end pipeline from data to analyst-grade reports

Active★4.2K2 months ago

scPassport

Stamps Seurat, SingleCellExperiment, and SummarizedExperiment objects with a persistent metadata passport. For Seurat objects the passport is stored in the misc slot; for SingleCellExperiment and SummarizedExperiment objects it is stored in the metadata slot. Tracks animal info, experiment details, lineage (parent/child relationships), RDS registry numbers, processing logs, and custom fields. Includes an interactive Shiny gadget to fill and update the passport, and a read mode to print the full passport to console. The passport persists inside the RDS file with no external files needed.

Active★32 months ago

OPSIN

Open Parser for Systematic IUPAC nomenclature

Active★2162 months ago

TileDBArray

DataRepresentation

Implements a DelayedArray backend for reading and writing dense or sparse arrays in the TileDB format. The resulting TileDBArrays are compatible with all Bioconductor pipelines that can accept DelayedArray instances.

Active★112 months ago

Claude Prism

Scientific Writing & Collaboration

Offline-first scientific writing workspace powered by Claude, integrating LaTeX, Python, and 100+ scientific skills with local execution, Zotero integration, and privacy-focused design (2026)

Active★1.5K2 months ago

MIRA (NeurIPS 2025)

Medical AI & Clinical Applications

Medical time series foundation model pretrained on 454B time points from heterogeneous clinical corpora spanning ICU physiological signals and hospital EHR, with continuous-time rotary positional encoding, frequency-specialized Mixture-of-Experts, and neural ODE extrapolation for zero-shot forecasting across irregular and multimodal temporal health data (Microsoft, 399+ stars, MIT License)

Active★3992 months ago

RFGeneRank

Transcriptomics

Tools to harmonize bulk RNA-seq matrices, optionally apply batch correction, and train cross-validated classification models using ranger, glmnet, or xgboost. Supports leakage-safe feature selection, permutation importance, SHAP-based interpretability, and calibration methods (Platt or isotonic). Provides stability metrics across folds, embeddings (PCA/UMAP), ROC visualization, SHAP dependence plots, and tidy ranked-gene tables for downstream analysis.

Active★02 months ago

Ibex

Implementation of the Ibex algorithm for single-cell embedding based on BCR sequences. The package includes a standalone function to encode BCR sequence information by amino acid properties or sequence order using tensorflow-based autoencoder. In addition, the package interacts with SingleCellExperiment or Seurat data objects.

Active★272 months ago

HiCool

HiCool provides an R interface to process and normalize Hi-C paired-end fastq reads into .(m)cool files. .(m)cool is a compact, indexed HDF5 file format specifically tailored for efficiently storing HiC-based data. On top of processing fastq reads, HiCool provides a convenient reporting function to generate shareable reports summarizing Hi-C experiments and including quality controls.

Active★22 months ago

crisprScore

Provides R wrappers of several on-target and off-target scoring methods for CRISPR guide RNAs (gRNAs). The following nucleases are supported: SpCas9, AsCas12a, enAsCas12a, and RfxCas13d (CasRx). The available on-target cutting efficiency scoring methods are RuleSet1, RuleSet3, DeepHF, enPAM+GB, and CRISPRscan. Both the CFD and MIT scoring methods are available for off-target specificity prediction. The package also provides a Lindel-derived score to predict the probability of a gRNA to produce indels inducing a frameshift for the Cas9 nuclease. Note that DeepHF and enPAM+GB are not available on Windows machines.

Active★272 months ago

crisprBwa

Provides a user-friendly interface to map on-targets and off-targets of CRISPR gRNA spacer sequences using bwa. The alignment is fast, and can be performed using either commonly-used or custom CRISPR nucleases. The alignment can work with any reference or custom genomes. Currently not supported on Windows machines.

Active★12 months ago

ComplexHeatmap

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.

Active★1.5K2 months ago

Earth-Agent

Climate Modeling

LLM agent framework for Earth Observation with 104 specialized tools across 5 functional kits

Active★1522 months ago

1
2
3
4
5
6

Submit a resource bio.tools Awesome Bioinformatics