Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active748
Idle370
Stale316
Archived13
(None)4476

Domain

Software422
ImmunoOncology251
Microarray138
Infrastructure123
GeneExpression117
Sequencing85
SingleCell72
Protein & Drug Discovery66
text-generation63
Visualization61
Annotation51
Genetics51
(None)2332

Language

R2426
Python448
Jupyter Notebook52
HTML30
C21
Makefile19
JavaScript16
C++15
Java10
Shell9
Web Ontology Language7
Perl6
(None)2815

License

GPL-3.0620
Artistic-2.0550
MIT549
CC-BY-4.0268
GPL-2.0252
GPL-2.0+243
CC0-1.0120
Apache-2.0107
GPL-3.0+101
CC-BY-3.083
NOASSERTION82
Other61
(None)2441

Source

bioconductor2418
bioregistry2418
github1150
awesome-ai-for-science418
huggingface303
awesome-bioinformatics126
bio.tools116
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18

Type

Software tool3202
Database2418
AI model303

Filters

Health

Active748
Idle370
Stale316
Archived13
(None)4476

Domain

Software422
ImmunoOncology251
Microarray138
Infrastructure123
GeneExpression117
Sequencing85
SingleCell72
Protein & Drug Discovery66
text-generation63
Visualization61
Annotation51
Genetics51
(None)2332

Language

R2426
Python448
Jupyter Notebook52
HTML30
C21
Makefile19
JavaScript16
C++15
Java10
Shell9
Web Ontology Language7
Perl6
(None)2815

License

GPL-3.0620
Artistic-2.0550
MIT549
CC-BY-4.0268
GPL-2.0252
GPL-2.0+243
CC0-1.0120
Apache-2.0107
GPL-3.0+101
CC-BY-3.083
NOASSERTION82
Other61
(None)2441

Source

bioconductor2418
bioregistry2418
github1150
awesome-ai-for-science418
huggingface303
awesome-bioinformatics126
bio.tools116
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18

Type

Software tool3202
Database2418
AI model303

5,923 resources indexed

Showing 151–200

ctheodoris/Geneformer

by ctheodoris

# Geneformer Geneformer is a foundational transformer model pretrained on a large-scale corpus of human single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.

Active↓3.2K2 weeks ago

MNE

MEG and EEG.

Active★3.4K2 weeks ago

UBERON Issue Tracker

An issue on the UBERON GitHub Issue tracker

Active★1552 weeks ago

google/medasr

by google

automatic-speech-recognition

Active↓16.6K2 weeks ago

ScientaLab/eva-rna

by ScientaLab

feature-extraction

Active↓732 weeks ago

EUCAIM ETL toolset

Data identity and mapping

Modular toolchain for an extensible and customizable ETL pipeline that extracts, transforms, and loads clinical data and medical imaging metadata, applying dataset-specific mappings to generate outputs compatible with the EUCAIM Common Data Model (CDM). Its design aims to minimize manual data preparation efforts and facilitate customization and integration with other components, such as data quality assurance tools. Containerized, currently supports input datasets in CSV, JSON, XLSX.

Active★02 weeks ago

GenomicScores

Provide infrastructure to store and access genomewide position-specific scores within R and Bioconductor.

Active★92 weeks ago

MicrobiomeProfiler

This is an R/shiny package to perform functional enrichment analysis for microbiome data. This package was based on clusterProfiler. Moreover, MicrobiomeProfiler support KEGG enrichment analysis, COG enrichment analysis, Microbe-Disease association enrichment analysis, Metabo-Pathway analysis.

Active★422 weeks ago

GSVA

FunctionalGenomics

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.

Active★2442 weeks ago

ScienceClaw

Autonomous Research Systems (2023-2025 Breakthroughs)

Self-evolving AI research colleague built on OpenClaw with 285+ runtime-adaptive skills across 28+ disciplines, persistent cross-session research memory, and zero-hallucination citation protocols; agent autonomously writes new SKILL.md files based on research patterns without redeployment (828+ stars, MIT License, 2026)

Active★8292 weeks ago

aasatorres/esm2-sae-topk-16384-k512

by aasatorres

Sparse Autoencoder (SAE) trained on residue-level embeddings from ESM-2 (650M, layer 33) for interpretability research on protein language models.

Active↓182 weeks ago

CellMentor

Implements supervised cell type-aware non-negative matrix factorization (NMF) for dimensional reduction in single-cell RNA sequencing analysis. The package provides methods for incorporating cell type information into the dimensionality reduction process, enabling improved visualization and downstream analysis of single-cell data while preserving biological structure. CellMentor employs a unique loss function that simultaneously minimizes variation within known cell populations while maximizing distinctions between different cell types, enabling effective transfer of learned patterns from labeled reference datasets to new unlabeled data.

Active★192 weeks ago

FAIR Cookbook

Active★1482 weeks ago

Rarr

The Zarr specification defines a format for chunked, compressed, N-dimensional arrays. It's design allows efficient access to subsets of the stored array, and supports both local and cloud storage systems. Rarr aims to implement this specification in R with minimal reliance on an external tools or libraries.

Active★522 weeks ago

OmicVerse

Genomics & Bioinformatics

Unified Python framework for bulk, single-cell, and spatial RNA-seq multi-omics analysis with deep learning deconvolution (VAE) and graph neural networks, bridging Bindea, Bindea, scanpy and squidpy ecosystems (Nature Communications 2024)

Active★1K2 weeks ago

DISCO-Design/DISCO

by DISCO-Design

DISCO (DIffusion for Sequence-structure CO-design) is a multimodal generative model that simultaneously co-designs protein sequences and 3D structures, conditioned on and co-folded with arbitrary biomolecules — including small-molecule ligands, DNA, and RNA.

Active↓62 weeks ago

assorthead

Vendors an assortment of useful header-only C++ libraries. Bioconductor packages can use these libraries in their own C++ code by LinkingTo this package without introducing any additional dependencies. The use of a central repository avoids duplicate vendoring of libraries across multiple R packages, and enables better coordination of version updates across cohorts of interdependent C++ libraries.

Active★12 weeks ago

DOCKSTRING

Automates and standardizes ligand preparation for AutoDock Vina.

Active★1852 weeks ago

Keylab/COMO

by Keylab

COMO (Closed-loop Optical Molecule recOgnition) is a deep learning framework for Optical Chemical Structure Recognition (OCSR). It recognizes chemical structure diagrams from images and predicts SMILES strings with atom-level 2D coordinates and bond matrices.

Active↓02 weeks ago

Galaxy Training Network

Identifiers in the GTN correspond to training materials in various formats (markdown, slides, video). The users can apply learned concepts directly within the framework via galaxy workflows.

Active★3652 weeks ago

Nengo

Simulation of large-scale brain models

Active★9292 weeks ago

Chromatograms

The Chromatograms packages defines an efficient infrastructure for storing and handling of chromatographic mass spectrometry data. It provides different implementations of *backends* to store and represent the data. Such backends can be optimized for small memory footprint or fast data access/processing. A lazy evaluation queue and chunk-wise processing capabilities ensure efficient analysis of also very large data sets.

Active★22 weeks ago

graphein

Machine Learning

Provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks.

Active★1.2K3 weeks ago

Jupyter Notebook

ClustIRR

ClustIRR analyzes repertoires of B- and T-cell receptors. It starts by identifying communities of immune receptors with similar specificities, based on the sequences of their complementarity-determining regions (CDRs). Next, it employs a Bayesian probabilistic models to quantify differential community occupancy (DCO) between repertoires, allowing the identification of expanding or contracting communities in response to e.g. infection or cancer treatment.

Active★53 weeks ago

cellmig

High-throughput cell imaging facilitates the analysis of cell migration across many wells treated under different biological conditions. These workflows generate considerable technical noise and biological variability, and therefore technical and biological replicates are necessary, leading to large, hierarchically structured datasets, i.e., cells are nested within technical replicates that are nested within biological replicates. Current statistical analyses of such data usually ignore the hierarchical structure of the data and fail to explicitly quantify uncertainty arising from technical or biological variability. To address this gap, we present cellmig, an R package implementing Bayesian hierarchical models for migration analysis. cellmig quantifies condition- specific velocity changes (e.g., drug effects) while modeling nested data structures and technical artifacts. It further enables synthetic data generation for experimental design optimization.

Active★13 weeks ago

PennyLane

Specialized Frameworks

Cross-platform library for differentiable programming of quantum computers with automatic differentiation, enabling hybrid quantum-classical machine learning for quantum chemistry, quantum physics, and NISQ algorithm research (Xanadu, 3k+ stars)

Active★3.2K3 weeks ago

NVIDIA PhysicsNeMo

Physics-Informed Neural Networks

Open-source framework for building physics-ML models at scale (renamed from Modulus, 2025)

Active★2.8K3 weeks ago

CellTypist

Genomics & Bioinformatics

Automated cell type annotation tool for single-cell transcriptomics using gradient boosting and logistic regression with reference atlases, enabling standardized classification across datasets (Wellcome Sanger Institute, Nature Biotechnology 2022)

Active★4863 weeks ago

AnVILAz

The AnVIL is a cloud computing resource developed in part by the National Human Genome Research Institute. The AnVILAz package supports end-users and developers using the AnVIL platform in the Azure cloud. The package provides a programmatic interface to AnVIL resources, including workspaces, notebooks, tables, and workflows. The package also provides utilities for managing resources, including copying files to and from Azure Blob Storage, and creating shared access signatures (SAS) for secure access to Azure resources.

Active★03 weeks ago

AnVILGCP

The package provides a set of functions to interact with the Google Cloud Platform (GCP) services on the AnVIL platform. The package is designed to use the API calls from the AnVIL package. It coordinates AnVIL workspace functionality with native GCP tools.

Active★03 weeks ago

birder-project/vit_reg4_so150m_p14_ls_dino-v2-bio

by birder-project

image-feature-extraction

vitreg4so150mp14ls_dino-v2-bio is a Bio-DINO image encoder for natural photographs of living organisms. It uses a SoViT-150M/14 Vision Transformer with 4 register tokens and 133.6M backbone parameters, trained with a DINOv2-style self-supervised objective on approximately 31 million curated images…

Active↓3.9K3 weeks ago

birder-project/vit_reg1_s14_ls_dino-v2-dist-bio

by birder-project

image-feature-extraction

vitreg1s14lsdino-v2-dist-bio is a compact Bio-DINO image encoder distilled from the larger Bio-DINO SoViT-150M/14 model. It keeps the same natural-photography biodiversity scope as the teacher model, but uses a much smaller ViT-S/14-style student with 21.7M backbone parameters and 384-dimensional…

Active↓5223 weeks ago

GraphExperiment

DataRepresentation

GraphExperiment provides users and developers with an S4 class that extends `SingleCellExperiment` by offering infrastructure to store and retrieve networks (`igraph` objects) representing how assay features and/or observations are associated with each other. The class was designed to store networks inferred from high-dimensional quantitative data, with feature-feature networks including gene coexpression networks (GCNs), gene regulatory networks (GRNs), and co-abundance networks (from proteomics and metabolomics), and observation-observation network including cell-cell distances, species-species relationships, and sample-sample similarities.

Active★13 weeks ago

WeatherBench2

Climate Modeling

Next-generation benchmark for data-driven global weather models with standardized evaluation framework and curated datasets for ML forecasting (Google Research, 2024)

Active★6143 weeks ago

Chemical Entity Materials and Reactions Ontological Framework

A data model for managing information about chemical entities, ranging from atoms through molecules to complex mixtures.

Active★233 weeks ago

NIF Standard Ontology: Neurolex

Active★593 weeks ago

Manhph2211/D-BETA

by Manhph2211

feature-extraction

Active↓823 weeks ago

BioSchemas

Bioschemas aims to improve the Findability on the Web of life sciences resources such as datasets, software, and training materials. It does this by encouraging people in the life sciences to use Schema.org markup in their websites so that they are indexable by search engines and other services. Bioschemas encourages the consistent use of markup to ease the consumption of the contained markup across many sites. This structured information then makes it easier to discover, collate, and analyse distributed resources. [from BioSchemas.org]

Active★633 weeks ago

3Dmol.js

An object-oriented, webGL based JavaScript library for online molecular visualization.

Active★9733 weeks ago

Jupyter Notebook

MsBackendMetaboLights

MetaboLights is one of the main public repositories for storage of metabolomics experiments, which includes analysis results as well as raw data. The MsBackendMetaboLights package provides functionality to retrieve and represent mass spectrometry (MS) data from MetaboLights. Data files are downloaded and cached locally avoiding repetitive downloads. MS data from metabolomics experiments can thus be directly and seamlessly integrated into R-based analysis workflows with the Spectra and MsBackendMetaboLights package.

Active★23 weeks ago

Casanovo

Genomics & Bioinformatics

Transformer encoder-decoder for de novo peptide sequencing from tandem mass spectrometry, translating MS/MS spectra directly to peptide sequences without reference databases, enabling identification of novel peptides for immunopeptidomics, antibody repertoires, and metaproteomes (Noble Lab UW, Nature Communications 2024)

Active★1873 weeks ago

PyLabRobot

Lab Automation & Robotics

Interactive and hardware-agnostic SDK for laboratory automation, enabling programmatic control of liquid handlers, plate readers, and other lab instruments across multiple vendors; foundational infrastructure for self-driving laboratories and AI-driven experimental execution (447+ stars)

Active★4503 weeks ago

Newton

Specialized Frameworks

GPU-accelerated differentiable physics simulation engine built on NVIDIA Warp, supporting rigid/soft body, cloth, and gradient-based optimization for scientific ML, initiated by Disney Research, DeepMind, and NVIDIA (Linux Foundation, Apache 2.0, 2025)

Active★5K3 weeks ago

AlphaFold3

Protein & Drug Discovery

AlphaFold 3 inference pipeline for unified biomolecular structure prediction of proteins, nucleic acids, small molecules, ions, and post-translational modifications (Google DeepMind, Nature 2024)

Active★8.1K3 weeks ago

Basic Register of Thesauri, Ontologies & Classifications

The Basic Register of Thesauri, Ontologies & Classifications (BARTOC) is a database of Knowledge Organization Systems and KOS related registries. The main goal of BARTOC is to list as many Knowledge Organization Systems as possible at one place in order to achieve greater visibility, highlight their features, make them searchable and comparable, and foster knowledge sharing. BARTOC includes any kind of KOS from any subject area, in any language, any publication format, and any form of accessibility. BARTOC’s search interface is available in 20 European languages and provides two search options: Basic Search by keywords, and Advanced Search by taxonomy terms. A circle of editors has gathered around BARTOC from all across Europe and BARTOC has been approved by the International Society for Knowledge Organization (ISKO).

Active★273 weeks ago

COTAN

Statistical and computational method to analyze the co-expression of gene pairs at single cell level. It provides the foundation for single-cell gene interactome analysis. The basic idea is studying the zero UMI counts' distribution instead of focusing on positive counts; this is done with a generalized contingency tables framework. COTAN can effectively assess the correlated or anti-correlated expression of gene pairs. It provides a numerical index related to the correlation and an approximate p-value for the associated independence test. COTAN can also evaluate whether single genes are differentially expressed, scoring them with a newly defined global differentiation index. Moreover, this approach provides ways to plot and cluster genes according to their co-expression pattern with other genes, effectively helping the study of gene interactions and becoming a new tool to identify cell-identity marker genes.

Active★173 weeks ago

drugbank-downloader

Database Wrappers

Automate downloading, opening, and parsing DrugBank.

Active★653 weeks ago

BirdNET-Analyzer

Ecological Modeling

Deep learning-based bioacoustic monitoring framework for automated bird species identification from audio recordings, supporting 6,000+ species globally with real-time analysis, batch processing, and API deployment; foundational tool in biodiversity research, conservation biology, and ecological acoustic monitoring (Cornell Lab of Ornithology, 1.5K+ stars, MIT License)

Active★1.6K3 weeks ago

SeqFu

Sequence Processing

Sequence manipulation toolkit for FASTA/FASTQ files written in Nim.

Active★1273 weeks ago

brick

Active★3753 weeks ago

1
2
3
4
5
6
119

Submit a resource bio.tools Awesome Bioinformatics