Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active61
Stale14
Idle13
Archived4
(None)15

Domain

Autonomous Research Systems (2023-2025 Breakthroughs)9
Genomics & Bioinformatics8
Protein & Drug Discovery8
Medical AI & Clinical Applications6
Climate Modeling4
Domain-Specific Research Agents3
Machine Learning3
Scientific Literature RAG & Analysis3
Simulations3
Specialized Frameworks3
Data quality management2
General Science Models2
(None)12

Language

Python69
R10
Jupyter Notebook6
Shell4
JavaScript2
C#1
C++1
Common Workflow Language1
Go1
HTML1
Rich Text Format1
Rust1
(None)9

License(1)

GPL-3.0620
Artistic-2.0550
MIT549
CC-BY-4.0268
GPL-2.0252
GPL-2.0+243
CC0-1.0120
Apache-2.0107
GPL-3.0+101
CC-BY-3.083
NOASSERTION82
Other61
(None)2441

Source

github92
awesome-ai-for-science61
bioregistry14
bioconductor10
awesome-bioinformatics7
bio.tools7
awesome-cheminformatics4
awesome-python-chemistry4

Type

Software tool93
Database14

Filters

Health

Active61
Stale14
Idle13
Archived4
(None)15

Domain

Autonomous Research Systems (2023-2025 Breakthroughs)9
Genomics & Bioinformatics8
Protein & Drug Discovery8
Medical AI & Clinical Applications6
Climate Modeling4
Domain-Specific Research Agents3
Machine Learning3
Scientific Literature RAG & Analysis3
Simulations3
Specialized Frameworks3
Data quality management2
General Science Models2
(None)12

Language

Python69
R10
Jupyter Notebook6
Shell4
JavaScript2
C#1
C++1
Common Workflow Language1
Go1
HTML1
Rich Text Format1
Rust1
(None)9

License(1)

GPL-3.0620
Artistic-2.0550
MIT549
CC-BY-4.0268
GPL-2.0252
GPL-2.0+243
CC0-1.0120
Apache-2.0107
GPL-3.0+101
CC-BY-3.083
NOASSERTION82
Other61
(None)2441

Source

github92
awesome-ai-for-science61
bioregistry14
bioconductor10
awesome-bioinformatics7
bio.tools7
awesome-cheminformatics4
awesome-python-chemistry4

Type

Software tool93
Database14

107 of 5,923 resources

Showing 51–100

OpenEvolve

Autonomous Research Systems (2023-2025 Breakthroughs)

Open-source implementation of AlphaEvolve's evolutionary coding agent paradigm, enabling LLMs to autonomously discover and optimize algorithms through iterative evolution, matching the approach behind DeepMind's breakthrough matrix multiplication discovery (6.2K+ stars, 2025)

Active★6.4K2 months ago

JAX-CFD

Specialized Frameworks

Computational fluid dynamics in JAX, enabling differentiable Navier-Stokes simulations with automatic differentiation for ML-accelerated CFD research, supporting turbulence modeling, convection-diffusion, and complex boundary conditions on CPUs and GPUs (Google Research, 947+ stars)

Active★9483 months ago

Jupyter Notebook

GS1 Web Vocabulary

The initial focus of the GS1 Web Vocabulary is consumer-facing properties for clothing, shoes, food beverage/tobacco and properties common to all products. [from homepage]

Active★504 months ago

Tahoe-x1

Genomics & Bioinformatics

Apache 2.0 single-cell foundation model family scaling to 3B parameters, pretrained on 266M cell profiles including perturbation data and released with training, embedding, and downstream benchmarking workflows for disease-relevant single-cell tasks (2025)

Active★1564 months ago

BiomedParse

Medical AI & Clinical Applications

Foundation model for joint segmentation, detection, and recognition of biomedical objects across nine imaging modalities, with v2 introducing BoltzFormer architecture for end-to-end 3D inference (Microsoft, Nature Methods 2025)

Active★6684 months ago

AlphaGeometry

Domain-Specific Research Agents

DeepMind's Olympiad-level geometry theorem prover combining neural language model with symbolic deduction engine, AlphaGeometry2 solves 84% of IMO geometry problems (42/50) at gold-medalist level (Nature 2024)

Active★4.8K5 months ago

Common Workflow Language

Workflow Managers

a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.

Active★1.5K5 months ago

Common Workflow Language

DNABERT-2 (ICLR 2024)

Genomics & Bioinformatics

Efficient foundation model and benchmark for multi-species genome understanding with context-aware nucleotide representations, improving upon DNABERT for diverse genomic task transfer learning (UIUC MAGICS Lab, 484+ stars)

Active★4885 months ago

PXDesign (ByteDance, 2025)

Protein & Drug Discovery

Fast, modular, and accurate de novo design of protein binders based on the Protenix foundation model, achieving 17-82% nanomolar hit rates across diverse targets with 2-6× improvement over prior methods like AlphaProteo and RFdiffusion (229+ stars, Apache 2.0)

Active★2295 months ago

ai-models (ECMWF)

Climate Modeling

ECMWF's unified framework and command-line tool to run AI-based weather forecasting models (GraphCast, Aurora, Pangu, NeuralGCM, FourCastNet) with operational ECMWF data infrastructure, enabling standardized inference and benchmarking across state-of-the-art meteorological AI systems (ECMWF, 576+ stars)

Active★5795 months ago

OpenFold

Protein & Drug Discovery

Trainable, memory-efficient PyTorch reproduction and retraining of AlphaFold2 providing new insights into its learning dynamics and out-of-distribution generalization; widely used as the open-source AlphaFold2 backbone underpinning many downstream protein structure prediction and design pipelines (Columbia AlQuraishi Lab & OpenFold Consortium, Nature Methods 2024)

Active★3.4K5 months ago

Cell2Sentence

Genomics & Bioinformatics

Teaching Large Language Models the Language of Biology through single-cell transcriptomics (ICML 2024)

Idle★8627 months ago

Jupyter Notebook

cctk

A library for computational chemistry (DFT) for input file generation, data extraction, method screening and analysis.

Idle★227 months ago

MedSegX

Medical AI & Clinical Applications

Generalist foundation model and database for open-world medical image segmentation, enabling universal segmentation of diverse anatomical structures and pathologies with zero-shot generalization to unseen tasks and modalities (Nature Biomedical Engineering 2025)

Idle★868 months ago

Curie

Autonomous Research Systems (2023-2025 Breakthroughs)

Automated and rigorous experiments using AI agents for scientific discovery

Idle★3608 months ago

OpenScholar

Scientific Literature RAG & Analysis

Retrieval-augmented LM synthesizing scientific literature from 45M papers with human-expert-level citation accuracy, outperforming GPT-4o by 5% on ScholarQABench (Nature 2026, UW & Ai2)

Idle★1.5K10 months ago

DPLM (ByteDance, ICML 2024 / ICLR 2025)

Protein & Drug Discovery

Family of diffusion protein language models demonstrating versatile generative and predictive capabilities for protein sequences and structures, including multimodal co-generation, conditional folding, inverse folding, motif scaffolding, and representation learning, with open pretrained weights and training scripts (327+ stars, ICML 2024, ICLR 2025, ICML 2025 Spotlight)

Idle★33510 months ago

ChemMCP

LLM for Chemistry

Extensible chemistry toolkit for MCP-enabled AI assistants, exposing molecule analysis, property prediction, and reaction synthesis tools through unified Python/MCP interfaces for chemistry agents and research workflows (Apache 2.0, 2025)

Idle★651 year ago

REINVENT

Protein & Drug Discovery

Industrial-grade reinforcement-learning-based generative platform for de novo molecular design with transformer architectures, supporting multi-objective optimization, scaffold decoration, and curriculum learning (AstraZeneca MolecularAI, REINVENT 4, 2024)

Archived★3731 year ago

MedSAM

Medical AI & Clinical Applications

Universal medical image segmentation foundation model trained on 1.57M image-mask pairs across 10 imaging modalities and 30+ cancer types (Nature Communications 2024)

Idle★4.3K1 year ago

Jupyter Notebook

paper-reviewer

Scientific Literature RAG & Analysis

Generate comprehensive reviews from arXiv papers and convert to blog posts

Idle★8361 year ago

okn.sd

Idle★31 year ago

Bio-DB-HTS

Sequence analysis

Git repo for Bio::DB::HTS module on CPAN, providing Perl links into HTSlib

Idle★261 year ago

Genie 2

Protein & Drug Discovery

Diffusion model for scalable protein structure design with multi-motif scaffolding capabilities, achieving state-of-the-art designability, diversity, and novelty through SE(3)-equivariant attention and massive data augmentation (AlQuraishi Lab, 2024)

Idle★1921 year ago

smoove

Structural variant callers

structural variant calling and genotyping with existing tools, but,smoothly.

Idle★2641 year ago

Description of a Project

DOAP is a project to create an XML/RDF vocabulary to describe software projects, and in particular open source projects.

Stale★2852 years ago

AutoViz

Data Analysis & Visualization

Automated data visualization with minimal code

Stale★1.9K2 years ago

Awesome AI-based Protein Design

Bioinformatics on GitHub

A collection of research papers for AI-based protein design.

Stale★3062 years ago

Chroma

Protein & Drug Discovery

Generative model for programmable protein design using diffusion modeling, equivariant graph neural networks, and conditional random fields to efficiently sample diverse all-atom structures; supports conditional generation via composable conditioners for substructure, symmetry, shape, and neural-network predictions; validated crystallographically (Generate Biomedicines, Nature 2023)

Stale★8192 years ago

Adverse Outcome Pathway Ontology

The AOPO provides classes and relationships for the semantic representation of the Adverse Outcome Pathway framework.

Stale★132 years ago

Rich Text Format

AlphaMissense

Genomics & Bioinformatics

Google DeepMind's AlphaFold-derived classifier for proteome-wide missense variant effect prediction, providing pathogenicity scores for all ~71M possible human missense variants and classifying 89% with 90% precision; pre-computed predictions are integrated into Ensembl VEP and UCSC Genome Browser to support clinical variant interpretation (Science 2023)

Archived★6332 years ago

cellbaseR

This R package makes use of the exhaustive RESTful Web service API that has been implemented for the Cellabase database. It enable researchers to query and obtain a wealth of biological information from a single database saving a lot of time. Another benefit is that researchers can easily make queries about different biological topics and link all this information together as all information is integrated.

Stale★22 years ago

FunSearch (DeepMind, Nature 2023)

Autonomous Research Systems (2023-2025 Breakthroughs)

First system to make novel, verifiable scientific discoveries by pairing LLMs with evolutionary search, solving open problems in combinatorics (cap set problem) and discovering faster matrix multiplication algorithms

Stale★1.1K2 years ago

Jupyter Notebook

idsa

Archived★762 years ago

DGL-LifeSci

Machine Learning

DGL-LifeSci is a [DGL](https://www.dgl.ai/)-based package for various applications in life science with graph neural network.

Stale★8032 years ago

GNET2

Cluster genes to functional groups with E-M process. Iteratively perform TF assigning and Gene assigning, until the assignment of genes did not change, or max number of iterations is reached.

Stale★23 years ago

sevenbridges

R client and utilities for Seven Bridges platform API, from Cancer Genomics Cloud to other Seven Bridges supported platforms.

Stale★374 years ago

pileup.js

Genome Browsers / Gene Diagrams

JavaScript library that can be used to generate interactive and highly customizable web-based genome browsers.

Stale★2814 years ago

BioJS

Genome Browsers / Gene Diagrams

BioJS is a library of over hundred JavaScript components enabling you to visualize and process data using current web technologies.

Stale★5064 years ago

3D e-Chem Virtual Machine

Virtual Machine

Virtual machine with all software and sample data to run 3D-e-Chem Knime workflows

Stale★177 years ago

Histopathology Ontology

An ontology of histopathological morphologies used by pathologists to classify/categorise animal lesions observed histologically during regulatory toxicology studies. The ontology was developed using real data from over 6000 regulatory toxicology studies donated by 13 companies spanning nine species. The original structure of the histopathology ontology was designed ab initio when the [INHAND](http://www.goreni.org/) manuscripts were not available. However, the ontology has been repetitively reviewed and updated to align with the subsequently published INHAND manuscripts. During this process cross references to INHAND lesion identifiers were added to the ontology. [from GitHub]

Stale★98 years ago

Selventa Chemicals

Selventa legacy chemical namespace used with the Biological Expression Language

Archived★08 years ago

European Nucleotide Archive

geosparql

Gender, Sex, and Sexual Orientation Ontology

Information Resource Registry

The information resource registry is a listing of data sources present in the NCATS Data Translator system. Each information resource has an identifier, a short description, and a URL to more information about that resource.

MetaboLights Compound

DICOM-SEG Annotation

Data quality management

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

Image Duplicates Checker

Data quality management

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]

AMARETTO

StatisticalMethod

Integrating an increasing number of available multi-omics cancer data remains one of the main challenges to improve our understanding of cancer. One of the main challenges is using multi-omics data for identifying novel cancer driver genes. We have developed an algorithm, called AMARETTO, that integrates copy number, DNA methylation and gene expression data to identify a set of driver genes by analyzing cancer samples and connects them to clusters of co-expressed genes, which we define as modules. We applied AMARETTO in a pancancer setting to identify cancer driver genes and their modules on multiple cancer sites. AMARETTO captures modules enriched in angiogenesis, cell cycle and EMT, and modules that accurately predict survival and molecular subtypes. This allows AMARETTO to identify novel cancer driver genes directing canonical cancer pathways.

1
2
3

Submit a resource bio.tools Awesome Bioinformatics