Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active1393
Idle649
Stale548
Archived19
(None)3614

Domain

Software422
ImmunoOncology251
Microarray138
Infrastructure123
text-generation122
GeneExpression117
Sequencing85
Protein & Drug Discovery72
SingleCell72
Visualization61
Genetics52
Annotation51
(None)2371

Language

R2432
Python845
Jupyter Notebook89
HTML49
Makefile35
C++34
C29
JavaScript29
Java24
Shell24
TypeScript15
Perl9
(None)2529

License

MIT697
GPL-3.0655
Artistic-2.0550
CC-BY-4.0261
GPL-2.0254
GPL-2.0+245
Apache-2.0220
NOASSERTION160
CC0-1.0114
GPL-3.0+98
CC-BY-3.079
BSD-3-Clause76
(None)2335

Source

bioregistry2419
bioconductor2418
github2168
awesome-ai-for-science471
huggingface459
bio.tools209
awesome-bioinformatics126
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18
2

Type

Software tool3345
Database2419
AI model459

Filters

Health

Active1393
Idle649
Stale548
Archived19
(None)3614

Domain

Software422
ImmunoOncology251
Microarray138
Infrastructure123
text-generation122
GeneExpression117
Sequencing85
Protein & Drug Discovery72
SingleCell72
Visualization61
Genetics52
Annotation51
(None)2371

Language

R2432
Python845
Jupyter Notebook89
HTML49
Makefile35
C++34
C29
JavaScript29
Java24
Shell24
TypeScript15
Perl9
(None)2529

License

MIT697
GPL-3.0655
Artistic-2.0550
CC-BY-4.0261
GPL-2.0254
GPL-2.0+245
Apache-2.0220
NOASSERTION160
CC0-1.0114
GPL-3.0+98
CC-BY-3.079
BSD-3-Clause76
(None)2335

Source

bioregistry2419
bioconductor2418
github2168
awesome-ai-for-science471
huggingface459
bio.tools209
awesome-bioinformatics126
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18
2

Type

Software tool3345
Database2419
AI model459

6,223 resources indexed

Showing 1–50

alimotahharynia/DrugGen-2

by alimotahharynia

text-generation

# DrugGen 2: A disease-aware language model for enhancing drug discovery DrugGen-2 is a disease‑aware language model specialized for generating drug-like SMILES structures based on both disease pathways and protein sequence.

Active↓1821 day ago

figtracer

Plain-text, git-tracked electronic lab notebook (ELN) for reproducible bioinformatics — threads your R & Python figures into living lab notes with full provenance. Built for single-cell / CyTOF / flow cytometry; works with Obsidian, Quarto & Jupyter.

Active★01 day ago

FAIR Cookbook

Active★1491 day ago

NVIDIA Earth-2

Climate Modeling

World's first fully open, accelerated weather AI software stack with Medium Range forecasting and Nowcasting models using generative AI (January 2026)

Active★1K1 day ago

napari

Medical AI & Clinical Applications

Fast, interactive, multi-dimensional image viewer for Python, foundational platform for scientific imaging AI with a rich plugin ecosystem integrating deep learning segmentation, object tracking, and microscopy analysis workflows (2.6K+ stars)

Active★2.7K1 day ago

nvMolKit (NVIDIA BioNeMo, 2025)

Protein & Drug Discovery

High-performance, GPU-accelerated library for key computational chemistry tasks including molecular similarity, conformer generation, and geometry relaxation, designed to accelerate drug-discovery and molecular-modeling workflows (264+ stars, Apache 2.0)

Active★2641 day ago

OpenDDE (Aureka Research, 2026)

Protein & Drug Discovery

Open-source, all-atom biomolecular foundation model that turns co-folding into a scalable engine for structure prediction, design, and optimization across proteins, nucleic acids, and small molecules in drug discovery; ranked first on PXMeter-AB, FoldBench-AB, and 2026ARK-AB antibody-antigen benchmarks (263+ stars, Apache 2.0)

Active★2712 days ago

UniParser/MolParser-Mobile

by UniParser

💻 Github | 📄 Report (Coming soon...) | 🚀 Demo

Active↓512 days ago

Jackrong/Qwopus3.5-27B-v3.5-GGUF

by Jackrong

image-text-to-text

!image

Active↓2.8K2 days ago

OpenScience (Synthetic Sciences)

Interactive Research Environments

Open-source AI workbench for scientific research that automates the full research loop — literature review, hypothesis generation, code writing, experiment execution, database querying, and report writing — with 290+ skills, specialized research agents, and a browser-based workspace (1453+ stars, Apache 2.0, 2026)

Active★1.8K2 days ago

Scientific Agent Skills

Research Workbench & Plugins

Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science with 140+ ready-to-use skills and 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Antigravity, and the open Agent Skills standard (K-Dense-AI, 26K+ stars, 2025)

Active★30.6K2 days ago

reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B

by reaperdoesntknow

text-generation

A 1.7B-parameter causal language model distilled from Qwen3-30B-A3B on 6,122 STEM chain-of-thought samples using discrepancy-informed knowledge distillation. The training objective emphasizes proof structure, detects reasoning pivot tokens through token-level divergence dynamics, smooths…

Active↓1.4K2 days ago

WeatherBench2

Climate Modeling

Next-generation benchmark for data-driven global weather models with standardized evaluation framework and curated datasets for ML forecasting (Google Research, 2024)

Active★6233 days ago

Gene Ontology Rules

GO Rules are a way of documenting the set of filters and reports that should apply to GAF annotation data. Some rules are expressed as SPARQL on a triplestore, some are code in the GAF parsing software, ontobio.

Active★493 days ago

King3Djbl/nexus-medical-GGUF

by King3Djbl

text-generation

> NEXUS domain specialist for medical Q&A and clinical reasoning — lightweight & uncensored.

Active↓3.1K3 days ago

nilearn

Neuroscience & Behavioral Analysis

Machine learning and statistical learning for neuroimaging in Python, providing easy-to-use tools for fMRI and MRI analysis including decoding, connectivity estimation, and parcellation with seamless scikit-learn integration (INRIA Parietal team, 1.4K+ stars)

Active★1.4K3 days ago

fableforge-ai/NEXUS-Medical

by fableforge-ai

text-generation

> NEXUS domain specialist for medical Q&A and clinical reasoning — lightweight & uncensored.

Active↓1.5K3 days ago

MACE

Materials Discovery

Machine learning interatomic potentials

Active★1.3K3 days ago

overreact

A library and command-line tool for building and analyzing complex homogeneous microkinetic models from quantum chemistry calculations, with support for quasi-harmonic thermochemistry, quantum tunnelling corrections, molecular symmetries and more.

Active★643 days ago

MNE

MEG and EEG.

Active★3.5K4 days ago

MassSpecWavelet

Peak Detection in Mass Spectrometry data is one of the important preprocessing steps. The performance of peak detection affects subsequent processes, including protein identification, profile alignment and biomarker identification. Using Continuous Wavelet Transform (CWT), this package provides a reliable algorithm for peak detection that does not require any type of smoothing or previous baseline correction method, providing more consistent results for different spectra. See <doi:10.1093/bioinformatics/btl355} for further details.

Active★114 days ago

Pyscf

A quantum chemistry package written in Python.

Active★774 days ago

BioNeMo Framework

Domain-Specific Models

NVIDIA's open-source platform for building and adapting biological AI models at scale, bundling ESM-2, Geneformer, MolMIM and DNA embedding models with recipes for single-GPU to multi-node training (2025)

Active★8184 days ago

SeqVarTools

An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.

Active★34 days ago

cantera

A collection of object-oriented software tools for problems involving chemical kinetics, thermodynamics, and transport processes.

Active★8204 days ago

CellRank

Genomics & Bioinformatics

Probabilistic framework for inferring cell fate decisions and trajectory dynamics from multi-view single-cell data using Markov chains and machine learning, integrating RNA velocity, pseudotime, and metabolic labeling to predict differentiation paths and terminal states (scverse/Theis Lab, 449+ stars, BSD 3-Clause)

Active★4544 days ago

Tesseract Core (Pasteur Labs, SciPy 2025 / JOSS)

Scientific Machine Learning Frameworks

Universal components for differentiable scientific computing, packaging heterogeneous scientific tools into self-contained, portable, gradient-propagating components with auto-generated schemas, CLI/REST API/Python SDK interfaces, and reproducible deployment across local, cloud, and HPC environments (105+ stars, Apache 2.0)

Active★1054 days ago

genesisml/decaf

by genesisml

Distilling Boltz: Flow Maps for Fast All-Atom Cofolding

Active↓04 days ago

seqlib

Molecular genetics

seqlib is a type-safe Rust library for working with DNA and RNA sequences.

Active★04 days ago

NeuroAI (Meta FAIR)

Neuroscience & Behavioral Analysis

Modular Python suite for Neuro-AI research across all modalities, providing efficient data loaders (NeuralSet), curated datasets (NeuralFetch), scalable training (NeuralTrain), and unified benchmarking (NeuralBench) for building and evaluating neuroscience foundation models (Meta FAIR, 270+ stars, MIT License, 2026)

Active★2725 days ago

OpenProteo

OpenProteo is the open-source Rust stack for proteomics raw-file access. It reads Thermo, Bruker, and Waters acquisitions through a single API (via the sibling OpenTFRaw, OpenTimsTDF, and OpenWRaw readers), converts them to PSI-MS mzML 1.1.0 with a canonical writer, and provides a zero-copy read_arrow() API (enabled by default) that loads directly into Polars or Pandas via PyArrow. No vendor SDKs, no Windows-only DLLs, no binary blobs in the release pipeline. Includes a one-shot vendor2mzml CLI.

Active★45 days ago

OpenWRaw

OpenWRaw is a standalone, cross-platform reader for Waters MassLynx .raw acquisition directories, implemented in pure Rust with no dependency on vendor DLLs. Python bindings built on PyO3 expose functions, scans, and ion-mobility data as native Python objects from Waters QTof and SYNAPT instrument families, ready to be assembled into a Pandas or Polars DataFrame.

Active★35 days ago

OpenTimsTDF

OpenTimsTDF is a standalone, cross-platform reader for Bruker timsTOF .tdf and .tdf_bin acquisition files, implemented in pure Rust with no dependency on vendor SDKs. Python bindings built on PyO3 expose frame, scan, and peak data as native Python objects, providing ion-mobility-aware access that can be assembled into a Pandas or Polars DataFrame.

Active★25 days ago

OpenTFRaw

OpenTFRaw is a standalone, cross-platform reader for Thermo Fisher Scientific .raw mass-spectrometry files, implemented in pure Rust with no dependency on vendor DLLs or .NET. Python bindings built on PyO3 return NumPy arrays for spectral data, straightforward to load into Pandas or Polars. Covers format versions 8 through 66 (LCQ Classic through Orbitrap Astral and modern TSQ instruments), supporting both centroid and profile spectra.

Active★105 days ago

Zaynoid/Qwen3.5-80B-A10B-medical-reap-v2

by Zaynoid

image-text-to-text

Active↓1395 days ago

OpenEvolve

Autonomous Research Systems (2023-2025 Breakthroughs)

Open-source implementation of AlphaEvolve's evolutionary coding agent paradigm, enabling LLMs to autonomously discover and optimize algorithms through iterative evolution, matching the approach behind DeepMind's breakthrough matrix multiplication discovery (6.2K+ stars, 2025)

Active★6.7K6 days ago

open-science

Interactive Research Environments

Local-first, open-source AI workbench for scientists — an open alternative to Claude Science (by ai4s-research, maintainers of this list; TypeScript, MIT, 2026)

Active★856 days ago

AgentSociety

Social Science Research & Simulation

Modern LLM-native agent simulation platform for social science research and experimental design, providing a flexible framework for creating and managing intelligent agents in simulated environments (Tsinghua FIB Lab, 984+ stars, 2025)

Active★1.1K6 days ago

Claude Prism

Scientific Writing & Collaboration

Offline-first scientific writing workspace powered by Claude, integrating LaTeX, Python, and 100+ scientific skills with local execution, Zotero integration, and privacy-focused design (2026)

Active★1.7K1 week ago

Rarr

The Zarr specification defines a format for chunked, compressed, N-dimensional arrays. It's design allows efficient access to subsets of the stored array, and supports both local and cloud storage systems. Rarr aims to implement this specification in R with minimal reliance on an external tools or libraries.

Active★541 week ago

Prior-Labs/tabpfn_3

by Prior-Labs

tabular-classification

### Model Overview TabPFN-3 is a transformer-based foundation model that uses in-context-learning to solve tabular prediction problems in a forward pass. Inference code can be found at https://github.com/PriorLabs/TabPFN. More details can be found in the Model Report.

Active↓11K1 week ago

ARA (Agent-Native Research Artifact)

Autonomous Research Systems (2023-2025 Breakthroughs)

Research ecosystem for rigorous and trustworthy AI scientists — a protocol and skill bundle that makes autonomous research verifiable, crystallized, and observable through structured, machine-executable research artifacts and five agent skills for research management, compilation, verification, visualization, and publication (ARA-Labs, 447+ stars, MIT License, 2026)

Active★4501 week ago

gevaertlab/diffusiongemma-radiology-vqa

by gevaertlab

image-text-to-text

This repository contains LoRA finetunes of DiffusionGemma (image-conditioned discrete-diffusion LLM) for radiology visual question answering, each paired with an autoregressive Gemma-4 finetune as a controlled baseline. It corresponds to the paper Discrete Diffusion Language Models for Interactive…

Active↓01 week ago

HPO Explorer

Genotype and phenotype

Interactive browser for the complete Human Phenotype Ontology (~19,800 terms), with a graph-based term explorer and a clinical profile analyzer for phenotype similarity, differential diagnosis, and gene prioritization.

Active★01 week ago

NVIDIA PhysicsNeMo

Physics-Informed Neural Networks

Open-source framework for building physics-ML models at scale (renamed from Modulus, 2025)

Active★3K1 week ago

markeR

markeR is an R package that provides a modular and extensible framework for the systematic evaluation of gene sets as phenotypic markers using transcriptomic data. The package is designed to support both quantitative analyses and visual exploration of gene set behaviour across experimental and clinical phenotypes. It implements multiple methods, including score-based and enrichment approaches, and also allows the exploration of expression behaviour of individual genes. In addition, users can assess the similarity of their own gene sets against established collections (e.g., those from MSigDB), facilitating biological interpretation.

Active★101 week ago

Newton

Specialized Frameworks

GPU-accelerated differentiable physics simulation engine built on NVIDIA Warp, supporting rigid/soft body, cloth, and gradient-based optimization for scientific ML, initiated by Disney Research, DeepMind, and NVIDIA (Linux Foundation, Apache 2.0, 2025)

Active★5.2K1 week ago

mosaic

Protein & Drug Discovery

Composite-objective protein design framework integrating Boltz, AlphaFold2, OpenFold3, ProteinMPNN, and ESM via JAX-based gradient optimization over continuous relaxed sequence space for multi-property binder design (319+ stars, MIT License, 2025)

Active★3451 week ago

Awesome LLM Scientific Discovery

📋 Paper Collections & Repositories

LLM papers for scientific discovery

Active★3971 week ago

TRIDENT (2025)

Computational Pathology & Digital Pathology

Toolkit for large-scale whole-slide image processing supporting 22+ patch encoders (UNI, CONCH, Virchow, H-Optimus-0, etc.), slide encoders (TITAN, GigaPath, PRISM, CHIEF, Madeleine, Feather), tissue segmentation, and multi-GPU inference with end-to-end pipeline and smart resume for standardized deployment of computational pathology foundation models (Mahmood Lab, Harvard Medical School, 553+ stars)

Active★6001 week ago

← Prev

1
2
3
125

Submit a resource bio.tools Awesome Bioinformatics