Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active1457
Idle684
Stale570
Archived19
(None)3628

Domain

Software422
ImmunoOncology251
text-generation151
Microarray138
Infrastructure123
GeneExpression117
Sequencing87
Protein & Drug Discovery72
SingleCell72
Visualization61
Genetics52
Annotation51
(None)2389

Language

R2436
Python918
Jupyter Notebook90
HTML50
C++38
Makefile35
C30
Shell29
JavaScript27
Java24
TypeScript22
Perl11
(None)2566

License

MIT713
GPL-3.0653
Artistic-2.0543
CC-BY-4.0262
GPL-2.0255
GPL-2.0+242
Apache-2.0228
NOASSERTION169
CC0-1.0114
GPL-3.0+101
CC-BY-3.079
BSD-3-Clause78
(None)2434

Source

bioregistry2419
bioconductor2418
github2209
huggingface539
awesome-ai-for-science490
bio.tools245
awesome-bioinformatics126
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18
2

Type

Software tool3400
Database2419
AI model539

Filters

Health

Active1457
Idle684
Stale570
Archived19
(None)3628

Domain

Software422
ImmunoOncology251
text-generation151
Microarray138
Infrastructure123
GeneExpression117
Sequencing87
Protein & Drug Discovery72
SingleCell72
Visualization61
Genetics52
Annotation51
(None)2389

Language

R2436
Python918
Jupyter Notebook90
HTML50
C++38
Makefile35
C30
Shell29
JavaScript27
Java24
TypeScript22
Perl11
(None)2566

License

MIT713
GPL-3.0653
Artistic-2.0543
CC-BY-4.0262
GPL-2.0255
GPL-2.0+242
Apache-2.0228
NOASSERTION169
CC0-1.0114
GPL-3.0+101
CC-BY-3.079
BSD-3-Clause78
(None)2434

Source

bioregistry2419
bioconductor2418
github2209
huggingface539
awesome-ai-for-science490
bio.tools245
awesome-bioinformatics126
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18
2

Type

Software tool3400
Database2419
AI model539

6,358 resources indexed

Showing 51–100

doppelgangR

The main function is doppelgangR(), which takes as minimal input a list of ExpressionSet object, and searches all list pairs for duplicated samples. The search is based on the genomic data (exprs(eset)), phenotype/clinical data (pData(eset)), and "smoking guns" - supposedly unique identifiers found in pData(eset).

Active★56 days ago

genzeonplatform/healthcare-brain-procedure-surgery-ner

by genzeonplatform

token-classification

Healthcare Brain Procedure Surgery NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of surgical procedures, diagnostic tests, interventions, and procedural details from unstructured clinical text.

Active↓236 days ago

recursionpharma/nesso

by recursionpharma

Nesso-1 is a fast, structure-based protein–ligand binding-affinity model. Given a protein sequence and a ligand (SMILES / CCD code / SDF), it predicts a binding affinity scalar along with a binder/non-binder score.

Active↓06 days ago

MONAI

Medical AI & Clinical Applications

NVIDIA and King's College London's open-source AI toolkit for healthcare imaging, providing foundational frameworks for medical image annotation (MONAI Label), training (MONAI Core), and deployment (MONAI Deploy) across radiology, pathology, and endoscopy (8K+ stars, Apache 2.0)

Active★8.4K6 days ago

genzeonplatform/healthcare-brain-vitals-ner

by genzeonplatform

token-classification

Healthcare Brain Vitals NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of vital signs, body measurements, and physiological parameters from clinical text.

Active↓246 days ago

genzeonplatform/healthcare-brain-laboratory-ner

by genzeonplatform

token-classification

Healthcare Brain Laboratory NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of laboratory test results, values, units, reference ranges, and abnormality flags from unstructured clinical text.

Active↓236 days ago

genzeonplatform/healthcare-brain-diagnosis-icd-ner

by genzeonplatform

token-classification

Healthcare Brain Diagnosis ICD NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of diagnoses, conditions, and support for ICD-10/SNOMED code mapping from unstructured clinical text.

Active↓336 days ago

genzeonplatform/healthcare-brain-medication-ner

by genzeonplatform

token-classification

Healthcare Brain Medication NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of medication names, dosages, routes, frequencies, and administration details from unstructured clinical text.

Active↓496 days ago

genzeonplatform/healthcare-brain-clinical-findings-ner

by genzeonplatform

token-classification

Healthcare Brain Clinical Findings NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of clinical findings, diseases, conditions, anatomical locations, and clinical modifiers from unstructured clinical text.

Active↓06 days ago

genzeonplatform/healthcare-brain-ner

by genzeonplatform

token-classification

Healthcare Brain NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated detection and de-identification of Protected Health Information (PHI) and Personally Identifiable Information (PII) in unstructured clinical text.

Active↓316 days ago

micymike/vilyalabs-med

by micymike

This model is a fine-tuned version of LiquidAI/LFM2.5-230M optimized for medical chat and consultation.

Active↓456 days ago

JCVI

Sequence assembly

JCVI is a versatile toolkit for comparative genomics analysis. It is a collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

Active★9246 days ago

Principia

Autonomous Research Systems (2023-2025 Breakthroughs)

Principle-first scientific idea discovery framework that extracts reusable principles from public literature and private research materials, composes them into traceable Idea Cards with prior-art comparisons, and exports validation-ready research packs; emphasizes inspectable scientific objects, risk disclosure, and falsification paths (ICML 2026, 411+ stars, MIT License)

Active★4146 days ago

JAX

Machine Learning

High-performance ML research

Active★36K6 days ago

affyPLM

A package that extends and improves the functionality of the base affy package. Routines that make heavy use of compiled code for speed. Central focus is on implementation of methods for fitting probe-level models and tools using these models. PLM based quality assessment tools.

Active★01 week ago

JAX-MD

Machine Learning for Physics

Molecular dynamics in JAX

Active★1.4K1 week ago

Jupyter Notebook

genzeonplatform/cliniguard-laboratory-ner

by genzeonplatform

token-classification

CliniGuard Laboratory NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of laboratory test results, values, units, reference ranges, and abnormality flags from unstructured clinical text.

Active↓01 week ago

genzeonplatform/cliniguard-diagnosis-icd-ner

by genzeonplatform

token-classification

CliniGuard Diagnosis ICD NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of diagnoses, conditions, and support for ICD-10/SNOMED code mapping from unstructured clinical text.

Active↓171 week ago

genzeonplatform/cliniguard-medication-ner

by genzeonplatform

token-classification

CliniGuard Medication NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of medication names, dosages, routes, frequencies, and administration details from unstructured clinical text.

Active↓171 week ago

genzeonplatform/cliniguard-clinical-findings-ner

by genzeonplatform

token-classification

CliniGuard Clinical Findings NER is a transformer-based clinical Named Entity Recognition model developed by Genzeon Platforms for automated extraction of clinical findings, diseases, conditions, anatomical locations, and clinical modifiers from unstructured clinical text.

Active↓01 week ago

Academic Research Skills (ARS)

Interactive Research Environments

Comprehensive Claude Code skill suite covering the full academic pipeline from deep research and paper writing to multi-perspective peer review, revision, and finalization; features multi-agent teams, PRISMA systematic review, style calibration, claim-level citation audits, integrity gates, and human-in-the-loop safeguards (38K+ stars, CC BY-NC 4.0, 2026)

Active★38.4K1 week ago

Jupyter AI (JupyterLab Extension)

Interactive Research Environments

Official Jupyter extension with `%%ai` magic commands and sidebar chat assistant, connecting multiple model providers and local inference

Active★4.3K1 week ago

TabFM (Google Research, 2026)

General Science Models

Scikit-learn compatible tabular foundation model for zero-shot classification and regression on mixed-type tabular datasets via in-context learning; applicable to diverse scientific datasets (1.8K+ stars, Apache 2.0)

Active★1.9K1 week ago

introvoyz041/DrugGen-2

by introvoyz041

text-generation

# DrugGen 2: A disease-aware language model for enhancing drug discovery DrugGen-2 is a disease‑aware language model specialized for generating drug-like SMILES structures based on both disease pathways and protein sequence.

Active↓461 week ago

Gene Expression Analysis Resource

The gEAR portal is a website for visualization and analysis of multi-omic data both in public and private domains.

Active★221 week ago

Jupyter Notebook

PaperQA2

Scientific Literature RAG & Analysis

High-accuracy RAG for scientific PDFs with citation support, agentic RAG, and contradiction detection

Active★8.9K1 week ago

Microbial Ecophysiological Trait and Phenotype Ontology

METPO (Microbial Ecophysiological Trait and Phenotype Ontology) provides standardized terms for describing microbial phenotypes, growth characteristics, and culture conditions. It includes classes for growth media, temperature tolerances, pH tolerances, and relationships like "grows in" and "does not grow in".

Active★11 week ago

redun

Workflow Managers

A python-based workflow manager.

Active★5951 week ago

ATHENA-R1 (Harvard MIMS)

Domain-Specific Research Agents

Reinforcement-learning-trained AI agent for treatment reasoning over a universe of 212 biomedical tools, performing multi-step evidence gathering and spawning parallel reasoning branches to reach evidence-grounded clinical decisions (55+ stars, MIT License, 2026)

Active★561 week ago

PinPath

PinPath enables flexible visualization of (omics) data onto pathways diagrams, allowing users to pinpoint where the relevant changes occur. It supports pathway diagrams from WikiPathways and KEGG, as well as custom GPML and KGML files. Data can be displayed on both native pathway layouts and network representations

Active★81 week ago

Best of Atomistic Machine Learning

Materials Discovery

Curated list of atomistic ML projects for materials science

Active★7021 week ago

BioSimulators

Ecological Modeling

Biological simulation tools

Active★151 week ago

Inflexa

Inflexa is an open-source, agentic orchestration platform for computational biology and translational medicine. It is designed to assist researchers in analyzing multi-omics, cheminformatics, and imaging data by reading published literature, designing multi-step analysis plans, and executing experiments with full reproducibility.

Active★81 week ago

moleculekit

Molecular Visualization

A molecule manipulation library.

Active★2371 week ago

onnx-community/OpenMed-NER-PharmaDetect-SuperClinical-434M-ONNX

by onnx-community

token-classification

This is an ONNX version of OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M. It was automatically converted and uploaded using this Hugging Face Space.

Active↓211 week ago

LiangYan3612/NucleusDiff

by LiangYan3612

NucleusDiff

Active↓01 week ago

Software Package Data Exchange License

Active★6781 week ago

ResearchStudio (Microsoft)

Autonomous Research Systems (2023-2025 Breakthroughs)

AI co-author covering the entire research lifecycle — from an under-specified research direction to a published paper; includes ResearchStudio-Idea for evidence-grounded research ideation and ResearchStudio-Reel for turning finished papers into posters, narrated videos, blogs, and interactive reels; runs as skills on Claude Code and Codex (1.2K+ stars, MIT License, 2026)

Active★1.3K1 week ago

blockclust

Cheminformatics

Galaxy workflow for BlockClust pipeline.

Active★1231 week ago

Zaynoid/QB-Q3-4B-v1-SFT

by Zaynoid

text-generation

Active↓471 week ago

sxkdz/RetroAgent

by sxkdz

text-generation

RetroAgent is a 4B-parameter LLM agent for multi-step retrosynthesis planning. It decomposes a target molecule into commercially available building blocks by searching over an AND-OR graph of molecules and reactions, driven entirely by tool calls.

Active↓411 week ago

AIDE (WecoAI, arXiv 2025)

Autonomous Research Systems (2023-2025 Breakthroughs)

LLM-driven machine learning engineering agent using agentic tree search to autonomously draft, debug and benchmark ML code; wins 4× more medals than the best linear agent on OpenAI's MLE-Bench (75 Kaggle competitions) (1.3K+ stars, MIT License)

Active★1.4K1 week ago

GSVA

FunctionalGenomics

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.

Active★2471 week ago

BiaPy

Medical AI & Clinical Applications

Open-source deep learning toolbox for bioimage analysis providing a unified, configuration-driven framework for 2D/3D semantic segmentation, instance segmentation, classification, denoising, super-resolution, and self-supervised learning; integrates state-of-the-art architectures including U-Net, Vision Transformers, and ConvNeXt, designed for microscopy and biomedical imaging researchers without extensive coding expertise (MIT License, actively maintained)

Active★2031 week ago

Jupyter Notebook

Genozip

A compressor of common genomic file formats (BAM, CRAM, FASTQ, VCF etc).

Active★1851 week ago

Delly

Structural variant callers

Structural variant discovery by integrated paired-end and split-read analysis.

Active★5271 week ago

Platform Material Digital Core Ontology

Active★411 week ago

Jupyter Notebook

PaddleOCR 3.0 (2024/2025)

High-Performance Document Processing

Advanced OCR with PP-StructureV3 document parsing, 13% accuracy improvement, supports 80+ languages

Active★85.8K1 week ago

Agentomics

Autonomous ML experimentation for biomedical data.

Active★261 week ago

micro-sam

Medical AI & Clinical Applications

Segment Anything Model for microscopy: interactive and automatic segmentation of light, electron, and fluorescence microscopy images in 2D and 3D, with domain-specific fine-tuning workflows for scientific imaging (1.5K+ stars)

Active★7001 week ago

Jupyter Notebook

1
2
3
4
128

Submit a resource bio.tools Awesome Bioinformatics