Find open-source science resources

This package contains utility functions used throughout the gDR platform to fit data, manipulate data, and convert and validate data structures. This package also has the necessary default constants for gDR platform. Many of the functions are utilized by the gDRcore package.

Active21 week ago

Artistic-2.0

scConform

Medical AI & Clinical Applications

Builds prediction interval for cell type annotation using conformal inference and conformal risk control. It provides two main methods. The first one gives prediction intervals with coverage guarantees based on standard conformal inference. The second one instead gives hierarchical prediction intervals that are consistent with the cell ontology.

Active71 week ago

Artistic-2.0

MIRA (NeurIPS 2025)

Medical time series foundation model pretrained on 454B time points from heterogeneous clinical corpora spanning ICU physiological signals and hospital EHR, with continuous-time rotary positional encoding, frequency-specialized Mixture-of-Experts, and neural ODE extrapolation for zero-shot forecasting across irregular and multimodal temporal health data (Microsoft, 399+ stars, MIT License)

Active4081 week ago

Interactive Research Environments

Claude Scholar

Semi-automated research assistant for academic research and software development, supporting Claude Code, Codex CLI, Kimi Code CLI, and OpenCode across ideation, coding, experiments, writing, and publication (Galaxy-Dawn, 4.5K+ stars, MIT License, 2026)

Active4.6K1 week ago

epiregulon

SingleCell

Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and observed phenotypes. Epiregulon infers TF activity in single cells by constructing a gene regulatory network (regulons). This is achieved through integration of scATAC-seq and scRNA-seq data and incorporation of public bulk TF ChIP-seq data. Links between regulatory elements and their target genes are established by computing correlations between chromatin accessibility and gene expressions.

Active281 week ago

Galaxy Training Network

Identifiers in the GTN correspond to training materials in various formats (markdown, slides, video). The users can apply learned concepts directly within the framework via galaxy workflows.

Active3661 week ago

HTML

graphein

Machine Learning

Provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks.

Active1.2K1 week ago

Jupyter Notebook

ESM3

Protein & Drug Discovery

98B-parameter frontier generative model jointly reasoning over protein sequence, structure, and function, trained on 2.78 billion proteins; generated a novel fluorescent protein (esmGFP) with only 58% sequence identity to known GFPs (EvolutionaryScale, 2024)

Active2.8K1 week ago

Jupyter Notebook

HuBMAPR

'HuBMAP' provides an open, global bio-molecular atlas of the human body at the cellular level. The `datasets()`, `samples()`, `donors()`, `publications()`, and `collections()` functions retrieves the information for each of these entity types. `*_details()` are available for individual entries of each entity type. `*_derived()` are available for retrieving derived datasets or samples for individual entries of each entity type. Data files can be accessed using `bulk_data_transfer()`.

Active31 week ago

Artistic-2.0

ParmEd

Simulations

Parameter/topology editor and molecular simulator with visualization capability.

Active4521 week ago

ClustIRR

Clustering

ClustIRR analyzes repertoires of B- and T-cell receptors. It starts by identifying communities of immune receptors with similar specificities, based on the sequences of their complementarity-determining regions (CDRs). Next, it employs a Bayesian probabilistic models to quantify differential community occupancy (DCO) between repertoires, allowing the identification of expanding or contracting communities in response to e.g. infection or cancer treatment.

Active51 week ago

igvShiny

This package is a wrapper of Integrative Genomics Viewer (IGV). It comprises an htmlwidget version of IGV. It can be used as a module in Shiny apps.

Active381 week ago

Chemprop

Machine Learning

Directed message passing neural networks for property prediction of molecules and reactions with uncertainty and interpretation.

Active2.4K1 week ago

Ontology for Biomarkers of Clinical Interest

The Ontology for Biomarkers of Clinical Interest (OBCI) formally defines biomarkers for diseases, phenotypes, and effects.

Active11 week ago

efo

Active641 week ago

A vocabulary for the catalysis disciplines

Genomics & Bioinformatics

Voc4Cat is a [SKOS](https://www.w3.org/TR/2009/REC-skos-reference-20090818/) vocabulary for the catalysis disciplines. The vocabulary was created in the [NFDI4Cat](http://www.nfdi4cat.org/) initiative. The first collection of terms was published in June 2023 with a focus on photo catalysis. Our goal is to continuously extend the vocabulary to other areas of catalysis and related disciplines like chemical engineering or materials science.

Active171 week ago

Just

CC0-1.0

OmicVerse

Unified Python framework for bulk, single-cell, and spatial RNA-seq multi-omics analysis with deep learning deconvolution (VAE) and graph neural networks, bridging Bindea, Bindea, scanpy and squidpy ecosystems (Nature Communications 2024)

Active1.1K1 week ago

MicrobiomeProfiler

Microbiome

This is an R/shiny package to perform functional enrichment analysis for microbiome data. This package was based on clusterProfiler. Moreover, MicrobiomeProfiler support KEGG enrichment analysis, COG enrichment analysis, Microbe-Disease association enrichment analysis, Metabo-Pathway analysis.

Active421 week ago

GPL-2.0

NIF Standard Ontology: Neurolex

Active611 week ago

Data Analysis & Visualization

DeepAnalyze

First agentic LLM for autonomous data science with end-to-end pipeline from data to analyst-grade reports

Active4.3K1 week ago

Chemical Entity Materials and Reactions Ontological Framework

A data model for managing information about chemical entities, ranging from atoms through molecules to complex mixtures.

Active231 week ago

CC0-1.0

scToppR

Pathways

scToppR provides an easy-to-use API wrapper for the ToppGene web platform, used for gene ontology and functional enrichment research. The package also integrates visualization tools, making it a convenient tool directly connecting ToppGene to code-based workflows in R. The tool can also easily save results into different formats.

Active71 week ago

Medical AI & Clinical Applications

nnU-Net

Self-configuring deep learning framework for semantic segmentation of biomedical images requiring no manual hyperparameter tuning; automatically adapts preprocessing, network topology, and training parameters to achieve state-of-the-art results across 120+ international competitions and benchmarks out-of-the-box (DKFZ, Nature Methods 2021, 8.3k+ stars)

Active8.6K1 week ago

Battery Interface Ontology

Active561 week ago

pride

Active31 week ago

Data identity and mapping

EUCAIM ETL toolset

Modular toolchain for an extensible and customizable ETL pipeline that extracts, transforms, and loads clinical data and medical imaging metadata, applying dataset-specific mappings to generate outputs compatible with the EUCAIM Common Data Model (CDM). Its design aims to minimize manual data preparation efforts and facilitate customization and integration with other components, such as data quality assurance tools. Containerized, currently supports input datasets in CSV, JSON, XLSX.

Active11 week ago

Chai-1

Protein & Drug Discovery

Multi-modal foundation model for biomolecular structure prediction (proteins, small molecules, DNA, RNA, glycans) achieving SOTA across benchmarks, with optional MSA/template support (Chai Discovery, 2024)

Active2K1 week ago

Moonlight2R

DNAMethylation

The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). We present an updated version of the R/bioconductor package called MoonlightR, namely Moonlight2R, which returns a list of candidate driver genes for specific cancer types on the basis of omics data integration. The Moonlight framework contains a primary layer where gene expression data and information about biological processes are integrated to predict genes called oncogenic mediators, divided into putative tumor suppressors and putative oncogenes. This is done through functional enrichment analyses, gene regulatory networks and upstream regulator analyses to score the importance of well-known biological processes with respect to the studied cancer type. By evaluating the effect of the oncogenic mediators on biological processes or through random forests, the primary layer predicts two putative roles for the oncogenic mediators: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As gene expression data alone is not enough to explain the deregulation of the genes, a second layer of evidence is needed. We have automated the integration of a secondary mutational layer through new functionalities in Moonlight2R. These functionalities analyze mutations in the cancer cohort and classifies these into driver and passenger mutations using the driver mutation prediction tool, CScape-somatic. Those oncogenic mediators with at least one driver mutation are retained as the driver genes. As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, Moonlight2R can be used to discover OCGs and TSGs in the same cancer type. This may for instance help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV). In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments. An additional mechanistic layer evaluates if there are mutations affecting the protein stability of the transcription factors (TFs) of the TSGs and OCGs, as that may have an effect on the expression of the genes.

Active51 week ago

Common Core Ontologies

The Common Core Ontologies (CCO) comprise twelve ontologies that are designed to represent and integrate taxonomies of generic classes and relations across all domains of interest. CCO is a mid-level extension of Basic Formal Ontology (BFO), an upper-level ontology framework widely used to structure and integrate ontologies in the biomedical domain (Arp, et al., 2015). BFO aims to represent the most generic categories of entity and the most generic types of relations that hold between them, by defining a small number of classes and relations. CCO then extends from BFO in the sense that every class in CCO is asserted to be a subclass of some class in BFO, and that CCO adopts the generic relations defined in BFO (e.g., has_part) (Smith and Grenon, 2004). Accordingly, CCO classes and relations are heavily constrained by the BFO framework, from which it inherits much of its basic semantic relationships.

Active3471 week ago

Autonomous Research Systems (2023-2025 Breakthroughs)

BSD-3-Clause

Arbor

Generalist autonomous research agent that grows a hypothesis tree to optimize any measurable task, beating Claude Code and Codex by 2.5× on the same compute budget across BrowseComp, Terminal-Bench 2.0, math reasoning, and MLE-Bench Lite; supports native CLI, keyless Claude Code/Codex integration, and an MCP tool server (RUC-NLPIR, 866+ stars, Apache 2.0, 2026)

Active8701 week ago

Data Labeling & Annotation

Label Studio

Multi-type data labeling and annotation tool

Active27.7K1 week ago

TypeScript

Babel

Hand-curated Snakemake pipelines to combine identifier cross-references from multiple sources across dozens of biomedical types, including anatomical entities, diseases and phenotypes, genes and proteins and many others.

Active141 week ago

Genomics & Bioinformatics

State (Arc Institute, bioRxiv 2025)

Machine learning model predicting cellular perturbation response across diverse contexts with State Transition (ST) and State Embedding (SE) variants, featuring CLI tooling, PyPI distribution, and Virtual Cell Challenge integration (575+ stars)

Active6091 week ago

Gene Ontology Issue Tracker

An issue on the Gene Ontology GitHub issue tracker

Active2441 week ago

Makefile

HTMD

Simulations

High-Throughput Molecular Dynamics: Programming Environment for Molecular Discovery.

Active2741 week ago

Rich Text Format

DataCite Ontology

An ontology that enables the metadata properties of the DataCite Metadata Schema Specification (i.e., a list of metadata properties for the accurate and consistent identification of a resource for citation and retrieval purposes) to be described in RDF.

Active41 week ago

XSLT

Document Components Ontology

An ontology that provides a structured vocabulary written of document components, both structural (e.g., block, inline, paragraph, section, chapter) and rhetorical (e.g., introduction, discussion, acknowledgements, reference list, figure, appendix).

Active161 week ago

Citation Typing Ontology

An ontology that enables characterization of the nature or type of citations, both factually and rhetorically.

Active161 week ago

PlantCV

Agricultural AI

Open-source image analysis toolkit for high-throughput plant phenotyping, extracting morphological, color, and texture traits from RGB, hyperspectral, and thermal imagery with modular Python workflows for crop improvement, stress detection, and plant biology research (Donald Danforth Plant Science Center, 795+ stars, MPL-2.0)

Active8031 week ago

Medical AI & Clinical Applications

MPL-2.0

QuPath

Open-source bioimage analysis platform for digital pathology and research, featuring AI-powered cell detection, tissue classification, and whole-slide image analysis with extensible scripting and plugin architecture (1.3K+ stars, actively maintained)

Active1.4K1 week ago

Java

cfDNAPro

Visualization

cfDNA fragments carry important features for building cancer sample classification ML models, such as fragment size, and fragment end motif etc. Analyzing and visualizing fragment size metrics, as well as other biological features in a curated, standardized, scalable, well-documented, and reproducible way might be time intensive. This package intends to resolve these problems and simplify the process. It offers two sets of functions for cfDNA feature characterization and visualization.

Active431 week ago

Domain-Specific Research Agents

ClawBio

First bioinformatics-native AI agent skill library enabling local-first, reproducible genomic and population-genetics research workflows built on OpenClaw (871+ stars, MIT License, 2026)

Active1K1 week ago

High-Performance Document Processing

OpenDataLoader PDF (OpenDataLoader, 2025)

Open-source PDF parser for AI-ready data, converting PDFs into Markdown/JSON/HTML/Tagged PDF with layout analysis and reading-order detection; ranks #1 overall on extraction benchmarks with deterministic bounding boxes and hybrid AI mode (26K+ stars, Apache 2.0)

Active26.3K1 week ago

Java

Remote Sensing & Geospatial AI

TerraMind (IBM & ESA, 2025)

First any-to-any generative foundation model for Earth Observation, enabling unified multimodal understanding and generation across diverse satellite sensors and geospatial tasks through a single architecture (258+ stars)

Active2811 week ago

Jupyter Notebook

extraChIPs

ChIPSeq

This package builds on existing tools and adds some simple but extremely useful capabilities for working wth ChIP-Seq data. The focus is on detecting differential binding windows/regions. One set of functions focusses on set-operations retaining mcols for GRanges objects, whilst another group of functions are to aid visualisation of results. Coercion to tibble objects is also implemented.

Active71 week ago

Slides & Presentation Generation

PPTAgent

Beyond text-to-slides generation with PPTEval multi-dimensional evaluation (EMNLP 2025)

Active4.7K1 week ago

matbench-discovery

Force Fields

A benchmark for ML-guided high-throughput materials discovery.

Active2371 week ago

Indigo

General Purpose

Universal molecular toolkit that can be used for molecular fingerprinting, substructure search, and molecular visualization written in C++ package, with Java, C#, and Python wrappers.

Active3981 week ago

C++

Autonomous Research Systems (2023-2025 Breakthroughs)

ARIS (Auto-Research-In-Sleep)

Lightweight Markdown-only skills for autonomous ML research with cross-model review loops, idea discovery, and experiment automation; no framework lock-in, works with Claude Code, Codex, OpenClaw, or any LLM agent (12.8K+ stars, MIT License, 2026)

Active12.8K1 week ago