Find open-source science resources

A Workflow Management System geared towards scientific workflows.

Active1.1K1 week ago

Scala

miaViz

Microbiome

The miaViz package implements functions to visualize TreeSummarizedExperiment objects especially in the context of microbiome analysis. Part of the mia family of R/Bioconductor packages.

Active121 week ago

Artistic-2.0

MsBackendMetaboLights

Infrastructure

Simulation-Based Inference

MetaboLights is one of the main public repositories for storage of metabolomics experiments, which includes analysis results as well as raw data. The MsBackendMetaboLights package provides functionality to retrieve and represent mass spectrometry (MS) data from MetaboLights. Data files are downloaded and cached locally avoiding repetitive downloads. MS data from metabolomics experiments can thus be directly and seamlessly integrated into R-based analysis workflows with the Spectra and MsBackendMetaboLights package.

Active21 week ago

Artistic-2.0

sbi

Python package for simulation-based inference enabling likelihood-free Bayesian parameter estimation from scientific simulators, with flexible interfaces for neural posterior estimation, sequential methods, and MCMC/variational backends (Mackelab, 825+ stars)

Active8471 week ago

propka

General Chemistry

Predicts the pKa values of ionizable groups in proteins and protein-ligand complexes based in the 3D structure.

Active3651 week ago

LGPL-2.1

GAIn (Genomic Annotation Infrastructure)

Genomics

GAIn is a platform for annotating genetic variants, genomic positions, and regions with reproducible, declarative pipelines using curated Genomic Resource Repositories.

Active01 week ago

atomate2

Simulations

atomate2 is a library of computational materials science workflows.

Active3291 week ago

MSstatsResponse

Proteomics

Domain-Specific Research Agents

Tools for detecting drug-protein interactions and estimating IC50 values from chemoproteomics data. Implements semi-parametric isotonic regression, bootstrapping, and curve fitting to evaluate compound effects on protein abundance.

Active11 week ago

Artistic-2.0

Biomni

General-purpose biomedical AI agent integrating LLM reasoning with retrieval-augmented planning and code-based execution to autonomously execute diverse biomedical research tasks and generate testable hypotheses (Stanford SNAP, bioRxiv 2025)

Active3.4K1 week ago

Genomics & Bioinformatics

Dorado

Oxford Nanopore's official deep-learning basecaller for nanopore sequencing, converting raw electrical signals into DNA/RNA sequences with integrated modified-base (methylation) detection and efficient CPU/GPU inference; foundational tool for long-read genomics, epigenetics, and real-time sequencing analysis (nanoporetech, 846+ stars, actively maintained)

Active8471 week ago

C++

Autonomous Research Systems (2023-2025 Breakthroughs)

EvoScientist

Self-evolving AI scientist with 6 specialized sub-agents (plan/research/code/debug/analyze/write) and persistent memory, #1 on DeepResearch Bench II and AstaBench, supporting multi-provider LLMs and multi-channel deployment (Apache 2.0, 2026)

Active4.2K1 week ago

Neuroscience & Behavioral Analysis

DeepLabCut

Markerless pose estimation of user-defined features with deep learning for all animals including humans, enabling quantitative behavioral analysis in neuroscience and ethology (Nature Neuroscience 2018, 5.6K+ stars)

Active5.7K1 week ago

LGPL-3.0

ChemML

Machine Learning

ChemML is a machine learning and informatics program suite for the analysis, mining, and modeling of chemical and materials data. (based on Tensorflow)

Active1781 week ago

TranscriptoScope

Transcriptomics

Local Windows-friendly R Shiny application for RNA-seq differential expression using DESeq2, normalized-expression testing, over-representation analysis, fgsea-ranked pathway analysis, and WGCNA coexpression-network analysis. It supports input validation, additive and interaction designs, built-in human, fruit-fly, and yeast annotations, publication-quality plots, and reproducibility bundles containing results, settings, and executable R and R Markdown rerun code.

Active11 week ago

bcftools

Variant Calling

samtools/bcftools are a suite of tools for manipulating NGS data and can be used to call variants.

Active8801 week ago

NeuralForceField

Force Fields

Neural Network Force Field based on PyTorch.

Active2931 week ago

Jupyter Notebook

AlpsNMR

Software

Reads Bruker NMR data directories both zipped and unzipped. It provides automated and efficient signal processing for untargeted NMR metabolomics. It is able to interpolate the samples, detect outliers, exclude regions, normalize, detect peaks, align the spectra, integrate peaks, manage metadata and visualize the spectra. After spectra proccessing, it can apply multivariate analysis on extracted data. Efficient plotting with 1-D data is also available. Basic reading of 1D ACD/Labs exported JDX samples is also available.

Active171 week ago

GeoAI

Climate Modeling

High-level open-source geospatial AI package for satellite/aerial imagery analysis, model training, inference, interactive visualization, and QGIS integration, bridging PyTorch/Transformers with remote sensing workflows (MIT, 2026)

Active3.2K1 week ago

Physics-Informed Neural Networks

NeuralPDE.jl

Physics-informed neural networks in Julia

Active1.2K2 weeks ago

Julia

FlowVision

Cytometry

FlowVision is offline flow cytometry analysis software for Windows and macOS. It supports FCS 2.0, 3.0, 3.1 and 3.2 file formats, polygon/rectangle/ellipse/quadrant gating with auto-fit (snap to cluster), spillover compensation, biexponential and hyperlog scales, MFI statistics (median, geometric mean, CV%), multi-file batch analysis with per-file gate overrides, and hierarchical gating. Spectral unmixing supports linear, NNLS, and Poisson-weighted least squares algorithms, with autofluorescence extraction and spillover spreading matrix. UMAP dimensionality reduction with reproducible seed and landmark mode for high-parameter panels. Imports FlowJo .wsp (compensation matrix) and exports gates to FlowJo .wsp and Gating-ML 2.0 (ISAC open standard) for interoperability with FlowJo, R/flowWorkspace/CytoML, and FCS Express.

Active02 weeks ago

iofcore

Active1362 weeks ago

Ruby

HIBAG

Genetics

Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.

Active312 weeks ago

GPL-3.0

Quantities, Units, Dimensions, and Types Ontology

Ontologies that aim to provide semantic specifications for units of measure, quantity kind, dimensions and data types.

Active1582 weeks ago

HTML

litert-community/MedGemma-1.5-4B-IT

by litert-community

Active4682 weeks ago

dron

Active102 weeks ago

Makefile

CC-BY-4.0

SeqAn

Package suites

The modern C++ library for sequence analysis.

Active4582 weeks ago

C++

SPALN

Mapping

Genome mapping and spliced alignment of cDNA or amino acid sequences

Active1142 weeks ago

C++

GPL-2.0

TeachOpenCADD

Courses

A teaching platform for computer-aided drug design (CADD) using open source packages and data.

Active1K2 weeks ago

Jupyter Notebook

CC-BY-4.0

VIVO Ontology

An ontology about scholarship

Active162 weeks ago

Unlicense

BiocCheck

Infrastructure

Medical AI & Clinical Applications

BiocCheck guides maintainers through Bioconductor best practicies. It runs Bioconductor-specific package checks by searching through package code, examples, and vignettes. Maintainers are required to address all errors, warnings, and most notes produced.

Active102 weeks ago

VoxTell (MIC-DKFZ, 2025)

Free-text promptable universal 3D medical image segmentation foundation model enabling zero-shot segmentation of diverse anatomical structures and pathologies via natural language prompts across CT, MRI, and other volumetric imaging modalities (DKFZ, 195+ stars, Apache 2.0)

Active2422 weeks ago

mlinslab/neurovfm-llm

by mlinslab

image-text-to-text

Active252 weeks ago

mlinslab/neurovfm-encoder

by mlinslab

image-feature-extraction

Active6462 weeks ago

edm.fibo

Active6132 weeks ago

Shell

Neuroscience & Behavioral Analysis

braindecode

Deep learning software to decode EEG, ECG or MEG signals, providing standardized neural network models, preprocessing pipelines, and evaluation workflows for brain-computer interfaces and cognitive neuroscience research (1.2K+ stars, BSD 3-Clause, actively maintained)

Active1.3K2 weeks ago

High-Performance Document Processing

MinerU (2024/2025)

SOTA multimodal document parsing with 1.2B parameters outperforming GPT-4o, converts PDFs to LLM-ready Markdown/JSON

Active74.3K2 weeks ago

HarshBhanushali7705/medgemma-27b-text-it-GPTQ-4bit

by HarshBhanushali7705

This repository contains a 4-bit GPTQ quantized version of google/medgemma-27b-text-it, optimized for high-throughput inference using vLLM and the Marlin kernel.

Active3462 weeks ago

alimotahharynia/DrugGen-2

by alimotahharynia

# DrugGen 2: A disease-aware language model for enhancing drug discovery DrugGen-2 is a disease‑aware language model specialized for generating drug-like SMILES structures based on both disease pathways and protein sequence.

Active6802 weeks ago

CladeTeam/CENO-P-1B

by CladeTeam

CENO-P-1B is the multi-species alignment (MSA) post-trained variant of the 1B CENO DNA foundation model, for variant effect prediction (VEP). It carries intraencodingpattern in its config and ships the MSA scoring path (modelingcenop.py), which consumes a per-token seq_idx to score packed MSA…

Active4562 weeks ago

CladeTeam/CENO-1B-131k

by CladeTeam

CENO-1B-131k is the long-context (131k) checkpoint of the 1B CENO DNA foundation model — a causal language model over genomic sequence built on a Nemotron-H Mamba / Attention / Mixture-of-Experts hybrid backbone (no MSA inputs).

Active4522 weeks ago

CladeTeam/CENO-80M-1m

by CladeTeam

CENO-80M-1m is the long-context (1M) checkpoint of the 80M CENO DNA foundation model — a causal language model over genomic sequence built on a Nemotron-H Mamba / Attention / Mixture-of-Experts hybrid backbone (no MSA inputs).

Active3942 weeks ago

figtracer

Workflows

Plain-text, git-tracked electronic lab notebook (ELN) for reproducible bioinformatics — threads your R & Python figures into living lab notes with full provenance. Built for single-cell / CyTOF / flow cytometry; works with Obsidian, Quarto & Jupyter.

Active02 weeks ago

Interactive Research Environments

K-Dense BYOK

Free, open-source desktop AI research assistant that runs locally and turns natural-language requests into real data analysis, literature search, figure generation, and manuscript review; ships with 149 scientific skills, 326 workflow templates, and 229 databases across genomics, proteomics, drug discovery, and materials science, plus a living lab notebook, 60+ scientific file previews, and LaTeX editing (K-Dense-AI, 908+ stars, MIT License, 2026)

Active9082 weeks ago

TypeScript

SpaceMarkers

SingleCell

Spatial transcriptomic technologies have helped to resolve the connection between gene expression and the 2D orientation of tissues relative to each other. However, the limited single-cell resolution makes it difficult to highlight the most important molecular interactions in these tissues. SpaceMarkers, R/Bioconductor software, can help to find molecular interactions, by identifying genes associated with latent space interactions in spatial transcriptomics.

Active82 weeks ago

FAIR Cookbook

Active1492 weeks ago

JavaScript

NVIDIA Earth-2

Climate Modeling

World's first fully open, accelerated weather AI software stack with Medium Range forecasting and Nowcasting models using generative AI (January 2026)

Active1K2 weeks ago

hypeR

GeneSetEnrichment

Medical AI & Clinical Applications

An R Package for Geneset Enrichment Workflows.

Active792 weeks ago

GPL-3.0

napari

Fast, interactive, multi-dimensional image viewer for Python, foundational platform for scientific imaging AI with a rich plugin ecosystem integrating deep learning segmentation, object tracking, and microscopy analysis workflows (2.6K+ stars)

Active2.7K2 weeks ago

nvMolKit (NVIDIA BioNeMo, 2025)

Protein & Drug Discovery

High-performance, GPU-accelerated library for key computational chemistry tasks including molecular similarity, conformer generation, and geometry relaxation, designed to accelerate drug-discovery and molecular-modeling workflows (264+ stars, Apache 2.0)

Active2642 weeks ago

Cuda

Dama12/yolov10x-onion-disease-detection

by Dama12

object-detection