Find open-source science resources

Open source PEM (Proton Exchange Membrane) fuel cell simulation tool.

Active2291 month ago

Rust-Bio

Package suites

Rust implementations of algorithms and data structures useful for bioinformatics.

Active1.8K1 month ago

Rust

segment-geospatial

Climate Modeling

Python package for segmenting geospatial data with the Segment Anything Model (SAM), enabling zero-shot object segmentation in satellite and aerial imagery for remote sensing and Earth observation (MIT, 4k+ stars)

Active4K1 month ago

ColabFold (2025 Updates)

AlphaFold/ESMFold accessible implementation with AF3 JSON export, database updates

Active2.8K1 month ago

Jupyter Notebook

Slides & Presentation Generation

SlideDeck AI

Co-create PowerPoint presentations with Generative AI from documents or topics

Active3601 month ago

Remote Sensing & Geospatial AI

TorchGeo

PyTorch domain library for geospatial deep learning providing standardized datasets, samplers, transforms, and pre-trained models for remote sensing, land cover mapping, and environmental monitoring (Microsoft, 4K+ stars)

Active4.1K1 month ago

DCAT-AP conversion to LinkML Schema

schema

Database

The DCAT-AP conversion to a LinkML Schema is the intended point of truth for the DCAT-AP+ schema, but could be used alternatively as a LinkML representation of DCAT-AP for other Projects. It is a port of DCAT-AP to the LinkML world that is as faithful to the original as possible. This Persistent Identifier does not only provide the SHACL Shape, but could also be used as described [here](https://github.com/perma-id/w3id.org/tree/cecbc2e5f40d928f05ed5306d24fc60db0e7bb21/nfdi-de/dcat-ap-plus). DCAT-AP+ is a [LinkML](https://linkml.io/)-based extension of the [DCAT Application Profile 3.0](https://semiceu.github.io/DCAT-AP/releases/3.0.0/) that adds a provenance layer for describing how a dataset was generated and what it is about, using the [Starting Point Terms of PROV-O](https://www.w3.org/TR/prov-o/#description-starting-point-terms), the [QUDT ontology](https://www.qudt.org/), and [Dublin Core Terms](http://purl.org/dc/terms/).

Active111 month ago

Hail

Data Analysis

Scalable genomic analysis.

Active1.1K1 month ago

DenoIST

DenoIST identifies and removes contamination in Image-based Spatial Transcriptomics data, using a transposed poisson mixture model with local neighbourhood offsets to infer genes that are likely to be due to neighbourhood contamination rather than endogenous expression.

Active91 month ago

PyTorch Geometric

Specialized Frameworks

Graph neural network library for PyTorch enabling molecular modeling, materials discovery, protein interaction networks, and scientific knowledge graph learning (23.7k+ stars)

Active23.9K1 month ago

linguist

Database

Active13.6K1 month ago

Ruby

MeLSI

MeLSI (Metric Learning for Statistical Inference) is a novel machine learning method for microbiome data analysis that learns optimal distance metrics to improve statistical power in detecting group differences. Unlike traditional distance metrics (Bray-Curtis, Euclidean, Jaccard), MeLSI adapts to the specific characteristics of your dataset to maximize separation between groups. The method uses an ensemble of weak learners to identify which microbial features drive group differences, providing both improved statistical power and biological interpretability through feature importance weights.

Active11 month ago

scifer

Preprocessing

Have you ever index sorted cells in a 96 or 384-well plate and then sequenced using Sanger sequencing? If so, you probably had some struggles to either check the electropherogram of each cell sequenced manually, or when you tried to identify which cell was sorted where after sequencing the plate. Scifer was developed to solve this issue by performing basic quality control of Sanger sequences and merging flow cytometry data from probed single-cell sorted B cells with sequencing data. scifer can export summary tables, 'fasta' files, electropherograms for visual inspection, and generate reports.

Active71 month ago

sparrow

GeneSetEnrichment

Provides a unified interface to a variety of GSEA techniques from different bioconductor packages. Results are harmonized into a single object and can be interrogated uniformly for quick exploration and interpretation of results. Interactive exploration of GSEA results is enabled through a shiny app provided by a sparrow.shiny sibling package.

Active231 month ago

High-Performance Document Processing

MinerU-Diffusion (OpenDataLab, ECCV 2026)

Diffusion-based document OCR framework replacing autoregressive decoding with block-level parallel diffusion decoding, enabling high-accuracy text recognition in scientific PDFs (613+ stars, MIT License)

Active6131 month ago

Social Science Research & Simulation

EDSL

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs (460+ stars, 2024)

Active4681 month ago

compareMS2

Phylogeny

compareMS2 is a tool for comparing sets of (tandem) mass spectra for clustering samples, molecular phylogenetics, identification of biological species or tissues, and quality control. compareMS2 currently consumes Mascot Generic Format, or MGF, and produces output in a variety of common image and distance matrix formats.

Active41 month ago

JavaScript

pairedGSEA

DifferentialExpression

pairedGSEA makes it simple to run a paired Differential Gene Expression (DGE) and Differencital Gene Splicing (DGS) analysis. The package allows you to store intermediate results for further investiation, if desired. pairedGSEA comes with a wrapper function for running an Over-Representation Analysis (ORA) and functionalities for plotting the results.

Active41 month ago

TorchSim

Specialized Frameworks

PyTorch-native atomistic simulation engine for the machine-learned interatomic potential (MLIP) era, enabling batched molecular dynamics and structural relaxation with automatic GPU memory management; supports MACE, Fairchem, SevenNet, ORB, MatterSim and other popular MLIPs with up to 100x speedup over ASE (Radical AI, AI for Science 2026, 468+ stars, MIT License)

Active4691 month ago

MatterSim

Materials Discovery

Deep learning atomistic model across elements, temperatures, and pressures

Active5701 month ago

CrcBiomeScreen

A developed and benchmarked reproducible machine learning framework for microbiome-based colorectal cancer (CRC) screening. By systematically evaluating normalization strategies, taxonomic resolutions, and class imbalance handling. This R package allows users to apply the full pipeline or selectively run specific components depending on their analytical needs. It establishes a scalable foundation for developing interpretable microbiome-based screening tools to support early CRC detection. This approach could be easily implemented in a national screening programme, to improve early detection rates for this disease.

Active01 month ago

GEOquery

Microarray

The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.

Active1151 month ago

MegaFold

Cross-platform system optimizations for accelerating AlphaFold3 training with 1.73x speedup and 1.23x memory reduction

Active711 month ago

pymatviz

General Chemistry

A toolkit for visualizations in materials informatics.

Active3181 month ago

Autonomous Research Systems (2023-2025 Breakthroughs)

RD-Agent (Microsoft)

Open-source LLM-powered R&D agent framework automating data-driven AI solution building through automated research, development, and evolution; achieves top open-source performance on MLE-Bench with dual Researcher-Developer agents and supports research copilot, data mining, Kaggle, and quant R&D workflows (13.6K+ stars, MIT License, 2025-2026)

Active13.6K1 month ago

mint

Learning the language of protein-protein interactions

Active1501 month ago

AQME

General Chemistry

Ensemble of automated QM workflows that can be run through jupyter notebooks, command lines and yaml files.

Active1271 month ago

Genomics & Bioinformatics

GENERanno (bioRxiv 2025)

Genomic foundation model for metagenomic and genome annotation, featuring an 8k base-pair context and 500M parameters trained on 386B base pairs of eukaryotic DNA; provides expert models and a unified CLI for prokaryotic/eukaryotic coding-sequence annotation with strong performance on Genomic Benchmarks, Nucleotide Transformer tasks, and custom Gener tasks (GenerTeam, 314+ stars, MIT License)

Active3141 month ago

ModelAngelo

Automatic atomic model building program for cryo-EM maps using deep learning, enabling rapid de novo protein structure determination from electron density with high accuracy (3DEM/EMBL, 169+ stars)

Active1711 month ago

Pepkio Bio Unit Converter

Molecular biology

Performs laboratory unit conversions across molarity, OD600 cell density, C₁V₁ dilution, and related dimensional pairs from mass, volume, molecular weight, and organism-specific OD factors. A browser calculator combines four modes in one tabbed workspace with compound MW lookup, species-aware OD uncertainty ranges, cross-tab chaining, and shareable links; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted use. Calculator arithmetic for the API client is hosted remotely; the client transmits conversion inputs and returns structured results and shareable run identifiers.

Active11 month ago

cTRAP

DifferentialExpression

Compare differential gene expression results with those from known cellular perturbations (such as gene knock-down, overexpression or small molecules) derived from the Connectivity Map. Such analyses allow not only to infer the molecular causes of the observed difference in gene expression but also to identify small molecules that could drive or revert specific transcriptomic alterations.

Active81 month ago

Pepkio Sequence Property Calculator

Molecular biology

Calculates sequence-derived molecular properties and related laboratory planning outputs from FASTA and assay setup inputs. The tool supports sequence analysis for DNA, RNA, and protein entries, plus dilution and ligation calculation modes through one API-backed workflow. Programmatic use is available through a Python library and command-line interface that submit run payloads and return structured result objects.

Active11 month ago

Pepkio RCF RPM Rotor Converter

Molecular biology

Translates between centrifuge RPM and relative centrifugal force using rotor geometry, reporting g-force or speed at rmin, ravg, and rmax. Convert mode handles rpm_to_rcf and rcf_to_rpm with rotor presets or manual radii in mm; transfer mode maps a source RPM on one rotor to an equivalent target RPM at matched rmax RCF; batch mode processes multiple spin steps from CSV or row arrays. A browser calculator and a Python library with command-line interface submit the same parameters to the Pepkio Tools API and return structured results with optional methods text and safety warnings.

Active11 month ago

Pepkio Dose-Response Curve Fitter

Pharmacology

Performs batch four-parameter and five-parameter logistic regression on multi-compound concentration–response screens to estimate IC50, EC50, pIC50, Hill slope, and related potency metrics with per-compound QC grades. A browser calculator supports CSV upload, curve review, and figure export; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits concentration–response data and returns structured fit results and shareable run identifiers.

Active21 month ago

epiregulon.extra

GeneRegulation

Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and observed phenotypes. Epiregulon infers TF activity in single cells by constructing a gene regulatory network (regulons). This is achieved through integration of scATAC-seq and scRNA-seq data and incorporation of public bulk TF ChIP-seq data. Links between regulatory elements and their target genes are established by computing correlations between chromatin accessibility and gene expressions.

Active01 month ago

Genomics & Bioinformatics

ChatSpatial

MCP server enabling spatial transcriptomics analysis via natural language, integrating 60+ methods including SpaGCN, Cell2location, LIANA+, CellRank for Visium, Xenium, MERFISH platforms

Active401 month ago

Genomics & Bioinformatics

gReLU (Genentech, 2024)

Python library to train, interpret, and apply deep learning models to DNA sequences, providing a unified framework for regulatory genomics with support for CNN and transformer architectures, variant effect prediction, and attribution analysis (325+ stars)

Active3311 month ago

Graphormer

General-purpose deep learning backbone for molecular modeling

Active2.5K1 month ago

plotgardener

Visualization

Coordinate-based genomic visualization package for R. It grants users the ability to programmatically produce complex, multi-paneled figures. Tailored for genomics, plotgardener allows users to visualize large complex genomic datasets and provides exquisite control over how plots are placed and arranged on a page.

Active3581 month ago

ReactomeGSA

GeneSetEnrichment

The ReactomeGSA packages uses Reactome's online analysis service to perform a multi-omics gene set analysis. The main advantage of this package is, that the retrieved results can be visualized using REACTOME's powerful webapplication. Since Reactome's analysis service also uses R to perfrom the actual gene set analysis you will get similar results when using the same packages (such as limma and edgeR) locally. Therefore, if you only require a gene set analysis, different packages are more suited.

Active331 month ago

BioEmu

Microsoft's generative model for sampling protein equilibrium conformations 100,000× faster than MD simulations, predicting domain motions, local unfolding and cryptic binding pockets on a single GPU (Science 2025)

Active8361 month ago

Ibex

Implementation of the Ibex algorithm for single-cell embedding based on BCR sequences. The package includes a standalone function to encode BCR sequence information by amino acid properties or sequence order using tensorflow-based autoencoder. In addition, the package interacts with SingleCellExperiment or Seurat data objects.

Active271 month ago

Interactive Research Environments

ScholarAIO

Agent-agnostic research infrastructure providing AI agents with a structured scientific workspace for deep PDF parsing, hybrid semantic/keyword literature search, citation-graph analysis, topic discovery, and academic writing workflows; natively integrates with Claude Code, Codex, Cursor, Cline, and AgentSkills.io (530+ stars, MIT License, 2026)

Active5301 month ago

Earth-Copilot

Climate Modeling

Microsoft's AI-powered geospatial Earth science application for natural-language exploration, visualization, and analysis of 130+ satellite collections, with STAC integration, multi-agent backend, MCP server, and deployable React/FastAPI stack (MIT, 2025)

Active1691 month ago

fgsea

GeneExpression

The package implements an algorithm for fast gene set enrichment analysis. Using the fast algorithm allows to make more permutations and get more fine grained p-values, which allows to use accurate stantard approaches to multiple hypothesis correction.

Active4451 month ago

sfi

MassSpectrometry

Data analysis for Single File Injections(SFIs) mode LC-MS analysis. In SFIs mode, pooled samples are initially injected to serve as reference peaks for subsequent analyses. Repeated injections of individual samples are then performed at fixed time intervals using isocratic elution. This package provides the functions to analyze data from SFIs mode including peak picking and peak reassignment.

Active11 month ago

MatterGen

Materials Discovery

Diffusion-based generative model for inorganic materials design, steering generation by chemistry, symmetry, bulk modulus, band gap, or magnetic properties, 2× more likely to produce stable novel structures than prior methods, experimentally validated with synthesized TaCr₂O₆ (Microsoft, Nature 2025)

Active1.7K1 month ago

Bedtools2

GFF BED File Utilities

A Swiss Army knife for genome arithmetic.

Active1K1 month ago

SpaceTrooper

SpaceTrooper performs Quality Control analysis using data driven GLM models of Image-Based spatial data, providing exploration plots, QC metrics computation, outlier detection. It implements a GLM strategy for the detection of low quality cells in imaging-based spatial data (Transcriptomics and Proteomics). It additionally implements several plots for the visualization of imaging based polygons through the ggplot2 package.

Active111 month ago

Scientific Machine Learning Frameworks

SciMLBenchmarks.jl

Scientific machine learning benchmarks & differential equation solvers

Active3441 month ago

MATLAB