Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

855 of 5,893 resources

Showing 51100

Molecular dynamics in JAX

Active1.4K1 week ago
Jupyter Notebook
Apache-2.0

Segment Anything Model for microscopy: interactive and automatic segmentation of light, electron, and fluorescence microscopy images in 2D and 3D, with domain-specific fine-tuning workflows for scientific imaging (1.5K+ stars)

Active6851 week ago
Jupyter Notebook
MIT

Deep learning software to decode EEG, ECG or MEG signals, providing standardized neural network models, preprocessing pipelines, and evaluation workflows for brain-computer interfaces and cognitive neuroscience research (1.2K+ stars, BSD 3-Clause, actively maintained)

Active1.2K1 week ago
Python
BSD-3-Clause

The package is a part of the gDR suite. It helps to prepare raw drug response data for downstream processing. It mainly contains helper functions for importing/loading/validating dose-response data provided in different file formats.

Active31 week ago
R

xgt is a command-line tool for programmatic access to the GTDB REST API. It provides four subcommands: search (genome queries with pagination), genome (cards, metadata, taxonomic history), taxon (lineage and genome set retrieval), and diff (per-rank taxonomic comparison between any two GTDB releases). All subcommands support batch input, JSON/CSV/TSV output, file splitting, and automatic retry. Implemented in Rust as a self-contained binary with no runtime dependencies.

Active301 week ago
Rust
Apache-2.0

Self-evolving AI scientist with 6 specialized sub-agents (plan/research/code/debug/analyze/write) and persistent memory, #1 on DeepResearch Bench II and AstaBench, supporting multi-provider LLMs and multi-channel deployment (Apache 2.0, 2026)

Active3.3K1 week ago
Python
Apache-2.0

atomate2 is a library of computational materials science workflows.

Active3151 week ago
Python
NOASSERTION

High-level open-source geospatial AI package for satellite/aerial imagery analysis, model training, inference, interactive visualization, and QGIS integration, bridging PyTorch/Transformers with remote sensing workflows (MIT, 2026)

Active3.1K1 week ago
Python
MIT

General-purpose biomedical AI agent integrating LLM reasoning with retrieval-augmented planning and code-based execution to autonomously execute diverse biomedical research tasks and generate testable hypotheses (Stanford SNAP, bioRxiv 2025)

Active3.2K1 week ago
Python
Apache-2.0

SimBu can be used to simulate bulk RNA-seq datasets with known cell type fractions. You can either use your own single-cell study for the simulation or the sfaira database. Different pre-defined simulation scenarios exist, as are options to run custom simulations. Additionally, expression values can be adapted by adding an mRNA bias, which produces more biologically relevant simulations.

Active191 week ago
R
GPL-3.0

Markerless pose estimation of user-defined features with deep learning for all animals including humans, enabling quantitative behavioral analysis in neuroscience and ethology (Nature Neuroscience 2018, 5.6K+ stars)

Active5.7K1 week ago
Python
LGPL-3.0

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Active381 week ago
R
MIT

The miaViz package implements functions to visualize TreeSummarizedExperiment objects especially in the context of microbiome analysis. Part of the mia family of R/Bioconductor packages.

Active121 week ago
R
Artistic-2.0

Simulations of spiking neural networks.

Active1.2K1 week ago
Python
NOASSERTION

SOTA multimodal document parsing with 1.2B parameters outperforming GPT-4o, converts PDFs to LLM-ready Markdown/JSON

Active65.9K1 week ago
Python
NOASSERTION

A haplotype-resolved assembler for accurate Hifi reads.

Active7791 week ago
C++
MIT

Interactive explorer for single-cell transcriptomics data enabling visualization of UMAP/t-SNE embeddings, differential expression analysis, and cross-dataset comparison through a fast web-based interface; widely adopted for exploring atlas-scale single-cell datasets and integrating with AI/ML analysis workflows (773+ stars, MIT License)

Active7732 weeks ago
JavaScript
MIT

Banksy is an R package that incorporates spatial information to cluster cells in a feature space (e.g. gene expression). To incorporate spatial information, BANKSY computes the mean neighborhood expression and azimuthal Gabor filters that capture gene expression gradients. These features are combined with the cell's own expression to embed cells in a neighbor-augmented product space which can then be clustered, allowing for accurate and spatially-aware cell typing and tissue domain segmentation.

Active1512 weeks ago
R
Other

Extends beachmat to initialize tatami matrices from TileDB-backed arrays. This allows C++ code in downstream packages to directly call the TileDB C/C++ library to access array data, without the need for block processing via DelayedArray. Developers only need to import this package to automatically extend the capabilities of beachmat::initializeCpp to TileDBArray instances.

Active02 weeks ago
R
GPL-3.0

World's first fully open, accelerated weather AI software stack with Medium Range forecasting and Nowcasting models using generative AI (January 2026)

Active9592 weeks ago
Python
Apache-2.0

A teaching platform for computer-aided drug design (CADD) using open source packages and data.

Active1K2 weeks ago
Jupyter Notebook
CC-BY-4.0

Fast, interactive, multi-dimensional image viewer for Python, foundational platform for scientific imaging AI with a rich plugin ecosystem integrating deep learning segmentation, object tracking, and microscopy analysis workflows (2.6K+ stars)

Active2.7K2 weeks ago
Python
BSD-3-Clause

Tool for converting raw DNA data files between 23andMe, AncestryDNA, MyHeritage, and FamilyTreeDNA formats.

Active12 weeks ago
PHP
MIT

Spatial transcriptomic technologies have helped to resolve the connection between gene expression and the 2D orientation of tissues relative to each other. However, the limited single-cell resolution makes it difficult to highlight the most important molecular interactions in these tissues. SpaceMarkers, R/Bioconductor software, can help to find molecular interactions, by identifying genes associated with latent space interactions in spatial transcriptomics.

Active82 weeks ago
R
MIT

Geneset Ordinal Association Test Enrichment Analysis (GOATEA) provides a 'Shiny' interface with interactive visualizations and utility functions for performing and exploring automated gene set enrichment analysis using the 'GOAT' package. 'GOATEA' is designed to support large-scale and user-friendly enrichment workflows across multiple gene lists and comparisons, with flexible plotting and output options. Visualizations pre-enrichment include interactive 'Volcano' and 'UpSet' (overlap) plots. Visualizations post-enrichment include interactive geneset dotplot, geneset treeplot, gene-effectsize heatmap, gene-geneset heatmap and 'STRING' database of protein-protein-interactions network graph. 'GOAT' reference: Frank Koopmans (2024) <doi:10.1038/s42003-024-06454-5>.

Active22 weeks ago
R
Apache-2.0+

Toolkit for large-scale whole-slide image processing supporting 22+ patch encoders (UNI, CONCH, Virchow, H-Optimus-0, etc.), slide encoders (TITAN, GigaPath, PRISM, CHIEF, Madeleine, Feather), tissue segmentation, and multi-GPU inference with end-to-end pipeline and smart resume for standardized deployment of computational pathology foundation models (Mahmood Lab, Harvard Medical School, 553+ stars)

Active5672 weeks ago
Python
NOASSERTION

Python package for simulation-based inference enabling likelihood-free Bayesian parameter estimation from scientific simulators, with flexible interfaces for neural posterior estimation, sequential methods, and MCMC/variational backends (Mackelab, 825+ stars)

Active8282 weeks ago
Python
Apache-2.0

A Flexible Model For Record Linkage

Active12 weeks ago
C++
GPL-3.0-or-later

First fully open-source model achieving AlphaFold3-level accuracy with 1000x faster binding affinity prediction (MIT)

Active4K2 weeks ago
Python
MIT

Vision foundation model for the tree of life, pretrained on diverse biological imagery across taxa for zero-shot species identification, trait extraction, and biodiversity research (Ohio State University Imageomics Institute)

Active2592 weeks ago
Python
NOASSERTION

Highly scalable equivariant deep learning interatomic potentials enabling million-atom molecular dynamics simulations with ab initio accuracy, building on E(3)-equivariant architectures for large-scale atomistic modeling (mir-group, MIT License, 480+ stars)

Active4822 weeks ago
Python
MIT

Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science with 140+ ready-to-use skills and 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Antigravity, and the open Agent Skills standard (K-Dense-AI, 26K+ stars, 2025)

Active26.5K2 weeks ago
Python
MIT

Biological simulation tools

Active152 weeks ago
Python
MIT

FlowVision is offline flow cytometry analysis software for Windows and macOS. It supports FCS 2.0, 3.0, 3.1 and 3.2 file formats, polygon/rectangle/ellipse/quadrant gating with auto-fit (snap to cluster), spillover compensation, biexponential and hyperlog scales, MFI statistics (median, geometric mean, CV%), multi-file batch analysis with per-file gate overrides, and hierarchical gating. Spectral unmixing supports linear, NNLS, and Poisson-weighted least squares algorithms, with autofluorescence extraction and spillover spreading matrix. UMAP dimensionality reduction with reproducible seed and landmark mode for high-parameter panels. Imports FlowJo .wsp (compensation matrix) and exports gates to FlowJo .wsp and Gating-ML 2.0 (ISAC open standard) for interoperability with FlowJo, R/flowWorkspace/CytoML, and FCS Express.

Active02 weeks ago
JavaScript
Proprietary

MS-based metabolomics data processing and compound annotation pipeline.

Active152 weeks ago
R
GPL-2.0+

A multitude of tools for comparative genomics, focused on large-scale analyses of biological data. SynExtend includes tools for working with syntenic data, clustering massive network structures, and estimating functional relationships among genes.

Active12 weeks ago
R
GPL-3.0

197 bioinformatics and life science skills for Claude Code and AI agents, achieving 92.0% accuracy on BixBench. Covers RNA-seq, single-cell analysis, drug discovery, proteomics, and more. Powers OmicsHorizon (195+ stars, 2026)

Active1952 weeks ago
Python
NOASSERTION

Package is a part of the gDR suite. It reexports functions from other packages in the gDR suite that contain critical processing functions and utilities. The vignette walks through the full processing pipeline for drug response analyses that the gDR suite offers.

Active22 weeks ago
R
Artistic-2.0

A small language for defining pipeline stages and linking them together to make pipelines.

Active2422 weeks ago
Groovy
NOASSERTION

A quantum chemistry package written in Python.

Active772 weeks ago
Python
Apache-2.0

With the dedicated fortify method implemented for flowSet, ncdfFlowSet and GatingSet classes, both raw and gated flow cytometry data can be plotted directly with ggplot. ggcyto wrapper and some customed layers also make it easy to add gates and population statistics to the plot.

Active652 weeks ago
R
Other

98B-parameter frontier generative model jointly reasoning over protein sequence, structure, and function, trained on 2.78 billion proteins; generated a novel fluorescent protein (esmGFP) with only 58% sequence identity to known GFPs (EvolutionaryScale, 2024)

Active2.4K2 weeks ago
Jupyter Notebook
NOASSERTION

Generalist deep learning algorithm for cell and nucleus segmentation across diverse image types, with human-in-the-loop training (2.0) and one-click image restoration (3.0), 70K+ training objects (Nature Methods 2021/2022/2025)

Active2.2K2 weeks ago
Python
BSD-3-Clause

The modern C++ library for sequence analysis.

Active4542 weeks ago
C++
NOASSERTION

Structural variant discovery by integrated paired-end and split-read analysis.

Active5212 weeks ago
C++
BSD-3-Clause

Provides univariate and multivariate statistics for feature prioritization in untargeted LC-MS metabolomics research.

Active02 weeks ago
R
MIT

Parallel Computing and Scientific Machine Learning: MIT 18.337J/6.338J course materials (1.9k+ stars)

Active2K2 weeks ago
HTML

Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and observed phenotypes. Epiregulon infers TF activity in single cells by constructing a gene regulatory network (regulons). This is achieved through integration of scATAC-seq and scRNA-seq data and incorporation of public bulk TF ChIP-seq data. Links between regulatory elements and their target genes are established by computing correlations between chromatin accessibility and gene expressions.

Active282 weeks ago
R
MIT

Machine learning and statistical learning for neuroimaging in Python, providing easy-to-use tools for fMRI and MRI analysis including decoding, connectivity estimation, and parcellation with seamless scikit-learn integration (INRIA Parietal team, 1.4K+ stars)

Active1.4K2 weeks ago
Python
BSD-3-Clause

NVIDIA's open-source platform for building and adapting biological AI models at scale, bundling ESM-2, Geneformer, MolMIM and DNA embedding models with recipes for single-GPU to multi-node training (2025)

Active7532 weeks ago
Jupyter Notebook