Find open-source science resources

Arc Institute's 40B-parameter genome foundation model trained on 9 trillion nucleotides from all domains of life, supporting 1M base pair context for generalist DNA/RNA/protein prediction and design (Nature 2026)

HyenaDNA

Long-range genomic foundation model using subquadratic Hyena operators instead of Transformer attention, enabling context lengths up to 1 million nucleotides for chromosome-scale DNA sequence modeling and downstream genomics tasks (Stanford Hazy Research, NeurIPS 2023, 784+ stars, Apache 2.0)

Caduceus (ICML 2024)

Bi-directional DNA language model based on the Mamba state space architecture, enabling efficient long-range genomic sequence modeling with linear-time complexity and built-in reverse-complement equivariance; achieves strong performance on chromatin accessibility, enhancer, and promoter prediction benchmarks (Stanford & UC Berkeley, 500+ stars)

CodonFM (NVIDIA)

Family of codon-resolution language models trained on 130 million protein-coding sequences from over 20,000 species, enabling cross-species gene expression prediction and codon-level functional genomics (2025)

LucaOne

Generalized biological foundation model with unified nucleic acid and protein language, integrating DNA/RNA/protein sequences (Nature Machine Intelligence 2025)

Nicheformer

Foundation model jointly trained on single-cell and spatial transcriptomics data, enabling unified representation learning across cellular and tissue spatial contexts for cell type prediction, spatial domain inference, and cross-modal integration (theislab, bioRxiv 2024, 164+ stars)

scFoundation

100M-parameter foundation model pretrained on 50M+ human single-cell transcriptomes covering ~20,000 genes, achieving SOTA on gene expression enhancement, drug response and perturbation prediction (Nature Methods 2024)

Stack

Arc Institute's single-cell foundation model enabling in-context learning at inference time via a novel tabular attention architecture, trained on 150M uniformly-preprocessed cells for generalizing biological effects and generating unseen cell profiles in novel contexts (2025)

GEARS

Geometric deep learning model predicting transcriptional outcomes of novel single- and multi-gene perturbations using gene–gene knowledge graphs, 40% higher precision than prior methods on combinatorial perturbation prediction (Stanford, Nature Biotechnology 2024)

ChatSpatial

MCP server enabling spatial transcriptomics analysis via natural language, integrating 60+ methods including SpaGCN, Cell2location, LIANA+, CellRank for Visium, Xenium, MERFISH platforms

Enformer

Gene expression prediction

DNABERT

DNA sequence analysis

OpenCRISPR

Neuroscience & Behavioral Analysis

First open-source AI-generated gene editing systems developed with protein language models, enabling programmable CRISPR-Cas nucleases for synthetic biology and therapeutic genome editing (Profluent, 2024)

SLEAP

Neuroscience & Behavioral Analysis

Deep learning-based multi-animal pose tracking and behavior classification, enabling automated quantification of social interactions and collective behavior across species (Nature Methods 2022, 2.2K+ stars)

CEBRA (Nature 2023)

Neuroscience & Behavioral Analysis

Learnable latent embeddings for joint behavioral and neural analysis, enabling consistent and interpretable mapping of neural activity to behavior across modalities, species, and experiments (EPFL & Harvard, 1K+ stars)

TRIBE v2

Computational Pathology & Digital Pathology

Meta FAIR's foundation model of vision, audition, and language for in-silico neuroscience, predicting fMRI brain responses to naturalistic multimodal stimuli (video, audio, text) through unified Transformer architecture mapped to the cortical surface (2026)

Prov-GigaPath (Nature 2024)

Computational Pathology & Digital Pathology

Whole-slide pathology foundation model trained on 1.3 billion image tiles from 171K slides using a LongNet-based architecture to encode gigapixel-scale WSIs for cancer subtyping and biomarker prediction (Microsoft Research & Providence, 601+ stars)

CONCH (Nature Medicine 2024)

Computational Pathology & Digital Pathology

Vision-language pathology foundation model using contrastive learning on histopathology image-text pairs, enabling zero-shot classification, slide-level retrieval, and multimodal reasoning across diverse cancer types (Mahmood Lab, 494+ stars)

PathChat (Nature Medicine 2024)

Multimodal generative AI assistant for computational pathology enabling interactive visual-language conversations over histopathology images for diagnostic reasoning, case discussion, and education, built on a Mistral-7B backbone with domain-specific fine-tuning (Mahmood Lab, Harvard Medical School, 1.2K+ stars)

MedSAM2

Segment Anything in 3D medical images and videos, extending SAM2 to volumetric and temporal medical imaging with state-of-the-art zero-shot segmentation performance across CT, MRI, and surgical video (arXiv 2025)

MedAgents

Multi-disciplinary collaboration framework for zero-shot medical reasoning using role-playing LLM agents (ACL 2024)

MedRAG

Systematic medical RAG toolkit for question answering over PubMed, StatPearls, textbooks, and Wikipedia, supporting multiple retrievers, domain LLMs, and follow-up-query workflows for benchmarked clinical/biomedical QA (ACL Findings 2024)

NVIDIA Biomedical AI-Q Research Agent

Deployable biomedical deep-research agent blueprint combining on-prem multimodal RAG, report generation, human-in-the-loop editing, and virtual screening with MolMIM and DiffDock for drug discovery workflows (2025)

nnU-Net

Self-configuring deep learning framework for semantic segmentation of biomedical images requiring no manual hyperparameter tuning; automatically adapts preprocessing, network topology, and training parameters to achieve state-of-the-art results across 120+ international competitions and benchmarks out-of-the-box (DKFZ, Nature Methods 2021, 8.3k+ stars)

TotalSegmentator

Robust deep learning-based segmentation of >100 anatomical structures in CT and MR images, built on nnU-Net and widely adopted in clinical radiology and surgical planning workflows (2.6K+ stars)

LLM4Chemistry

LLM for Chemistry

Curated paper list about LLMs for chemistry covering fine-tuning, reasoning, multi-modal models, agents, and benchmarks (COLING 2025)

GNoME

DeepMind's graph neural network for materials exploration, discovering 2.2M new crystal structures (380K most stable) equivalent to 800 years of traditional research, with 520K+ materials dataset open-sourced (Nature 2023)

FAIRChem (OMat24)

Meta's comprehensive ML ecosystem for materials/chemistry with 118M+ DFT calculations, EquiformerV2 models achieving top Matbench Discovery performance

JARVIS

NIST's open-source platform for data-driven atomistic materials design, integrating DFT datasets (JARVIS-DFT), machine learning property prediction (JARVIS-ML), and a comprehensive leaderboard for benchmarking materials AI methods across the periodic table (384+ stars)

SchNetPack

PyTorch toolkit for deep neural networks in atomistic simulations, implementing SchNet, DimeNet++, PaiNN, and GemNet for molecular dynamics and quantum chemistry (900+ stars)

pymatgen

Python Materials Genomics: robust materials analysis library defining classes for structures and molecules with support for many electronic structure codes; foundational toolkit powering the Materials Project (Berkeley Lab, 1.8K+ stars)

MatterGen

Diffusion-based generative model for inorganic materials design, steering generation by chemistry, symmetry, bulk modulus, band gap, or magnetic properties, 2× more likely to produce stable novel structures than prior methods, experimentally validated with synthesized TaCr₂O₆ (Microsoft, Nature 2025)

MatterSim

Deep learning atomistic model across elements, temperatures, and pressures

ORB

Universal machine learning interatomic potential for atomistic simulation of materials, molecules, and biomolecules across the periodic table, with open-source pretrained models and inference tools (Orbital Materials, 2024-2025)

SevenNet (JCTC 2024)

Graph neural network interatomic potential package supporting efficient multi-GPU parallel molecular dynamics simulations, enabling large-scale atomistic modeling with machine learning potentials (MDIL-SNU, MIT License)

Crystal Graph CNNs

Crystal property prediction

MatBench

Materials informatics benchmark

AiZynthFinder

Chemical Synthesis

AstraZeneca's industrial-grade retrosynthetic planning tool using MCTS to recursively decompose molecules into purchasable precursors, with multi-step route scoring and support for custom one-step models (v4.0, 2024)

SyntheMol (Stanford, Nature Machine Intelligence 2024)

Chemical Synthesis

Machine Learning for Physics

Generative AI system for antibiotic discovery that searches billions of synthesizable molecules by combining molecular building blocks through real chemical reactions, experimentally validating novel compounds active against drug-resistant bacteria

AlphaQubit

Machine Learning for Physics

Google DeepMind and Google Quantum AI's transformer-based neural-network decoder for quantum error correction, trained on real Sycamore quantum processor data to outperform tensor-network and correlated matching decoders at code distances 3 and 5, demonstrating ML's role in enabling fault-tolerant quantum computing (Nature 2024)

NetKet

Machine Learning for Physics

Machine learning toolkit for many-body quantum systems, implementing neural quantum states, variational Monte Carlo, and tensor network algorithms to solve ground-state and dynamical problems in condensed matter physics and quantum chemistry (EPFL & collaborators, Nature Physics 2019/2022+, 670+ stars)

EquiformerV2

Improved equivariant Transformer for 3D atomic graphs (ICLR2024)

AstroCLIP

Astronomy & Astrophysics

Cross-modal self-supervised foundation model for galaxies by Polymathic AI, jointly embedding multi-band galaxy imaging and optical spectra into a shared latent space to enable zero/few-shot redshift estimation, galaxy property prediction, morphology classification, and cross-modal similarity search (MNRAS Letters 2024)

AION (arXiv 2025)

Astronomy & Astrophysics

Polymathic AI's large omnimodal foundation model for astronomical surveys, seamlessly integrating 39 distinct data modalities including imaging, spectra, photometry, and catalog entries for similarity search, property prediction, and generative modeling across legacy surveys (MIT)

DeepSphere

Astronomy & Astrophysics

Spherical CNNs for astronomy

Aurora

Microsoft's foundation model for the Earth system supporting weather, air pollution, and ocean wave forecasting at multiple resolutions, trained on 1M+ hours of diverse atmospheric data (Nature 2025)

Earth-Copilot

Microsoft's AI-powered geospatial Earth science application for natural-language exploration, visualization, and analysis of 130+ satellite collections, with STAC integration, multi-agent backend, MCP server, and deployable React/FastAPI stack (MIT, 2025)

NeuralGCM

Google Research's hybrid ML/physics atmospheric model combining learned dynamics with physical constraints, outperforming traditional models on 2-15 day forecasts and 40-year climate simulation, developed with ECMWF (Nature 2024)

FuXi (Nature 2023)

Fudan University's cascade machine learning forecasting system for 15-day global weather prediction, employing a 3D Earth-specific transformer with hard-constraint techniques to achieve state-of-the-art accuracy against traditional NWP and AI baselines

WeatherGFT