Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

3,202 of 5,923 resources

Showing 1,2011,250

PyTorch toolkit for deep neural networks in atomistic simulations, implementing SchNet, DimeNet++, PaiNN, and GemNet for molecular dynamics and quantum chemistry (900+ stars)

Python Materials Genomics: robust materials analysis library defining classes for structures and molecules with support for many electronic structure codes; foundational toolkit powering the Materials Project (Berkeley Lab, 1.8K+ stars)

Diffusion-based generative model for inorganic materials design, steering generation by chemistry, symmetry, bulk modulus, band gap, or magnetic properties, 2× more likely to produce stable novel structures than prior methods, experimentally validated with synthesized TaCr₂O₆ (Microsoft, Nature 2025)

Deep learning atomistic model across elements, temperatures, and pressures

Universal machine learning interatomic potential for atomistic simulation of materials, molecules, and biomolecules across the periodic table, with open-source pretrained models and inference tools (Orbital Materials, 2024-2025)

Graph neural network interatomic potential package supporting efficient multi-GPU parallel molecular dynamics simulations, enabling large-scale atomistic modeling with machine learning potentials (MDIL-SNU, MIT License)

Materials informatics benchmark

AstraZeneca's industrial-grade retrosynthetic planning tool using MCTS to recursively decompose molecules into purchasable precursors, with multi-step route scoring and support for custom one-step models (v4.0, 2024)

Generative AI system for antibiotic discovery that searches billions of synthesizable molecules by combining molecular building blocks through real chemical reactions, experimentally validating novel compounds active against drug-resistant bacteria

Google DeepMind and Google Quantum AI's transformer-based neural-network decoder for quantum error correction, trained on real Sycamore quantum processor data to outperform tensor-network and correlated matching decoders at code distances 3 and 5, demonstrating ML's role in enabling fault-tolerant quantum computing (Nature 2024)

Machine learning toolkit for many-body quantum systems, implementing neural quantum states, variational Monte Carlo, and tensor network algorithms to solve ground-state and dynamical problems in condensed matter physics and quantum chemistry (EPFL & collaborators, Nature Physics 2019/2022+, 670+ stars)

Improved equivariant Transformer for 3D atomic graphs (ICLR2024)

Cross-modal self-supervised foundation model for galaxies by Polymathic AI, jointly embedding multi-band galaxy imaging and optical spectra into a shared latent space to enable zero/few-shot redshift estimation, galaxy property prediction, morphology classification, and cross-modal similarity search (MNRAS Letters 2024)

Polymathic AI's large omnimodal foundation model for astronomical surveys, seamlessly integrating 39 distinct data modalities including imaging, spectra, photometry, and catalog entries for similarity search, property prediction, and generative modeling across legacy surveys (MIT)

Spherical CNNs for astronomy

Microsoft's foundation model for the Earth system supporting weather, air pollution, and ocean wave forecasting at multiple resolutions, trained on 1M+ hours of diverse atmospheric data (Nature 2025)

Microsoft's AI-powered geospatial Earth science application for natural-language exploration, visualization, and analysis of 130+ satellite collections, with STAC integration, multi-agent backend, MCP server, and deployable React/FastAPI stack (MIT, 2025)

Google Research's hybrid ML/physics atmospheric model combining learned dynamics with physical constraints, outperforming traditional models on 2-15 day forecasts and 40-year climate simulation, developed with ECMWF (Nature 2024)

Fudan University's cascade machine learning forecasting system for 15-day global weather prediction, employing a 3D Earth-specific transformer with hard-constraint techniques to achieve state-of-the-art accuracy against traditional NWP and AI baselines

Physics-AI hybrid modeling for fine-grained weather forecasting (NeurIPS'24)

Python package for segmenting geospatial data with the Segment Anything Model (SAM), enabling zero-shot object segmentation in satellite and aerial imagery for remote sensing and Earth observation (MIT, 4k+ stars)

Curated list of large weather models for AI Earth science

Python toolkit for fine-tuning geospatial foundation models

PyTorch domain library for geospatial deep learning providing standardized datasets, samplers, transforms, and pre-trained models for remote sensing, land cover mapping, and environmental monitoring (Microsoft, 4K+ stars)

Allen Institute for AI's global geospatial foundation model for satellite imagery analysis, enabling large-scale mapping of buildings, wind turbines, trees, and land cover from Sentinel-2 data with open-source weights and inference tools (2024)

Semantic-enhanced multi-modal remote sensing foundation model for Earth observation (Nature Machine Intelligence 2025), enabling universal interpretation across diverse satellite imagery modalities with open-source weights and benchmarks

University of Cambridge's foundation model for time-series satellite imagery, enabling efficient extraction of temporal patterns from Earth observation for land classification, canopy height prediction, and other remote sensing tasks

First any-to-any generative foundation model for Earth Observation, enabling unified multimodal understanding and generation across diverse satellite sensors and geospatial tasks through a single architecture (258+ stars)

Agricultural machine learning platform

Ecological modeling and conservation AI

Microsoft AI for Good Lab's open-source biodiversity research hub providing AI models, edge devices, and tools for wildlife monitoring and conservation, including MegaDetector (camera trap animal detection), SPARROW (species recognition), PytorchWildlife (conservation AI toolkit), and bioacoustics analysis pipelines (1K+ stars)

Curated collection of 23,000+ agent skills for empirical research across 8 social science disciplines, enabling reproducible social science research with AI agents (Stanford REAP & CoPaper.AI, 1.1K+ stars, 2026)

Large language model for science

Open-source scientific multimodal foundation model built on a 235B MoE LLM and 6B vision encoder, continually pretrained on 5T tokens including 2.5T scientific-domain tokens, with strong results across chemistry, materials, life science, and earth science benchmarks (2025)

Open language model for mathematics (7B/34B) trained on Proof-Pile-2, outperforming Minerva at equal scale on MATH benchmark, with tool use and formal theorem proving in Lean without finetuning (EleutherAI, ICLR 2024)

IBM's open foundation model family for materials and chemistry, covering SMILES, SELFIES, molecular graphs, 3D atom positions, and electron density grids, with a unified toolkit for representation learning and downstream prediction/generation (Apache 2.0, 2024-2025)

15TB collection of 16 large-scale numerical simulation datasets spanning fluid dynamics, MHD, astrophysics, biological systems, and acoustic scattering, with unified PyTorch dataloaders and benchmarks for training foundation models on physical sciences (Polymathic AI, NeurIPS 2024)

Acausal modeling framework for automatically parallelized scientific machine learning (1.5k+ stars)

Scientific machine learning benchmarks & differential equation solvers

Unified interface for local, global, gradient-based and derivative-free optimization (800+ stars)

SDK & library for AI-driven scientific computing applications

Molecular dynamics analysis

Euclidean neural networks for arbitrary point transformations enabling E(3)-equivariant deep learning, foundational library for building geometry-aware neural networks in molecular dynamics, materials science, and physics

Probabilistic programming

High-performance molecular simulation toolkit

End-to-end molecular dynamics engine built on PyTorch, enabling differentiable simulations with neural network potentials and GPU acceleration for machine learning-accelerated molecular dynamics (MIT License, 707+ stars)

Graph neural network library for PyTorch enabling molecular modeling, materials discovery, protein interaction networks, and scientific knowledge graph learning (23.7k+ stars)