Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

3,202 of 5,923 resources

Showing 1,2511,300

Fudan University's cascade machine learning forecasting system for 15-day global weather prediction, employing a 3D Earth-specific transformer with hard-constraint techniques to achieve state-of-the-art accuracy against traditional NWP and AI baselines

Physics-AI hybrid modeling for fine-grained weather forecasting (NeurIPS'24)

Python package for segmenting geospatial data with the Segment Anything Model (SAM), enabling zero-shot object segmentation in satellite and aerial imagery for remote sensing and Earth observation (MIT, 4k+ stars)

Curated list of large weather models for AI Earth science

Python toolkit for fine-tuning geospatial foundation models

PyTorch domain library for geospatial deep learning providing standardized datasets, samplers, transforms, and pre-trained models for remote sensing, land cover mapping, and environmental monitoring (Microsoft, 4K+ stars)

Allen Institute for AI's global geospatial foundation model for satellite imagery analysis, enabling large-scale mapping of buildings, wind turbines, trees, and land cover from Sentinel-2 data with open-source weights and inference tools (2024)

Semantic-enhanced multi-modal remote sensing foundation model for Earth observation (Nature Machine Intelligence 2025), enabling universal interpretation across diverse satellite imagery modalities with open-source weights and benchmarks

University of Cambridge's foundation model for time-series satellite imagery, enabling efficient extraction of temporal patterns from Earth observation for land classification, canopy height prediction, and other remote sensing tasks

First any-to-any generative foundation model for Earth Observation, enabling unified multimodal understanding and generation across diverse satellite sensors and geospatial tasks through a single architecture (258+ stars)

Agricultural machine learning platform

Ecological modeling and conservation AI

Microsoft AI for Good Lab's open-source biodiversity research hub providing AI models, edge devices, and tools for wildlife monitoring and conservation, including MegaDetector (camera trap animal detection), SPARROW (species recognition), PytorchWildlife (conservation AI toolkit), and bioacoustics analysis pipelines (1K+ stars)

Curated collection of 23,000+ agent skills for empirical research across 8 social science disciplines, enabling reproducible social science research with AI agents (Stanford REAP & CoPaper.AI, 1.1K+ stars, 2026)

Large language model for science

Open-source scientific multimodal foundation model built on a 235B MoE LLM and 6B vision encoder, continually pretrained on 5T tokens including 2.5T scientific-domain tokens, with strong results across chemistry, materials, life science, and earth science benchmarks (2025)

IBM's open foundation model family for materials and chemistry, covering SMILES, SELFIES, molecular graphs, 3D atom positions, and electron density grids, with a unified toolkit for representation learning and downstream prediction/generation (Apache 2.0, 2024-2025)

15TB collection of 16 large-scale numerical simulation datasets spanning fluid dynamics, MHD, astrophysics, biological systems, and acoustic scattering, with unified PyTorch dataloaders and benchmarks for training foundation models on physical sciences (Polymathic AI, NeurIPS 2024)

Acausal modeling framework for automatically parallelized scientific machine learning (1.5k+ stars)

Scientific machine learning benchmarks & differential equation solvers

Unified interface for local, global, gradient-based and derivative-free optimization (800+ stars)

SDK & library for AI-driven scientific computing applications

Molecular dynamics analysis

Euclidean neural networks for arbitrary point transformations enabling E(3)-equivariant deep learning, foundational library for building geometry-aware neural networks in molecular dynamics, materials science, and physics

Probabilistic programming

High-performance molecular simulation toolkit

End-to-end molecular dynamics engine built on PyTorch, enabling differentiable simulations with neural network potentials and GPU acceleration for machine learning-accelerated molecular dynamics (MIT License, 707+ stars)

Graph neural network library for PyTorch enabling molecular modeling, materials discovery, protein interaction networks, and scientific knowledge graph learning (23.7k+ stars)

200+ AI for Science papers with Chinese interpretations

Ensemble of automated QM workflows that can be run through jupyter notebooks, command lines and yaml files.

A Python program to compute quasi-harmonic thermochemical data from Gaussian frequency calculations.

Calculate mass, elemental composition, and mass distribution spectrum of a molecule given by its chemical formula, relative element weights, or sequence.

A toolkit for visualizations in materials informatics.

A library for processing, analyzing and modeling spectroscopic data.

Library of descriptors to aid in the data-mining of materials properties, created by the Lawrence Berkeley National Laboratory.

Aims to provide useful high-level interfaces that make ML for materials science as easy as possible.

Library for fast calculations of **mo**lecula**r** **fe**at**u**re**s** from 3D structures for machine learning with a focus on steric descriptors.

Ensemble of automated machine learning protocols that can be run sequentially through a single command line. The program works for regression and classification problems.

Self-Referencing Embedded Strings (SELFIES): A 100% robust molecular string representation.

Library with several compositional and structural material descriptors, along with a few pre-trained neural network models of material properties.

A benchmarking platform for molecular generation models.

Makes alchemical free energy calculations easier by leveraging the full power and flexibility of the PyData stack.

A tool and library for creating quantum chemistry input files.