Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source
Type(1)
3,202 of 5,923 resources
Showing 1,251–1,300
Fudan University's cascade machine learning forecasting system for 15-day global weather prediction, employing a 3D Earth-specific transformer with hard-constraint techniques to achieve state-of-the-art accuracy against traditional NWP and AI baselines
Physics-AI hybrid modeling for fine-grained weather forecasting (NeurIPS'24)
Python package for segmenting geospatial data with the Segment Anything Model (SAM), enabling zero-shot object segmentation in satellite and aerial imagery for remote sensing and Earth observation (MIT, 4k+ stars)
Curated list of large weather models for AI Earth science
Python toolkit for fine-tuning geospatial foundation models
PyTorch domain library for geospatial deep learning providing standardized datasets, samplers, transforms, and pre-trained models for remote sensing, land cover mapping, and environmental monitoring (Microsoft, 4K+ stars)
Allen Institute for AI's global geospatial foundation model for satellite imagery analysis, enabling large-scale mapping of buildings, wind turbines, trees, and land cover from Sentinel-2 data with open-source weights and inference tools (2024)
Semantic-enhanced multi-modal remote sensing foundation model for Earth observation (Nature Machine Intelligence 2025), enabling universal interpretation across diverse satellite imagery modalities with open-source weights and benchmarks
University of Cambridge's foundation model for time-series satellite imagery, enabling efficient extraction of temporal patterns from Earth observation for land classification, canopy height prediction, and other remote sensing tasks
First any-to-any generative foundation model for Earth Observation, enabling unified multimodal understanding and generation across diverse satellite sensors and geospatial tasks through a single architecture (258+ stars)
Agricultural machine learning platform
Ecological modeling and conservation AI
Microsoft AI for Good Lab's open-source biodiversity research hub providing AI models, edge devices, and tools for wildlife monitoring and conservation, including MegaDetector (camera trap animal detection), SPARROW (species recognition), PytorchWildlife (conservation AI toolkit), and bioacoustics analysis pipelines (1K+ stars)
Curated collection of 23,000+ agent skills for empirical research across 8 social science disciplines, enabling reproducible social science research with AI agents (Stanford REAP & CoPaper.AI, 1.1K+ stars, 2026)
Large language model for science
Open-source scientific multimodal foundation model built on a 235B MoE LLM and 6B vision encoder, continually pretrained on 5T tokens including 2.5T scientific-domain tokens, with strong results across chemistry, materials, life science, and earth science benchmarks (2025)
Mathematical reasoning
IBM's open foundation model family for materials and chemistry, covering SMILES, SELFIES, molecular graphs, 3D atom positions, and electron density grids, with a unified toolkit for representation learning and downstream prediction/generation (Apache 2.0, 2024-2025)
15TB collection of 16 large-scale numerical simulation datasets spanning fluid dynamics, MHD, astrophysics, biological systems, and acoustic scattering, with unified PyTorch dataloaders and benchmarks for training foundation models on physical sciences (Polymathic AI, NeurIPS 2024)
Acausal modeling framework for automatically parallelized scientific machine learning (1.5k+ stars)
Scientific machine learning benchmarks & differential equation solvers
Unified interface for local, global, gradient-based and derivative-free optimization (800+ stars)
SDK & library for AI-driven scientific computing applications
Machine learning in Julia
Molecular dynamics analysis
Euclidean neural networks for arbitrary point transformations enabling E(3)-equivariant deep learning, foundational library for building geometry-aware neural networks in molecular dynamics, materials science, and physics
Probabilistic programming
High-performance molecular simulation toolkit
End-to-end molecular dynamics engine built on PyTorch, enabling differentiable simulations with neural network potentials and GPU acceleration for machine learning-accelerated molecular dynamics (MIT License, 707+ stars)
Graph neural network library for PyTorch enabling molecular modeling, materials discovery, protein interaction networks, and scientific knowledge graph learning (23.7k+ stars)
LLM for scientific research papers
PINN research collection
LLM agents across scientific domains
200+ AI for Science papers with Chinese interpretations
Physics-informed ML and SciML
Biomedical AI agents
Ensemble of automated QM workflows that can be run through jupyter notebooks, command lines and yaml files.
A Python program to compute quasi-harmonic thermochemical data from Gaussian frequency calculations.
Calculate mass, elemental composition, and mass distribution spectrum of a molecule given by its chemical formula, relative element weights, or sequence.
A toolkit for visualizations in materials informatics.
A library for processing, analyzing and modeling spectroscopic data.
Library of descriptors to aid in the data-mining of materials properties, created by the Lawrence Berkeley National Laboratory.
Aims to provide useful high-level interfaces that make ML for materials science as easy as possible.
Library for fast calculations of **mo**lecula**r** **fe**at**u**re**s** from 3D structures for machine learning with a focus on steric descriptors.
Ensemble of automated machine learning protocols that can be run sequentially through a single command line. The program works for regression and classification problems.
Self-Referencing Embedded Strings (SELFIES): A 100% robust molecular string representation.
Library with several compositional and structural material descriptors, along with a few pre-trained neural network models of material properties.
A benchmarking platform for molecular generation models.
Makes alchemical free energy calculations easier by leveraging the full power and flexibility of the PyData stack.
A tool and library for creating quantum chemistry input files.