Find open-source science resources

Figure & Illustration Generation

Transform arXiv research papers into engaging presentations and YouTube-ready videos

PaperBanana

Automated Code Generation

Automated academic illustration generation for AI scientists, converting research papers into publication-ready figures using VLMs and diffusion models with iterative refinement (PKU & Google Research, 6.2K+ stars, 2026)

Paper2Code

Automated code generation from machine learning research papers into runnable implementations (4.5K+ stars, 2025)

PDF-Extract-Kit (2024)

Comprehensive toolkit for high-quality PDF content extraction with layout detection, formula recognition, and OCR

Nougat (Meta AI)

Neural optical understanding for academic documents, transforms scientific PDFs to Markdown with mathematical formula support

olmOCR (AllenAI)

Toolkit for linearizing academic PDFs into LLM-ready text with high accuracy and structure preservation, optimized for scientific literature extraction

Unstructured

Production-grade ETL for transforming complex documents into structured formats, with open-source API

Marker

High-accuracy PDF→Markdown/JSON/HTML conversion, specialized for tables/formulas/code blocks with benchmark scripts

S2ORC doc2json (AllenAI)

Large-scale PDF/LaTeX/JATS parsing to standardized JSON for millions of papers

GROBID

Machine learning software for extracting structured metadata from scholarly documents

Science-Parse / SPv2 (AllenAI)

Figure & Table Extraction

Parse scientific papers to structured fields (title/author/sections/references)

PDFFigures2

Figure & Table Extraction

Extract figures, tables, captions, and section titles from scholarly PDFs

TableBank

Interactive Research Environments

Large-scale table detection and recognition dataset with pre-trained models

Notebook Intelligence (NBI)

Interactive Research Environments

AI coding assistant for JupyterLab with agent mode, supporting arbitrary LLM providers (2025+)

AutoR

Literature Management Plugins

Human-centered research OS with terminal-first harness and local browser Studio, turning research work into reproducible artifact-backed runs through a 9-stage workflow with human approval gates, resume/rollback controls, and venue-aware manuscript packaging (1K+ stars, 2026)

llm-for-zotero

Literature Management Plugins

Research agent system deeply integrated with Zotero supporting Agent Mode, skills, multi-model backends (OpenAI-compatible, Claude Code, WebChat, Codex), and MinerU PDF parsing for literature Q&A, summarization, figure inspection, and source comparison (1.3K+ stars, 2026)

PapersGPT for Zotero

Scientific Writing & Collaboration

Multi-PDF conversation, retrieval, and citation in Zotero with commercial/local models (Ollama), MCP support

Obsidian Smart Connections

Knowledge Graph Construction

AI-powered note linking and research graph navigation

GraphGen

Knowledge Graph Construction

Knowledge graph-guided synthetic data generation for LLM fine-tuning, achieving strong performance on scientific QA (GPQA-Diamond) and math reasoning (AIME)

KoPA

Autonomous Research Systems (2023-2025 Breakthroughs)

Structure-aware prefix adaptation for integrating LLMs with knowledge graphs (ACM MM 2024)

DeepScientist

Autonomous Research Systems (2023-2025 Breakthroughs)

First system progressively surpassing human SOTA on frontier AI tasks (183.7%, 1.9%, 7.9% improvements), month-long autonomous discovery with 20,000+ GPU hours

Kosmos

Autonomous Research Systems (2023-2025 Breakthroughs)

Extended autonomy AI scientist with 200 parallel agent rollouts, 42K lines of code execution, 1.5K papers analyzed per run, achieving 79.4% accuracy and 7 scientific discoveries (Edison Scientific)

AlphaResearch

Autonomous Research Systems (2023-2025 Breakthroughs)

Autonomous algorithm discovery combining evolutionary search with peer-review reward models, achieving best-known performance on circle packing problems

AutoResearchClaw

Autonomous Research Systems (2023-2025 Breakthroughs)

Fully autonomous research from idea to paper with multi-agent debate, citation verification, and OpenClaw integration (11K+ stars, 2026)

AI-Researcher

Autonomous Research Systems (2023-2025 Breakthroughs)

Autonomous pipeline from literature review→hypothesis→algorithm implementation→publication-level writing with Scientist-Bench evaluation

autoresearch

Autonomous Research Systems (2023-2025 Breakthroughs)

Andrej Karpathy's autonomous LLM research framework: AI agent runs overnight experiments on a real training setup, auto-editing code→5min training→evaluation in a loop, ~100 experiments per night on a single GPU

UniScientist

Autonomous Research Systems (2023-2025 Breakthroughs)

Universal scientific research intelligence covering 50+ disciplines, repositioning LLMs as cross-disciplinary generators with human experts as verifiers; 30B model outperforms Claude Opus and GPT on 5 research benchmarks

PantheonOS (Stanford, 2025)

Evaluation & Benchmarking

Evolvable and privacy-preserving multi-agent framework automating, scaling, and accelerating data sciences with a particular focus on end-to-end single-cell biology analyses; features agentic code evolution, multi-agent team orchestration, distributed architecture, and a community marketplace with 1,000+ curated agents and skills (428+ stars)

ScienceAgentBench (ICLR 2025)

Evaluation & Benchmarking

102 executable tasks from 44 peer-reviewed papers across 4 disciplines with containerized evaluation

SciCode

Academic Review & Evaluation

Research coding benchmark curated by scientists with 338 subproblems across 16 subdomains (physics, math, materials, biology, chemistry), evaluating LLMs on realistic scientific programming tasks with gold-standard solutions (NeurIPS 2024)

LLM-Peer-Review

Web application for LLM-assisted manuscript review and annotation

Lean Copilot

LLMs as copilots for theorem proving in Lean 4, exposing native tactics (`suggest_tactics`, `search_proof`, `select_premises`) that embed language model inference and premise retrieval directly inside the Lean proof environment, supporting local CTranslate2/CUDA inference as well as remote model APIs for interactive and automated proof search (Caltech & NVIDIA, NeurIPS 2024, 1.2K+ stars)

BioDiscoveryAgent

AI agent for biological discovery and research automation

STAgent

Multimodal LLM-based AI agent enabling deep research in spatial transcriptomics, automating analysis and interpretation of spatial gene expression data (Harvard LiuLab, bioRxiv 2025)

MOOSE

Large Language Models for automated open-domain scientific hypotheses discovery (ACL 2024, ICML Best Poster)

SciAgents

Neural Differential Equations

Bioinspired multi-agent intelligent graph reasoning system that autonomously traverses ontological knowledge graphs to generate, critique, and refine novel research hypotheses, demonstrated on bio-inspired materials discovery with cross-disciplinary connection mining (MIT Lamm Group, 2024)

torchdyn

Neural differential equations in PyTorch

PySINDy

Sparse identification of nonlinear dynamics

PSRN

Parallel symbolic regression network evaluating millions of expressions on GPU with automated subtree reuse, Nature Computational Science cover article (MIT, 2026)

Poseidon

Efficient foundation models for PDEs with pretrained transformer-based neural operators and downstream task fine-tuning pipelines, HuggingFace integration for models and datasets (ETH Zurich CAMLab, arXiv 2024)

GAOT (NeurIPS 2025)

Geometry Aware Operator Transformer serving as an efficient and accurate neural surrogate for PDEs on arbitrary domains, combining geometric priors with transformer architectures for scientific computing (ETH Zurich CAMLab, 92+ stars)

PhiFlow

Differentiable PDE solving framework for machine learning with built-in fluid simulation, supporting PyTorch/JAX/TensorFlow backends and enabling neural network training within physical simulations (TUM, MIT License)

exponax

Efficient differentiable n-dimensional PDE solvers built on JAX and Equinox, shipping 46+ built-in equations with Fourier spectral methods, exponential time differencing, and full auto-differentiation for physics-based deep learning workflows (MIT, 200+ stars, 2024)

AlphaFold

Protein structure prediction

AlphaProteo

Deep learning system for de novo design of high-affinity protein binders, achieving strong binding across diverse target classes including challenging intracellular proteins with significantly higher success rates than traditional wet-lab screening methods (Google DeepMind, Nature 2024)

ColabFold (2025 Updates)

AlphaFold/ESMFold accessible implementation with AF3 JSON export, database updates

OpenFold3

Fully open-source (Apache 2.0) biomolecular structure prediction reproducing AlphaFold3, free for academic and commercial use (Columbia AlQuraishi Lab & OpenFold Consortium, 2025)

Protenix

Trainable PyTorch reproduction of AlphaFold 3

RoseTTAFold-All-Atom

All-atom biomolecular structure prediction for protein-nucleic acid-small molecule-metal ion complexes, enabling accurate modeling of covalent modifications and assemblies beyond proteins (Baker Lab, Science 2024)

Proteina-Complexa