Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

3,202 of 5,923 resources

Showing 1,1011,150

Transform arXiv research papers into engaging presentations and YouTube-ready videos

Automated academic illustration generation for AI scientists, converting research papers into publication-ready figures using VLMs and diffusion models with iterative refinement (PKU & Google Research, 6.2K+ stars, 2026)

Automated code generation from machine learning research papers into runnable implementations (4.5K+ stars, 2025)

Comprehensive toolkit for high-quality PDF content extraction with layout detection, formula recognition, and OCR

Neural optical understanding for academic documents, transforms scientific PDFs to Markdown with mathematical formula support

Toolkit for linearizing academic PDFs into LLM-ready text with high accuracy and structure preservation, optimized for scientific literature extraction

Production-grade ETL for transforming complex documents into structured formats, with open-source API

High-accuracy PDF→Markdown/JSON/HTML conversion, specialized for tables/formulas/code blocks with benchmark scripts

Large-scale PDF/LaTeX/JATS parsing to standardized JSON for millions of papers

Machine learning software for extracting structured metadata from scholarly documents

Parse scientific papers to structured fields (title/author/sections/references)

Extract figures, tables, captions, and section titles from scholarly PDFs

Large-scale table detection and recognition dataset with pre-trained models

AI coding assistant for JupyterLab with agent mode, supporting arbitrary LLM providers (2025+)

Human-centered research OS with terminal-first harness and local browser Studio, turning research work into reproducible artifact-backed runs through a 9-stage workflow with human approval gates, resume/rollback controls, and venue-aware manuscript packaging (1K+ stars, 2026)

Research agent system deeply integrated with Zotero supporting Agent Mode, skills, multi-model backends (OpenAI-compatible, Claude Code, WebChat, Codex), and MinerU PDF parsing for literature Q&A, summarization, figure inspection, and source comparison (1.3K+ stars, 2026)

Multi-PDF conversation, retrieval, and citation in Zotero with commercial/local models (Ollama), MCP support

AI-powered note linking and research graph navigation

Knowledge graph-guided synthetic data generation for LLM fine-tuning, achieving strong performance on scientific QA (GPQA-Diamond) and math reasoning (AIME)

Structure-aware prefix adaptation for integrating LLMs with knowledge graphs (ACM MM 2024)

First system progressively surpassing human SOTA on frontier AI tasks (183.7%, 1.9%, 7.9% improvements), month-long autonomous discovery with 20,000+ GPU hours

Extended autonomy AI scientist with 200 parallel agent rollouts, 42K lines of code execution, 1.5K papers analyzed per run, achieving 79.4% accuracy and 7 scientific discoveries (Edison Scientific)

Autonomous algorithm discovery combining evolutionary search with peer-review reward models, achieving best-known performance on circle packing problems

Fully autonomous research from idea to paper with multi-agent debate, citation verification, and OpenClaw integration (11K+ stars, 2026)

Autonomous pipeline from literature review→hypothesis→algorithm implementation→publication-level writing with Scientist-Bench evaluation

Andrej Karpathy's autonomous LLM research framework: AI agent runs overnight experiments on a real training setup, auto-editing code→5min training→evaluation in a loop, ~100 experiments per night on a single GPU

Universal scientific research intelligence covering 50+ disciplines, repositioning LLMs as cross-disciplinary generators with human experts as verifiers; 30B model outperforms Claude Opus and GPT on 5 research benchmarks

Evolvable and privacy-preserving multi-agent framework automating, scaling, and accelerating data sciences with a particular focus on end-to-end single-cell biology analyses; features agentic code evolution, multi-agent team orchestration, distributed architecture, and a community marketplace with 1,000+ curated agents and skills (428+ stars)

102 executable tasks from 44 peer-reviewed papers across 4 disciplines with containerized evaluation

Research coding benchmark curated by scientists with 338 subproblems across 16 subdomains (physics, math, materials, biology, chemistry), evaluating LLMs on realistic scientific programming tasks with gold-standard solutions (NeurIPS 2024)

Web application for LLM-assisted manuscript review and annotation

LLMs as copilots for theorem proving in Lean 4, exposing native tactics (`suggest_tactics`, `search_proof`, `select_premises`) that embed language model inference and premise retrieval directly inside the Lean proof environment, supporting local CTranslate2/CUDA inference as well as remote model APIs for interactive and automated proof search (Caltech & NVIDIA, NeurIPS 2024, 1.2K+ stars)

AI agent for biological discovery and research automation

Multimodal LLM-based AI agent enabling deep research in spatial transcriptomics, automating analysis and interpretation of spatial gene expression data (Harvard LiuLab, bioRxiv 2025)

Large Language Models for automated open-domain scientific hypotheses discovery (ACL 2024, ICML Best Poster)

Bioinspired multi-agent intelligent graph reasoning system that autonomously traverses ontological knowledge graphs to generate, critique, and refine novel research hypotheses, demonstrated on bio-inspired materials discovery with cross-disciplinary connection mining (MIT Lamm Group, 2024)

Neural differential equations in PyTorch

Sparse identification of nonlinear dynamics

Parallel symbolic regression network evaluating millions of expressions on GPU with automated subtree reuse, Nature Computational Science cover article (MIT, 2026)

Efficient foundation models for PDEs with pretrained transformer-based neural operators and downstream task fine-tuning pipelines, HuggingFace integration for models and datasets (ETH Zurich CAMLab, arXiv 2024)

Geometry Aware Operator Transformer serving as an efficient and accurate neural surrogate for PDEs on arbitrary domains, combining geometric priors with transformer architectures for scientific computing (ETH Zurich CAMLab, 92+ stars)

Differentiable PDE solving framework for machine learning with built-in fluid simulation, supporting PyTorch/JAX/TensorFlow backends and enabling neural network training within physical simulations (TUM, MIT License)

Efficient differentiable n-dimensional PDE solvers built on JAX and Equinox, shipping 46+ built-in equations with Fourier spectral methods, exponential time differencing, and full auto-differentiation for physics-based deep learning workflows (MIT, 200+ stars, 2024)

Protein structure prediction

Deep learning system for de novo design of high-affinity protein binders, achieving strong binding across diverse target classes including challenging intracellular proteins with significantly higher success rates than traditional wet-lab screening methods (Google DeepMind, Nature 2024)

AlphaFold/ESMFold accessible implementation with AF3 JSON export, database updates

Fully open-source (Apache 2.0) biomolecular structure prediction reproducing AlphaFold3, free for academic and commercial use (Columbia AlQuraishi Lab & OpenFold Consortium, 2025)

Trainable PyTorch reproduction of AlphaFold 3

All-atom biomolecular structure prediction for protein-nucleic acid-small molecule-metal ion complexes, enabling accurate modeling of covalent modifications and assemblies beyond proteins (Baker Lab, Science 2024)

Flow-based generative model for atomistic protein binder design with test-time optimization, SOTA on binder benchmarks (ICLR 2026 Oral, NVIDIA)