Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

10 of 5,893 resources

Advanced OCR with PP-StructureV3 document parsing, 13% accuracy improvement, supports 80+ languages

Active81.3K5 days ago
Python
Apache-2.0

SOTA multimodal document parsing with 1.2B parameters outperforming GPT-4o, converts PDFs to LLM-ready Markdown/JSON

Active65.9K1 week ago
Python
NOASSERTION

Comprehensive toolkit for high-quality PDF content extraction with layout detection, formula recognition, and OCR

Neural optical understanding for academic documents, transforms scientific PDFs to Markdown with mathematical formula support

Toolkit for linearizing academic PDFs into LLM-ready text with high accuracy and structure preservation, optimized for scientific literature extraction

Production-grade ETL for transforming complex documents into structured formats, with open-source API

High-accuracy PDF→Markdown/JSON/HTML conversion, specialized for tables/formulas/code blocks with benchmark scripts

Large-scale PDF/LaTeX/JATS parsing to standardized JSON for millions of papers

Machine learning software for extracting structured metadata from scholarly documents

Parse scientific papers to structured fields (title/author/sections/references)