ProteinWorkshop

github.com/a-r-j/proteinworkshop

Unified benchmarking framework for protein representation learning, providing standardized interfaces for pre-training and diverse downstream tasks including structure prediction, fitness prediction, and property prediction across multiple protein datasets and model architectures (ICLR 2024, 273+ stars, MIT License)

Sourced from

  • Awesome AI for Sciencegithub.com/a-r-j/proteinworkshop

Related resources

Therapeutics Data Commons: 66 AI-ready datasets across 22 drug discovery tasks with 29 leaderboards, covering target identification, molecular generation, ADMET prediction, and clinical trial outcomes (Harvard MIMS, NeurIPS 2021/2024)

Idle1.3K11 months ago
Jupyter Notebook
MIT

Large-scale benchmark suite for protein fitness prediction and design, aggregating 200+ deep mutational scanning assays and clinical variant datasets across diverse protein families and taxa, with standardized zero-shot and supervised leaderboards for variant effect prediction, mutation effect prediction, and protein language model evaluation (OATML & Marks Lab, NeurIPS 2023 Spotlight, Datasets & Benchmarks)

Active4302 months ago
HTML
MIT

Comprehensive collection of Chinese medical datasets for AI research

Idle2801 year ago

Curated open dataset collection of 602M+ observational and perturbational single-cell profiles for accelerating virtual cell model creation, integrating Tahoe-100M and scBaseCount data with Google Cloud Marketplace distribution (Arc Institute, 2025-2026)