Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

304 of 5,819 resources

Showing 151200

Fine-tuned version of google/gemma-4-E4B-it across three professional domains — Medical, Legal, and Finance — using QLoRA (4-bit NF4) with Optuna-tuned hyperparameters, trained on Kaggle T4 GPU.

Active1K1 month ago
Python

Scalable agentic training environment for code-centric reasoning in biomedical data science

Active1141 month ago
Python

Multi-modal foundation model for biomolecular structure prediction (proteins, small molecules, DNA, RNA, glycans) achieving SOTA across benchmarks, with optional MSA/template support (Chai Discovery, 2024)

Active1.9K2 months ago
Python
Apache-2.0

The Context and Measurement Ontology (COMO) contains ontological terms to describe the context for various types of experimental data and measurements. It is useful in its current state for several different environmental microbiology projects. This ontology is used in multiple CORAL (Contextual Ontology-based Repository Analysis Library) deployments.

Active82 months ago
Python
AGPL-3.0

Programmatic data labeling and weak supervision

Active6K2 months ago
Python
Apache-2.0

Parameter/topology editor and molecular simulator with visualization capability.

Active4522 months ago
Python

Medical time series foundation model pretrained on 454B time points from heterogeneous clinical corpora spanning ICU physiological signals and hospital EHR, with continuous-time rotary positional encoding, frequency-specialized Mixture-of-Experts, and neural ODE extrapolation for zero-shot forecasting across irregular and multimodal temporal health data (Microsoft, 399+ stars, MIT License)

Active3992 months ago
Python
MIT

Accessible protein design platform via Google Colab integrating AlphaFold2, RoseTTAFold, and ProteinMPNN for de novo hallucination, fixed backbone design, and binder design (Sergey Ovchinnikov, 2022+)

Active9132 months ago
Python
NOASSERTION

## Model Description This is a lightweight, high-performance image classification model built to diagnose histopathological scans of lung and colon tissues. This model was specifically designed for rapid web deployment without sacrificing clinical accuracy.

Active42 months ago
Python

LLM agent framework for Earth Observation with 104 specialized tools across 5 functional kits

Active1522 months ago
Python
MIT

Tool to build force field input files for molecular simulation.

Active2012 months ago
Python
MIT

GFF and GTF file manipulation and interconversion.

Active3192 months ago
Python
MIT

Google DeepMind's diffusion-based ensemble weather forecasting model at 0.25° resolution, outperforming ECMWF ENS on 97.2% of targets up to 15 days ahead, with open-source code and weights (Nature 2024)

Active6.7K2 months ago
Python
Apache-2.0

A library and command-line tool for building and analyzing complex homogeneous microkinetic models from quantum chemistry calculations, with support for quasi-harmonic thermochemistry, quantum tunnelling corrections, molecular symmetries and more.

Active642 months ago
Python
MIT

Unified ML/DL framework for drug discovery workflows, integrating RDKit, DeepChem, and scikit-learn with SHAP explainability

Active1782 months ago
Python
BSD-2-Clause

An EMMO-based domain ontology for atomistic and electronic modelling.

Active12 months ago
Python
CC-BY-4.0

End-to-end semi-automated scientific discovery system that designs, iterates, and analyzes code-based experiments via LLM-as-a-mutator over scientific articles and code examples; auto-creates, runs, and debugs experiment code in containers and writes meta-analysis reports (339+ stars, Apache 2.0)

Active3392 months ago
Python
Apache-2.0

Free-text promptable universal 3D medical image segmentation foundation model enabling zero-shot segmentation of diverse anatomical structures and pathologies via natural language prompts across CT, MRI, and other volumetric imaging modalities (DKFZ, 195+ stars, Apache 2.0)

Active1972 months ago
Python
Apache-2.0

MarkushGrapher-2 is an end-to-end multimodal model for recognizing chemical structures from patent document images. It jointly encodes vision, text, and layout information to convert Markush structure images into machine-readable CXSMILES representations.

Active2542 months ago
Python

# or·a·cle /ˈôrəkəl/ — a source of wise counsel; one who provides authoritative knowledge. From Latin ōrāculum, meaning divine announcement. In computer science, an oracle is a black box that always returns the correct answer — you don't ask it how it knows, you ask and it answers.

Active1422 months ago
Python

Predicts the pKa values of ionizable groups in proteins and protein-ligand complexes based in the 3D structure.

Active3612 months ago
Python
LGPL-2.1

Deep learning-based variant caller

Active3.7K2 months ago
Python
BSD-3-Clause

Open-source implementation of AlphaEvolve's evolutionary coding agent paradigm, enabling LLMs to autonomously discover and optimize algorithms through iterative evolution, matching the approach behind DeepMind's breakthrough matrix multiplication discovery (6.2K+ stars, 2025)

Active6.4K2 months ago
Python
Apache-2.0

The Graphic Descriptor Ontology (GDO) is intended for use in describing graphics that represent the form of objects. It uses the language of visual communication, illustration, and technical drawing. The GDO is rooted in the Basic Formal Ontology (BFO) and uses several classes from the Information Entity Ontology of the Common Core Ontologies as a mid-level ontology. [from https://gdo.endlessforms.info/about]

Active02 months ago
Python
CC-BY-4.0

In retrieval systems, embedding models determine the quality of your search.

Active272.3K2 months ago
Python

A Python package for protein dynamics analysis

Active5463 months ago
Python
NOASSERTION

In search enginers, rerankers are crucial for improving the accuracy of your retrieval system.

Active22.9K3 months ago
Python

Structure-aware protein language model using 3D structural vocabulary (Foldseek) for joint sequence-structure pretraining, achieving SOTA on protein engineering and fitness prediction benchmarks (ICML 2024, Westlake University & Repl)

Active6043 months ago
Python
MIT

Multimodal deep learning framework integrating peptide-MHC protein sequence, structure, and biochemical properties to predict class-I immunogenicity for infectious disease epitopes and cancer neoepitopes with cancer-wildtype contrastive learning, enabling personalized vaccine design (Krishnaswamy Lab, Yale University)

Active443 months ago
Python
NOASSERTION

Interactive personal genome analysis toolkit using Claude Code and Python. Parses raw genotyping data from consumer DNA services and analyzes SNPs across 17 categories including health risks, pharmacogenomics, ancestry, and nutrition, with a terminal-style HTML dashboard.

Active443 months ago
Python
MIT

For a convenient overview and download list, visit our model page for this model.

Active613 months ago
Python

!image/png

Active83 months ago
Python

Deep learning library for Chemistry based on Tensorflow

Active6.8K3 months ago
Python
MIT

![Language: Multilingual]()

Active1.4K3 months ago
Python

This ontology integrates cell type markers for cells in the Cell Ontology from various sources along with details of marker context (anatomical context, assay), confidence (where available) and provenance. [from repository]

Active13 months ago
Python

Sahal Shaji Mullappilly\, Mohammed Irfan K\, Omair Mohamed, Mohamed Zidan, Fahad Khan, Salman Khan, Rao Muhammad Anwer, and Hisham Cholakkal

Active3723 months ago
Python

First benchmark evaluating LLMs' ability to rediscover scientific laws through interactive experimentation across 324 tasks in 12 physics domains, featuring memorization-resistant metaphysical shifts of canonical laws (HKUST)

Active1513 months ago
Python
MIT

GenBio AI's software stack for the AI-Driven Digital Organism, supporting adaptation and finetuning of multiscale biological foundation models across DNA, RNA, protein, structure, and single-cell tasks with reproducible CLIs and pretrained model zoo (2025)

Active1153 months ago
Python
NOASSERTION

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 31% better perplexity than standard knowledge distillation at 3.8x compression.

Active733 months ago
Python

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 54% better perplexity than standard knowledge distillation at 9.4x compression.

Active53 months ago
Python

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 87% better perplexity than standard knowledge distillation at 20x compression.

Active143 months ago
Python

FASTQ and SAM quality control using Python.

Active1093 months ago
Python
MIT
Active6.5K3 months ago
Python

A library containing basis sets for use in quantum chemistry calculations. In addition, this library has functionality for manipulation of basis set data.

Active1993 months ago
Python
BSD-3-Clause

A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.

Active373 months ago
Python
MIT

Universal pretrained neural network potential with charge and magnetic moment awareness, trained on 1.5M+ Materials Project inorganic structures for charge-informed molecular dynamics and phase diagram prediction (Berkeley, Nature Machine Intelligence 2023 Cover)

Active3833 months ago
Python
NOASSERTION

Rectified Quaternion Flow for efficient protein backbone generation, 37× faster than RFDiffusion with 0.972 designability (ICML 2025)

Active843 months ago
Python

Azure Semantic Kernel multi-agent PPT generation reference

Active493 months ago
Python
MIT

From Inquiry to Decision: Building Trustworthy Medical AI

Active204 months ago
Python
Active454 months ago
Python