Find open-source science resources

Cross-domain directory aggregating tools, AI models, datasets, and research resources from bio.tools, Bioconductor, HuggingFace, curated GitHub awesome-lists, and more.

156 of 5,684 resources

Showing 150

Gemma 4 E2B fine-tuned on 225K drug–target pairs for novel small-molecule generation.

231 week ago
Python

Scientific equation discovery and symbolic regression using LLMs, combining code generation with evolutionary search (ICLR 2025 Oral)

2499 months ago
Python
MIT

Machine learning model predicting cellular perturbation response across diverse contexts with State Transition (ST) and State Embedding (SE) variants, featuring CLI tooling, PyPI distribution, and Virtual Cell Challenge integration (575+ stars)

5873 weeks ago
Python
NOASSERTION

- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…

3531 week ago
Python

The Simplified Upper Level Ontology (SULO) is ontology with a minimal set of classes and relations to guide the development of a personal health knowledge graph. [from homepage]

166 days ago
Python
MIT

StatescopeR is an R wrapper around Statescope, a computational framework designed to discover cell states from cell type-specific gene expression profiles inferred from bulk RNA profiles.

06 days ago
Python
NOASSERTION

Composite-objective protein design framework integrating Boltz, AlphaFold2, OpenFold3, ProteinMPNN, and ESM via JAX-based gradient optimization over continuous relaxed sequence space for multi-property binder design (319+ stars, MIT License, 2025)

3232 weeks ago
Python
MIT

Abstract:

136.6K2 years ago
Python

Automated and rigorous experiments using AI agents for scientific discovery

3608 months ago
Python
Apache-2.0

Multimodal deep learning framework integrating peptide-MHC protein sequence, structure, and biochemical properties to predict class-I immunogenicity for infectious disease epitopes and cancer neoepitopes with cancer-wildtype contrastive learning, enabling personalized vaccine design (Krishnaswamy Lab, Yale University)

442 months ago
Python
NOASSERTION

- 2025-05-15: We identified a bug in the Bacformer Large code on HuggingFace which resulted in a significant drop in the quality of the output embeddings. This is now fixed, but if you downloaded or cached the model before this date, re-download and use the latest model revision before running…

2131 week ago
Python

Directed message passing neural networks for property prediction of molecules and reactions with uncertainty and interpretation.

2.4K1 month ago
Python
NOASSERTION

GPU-accelerated differentiable physics simulation engine built on NVIDIA Warp, supporting rigid/soft body, cloth, and gradient-based optimization for scientific ML, initiated by Disney Research, DeepMind, and NVIDIA (Linux Foundation, Apache 2.0, 2025)

5K3 days ago
Python
Apache-2.0

Incremental knowledge graph construction using LLMs with entity extraction and Neo4j visualization

9473 weeks ago
Python
Apache-2.0

Medical time series foundation model pretrained on 454B time points from heterogeneous clinical corpora spanning ICU physiological signals and hospital EHR, with continuous-time rotary positional encoding, frequency-specialized Mixture-of-Experts, and neural ODE extrapolation for zero-shot forecasting across irregular and multimodal temporal health data (Microsoft, 399+ stars, MIT License)

3991 month ago
Python
MIT
3.8K1 year ago
Python

In search engines, rerankers are crucial for improving the accuracy of your retrieval system.

150.1K2 weeks ago
Python

In retrieval systems, embedding models determine the quality of your search.

356.2K2 months ago
Python

AI for chemical reaction prediction and synthesis planning

4244 years ago
Python
NOASSERTION

E(3)-equivariant neural network interatomic potentials achieving DFT accuracy with up to 1000× less training data than invariant models, foundational architecture behind MACE and Allegro (Harvard, MIT, Nature Communications 2022)

9145 days ago
Python
MIT

Industrial-grade reinforcement-learning-based generative platform for de novo molecular design with transformer architectures, supporting multi-objective optimization, scaffold decoration, and curriculum learning (AstraZeneca MolecularAI, REINVENT 4, 2024)

Archived3731 year ago
Python
Apache-2.0

!image/png

2.1K1 year ago
Python

Shanghai AI Lab's deep learning-based global weather forecasting model pushing skillful forecasts beyond 10 days lead, with open-source inference code and pretrained ONNX model weights (arXiv 2023)

1695 months ago
Python

The Reagent Ontology (ReO) adheres to OBO Foundry principles (obofoundry.org) to model the domain of biomedical research reagents, considered broadly to include materials applied “chemically” in scientific techniques to facilitate generation of data and research materials. ReO is a modular ontology that re-uses existing ontologies to facilitate cross-domain interoperability. It consists of reagents and their properties, linking diverse biological and experimental entities to which they are related. ReO supports community use cases by providing a flexible, extensible, and deeply integrated framework that can be adapted and extended with more specific modeling to meet application needs.

06 years ago
Python
NOASSERTION

Unified Python framework for bulk, single-cell, and spatial RNA-seq multi-omics analysis with deep learning deconvolution (VAE) and graph neural networks, bridging Bindea, Bindea, scanpy and squidpy ecosystems (Nature Communications 2024)

1K22 hours ago
Python
GPL-3.0

Open-source framework for building physics-ML models at scale (renamed from Modulus, 2025)

2.8K3 days ago
Python
Apache-2.0
636 days ago
Python

Computation Pipeline library for python widely used in science and bioinformatics.

1754 years ago
Python
MIT

For a convenient overview and download list, visit our model page for this model.

4411 week ago
Python

This model is a fine-tuned version of DeBERTa on the PubMED Dataset.

48.7K2 years ago
Python

ECMWF's unified framework and command-line tool to run AI-based weather forecasting models (GraphCast, Aurora, Pangu, NeuralGCM, FourCastNet) with operational ECMWF data infrastructure, enabling standardized inference and benchmarking across state-of-the-art meteorological AI systems (ECMWF, 576+ stars)

5795 months ago
Python
Apache-2.0

Deep learning for chemistry and materials science remains a novel field with lots of potiential. However, the popularity of transfer learning based methods in areas such as NLP and computer vision have not yet been effectively developed in computational chemistry + machine learning.

254.2K5 years ago
Python

# ChemGPT 19M ChemGPT is based on the GPT-Neo model and was introduced in the paper Neural Scaling of Deep Chemical Models.

6.8K3 years ago
Python

> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > model. For instructions on how to install TransformerEngine, please refer to the > official documentation.

6548 months ago
Python

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 54% better perplexity than standard knowledge distillation at 9.4x compression.

53 months ago
Python

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

6341 year ago
Python
46710 months ago
Python
1.9K10 months ago
Python

For a convenient overview and download list, visit our model page for this model.

602 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

155 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

2511 months ago
Python

中文版说明

778 months ago
Python

For a convenient overview and download list, visit our model page for this model.

3.6K10 months ago
Python

Fine-tuned version of google/gemma-4-E4B-it across three professional domains — Medical, Legal, and Finance — using QLoRA (4-bit NF4) with Optuna-tuned hyperparameters, trained on Kaggle T4 GPU.

1K1 month ago
Python

# GPN trained on Arabidopsis thaliana and 7 other Brassicales See https://github.com/songlab-cal/gpn for more details.

4791 year ago
Python
48410 months ago
Python