Find open-source science resources

Idle72211 months ago

zhihan1996/DNA_bert_6

by zhihan1996

Idle5.7K11 months ago

Qualification Ontology

Database

An ontology of qualifications, distinctions, and certifications that uses the Phenotype And Trait Ontology term quality (PATO:0000001) as a root term.

Idle111 months ago

andrewdalpino/ESM2-150M-Protein-Molecular-Function

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle2412 months ago

andrewdalpino/ESM2-150M-Protein-Cellular-Component

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle1212 months ago

andrewdalpino/ESM2-150M-Protein-Biological-Process

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle712 months ago

EVOLVEpro

In silico directed evolution framework using few-shot active learning to optimize protein activities, enabling rapid protein engineering with minimal experimental data (352+ stars, 2023)

Idle3601 year ago

NOASSERTION

ChemMCP

LLM for Chemistry

Extensible chemistry toolkit for MCP-enabled AI assistants, exposing molecule analysis, property prediction, and reaction synthesis tools through unified Python/MCP interfaces for chemistry agents and research workflows (Apache 2.0, 2025)

Idle651 year ago

andrewdalpino/ESM2-35M-Protein-Molecular-Function

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle71 year ago

andrewdalpino/ESM2-35M-Protein-Cellular-Component

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle201 year ago

andrewdalpino/ESM2-35M-Protein-Biological-Process

by andrewdalpino

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle61 year ago

Sisigoks/FloraSense

by Sisigoks

image-classification

FloraSense is a fine-tuned Vision Transformer (ViT) model designed for accurate classification of plant species and flora-related imagery. It builds on top of the powerful google/vit-base-patch16-224 base model and is fine-tuned on the PlanterGARDENEDITION dataset curated by Sisigoks, which…

Idle2481 year ago

Uni-Mol

Universal 3D molecular pretraining framework with 209M conformations, scaling to 1.1B parameters (Uni-Mol2) on 800M conformations for molecular property prediction, docking, and quantum chemistry (ICLR 2023, NeurIPS 2024)

Idle1.1K1 year ago

REINVENT

Industrial-grade reinforcement-learning-based generative platform for de novo molecular design with transformer architectures, supporting multi-objective optimization, scaffold decoration, and curriculum learning (AstraZeneca MolecularAI, REINVENT 4, 2024)

Archived3731 year ago

PocketDoc/Dans-PersonalityEngine-V1.3.0-24b

by PocketDoc

Dans-PersonalityEngine-V1.3.0-24b Dans-PersonalityEngine-V1.3.0-24b ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⡂⠀⠁⡄⢀⠁⢀⣈⡄⠌⠐⠠⠤⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⡄⠆⠀⢠⠀⠛⣸⣄⣶⣾⡷⡾⠘⠃⢀⠀⣴⠀⡄⠰⢆⣠⠘⠰⠀⡀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠃⠀⡋⢀⣤⡿⠟⠋⠁⠀⡠⠤⢇⠋⠀⠈⠃⢀⠀⠈⡡⠤⠀⠀⠁⢄⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠁⡂⠀⠀⣀⣔⣧⠟⠋⠀⢀⡄⠀⠪⣀⡂⢁⠛⢆⠀⠀⠀⢎⢀⠄⢡⠢⠛⠠⡀⠀⠄⠀⠀ ⠀⠀⡀⠡⢑⠌⠈⣧⣮⢾⢏⠁⠀⠀⡀⠠⠦⠈⠀⠞⠑⠁⠀⠀⢧⡄⠈⡜⠷⠒⢸⡇⠐⠇⠿⠈⣖⠂⠀…

Idle1651 year ago

PocketDoc/Dans-PersonalityEngine-V1.2.0-24b

by PocketDoc

Dans-PersonalityEngine-V1.2.0-24b ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⡂⠀⠁⡄⢀⠁⢀⣈⡄⠌⠐⠠⠤⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⡄⠆⠀⢠⠀⠛⣸⣄⣶⣾⡷⡾⠘⠃⢀⠀⣴⠀⡄⠰⢆⣠⠘⠰⠀⡀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠃⠀⡋⢀⣤⡿⠟⠋⠁⠀⡠⠤⢇⠋⠀⠈⠃⢀⠀⠈⡡⠤⠀⠀⠁⢄⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠁⡂⠀⠀⣀⣔⣧⠟⠋⠀⢀⡄⠀⠪⣀⡂⢁⠛⢆⠀⠀⠀⢎⢀⠄⢡⠢⠛⠠⡀⠀⠄⠀⠀ ⠀⠀⡀⠡⢑⠌⠈⣧⣮⢾⢏⠁⠀⠀⡀⠠⠦⠈⠀⠞⠑⠁⠀⠀⢧⡄⠈⡜⠷⠒⢸⡇⠐⠇⠿⠈⣖⠂⠀ ⠀⢌⠀⠤⠀⢠⣞⣾⡗⠁⠀⠈⠁⢨⡼⠀⠀⠀⢀⠀⣀⡤⣄⠄⠈⢻⡇⠀⠐⣠⠜⠑⠁⠀⣀⡔⡿⠨⡄…

Idle561 year ago

CatKit

Simulations

General purpose tools for high-throughput catalysis.

Idle1041 year ago

GPL-3.0

unsloth/medgemma-27b-text-it-GGUF

by unsloth

image-text-to-text

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Idle9.5K1 year ago

Autonomous Research Systems (2023-2025 Breakthroughs)

POPPER

Automated hypothesis testing with agentic sequential falsifications

Idle2741 year ago

Slides & Presentation Generation

PaperToSlides

AI-powered tool that automatically converts academic papers (PDF) into presentation slides

Idle131 year ago

ibm-research/GP-MoLFormer-Uniq

by ibm-research

GP-MoLFormer is a class of models pretrained on SMILES string representations of 0.65-1.1B molecules from ZINC and PubChem. This repository is for the model pretrained on all the unique molecules from both datasets.

Idle1.5K1 year ago

ProteinWorkshop

Biology & Medicine

Unified benchmarking framework for protein representation learning, providing standardized interfaces for pre-training and diverse downstream tasks including structure prediction, fitness prediction, and property prediction across multiple protein datasets and model architectures (ICLR 2024, 273+ stars, MIT License)

Idle2741 year ago

QIAIUNCC/EYE-Llama_gqa

by QIAIUNCC

## Model Description EYE-Llama_gqa is a large language model specifically designed for ophthalmic question-answering (QA). It is built upon the Llama 2 architecture and fine-tuned on a the EYE-lit and EYE-QA+ dataset.

Idle1061 year ago

prov-gigapath/prov-gigapath

by prov-gigapath

image-feature-extraction

Idle60.4K1 year ago

medicalai/ClinicalBERT

by medicalai

This model card describes the ClinicalBERT model, which was trained on a large multicenter dataset with a large corpus of 1.2B words of diverse diseases we constructed. We then utilized a large-scale corpus of EHRs from over 3 million patient records to fine tune the base language model.

Idle21.6K1 year ago

Neural Differential Equations

torchdiffeq

PyTorch implementation of neural ODEs

Idle6.4K1 year ago

prithivMLmods/Indian-Western-Food-34

by prithivMLmods

image-classification

!fffffff.png

Idle271 year ago

PurvaTijare/PPTStab

by PurvaTijare

tabular-regression

PPTStab: Prediction and Designing of thermostable proteins with a desired melting temperature

Idle01 year ago

mradermacher/Dans-PersonalityEngine-V1.2.0-24b-i1-GGUF

by mradermacher

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Idle6851 year ago

Scientific Literature RAG & Analysis

paper-reviewer

Generate comprehensive reviews from arXiv papers and convert to blog posts

Idle8361 year ago

AI2BMD

Specialized Frameworks

Microsoft's AI-powered ab initio biomolecular dynamics simulation achieving quantum-mechanical accuracy for proteins with 10,000+ atoms, orders of magnitude faster than DFT using protein fragmentation and ML force fields (Nature 2024)

Idle5751 year ago

Machine Learning for Physics

Equiformer

Equivariant graph attention Transformer (ICLR2023)

Idle2821 year ago

songlab/gpn-brassicales

by songlab

# GPN trained on Arabidopsis thaliana and 7 other Brassicales See https://github.com/songlab-cal/gpn for more details.

Idle3201 year ago

aaditya/Llama3-OpenBioLLM-70B

by aaditya

!image/png

Idle1.4K1 year ago

FremyCompany/BioLORD-2023

by FremyCompany

sentence-similarity

# FremyCompany/BioLORD-2023 This model was trained using BioLORD, a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts.

Idle440.1K1 year ago

QMsolve

Simulations

A module for solving and visualizing the Schrödinger equation.

Idle1.2K1 year ago

BSD-3-Clause

Henrychur/MMedS-Llama-3-8B

by Henrychur

# MMedS-Llama3 💻Github Repo 🖨️arXiv Paper

Idle9481 year ago

Genomics & Bioinformatics

Geneformer

Single-cell transformer foundation model pretrained on 104M human transcriptomes via masked gene prediction, enabling transfer learning for cell type classification, gene network analysis, and in silico perturbation with limited labeled data (Nature 2023, V2 2024)

Idle01 year ago

mradermacher/Palmyra-Med-70B-GGUF

by mradermacher

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Idle3811 year ago

Chart-to-Code & Reproducibility

ChartAssistant / ChartAst (ACL 2024)

Universal chart comprehension and reasoning model

Idle1351 year ago

NOASSERTION

RnaChipIntegrator

Computational biology

Utility that performs integrated analyses of 'gene' data (a set of genes or other genomic features) with 'peak' data (a set of regions, for example ChIP peaks) to identify the genes nearest to each peak, and vice versa.

Idle51 year ago

Artistic-2.0

bcbio-nextgen

Pipelines

Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction.

Idle1K1 year ago

gbyuvd/synthaccess-chemselfies

by gbyuvd

ChemFIE-SA is a BERT-like sequence classifier for predicting synthesis accessibility given a SELFIES string of a compound, fine-tuned from gbyuvd/chemselfies-base-bertmlm on DeepSA's expanded dataset from Wang et al. 2023.

Idle71 year ago

gbyuvd/drugtargetpred-chemselfies

by gbyuvd

This model is a BERT-like sequence classifier for 221 human protein drug targets, fine-tuned from gbyuvd/chemselfies-base-bertmlm on a dataset derived ChemBL34 (Zdrazil et al. 2023). It predicts potential drug targets using chemical structures represented as SELFIES (Self-Referencing Embedded…

Idle81 year ago

ChIP-seq analysis notes from Tommy Tang

ChIP-Seq

Resources on ChIP-seq data which include papers, methods, links to software, and analysis.

Idle8501 year ago

RaphaelMourad/Mistral-DNA-v1-138M-bacteria

by RaphaelMourad

The Mistral-DNA-v1-138M-bacteria Large Language Model (LLM) is a pretrained generative DNA text model with 17.31M parameters x 8 experts = 138.5M parameters. It is derived from Mistral-7B-v0.1 model, which was simplified for DNA: the number of layers and the hidden size were reduced.

Idle101 year ago

smof

Sequence Processing

UNIX-style FASTA manipulation tools.

Idle171 year ago

sagawa/ReactionT5v1-forward

by sagawa

This is a ReactionT5 pre-trained to predict the products of reactions.

Idle471 year ago

BioGPT

Domain-Specific Models

Biomedical text generation

Idle4.5K1 year ago

Genie 2

Diffusion model for scalable protein structure design with multi-motif scaffolding capabilities, achieving state-of-the-art designability, diversity, and novelty through SE(3)-equivariant attention and massive data augmentation (AlQuraishi Lab, 2024)

Idle1921 year ago