Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active68
Idle52
Stale25

Domain

text-generation40
fill-mask26
image-text-to-text16
text-classification10
feature-extraction9
image-feature-extraction4
question-answering4
sentence-similarity4
image-classification3
token-classification3
text-ranking2
translation2
(None)15

Language(1)

Python145
C1
(None)142

License

(None)145

Source(1)

github225
huggingface145
awesome-ai-for-science134
bio.tools39
awesome-python-chemistry30
bioregistry28
awesome-bioinformatics16
awesome-cheminformatics11
awesome-scientific-python6

Type

AI model145

Filters

Health

Active68
Idle52
Stale25

Domain

text-generation40
fill-mask26
image-text-to-text16
text-classification10
feature-extraction9
image-feature-extraction4
question-answering4
sentence-similarity4
image-classification3
token-classification3
text-ranking2
translation2
(None)15

Language(1)

Python145
C1
(None)142

License

(None)145

Source(1)

github225
huggingface145
awesome-ai-for-science134
bio.tools39
awesome-python-chemistry30
bioregistry28
awesome-bioinformatics16
awesome-cheminformatics11
awesome-scientific-python6

Type

AI model145

145 of 5,893 resources

Showing 101–145

andrewdalpino/ESM2-150M-Protein-Cellular-Component

by andrewdalpino

text-classification

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle↓1212 months ago

andrewdalpino/ESM2-150M-Protein-Biological-Process

by andrewdalpino

text-classification

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle↓712 months ago

andrewdalpino/ESM2-35M-Protein-Molecular-Function

by andrewdalpino

text-classification

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle↓71 year ago

andrewdalpino/ESM2-35M-Protein-Cellular-Component

by andrewdalpino

text-classification

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle↓201 year ago

andrewdalpino/ESM2-35M-Protein-Biological-Process

by andrewdalpino

text-classification

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle↓61 year ago

Sisigoks/FloraSense

by Sisigoks

image-classification

FloraSense is a fine-tuned Vision Transformer (ViT) model designed for accurate classification of plant species and flora-related imagery. It builds on top of the powerful google/vit-base-patch16-224 base model and is fine-tuned on the PlanterGARDENEDITION dataset curated by Sisigoks, which…

Idle↓2481 year ago

PocketDoc/Dans-PersonalityEngine-V1.3.0-24b

by PocketDoc

text-generation

Dans-PersonalityEngine-V1.3.0-24b Dans-PersonalityEngine-V1.3.0-24b ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⡂⠀⠁⡄⢀⠁⢀⣈⡄⠌⠐⠠⠤⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⡄⠆⠀⢠⠀⠛⣸⣄⣶⣾⡷⡾⠘⠃⢀⠀⣴⠀⡄⠰⢆⣠⠘⠰⠀⡀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠃⠀⡋⢀⣤⡿⠟⠋⠁⠀⡠⠤⢇⠋⠀⠈⠃⢀⠀⠈⡡⠤⠀⠀⠁⢄⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠁⡂⠀⠀⣀⣔⣧⠟⠋⠀⢀⡄⠀⠪⣀⡂⢁⠛⢆⠀⠀⠀⢎⢀⠄⢡⠢⠛⠠⡀⠀⠄⠀⠀ ⠀⠀⡀⠡⢑⠌⠈⣧⣮⢾⢏⠁⠀⠀⡀⠠⠦⠈⠀⠞⠑⠁⠀⠀⢧⡄⠈⡜⠷⠒⢸⡇⠐⠇⠿⠈⣖⠂⠀…

Idle↓1651 year ago

PocketDoc/Dans-PersonalityEngine-V1.2.0-24b

by PocketDoc

text-generation

Dans-PersonalityEngine-V1.2.0-24b ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⡂⠀⠁⡄⢀⠁⢀⣈⡄⠌⠐⠠⠤⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⡄⠆⠀⢠⠀⠛⣸⣄⣶⣾⡷⡾⠘⠃⢀⠀⣴⠀⡄⠰⢆⣠⠘⠰⠀⡀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠃⠀⡋⢀⣤⡿⠟⠋⠁⠀⡠⠤⢇⠋⠀⠈⠃⢀⠀⠈⡡⠤⠀⠀⠁⢄⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠁⡂⠀⠀⣀⣔⣧⠟⠋⠀⢀⡄⠀⠪⣀⡂⢁⠛⢆⠀⠀⠀⢎⢀⠄⢡⠢⠛⠠⡀⠀⠄⠀⠀ ⠀⠀⡀⠡⢑⠌⠈⣧⣮⢾⢏⠁⠀⠀⡀⠠⠦⠈⠀⠞⠑⠁⠀⠀⢧⡄⠈⡜⠷⠒⢸⡇⠐⠇⠿⠈⣖⠂⠀ ⠀⢌⠀⠤⠀⢠⣞⣾⡗⠁⠀⠈⠁⢨⡼⠀⠀⠀⢀⠀⣀⡤⣄⠄⠈⢻⡇⠀⠐⣠⠜⠑⠁⠀⣀⡔⡿⠨⡄…

Idle↓561 year ago

unsloth/medgemma-27b-text-it-GGUF

by unsloth

image-text-to-text

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Idle↓9.5K1 year ago

ibm-research/GP-MoLFormer-Uniq

by ibm-research

text-generation

GP-MoLFormer is a class of models pretrained on SMILES string representations of 0.65-1.1B molecules from ZINC and PubChem. This repository is for the model pretrained on all the unique molecules from both datasets.

Idle↓1.5K1 year ago

QIAIUNCC/EYE-Llama_gqa

by QIAIUNCC

text-generation

## Model Description EYE-Llama_gqa is a large language model specifically designed for ophthalmic question-answering (QA). It is built upon the Llama 2 architecture and fine-tuned on a the EYE-lit and EYE-QA+ dataset.

Idle↓1061 year ago

prov-gigapath/prov-gigapath

by prov-gigapath

image-feature-extraction

Idle↓60.4K1 year ago

medicalai/ClinicalBERT

by medicalai

This model card describes the ClinicalBERT model, which was trained on a large multicenter dataset with a large corpus of 1.2B words of diverse diseases we constructed. We then utilized a large-scale corpus of EHRs from over 3 million patient records to fine tune the base language model.

Idle↓21.6K1 year ago

prithivMLmods/Indian-Western-Food-34

by prithivMLmods

image-classification

!fffffff.png

Idle↓271 year ago

PurvaTijare/PPTStab

by PurvaTijare

tabular-regression

PPTStab: Prediction and Designing of thermostable proteins with a desired melting temperature

Idle↓01 year ago

mradermacher/Dans-PersonalityEngine-V1.2.0-24b-i1-GGUF

by mradermacher

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Idle↓6851 year ago

songlab/gpn-brassicales

by songlab

# GPN trained on Arabidopsis thaliana and 7 other Brassicales See https://github.com/songlab-cal/gpn for more details.

Idle↓3201 year ago

aaditya/Llama3-OpenBioLLM-70B

by aaditya

!image/png

Idle↓1.4K1 year ago

FremyCompany/BioLORD-2023

by FremyCompany

sentence-similarity

# FremyCompany/BioLORD-2023 This model was trained using BioLORD, a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts.

Idle↓440.1K1 year ago

Henrychur/MMedS-Llama-3-8B

by Henrychur

text-generation

# MMedS-Llama3 💻Github Repo 🖨️arXiv Paper

Idle↓9481 year ago

mradermacher/Palmyra-Med-70B-GGUF

by mradermacher

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Idle↓3811 year ago

gbyuvd/synthaccess-chemselfies

by gbyuvd

text-classification

ChemFIE-SA is a BERT-like sequence classifier for predicting synthesis accessibility given a SELFIES string of a compound, fine-tuned from gbyuvd/chemselfies-base-bertmlm on DeepSA's expanded dataset from Wang et al. 2023.

Idle↓71 year ago

gbyuvd/drugtargetpred-chemselfies

by gbyuvd

text-classification

This model is a BERT-like sequence classifier for 221 human protein drug targets, fine-tuned from gbyuvd/chemselfies-base-bertmlm on a dataset derived ChemBL34 (Zdrazil et al. 2023). It predicts potential drug targets using chemical structures represented as SELFIES (Self-Referencing Embedded…

Idle↓81 year ago

RaphaelMourad/Mistral-DNA-v1-138M-bacteria

by RaphaelMourad

text-generation

The Mistral-DNA-v1-138M-bacteria Large Language Model (LLM) is a pretrained generative DNA text model with 17.31M parameters x 8 experts = 138.5M parameters. It is derived from Mistral-7B-v0.1 model, which was simplified for DNA: the number of layers and the hidden size were reduced.

Idle↓101 year ago

sagawa/ReactionT5v1-forward

by sagawa

This is a ReactionT5 pre-trained to predict the products of reactions.

Idle↓471 year ago

johnsnowlabs/JSL-MedLlama-3-8B-v2.0

by johnsnowlabs

text-generation

# JSL-MedLlama-3-8B-v2.0

Stale↓6032 years ago

Lolimipsu/so_vits_yuuka_voice_model

by Lolimipsu

question-answering

!image/png

Stale↓02 years ago

blaze999/Medical-NER

by blaze999

token-classification

This model is a fine-tuned version of DeBERTa on the PubMED Dataset.

Stale↓45.1K2 years ago

BioMistral/BioMistral-7B

by BioMistral

text-generation

Abstract:

Stale↓102.4K2 years ago

BioMistral/BioMistral-7B-GGUF

by BioMistral

text-generation

Abstract:

Stale↓8752 years ago

knowledgator/SMILES2IUPAC-canonical-base

by knowledgator

text-generation

SMILES2IUPAC-canonical-base was designed to accurately translate SMILES chemical names to IUPAC standards.

Stale↓5.1K2 years ago

TachyHealth/Thealth_Mixtral-8x7B

by TachyHealth

text-generation

Stale↓02 years ago

AmelieSchreiber/esm_interact

by AmelieSchreiber

This model was finetuned on concatenated pairs of interacting proteins in much the same way as PepMLM. It is meant to generate interaction partners for proteins using the masked language modeling capabilities of ESM-2. The model is not well tested, so use with caution.

Stale↓42 years ago

Rostlab/ProstT5

by Rostlab

ProstT5 is a protein language model (pLM) which can translate between protein sequence and structure. !ProstT5 pre-training and inference

Stale↓7.8K2 years ago

Galahad3x/QAModelForPatho

by Galahad3x

question-answering

Question Answering Model for the PathoTHREAT Project

Stale↓42 years ago

alabnii/jmedroberta-base-sentencepiece-vocab50000

by alabnii

This is a Japanese RoBERTa base model pre-trained on academic articles in medical sciences collected by Japan Science and Technology Agency (JST).

Stale↓1462 years ago

yasinelh/retinal_vessel_U-Net

by yasinelh

image-segmentation

I present a demo showcasing retinal vessel segmentation using the U-Net model, which is a well-known and widely used model in medical image segmentation. The model was trained on the DRIVE dataset, and the training process was conducted on Google Colab.

Stale↓02 years ago

cambridgeltl/SapBERT-from-PubMedBERT-fulltext

by cambridgeltl

feature-extraction

datasets: - UMLS

Stale↓1.8M2 years ago

Dr-BERT/DrBERT-4GB-CP-PubMedBERT

by Dr-BERT

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

Stale↓523 years ago

Dr-BERT/DrBERT-4GB-CP-CamemBERT

by Dr-BERT

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

Stale↓03 years ago

Dr-BERT/DrBERT-4GB

by Dr-BERT

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

Stale↓1963 years ago

Dr-BERT/DrBERT-7GB

by Dr-BERT

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains.

Stale↓1.4K3 years ago

UEG/interface

by UEG

text-classification

Stale↓03 years ago

tinnerofkors/kors

by tinnerofkors

text-classification

Stale↓03 years ago

K8778/universe

by K8778

Stale↓03 years ago

SamKenX-Hub-Community/SamKenXAI-engine-compiting

by SamKenX-Hub-Community

depth-estimation

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Stale↓03 years ago

ncfrey/ChemGPT-1.2B

by ncfrey

text-generation

# ChemGPT 1.2B ChemGPT is based on the GPT-Neo model and was introduced in the paper Neural Scaling of Deep Chemical Models.

Stale↓3.3K3 years ago

ncfrey/ChemGPT-19M

by ncfrey

text-generation

# ChemGPT 19M ChemGPT is based on the GPT-Neo model and was introduced in the paper Neural Scaling of Deep Chemical Models.

Stale↓2.3K3 years ago

ncfrey/ChemGPT-4.7M

by ncfrey

text-generation

# ChemGPT 4.7M ChemGPT is based on the GPT-Neo model and was introduced in the paper Neural Scaling of Deep Chemical Models.

Stale↓3K3 years ago

seyonec/ChemBERTa-zinc-base-v1

by seyonec

Deep learning for chemistry and materials science remains a novel field with lots of potiential. However, the popularity of transfer learning based methods in areas such as NLP and computer vision have not yet been effectively developed in computational chemistry + machine learning.

Stale↓281.7K5 years ago

1
2
3

Next →

Submit a resource bio.tools Awesome Bioinformatics