Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

303 of 5,923 resources

Showing 201250

This repository contains pre-trained models from RadImageNet, a large-scale radiologic image dataset designed to facilitate transfer learning for medical imaging applications.

Idle011 months ago

Welcome to IBM's series of large foundation models for sustainable materials. Our models span a variety of representations and modalities, including SMILES, SELFIES, 3D atom positions, 3D density grids, molecular graphs, and other formats.

Idle19311 months ago
Python
Idle2.1K11 months ago
Python
Idle72811 months ago
Python
Idle72211 months ago
Python
Idle5.7K11 months ago
Python
Idle011 months ago

!ether0 logo

Idle25611 months ago

# Model details ## Model description Nature Language Model (NatureLM) is a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including…

Idle2011 months ago

# Model details ## Model description Nature Language Model (NatureLM) is a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including…

Idle26111 months ago

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle2412 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle1212 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle712 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle71 year ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle201 year ago
Python

This model had been created as part of joint research of HUMADEX research group (https://www.linkedin.com/company/101563689/) and has received funding by the European Union Horizon Europe Research and Innovation Program project SMILE (grant number 101080923) and Marie Skłodowska-Curie Actions…

Idle3011 year ago

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle61 year ago
Python

CineMA is a foundation model for Cine cardiac magnetic resonance (CMR) imaging based on Masked-Autoencoder. CineMA has been pre-trained on UK Biobank data and fine-tuned on multiple clinically relevant tasks such as ventricle and myocaridum segmentation, ejection fraction (EF) regression,…

Idle01 year ago

FloraSense is a fine-tuned Vision Transformer (ViT) model designed for accurate classification of plant species and flora-related imagery. It builds on top of the powerful google/vit-base-patch16-224 base model and is fine-tuned on the PlanterGARDENEDITION dataset curated by Sisigoks, which…

Idle2481 year ago
Python

Using llama.cpp release b5466 for quantization.

Idle10.9K1 year ago

Dans-PersonalityEngine-V1.3.0-24b Dans-PersonalityEngine-V1.3.0-24b ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⡂⠀⠁⡄⢀⠁⢀⣈⡄⠌⠐⠠⠤⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⡄⠆⠀⢠⠀⠛⣸⣄⣶⣾⡷⡾⠘⠃⢀⠀⣴⠀⡄⠰⢆⣠⠘⠰⠀⡀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠃⠀⡋⢀⣤⡿⠟⠋⠁⠀⡠⠤⢇⠋⠀⠈⠃⢀⠀⠈⡡⠤⠀⠀⠁⢄⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠁⡂⠀⠀⣀⣔⣧⠟⠋⠀⢀⡄⠀⠪⣀⡂⢁⠛⢆⠀⠀⠀⢎⢀⠄⢡⠢⠛⠠⡀⠀⠄⠀⠀ ⠀⠀⡀⠡⢑⠌⠈⣧⣮⢾⢏⠁⠀⠀⡀⠠⠦⠈⠀⠞⠑⠁⠀⠀⢧⡄⠈⡜⠷⠒⢸⡇⠐⠇⠿⠈⣖⠂⠀…

Idle1651 year ago
Python

Dans-PersonalityEngine-V1.2.0-24b ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⡂⠀⠁⡄⢀⠁⢀⣈⡄⠌⠐⠠⠤⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⡄⠆⠀⢠⠀⠛⣸⣄⣶⣾⡷⡾⠘⠃⢀⠀⣴⠀⡄⠰⢆⣠⠘⠰⠀⡀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠃⠀⡋⢀⣤⡿⠟⠋⠁⠀⡠⠤⢇⠋⠀⠈⠃⢀⠀⠈⡡⠤⠀⠀⠁⢄⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠁⡂⠀⠀⣀⣔⣧⠟⠋⠀⢀⡄⠀⠪⣀⡂⢁⠛⢆⠀⠀⠀⢎⢀⠄⢡⠢⠛⠠⡀⠀⠄⠀⠀ ⠀⠀⡀⠡⢑⠌⠈⣧⣮⢾⢏⠁⠀⠀⡀⠠⠦⠈⠀⠞⠑⠁⠀⠀⢧⡄⠈⡜⠷⠒⢸⡇⠐⠇⠿⠈⣖⠂⠀ ⠀⢌⠀⠤⠀⢠⣞⣾⡗⠁⠀⠈⠁⢨⡼⠀⠀⠀⢀⠀⣀⡤⣄⠄⠈⢻⡇⠀⠐⣠⠜⠑⠁⠀⣀⡔⡿⠨⡄…

Idle561 year ago
Python

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Idle9.5K1 year ago
Python

GP-MoLFormer is a class of models pretrained on SMILES string representations of 0.65-1.1B molecules from ZINC and PubChem. This repository is for the model pretrained on all the unique molecules from both datasets.

Idle1.5K1 year ago
Python

Model Repo: xformai/qwen-0.6b-mentalhealth-support Base Model: Qwen/Qwen-0.5B Task: Empathetic Conversational AI for mental health & emotional support Fine-Tuned By: XformAI

Idle71 year ago

## Model Description EYE-Llama_gqa is a large language model specifically designed for ophthalmic question-answering (QA). It is built upon the Llama 2 architecture and fine-tuned on a the EYE-lit and EYE-QA+ dataset.

Idle1061 year ago
Python

## Overview This project focuses on curating and modeling bioactivity data of small molecules targeting immune receptors. Using datasets from ImmtorLig_DB, we applied machine learning techniques to predict interactions between small molecules and immune receptors or cytokines, aiding drug discovery…

Idle01 year ago

This model card describes the ClinicalBERT model, which was trained on a large multicenter dataset with a large corpus of 1.2B words of diverse diseases we constructed. We then utilized a large-scale corpus of EHRs from over 3 million patient records to fine tune the base language model.

Idle21.6K1 year ago
Python

Protein solubility is a critical factor in both pharmaceutical research and production processes, as it can significantly impact the quality and function of a protein. This is an example for finetuning ibm/biomed.omics.bl.sm-ted-458m for protein solubility prediction (binary classification) based…

Idle1181 year ago

!fffffff.png

Idle271 year ago
Python

PPTStab: Prediction and Designing of thermostable proteins with a desired melting temperature

Idle01 year ago
Python

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Idle6851 year ago
Python

한국어 모델을 이용한 SapBERT(Self-alignment pretraining for BERT)입니다. 한·영 의료 용어 사전인 KOSTOM을 사용해 한국어 용어와 영어 용어를 정렬했습니다. 참고: SapBERT, Original Code

Idle191 year ago

This is the base model of GenomeOcean-4B. It is trained with Causal Language Modeling (CLM) and uses a BPE tokenizer with 4096 tokens. It supports a maximum sequence length of 10240 tokens (~50kbp).

Idle3.3K1 year ago
Idle71 year ago

# GPN trained on Arabidopsis thaliana and 7 other Brassicales See https://github.com/songlab-cal/gpn for more details.

Idle3201 year ago
Python

!image/png

Idle1.4K1 year ago
Python

BiomedCLIP is a biomedical vision-language foundation model that is pretrained on PMC-15M, a dataset of 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning.

Idle832.1K1 year ago

# FremyCompany/BioLORD-2023 This model was trained using BioLORD, a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts.

Idle440.1K1 year ago
Python

HuatuoGPT-o1-7B

Idle5031 year ago

# MMedS-Llama3 💻Github Repo 🖨️arXiv Paper

Idle9481 year ago
Python

Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery. This is an example of finetuning ibm/biomed.omics.bl.sm-ted-400 the task. Prediction of binding affinities using pKd, the negative logarithm of the dissociation constant, which reflects the…

Idle27.6K1 year ago

T-cell receptor (TCR) binding to immunogenic peptides (epitopes) presented by major histocompatibility complex (MHC) molecules is a critical mechanism in the adaptive immune system, essential for antigen recognition and triggering immune responses.

Idle731 year ago

Drugs must satisfy stringent criteria for both efficacy and safety. This model predicts the likelihood of FDA approval for small-molecule drugs, represented using SMILES (Simplified Molecular Input Line Entry System) strings.

Idle441 year ago

Drugs must satisfy stringent criteria for both efficacy and safety. This model predicts the likelihood of failure in clinical toxicity trials for small-molecule drugs, represented using SMILES (Simplified Molecular Input Line Entry System) strings.

Idle451 year ago

Drugs targeting the central nervous system must meet stringent criteria for both efficacy and safety, including their ability to penetrate the blood-brain barrier (BBB). This model predicts the likelihood of small-molecule drugs crossing the BBB, a critical factor in CNS drug development.

Idle491 year ago

Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery. Traditionally, binding affinities are measured through high-throughput screening experiments, which, while accurate, are resource-intensive and limited in their scalability to evaluate large sets…

Idle311 year ago

The ibm/biomed.omics.bl.sm.ma-ted-458m model is a biomedical foundation model trained on over 2 billion biological samples across multiple modalities, including proteins, small molecules, and single-cell gene data. Designed for robust performance, it achieves state-of-the-art results over a variety…

Idle1.6K1 year ago