Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

145 of 5,893 resources

Showing 51100

For a convenient overview and download list, visit our model page for this model.

Active613 months ago
Python

!image/png

Active83 months ago
Python

![Language: Multilingual]()

Active1.4K3 months ago
Python

Sahal Shaji Mullappilly\, Mohammed Irfan K\, Omair Mohamed, Mohamed Zidan, Fahad Khan, Salman Khan, Rao Muhammad Anwer, and Hisham Cholakkal

Active3723 months ago
Python

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 31% better perplexity than standard knowledge distillation at 3.8x compression.

Active733 months ago
Python

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 54% better perplexity than standard knowledge distillation at 9.4x compression.

Active53 months ago
Python

A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 87% better perplexity than standard knowledge distillation at 20x compression.

Active143 months ago
Python
Active6.5K3 months ago
Python

From Inquiry to Decision: Building Trustworthy Medical AI

Active204 months ago
Python
Active454 months ago
Python

A Neural Machine Translation (NMT) model based on a custom Transformer (Encoder-Decoder) architecture, trained from scratch. This model is designed to translate English sentences into Hebrew using multilingual encoding and specialized layer configurations.

Active274 months ago
Python

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Active7.8K4 months ago
Python

PII Detection Model | 44M Parameters | Open Source

Active27K4 months ago
Python

The MediPhi Model Collection comprises 7 small language models of 3.8B parameters from the base model Phi-3.5-mini-instruct specialized in the medical and clinical domains. The collection is designed in a modular fashion. Five MediPhi experts are fine-tuned on various medical corpora (i.e.

Active2K5 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

Active325 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. This model version was continually pretrained on ~14 million cancer transcriptomes…

Active165 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

Active315 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

Active175 months ago
Python

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Idle19.9K6 months ago
Python

Large Language and Vision Assistant for bioMedicine (i.e., “LLaVA-Med”) is a large language and vision model trained using a curriculum learning method for adapting LLaVA to the biomedical domain. It is an open-source release intended for research use only to facilitate reproducibility of the…

Idle21.4K6 months ago
Python

ChemFIE-BED is a sentence-transformers based on gbyuvd/chemselfies-base-bertmlm fine-tuned on around (for now) 2 million pairs of valid molecules' SELFIES (Krenn et al. 2020) taken from COCONUTDB (Sorokina et al. 2021) and ChemBL34 (Zdrazil et al. 2023).

Idle1147 months ago
Python

GitHub homepage: Cell2Sentence GitHub

Idle9607 months ago
Python
Idle445.6K7 months ago
Python

This model is a lightweight model pre-trained on SELFIES (Self-Referencing Embedded Strings) representations of molecules. It is trained on 2.7M unique and valid molecules taken from COCONUTDB and ChemBL34, with 7.3M total generated masked examples.

Idle68 months ago
Python

> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > model. For instructions on how to install TransformerEngine, please refer to the > official documentation.

Idle348 months ago
Python

> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > model. For instructions on how to install TransformerEngine, please refer to the > official documentation.

Idle5838 months ago
Python

Website    🤖 7B Model    🤖 32B Model    MedEvalKit    Technical Report    Lingshu MCP

Idle4.1K8 months ago
Python

中文版说明

Idle779 months ago
Python

Neeto-1.0-8b is an openly released biomedical large language model (LLM) created by BYOL Academy to assist learners and practitioners with medical exam study, literature understanding, and structured clinical reasoning.

Idle7.7K9 months ago
Python

This is a ReactionT5 pre-trained to predict the products of reactions. You can use the demo here.

Idle2K9 months ago
Python

This repos contains the biomedicine MLLM developed from Qwen2.5-VL-3B-Instruct in our paper: On Domain-Adaptive Post-Training for Multimodal Large Language Models. The correspoding training dataset is in biomed-visual-instructions.

Idle1529 months ago
Python

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature

Idle7110 months ago
Python

darkknight25/deepseek-16b-medical-GPT is a fine-tuned version of deepseek-ai/deepseek-l6b-moe-chat, optimized for medical question answering, reasoning, and clinical summarization using QLoRA and open-access healthcare datasets.

Idle010 months ago
Python

For a convenient overview and download list, visit our model page for this model.

Idle3.6K11 months ago
Python

For a convenient overview and download list, visit our model page for this model.

Idle42811 months ago
Python

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Idle7.7K11 months ago
Python
Idle494.9K11 months ago
Python
Idle9.8K11 months ago
Python

Welcome to IBM's series of large foundation models for sustainable materials. Our models span a variety of representations and modalities, including SMILES, SELFIES, 3D atom positions, 3D density grids, molecular graphs, and other formats.

Idle19011 months ago
Python
Idle2.3K11 months ago
Python
Idle73811 months ago
Python
Idle72911 months ago
Python
Idle6.2K11 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle2312 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle1212 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle712 months ago
Python

An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM2 Transformer architecture, pre-trained on UniRef50, and fine-tuned on the AmiGO dataset, this model predicts the GO subgraph for a particular protein sequence -…

Idle51 year ago
Python