Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

303 of 5,923 resources

Showing 151–200

Active454 months ago
Python

A Neural Machine Translation (NMT) model based on a custom Transformer (Encoder-Decoder) architecture, trained from scratch. This model is designed to translate English sentences into Hebrew using multilingual encoding and specialized layer configurations.

Active184 months ago
Python
Active04 months ago

While large language models (LLMs) have achieved impressive progress, their application in scientific domains such as chemistry remains hindered by shallow domain understanding and limited reasoning capabilities. In this work, we focus on the specific field of chemistry and develop a Chemical…

Active735 months ago

ChemDFM-v2.0 is the latest non-thinking model of ChemDFM, the pioneering open-sourced dialogue foundation model for Chemistry and molecule science.

Active9645 months ago

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Active7.8K5 months ago
Python

PII Detection Model | 44M Parameters | Open Source

Active27K5 months ago
Python

The MediPhi Model Collection comprises 7 small language models of 3.8B parameters from the base model Phi-3.5-mini-instruct specialized in the medical and clinical domains. The collection is designed in a modular fashion. Five MediPhi experts are fine-tuned on various medical corpora (i.e.

Active2K5 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

Active325 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. This model version was continually pretrained on ~14 million cancer transcriptomes…

Active165 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

Active315 months ago
Python

## Description: Geneformer is a foundational transformer model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology.

Idle176 months ago
Python

## Model Overview PlantBiMoE is a DNA language model trained on 42 representative plant species genomes. More specifically, PlantBiMoE uses the BiMamba and SparseMoE architecture with a masked language modeling objective to leverage highly available genotype data from 42 different plant speices to…

Idle106 months ago
Idle2846 months ago

This is a merge of pre-trained language models created using mergekit.

Idle416 months ago

MACE-MH-1 is a foundation machine-learning interatomic potential (MLIP) that bridges molecular, surface, and materials chemistry through cross-domain learning:

Idle06 months ago

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Idle18.6K6 months ago
Python

Large Language and Vision Assistant for bioMedicine (i.e., β€œLLaVA-Med”) is a large language and vision model trained using a curriculum learning method for adapting LLaVA to the biomedical domain. It is an open-source release intended for research use only to facilitate reproducibility of the…

Idle21.4K6 months ago
Python

ChemFIE-BED is a sentence-transformers based on gbyuvd/chemselfies-base-bertmlm fine-tuned on around (for now) 2 million pairs of valid molecules' SELFIES (Krenn et al. 2020) taken from COCONUTDB (Sorokina et al. 2021) and ChemBL34 (Zdrazil et al. 2023).

Idle1177 months ago
Python

GitHub homepage: Cell2Sentence GitHub

Idle9607 months ago
Python
Idle226.2K7 months ago
Python

Tahoe-x1 is a family of perturbation-trained single-cell foundation models with up to 3 billion parameters, developed by Tahoe Therapeutics. Pretrained on 266 million single-cell transcriptomic profiles including the Tahoe-100M perturbation compendium, Tahoe-x1 achieves state-of-the-art performance…

Idle407 months ago

GeneJEPA is a Joint-Embedding Predictive Architecture (JEPA) trained for self-supervised representation learning on scRNA-seq. It uses a Perceiver-style encoder to handle sparse, high-dimensional gene count vectors and a Fourier-feature tokenizer for numerical tokenization.

Idle07 months ago

# Biomni-R0-32B-Preview This repo contains the weights of Biomni-R0-32B-Preview, a research preview of the series of biomedical AI agents trained by the Biomni team.

Idle3828 months ago

InstaNovoPlus is a diffusion-based model for de novo peptide sequencing from mass spectrometry data. This model leverages multinomial diffusion for accurate, database-free peptide identification for large-scale proteomics experiments.

Idle58 months ago

This model is a lightweight model pre-trained on SELFIES (Self-Referencing Embedded Strings) representations of molecules. It is trained on 2.7M unique and valid molecules taken from COCONUTDB and ChemBL34, with 7.3M total generated masked examples.

Idle58 months ago
Python

> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > model. For instructions on how to install TransformerEngine, please refer to the > official documentation.

Idle348 months ago
Python

> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > model. For instructions on how to install TransformerEngine, please refer to the > official documentation.

Idle5838 months ago
Python

Website    πŸ€– 7B Model    πŸ€– 32B Model    MedEvalKit    Technical Report    Lingshu MCP

Idle4.1K8 months ago
Python

Evo 2 is a state of the art DNA language model for long context modeling and design. Evo 2 models DNA sequences at single-nucleotide resolution at up to 1 million base pair context length using the StripedHyena 2 architecture, using Savanna.

Idle09 months ago

δΈ­ζ–‡η‰ˆθ―΄ζ˜Ž

Idle779 months ago
Python

Neeto-1.0-8b is an openly released biomedical large language model (LLM) created by BYOL Academy to assist learners and practitioners with medical exam study, literature understanding, and structured clinical reasoning.

Idle7.7K9 months ago
Python
Idle519 months ago
Python

This repository contains the official model of the paper A Unified Predictive and Generative Solution for Liquid Electrolyte Formulation.

Idle09 months ago

This is a ReactionT5 pre-trained to predict the products of reactions. You can use the demo here.

Idle2K9 months ago
Python

This repos contains the biomedicine MLLM developed from Qwen2.5-VL-3B-Instruct in our paper: On Domain-Adaptive Post-Training for Multimodal Large Language Models. The correspoding training dataset is in biomed-visual-instructions.

Idle1579 months ago
Python

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature

Idle7110 months ago
Python

JAMUN is a novel approach for generating conformational ensembles of protein structures, presented in the paper JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles.

Idle010 months ago

darkknight25/deepseek-16b-medical-GPT is a fine-tuned version of deepseek-ai/deepseek-l6b-moe-chat, optimized for medical question answering, reasoning, and clinical summarization using QLoRA and open-access healthcare datasets.

Idle011 months ago
Python

Using llama.cpp release b5868 for quantization.

Idle4.1K11 months ago

For a convenient overview and download list, visit our model page for this model.

Idle3.6K11 months ago
Python

For a convenient overview and download list, visit our model page for this model.

Idle42811 months ago
Python

Segment Anything in 3D Medical Images and Videos

Idle10311 months ago

> A CMR-report contrastive model combining Vision Transformers and pretrained text encoders.

Idle1411 months ago

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Idle7.7K11 months ago
Python
Idle496K11 months ago
Python
Idle9.8K11 months ago
Python