Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source
Type
5,923 resources indexed
Showing 601–650
Open-source platform for building, extending, and experimenting with scientific agents, providing modular agent construction tools and standardized evaluation pipelines for accelerating autonomous scientific discovery research (748+ stars, MIT License)
StanfordShahLab/motor-t-base
by StanfordShahLabStanfordShahLab/clmbr-t-base
by StanfordShahLabmradermacher/Prototype-Virus-1B-GGUF
by mradermacherFor a convenient overview and download list, visit our model page for this model.
The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.
Mass spectrometry (MS) data backend supporting import and export of MS/MS spectra data from Mascot Generic Format (mgf) files. Objects defined in this package are supposed to be used with the Spectra Bioconductor package. This package thus adds mgf file support to the Spectra package.
Mass spectrometry (MS) data backend supporting import and handling of MS/MS spectra from NIST MSP Format (msp) files. Import of data from files with different MSP *flavours* is supported. Objects from this package add support for MSP files to Bioconductor's Spectra package. This package is thus not supposed to be used without the Spectra package that provides a complete infrastructure for MS data handling.
UmbrellaInc/Prototype-Virus-1B
by UmbrellaInc!image/png
Deep learning library for Chemistry based on Tensorflow
thelamapi/next-ocr
by thelamapi![Language: Multilingual]()
CondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.
ScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
scvi-tools/tabula-sapiens-large_intestine-scvi
by scvi-toolsScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.
scvi-tools/tabula-sapiens-heart-stereoscope
by scvi-toolsStereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
scvi-tools/tabula-sapiens-heart-condscvi
by scvi-toolsCondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.
scvi-tools/tabula-sapiens-heart-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
scvi-tools/tabula-sapiens-heart-scvi
by scvi-toolsScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.
scvi-tools/tabula-sapiens-fat-stereoscope
by scvi-toolsStereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
scvi-tools/tabula-sapiens-fat-condscvi
by scvi-toolsCondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.
scvi-tools/tabula-sapiens-fat-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
scvi-tools/tabula-sapiens-fat-scvi
by scvi-toolsScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.
scvi-tools/tabula-sapiens-eye-stereoscope
by scvi-toolsStereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
scvi-tools/tabula-sapiens-eye-condscvi
by scvi-toolsCondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.
scvi-tools/tabula-sapiens-eye-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
scvi-tools/tabula-sapiens-eye-scvi
by scvi-toolsScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.
Stereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
scvi-tools/tabula-sapiens-bone_marrow-condscvi
by scvi-toolsCondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.
scvi-tools/tabula-sapiens-bone_marrow-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
scvi-tools/tabula-sapiens-bone_marrow-scvi
by scvi-toolsScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.
scvi-tools/tabula-sapiens-blood-stereoscope
by scvi-toolsStereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
scvi-tools/tabula-sapiens-blood-condscvi
by scvi-toolsCondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.
scvi-tools/tabula-sapiens-blood-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
scvi-tools/tabula-sapiens-blood-scvi
by scvi-toolsScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.
scvi-tools/tabula-sapiens-bladder-stereoscope
by scvi-toolsStereoscope is a variational inference model for single-cell RNA-seq data that can learn a cell-type specific rate of gene expression. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in Stereoscope.
scvi-tools/tabula-sapiens-bladder-condscvi
by scvi-toolsCondSCVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space. The predictions of the model are meant to be afterward used for deconvolution of a second spatial transcriptomics dataset in DestVI.
scvi-tools/tabula-sapiens-bladder-scanvi
by scvi-toolsScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a cell-type classifier in the latent space…
scvi-tools/tabula-sapiens-bladder-scvi
by scvi-toolsScVI is a variational inference model for single-cell RNA-seq data that can learn an underlying latent space, integrate technical batches and impute dropouts. The learned low-dimensional latent representation of the data can be used for visualization and clustering.
Deep learning library for solving PDEs
Genome mapping and spliced alignment of cDNA or amino acid sequences
arcinstitute/evo2_20b
by arcinstituteEvo 2 is a state-of-the-art DNA language model trained autoregressively on trillions of DNA tokens.
With the DataPLANT biology ontology (DPBO), DataPLANT provides an intermediate ontology that acts as a broker and bridge between the individual researcher/domain experts and main ontology providers. DPBO enables easy and agile collection of missing vocabulary as well as relationships between terms for (meta)data annotation using DataPLANT’s Swate tool.
This ontology integrates cell type markers for cells in the Cell Ontology from various sources along with details of marker context (anatomical context, assay), confidence (where available) and provenance. [from repository]
Sahal Shaji Mullappilly\, Mohammed Irfan K\, Omair Mohamed, Mohamed Zidan, Fahad Khan, Salman Khan, Rao Muhammad Anwer, and Hisham Cholakkal
First benchmark evaluating LLMs' ability to rediscover scientific laws through interactive experimentation across 324 tasks in 12 physics domains, featuring memorization-resistant metaphysical shifts of canonical laws (HKUST)
a Bayesian normalization procedure derived from first principles. Sanity estimates expression values and associated error bars directly from raw unique molecular identifier (UMI) counts without any tunable parameters.
GenBio AI's software stack for the AI-Driven Digital Organism, supporting adaptation and finetuning of multiscale biological foundation models across DNA, RNA, protein, structure, and single-cell tasks with reproducible CLIs and pretrained model zoo (2025)
A compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 31% better perplexity than standard knowledge distillation at 3.8x compression.
littleworth/protgpt2-distilled-small
by littleworthA compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 54% better perplexity than standard knowledge distillation at 9.4x compression.
littleworth/protgpt2-distilled-tiny
by littleworthA compact protein language model distilled from ProtGPT2 using complementary-regularizer distillation---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 87% better perplexity than standard knowledge distillation at 20x compression.
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.