Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

422 of 5,923 resources

Showing 150

This package contains core functions to process and analyze drug response data. The package provides tools for normalizing, averaging, and calculation of gDR metrics data. All core functions are wrapped into the pipeline function allowing analyzing the data in a straightforward way.

Active21 day ago
R
Artistic-2.0

Functions helpful for LIBD deconvolution project. Includes tools for marker finding with mean ratio, expression plotting, and plotting deconvolution results. Working to include DLPFC datasets.

Active102 days ago
R
Artistic-2.0

The package is a part of the gDR suite. It helps to prepare raw drug response data for downstream processing. It mainly contains helper functions for importing/loading/validating dose-response data provided in different file formats.

Active31 week ago
R

SimBu can be used to simulate bulk RNA-seq datasets with known cell type fractions. You can either use your own single-cell study for the simulation or the sfaira database. Different pre-defined simulation scenarios exist, as are options to run custom simulations. Additionally, expression values can be adapted by adding an mRNA bias, which produces more biologically relevant simulations.

Active191 week ago
R
GPL-3.0

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Active381 week ago
R
MIT

Package is a part of the gDR suite. It reexports functions from other packages in the gDR suite that contain critical processing functions and utilities. The vignette walks through the full processing pipeline for drug response analyses that the gDR suite offers.

Active22 weeks ago
R
Artistic-2.0

Implements supervised cell type-aware non-negative matrix factorization (NMF) for dimensional reduction in single-cell RNA sequencing analysis. The package provides methods for incorporating cell type information into the dimensionality reduction process, enabling improved visualization and downstream analysis of single-cell data while preserving biological structure. CellMentor employs a unique loss function that simultaneously minimizes variation within known cell populations while maximizing distinctions between different cell types, enabling effective transfer of learned patterns from labeled reference datasets to new unlabeled data.

Active192 weeks ago
R
Apache-2.0+

The AnVIL is a cloud computing resource developed in part by the National Human Genome Research Institute. The AnVILAz package supports end-users and developers using the AnVIL platform in the Azure cloud. The package provides a programmatic interface to AnVIL resources, including workspaces, notebooks, tables, and workflows. The package also provides utilities for managing resources, including copying files to and from Azure Blob Storage, and creating shared access signatures (SAS) for secure access to Azure resources.

Active03 weeks ago
R
Artistic-2.0

The package provides a set of functions to interact with the Google Cloud Platform (GCP) services on the AnVIL platform. The package is designed to use the API calls from the AnVIL package. It coordinates AnVIL workspace functionality with native GCP tools.

Active03 weeks ago
R
Artistic-2.0

A package that allows interactive exploration of AnnotationHub and ExperimentHub resources. It uses DT / DataTable to display resources for multiple organisms. It provides template code for reproducibility and for downloading resources via the indicated Hub package.

Active03 weeks ago
R
Artistic-2.0

R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots. The package is still under development.

Active63 weeks ago
R
MIT

SpatialArtifacts provides a data-driven two-step workflow to identify, classify, and handle spatial artifacts in spatial transcriptomics data. The package combines median absolute deviation (MAD)-based outlier detection with morphological image processing (fill, outline, and star patterns) to detect edge and interior artifacts. It supports multiple platforms including 10x Genomics Visium (standard and HD), allowing for consistent quality control across different spatial resolutions.

Active43 weeks ago
R

iSEEtree is an extension of iSEE for the TreeSummarizedExperiment data container. It provides interactive panel designs to explore hierarchical datasets, such as the microbiome and cell lines.

Active33 weeks ago
R
Artistic-2.0

This package contains utility functions used throughout the gDR platform to fit data, manipulate data, and convert and validate data structures. This package also has the necessary default constants for gDR platform. Many of the functions are utilized by the gDRcore package.

Active23 weeks ago
R
Artistic-2.0

DeeDeeExperiment is an S4 class extending the SingleCellExperiment class, designed to integrate and manage omics analysis results. It introduces two dedicated slots to store Differential Expression Analysis (DEA) results and Functional Enrichment Analysis (FEA) results, providing a structured approach for downstream analysis.

Active03 weeks ago
R
MIT

Low- and high-level wrappers for Gemma's RESTful API. They enable access to curated expression and differential expression data from over 10,000 published studies. Gemma is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles.

Active104 weeks ago
R
Apache-2.0+

lefser is the R implementation of the popular microbiome biomarker discovery too, LEfSe. It uses the Kruskal-Wallis test, Wilcoxon-Rank Sum test, and Linear Discriminant Analysis to find biomarkers from two-level classes (and optional sub-classes).

Active664 weeks ago
R
Artistic-2.0

A comprehensive toolkit that bridges popular Python-based immune repertoire analysis tools and Hugging Face protein language models into the R environment. Provides unified interfaces for TCR distance calculations (tcrdist3), sequence generation probability (OLGA), selection inference (soNNia), clustering (clusTCR), protein embeddings (ESM-2), metaclone discovery (metaclonotypist). Fully compatible with the scRepertoire and immApex ecosystem for single-cell immune repertoire analysis.

Active24 weeks ago
R
MIT

The scECODA R package provides a complete workflow for the analysis and visualization of compositional data, primarily focusing on cell type proportions derived from single-cell data. It implements specialized methods, such as the Centered Log-Ratio (CLR) transformation, to properly analyze proportional data while avoiding the biases introduced by the compositional constraint. The package encapsulates data management, transformation, and analysis into a single SummarizedExperiment object, offering downstream tools for dimensionality reduction via PCA, calculating critical metrics like the Adjusted Rand Index (ARI) and Modularity to quantify sample grouping quality, and generating high-quality visualizations like heatmaps and scatter plots.

Active81 month ago
R
GPL-3.0

'HuBMAP' provides an open, global bio-molecular atlas of the human body at the cellular level. The `datasets()`, `samples()`, `donors()`, `publications()`, and `collections()` functions retrieves the information for each of these entity types. `*_details()` are available for individual entries of each entity type. `*_derived()` are available for retrieving derived datasets or samples for individual entries of each entity type. Data files can be accessed using `bulk_data_transfer()`.

Active31 month ago
R
Artistic-2.0

A much faster analytical implementation of chromVAR, with additional features, used to infer TF activity from (bulk or single-cell) ATAC-seq data and motif annotations (or binding probabilities). The package also includes the CVnorm normalization method based on the chromVAR logic.

Active11 month ago
R
GPL-3.0+

A set of tools to for machine and deep learning in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.

Active141 month ago
R
MIT

Differential expression analysis is commonly used to study diverse biological datasets. The reproducibility-optimized test statistic (ROTS) (Elo et al., 2008, <doi:10.1109/tcbb.2007.1078>) uses a modified t-statistic to prioritise features that differ between two or more groups. However, the ROTS Bioconductor implementation (Suomi et al., 2017, <doi:10.1371/journal.pcbi.1005562>) did not accommodate technical or biological covariates. LimROTS (Anwar et al., 2025, <doi:10.1093/bioinformatics/btaf570>) addressed this limitation by combining a reproducibility-optimized test statistic with the limma empirical Bayes approach (Ritchie et al., 2015, <doi:10.1093/nar/gkv007>). This enables the analysis of more complex experimental designs and the incorporation of covariates.

Active41 month ago
R
GPL-2.0+

"LipidTrend" is an R package that implements a permutation-based statistical test to identify significant differences in lipidomic features between groups. The test incorporates Gaussian kernel smoothing of region statistics to improve stability and accuracy, particularly when dealing with small sample sizes. This package also includes two plotting functions for visualizing significant tendencies in 1D and 2D feature data, respectively.

Active01 month ago
R
MIT

High-performance interval overlap and join operations for 'IRanges' and 'GenomicRanges'. The package provides deterministic multithreaded overlap computation, reusable subject indexes for repeated queries, and join helpers that keep range metadata in a consistent output grammar.

Active21 month ago
R
Artistic-2.0

This package is a wrapper of Integrative Genomics Viewer (IGV). It comprises an htmlwidget version of IGV. It can be used as a module in Shiny apps.

Active381 month ago
R
MIT

Provides a structured S4 approach to importing data files from the 10X pipelines. It mainly supports Single Cell Multiome ATAC + Gene Expression data among other data types. The main Bioconductor data representations used are SingleCellExperiment and RaggedExperiment.

Active11 month ago
R

A client for BEDbase. bedbaser provides access to the API at api.bedbase.org. It also includes convenience functions to import BED files into GRanges objects and BEDsets into GRangesLists.

Active31 month ago
R
Artistic-2.0

Leverage the existing open access TCGA data on Terra with well-established Bioconductor infrastructure. Make use of the Terra data model without learning its complexities. With a few functions, you can copy / download and generate a MultiAssayExperiment from the TCGA example workspaces provided by Terra.

Active01 month ago
R
Artistic-2.0

Parse GFF and GTF files using C++ classes. The package also provides utilities to read and write GFF3 files. The GFF (General Feature Format) format is a tab-delimited file format for describing genes and other features of DNA, RNA, and protein sequences. GFF files are often used to describe the features of genomes.

Active01 month ago
R
Artistic-2.0

Provides generic functions for interacting with the AnVIL ecosystem. Packages that use either GCP or Azure in AnVIL are built on top of AnVILBase. Extension packages will provide methods for interacting with other cloud providers.

Active01 month ago
R
Artistic-2.0

This package provides an interactive Shiny dashboard for Bioconductor package maintainers. It visualizes various package statuses, metadata, and development metrics, offering insights into package health and activity. This tool aims to support maintainers of multiple packages by filtering packages via maintainer email.

Active01 month ago
R
Artistic-2.0

Estimate networks from the precision matrix of compositional microbial abundance data.

Active2301 month ago
R
GPL-3.0+

The European Genome-phenome Archive (EGA) provides long-term storage and controlled sharing of personally identifiable genetic data. The Rega package offers a streamlined and extensible R interface to the EGA API, facilitating the programmatic upload of metadata. GEO-like Excel submission template is provided as a default method of organizing submission metadata.

Active01 month ago
R
Artistic-2.0

The ASURI (Analysis of SUrvival and patients RIsk prediction based on gene signatures) package discovers marker genes that are related to risk prediction capabilities and to a clinical variable of interest. It uses two main steps, including subsampling glmnet and unicox. The package implements robust functions to discover survival markers related to a clinical phenotype and to predict a risk score, allowing to study the patient's risk based on the gene signatures. Several plots are provided to visualise the relevance of the genes, the risk score, and patient stratification, as well as a robust version of the Kaplan-Meier curves.

Active01 month ago
R
LGPL-3.0

Dot plots of single-cell RNA-seq data allow for an examination of the relationships between cell groupings (e.g. clusters) and marker gene expression. The scDotPlot package offers a unified approach to perform a hierarchical clustering analysis and add annotations to the columns and/or rows of a scRNA-seq dot plot. It works with SingleCellExperiment and Seurat objects as well as data frames.

Active71 month ago
R
Artistic-2.0

Takes as input an incomplete perturbation profile and differential gene expression in log odds and infers unobserved perturbations and augments observed ones. The inference is done by iteratively inferring a network from the perturbations and inferring perturbations from the network. The network inference is done by Nested Effects Models.

Active21 month ago
R
GPL-3.0

This package provides functions for handling and analyzing immune receptor repertoire data, such as produced by the CellRanger V(D)J pipeline. This includes reading the data into R, merging it with paired single-cell data, quantifying clonotype abundances, calculating diversity metrics, and producing common plots. It implements the E-M Algorithm for clonotype assignment, along with other methods, which makes use of ambiguous cells for improved quantification.

Active131 month ago
R
Artistic-2.0

ClonalSim generates realistic mutational profiles of tumor samples with hierarchical clonal structure. It simulates founder, shared, and private mutations with biologically realistic noise models including intra-tumor heterogeneity (Beta distribution) and technical sequencing noise (negative binomial depth variation, binomial read sampling, base errors). The package is designed for benchmarking variant callers, testing clonal deconvolution algorithms, and teaching tumor heterogeneity concepts.

Active11 month ago
R
MIT

Hilbert curve is a type of space-filling curves that fold one dimensional axis into a two dimensional space, but with still preserves the locality. This package aims to provide an easy and flexible way to visualize data through Hilbert curve.

Active441 month ago
R
MIT

This package provides functions to browse the harmonized metadata for large omics databases. This package also supports data navigation if the metadata incorporates ontology.

Active21 month ago
R
Artistic-2.0

Provides hurdle negative binomial models for differential expression analysis with long-read RNA-Seq data.

Active02 months ago
R
MIT

Lightweight Expression displaYer (plotter / viewer) of SummarizedExperiment object in R. This package provides a quick and easy Shiny-based GUI to empower a user to use a SummarizedExperiment object to view

Active02 months ago
R
CC0-1.0

Implementation of the Ibex algorithm for single-cell embedding based on BCR sequences. The package includes a standalone function to encode BCR sequence information by amino acid properties or sequence order using tensorflow-based autoencoder. In addition, the package interacts with SingleCellExperiment or Seurat data objects.

Active272 months ago
R
MIT

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.

Active1.5K2 months ago
R
MIT

RNA abundance and cell size parameters could improve RNA-seq deconvolution algorithms to more accurately estimate cell type proportions given the different cell type transcription activity levels. A Total RNA Expression Gene (TREG) can facilitate estimating total RNA content using single molecule fluorescent in situ hybridization (smFISH). We developed a data-driven approach using a measure of expression invariance to find candidate TREGs in postmortem human brain single nucleus RNA-seq. This R package implements the method for identifying candidate TREGs from snRNA-seq data.

Active52 months ago
R
Artistic-2.0

The qsvaR package contains functions for removing the effect of degration in rna-seq data from postmortem brain tissue. The package is equipped to help users generate principal components associated with degradation. The components can be used in differential expression analysis to remove the effects of degradation.

Active02 months ago
R
Artistic-2.0

Builds prediction interval for cell type annotation using conformal inference and conformal risk control. It provides two main methods. The first one gives prediction intervals with coverage guarantees based on standard conformal inference. The second one instead gives hierarchical prediction intervals that are consistent with the cell ontology.

Active72 months ago
R
Artistic-2.0

cytofQC is a package for initial cleaning of CyTOF data. It uses a semi-supervised approach for labeling cells with their most likely data type (bead, doublet, debris, dead) and the probability that they belong to each label type. This package does not remove data from the dataset, but provides labels and information to aid the data user in cleaning their data. Our algorithm is able to distinguish between doublets and large cells.

Active22 months ago
R
Artistic-2.0

This package implements algorithms and data structures for performing gene expression signature (GES) searches, and subsequently interpreting the results functionally with specialized enrichment methods.

Active232 months ago
R
Artistic-2.0