Find open-source science resources

The Zarr specification defines a format for chunked, compressed, N-dimensional arrays. It's design allows efficient access to subsets of the stored array, and supports both local and cloud storage systems. Rarr aims to implement this specification in R with minimal reliance on an external tools or libraries.

Active541 week ago

epiregulon

Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and observed phenotypes. Epiregulon infers TF activity in single cells by constructing a gene regulatory network (regulons). This is achieved through integration of scATAC-seq and scRNA-seq data and incorporation of public bulk TF ChIP-seq data. Links between regulatory elements and their target genes are established by computing correlations between chromatin accessibility and gene expressions.

Active281 week ago

igvShiny

This package is a wrapper of Integrative Genomics Viewer (IGV). It comprises an htmlwidget version of IGV. It can be used as a module in Shiny apps.

Active381 week ago

scToppR

Pathways

scToppR provides an easy-to-use API wrapper for the ToppGene web platform, used for gene ontology and functional enrichment research. The package also integrates visualization tools, making it a convenient tool directly connecting ToppGene to code-based workflows in R. The tool can also easily save results into different formats.

Active71 week ago

GlycoDash

Biochemistry

GlycoDash is an R Shiny dashboard for processing glycomics data obtained from LaCyTools, SweetSuite and Skyline.

Active22 weeks ago

RiboCrypt

R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots. The package is still under development.

Active62 weeks ago

ORFik

ImmunoOncology

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Active382 weeks ago

anndataR

Bring the power and flexibility of AnnData to the R ecosystem, allowing you to effortlessly manipulate and analyse your single-cell data. This package lets you work with backed h5ad and zarr files, directly access various slots (e.g. X, obs, var), or convert the data into SingleCellExperiment and Seurat objects.

Active1882 weeks ago

STADyUM

StatisticalMethod

STADyUM is a package with functionality for analyzing nascent RNA read counts to infer transcription rates. This includes utilities for processing experimental nascent RNA read counts as well as for simulating PRO-seq data. Rates such as initiation, pause release and landing pad occupancy are estimated from either synthetic or experimental data. There are also options for varying pause sites and including steric hindrance of initiation in the model.

Active12 weeks ago

DenoIST

DenoIST identifies and removes contamination in Image-based Spatial Transcriptomics data, using a transposed poisson mixture model with local neighbourhood offsets to infer genes that are likely to be due to neighbourhood contamination rather than endogenous expression.

Active93 weeks ago

MeLSI

MeLSI (Metric Learning for Statistical Inference) is a novel machine learning method for microbiome data analysis that learns optimal distance metrics to improve statistical power in detecting group differences. Unlike traditional distance metrics (Bray-Curtis, Euclidean, Jaccard), MeLSI adapts to the specific characteristics of your dataset to maximize separation between groups. The method uses an ensemble of weak learners to identify which microbial features drive group differences, providing both improved statistical power and biological interpretability through feature importance weights.

Active13 weeks ago

scifer

Preprocessing

Have you ever index sorted cells in a 96 or 384-well plate and then sequenced using Sanger sequencing? If so, you probably had some struggles to either check the electropherogram of each cell sequenced manually, or when you tried to identify which cell was sorted where after sequencing the plate. Scifer was developed to solve this issue by performing basic quality control of Sanger sequences and merging flow cytometry data from probed single-cell sorted B cells with sequencing data. scifer can export summary tables, 'fasta' files, electropherograms for visual inspection, and generate reports.

Active73 weeks ago

sparrow

GeneSetEnrichment

Provides a unified interface to a variety of GSEA techniques from different bioconductor packages. Results are harmonized into a single object and can be interrogated uniformly for quick exploration and interpretation of results. Interactive exploration of GSEA results is enabled through a shiny app provided by a sparrow.shiny sibling package.

Active233 weeks ago

CrcBiomeScreen

A developed and benchmarked reproducible machine learning framework for microbiome-based colorectal cancer (CRC) screening. By systematically evaluating normalization strategies, taxonomic resolutions, and class imbalance handling. This R package allows users to apply the full pipeline or selectively run specific components depending on their analytical needs. It establishes a scalable foundation for developing interpretable microbiome-based screening tools to support early CRC detection. This approach could be easily implemented in a national screening programme, to improve early detection rates for this disease.

Active03 weeks ago

GEOquery

Microarray

The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.

Active1153 weeks ago

cTRAP

DifferentialExpression

Compare differential gene expression results with those from known cellular perturbations (such as gene knock-down, overexpression or small molecules) derived from the Connectivity Map. Such analyses allow not only to infer the molecular causes of the observed difference in gene expression but also to identify small molecules that could drive or revert specific transcriptomic alterations.

Active83 weeks ago

epiregulon.extra

GeneRegulation

Active03 weeks ago

plotgardener

Visualization

Coordinate-based genomic visualization package for R. It grants users the ability to programmatically produce complex, multi-paneled figures. Tailored for genomics, plotgardener allows users to visualize large complex genomic datasets and provides exquisite control over how plots are placed and arranged on a page.

Active3584 weeks ago

ReactomeGSA

GeneSetEnrichment

The ReactomeGSA packages uses Reactome's online analysis service to perform a multi-omics gene set analysis. The main advantage of this package is, that the retrieved results can be visualized using REACTOME's powerful webapplication. Since Reactome's analysis service also uses R to perfrom the actual gene set analysis you will get similar results when using the same packages (such as limma and edgeR) locally. Therefore, if you only require a gene set analysis, different packages are more suited.

Active334 weeks ago

fgsea

GeneExpression

The package implements an algorithm for fast gene set enrichment analysis. Using the fast algorithm allows to make more permutations and get more fine grained p-values, which allows to use accurate stantard approaches to multiple hypothesis correction.

Active4451 month ago

sfi

MassSpectrometry

Data analysis for Single File Injections(SFIs) mode LC-MS analysis. In SFIs mode, pooled samples are initially injected to serve as reference peaks for subsequent analyses. Repeated injections of individual samples are then performed at fixed time intervals using isocratic elution. This package provides the functions to analyze data from SFIs mode including peak picking and peak reassignment.

Active11 month ago

SpaceTrooper

SpaceTrooper performs Quality Control analysis using data driven GLM models of Image-Based spatial data, providing exploration plots, QC metrics computation, outlier detection. It implements a GLM strategy for the detection of low quality cells in imaging-based spatial data (Transcriptomics and Proteomics). It additionally implements several plots for the visualization of imaging based polygons through the ggplot2 package.

Active111 month ago

rhinotypeR

Sequencing

"rhinotypeR" is designed to automate the comparison of sequence data against prototype strains, streamlining the genotype assignment process. By implementing predefined pairwise distance thresholds, this package makes genotype assignment accessible to researchers and public health professionals. This tool enhances our epidemiological toolkit by enabling more efficient surveillance and analysis of rhinoviruses (RVs) and other viral pathogens with complex genomic landscapes. Additionally, "rhinotypeR" supports comprehensive visualization and analysis of single nucleotide polymorphisms (SNPs) and amino acid substitutions, facilitating in-depth genetic and evolutionary studies.

Active41 month ago

crisprDesign

CRISPR

Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.

Active311 month ago

blockclust

Cheminformatics

Galaxy workflow for BlockClust pipeline.

Active1231 month ago

pairedGSEA

DifferentialExpression

pairedGSEA makes it simple to run a paired Differential Gene Expression (DGE) and Differencital Gene Splicing (DGS) analysis. The package allows you to store intermediate results for further investiation, if desired. pairedGSEA comes with a wrapper function for running an Over-Representation Analysis (ORA) and functionalities for plotting the results.

Active41 month ago

PhyloProfile

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Active381 month ago

psichomics

Sequencing

Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), Sequence Read Archive (SRA) and user-provided data. The tool interactively performs survival, dimensionality reduction and median- and variance-based differential splicing and gene expression analyses that benefit from the incorporation of clinical and molecular sample-associated features (such as tumour stage or survival). Interactive visual access to genomic mapping and functional annotation of selected alternative splicing events is also included.

Active371 month ago

ImageArray

ImageArray provides a framework for on-disk and in-memory image arrays, specifically for pyramidal images stored in HDF5, Zarr and life sciences image file formats (OME Bio-Formats).

Active61 month ago

SpaceMarkers

Spatial transcriptomic technologies have helped to resolve the connection between gene expression and the 2D orientation of tissues relative to each other. However, the limited single-cell resolution makes it difficult to highlight the most important molecular interactions in these tissues. SpaceMarkers, R/Bioconductor software, can help to find molecular interactions, by identifying genes associated with latent space interactions in spatial transcriptomics.

Active81 month ago

gVenn

Tools to compute and visualize overlaps between gene sets or genomic regions. Venn diagrams with proportional areas are provided, while UpSet plots are recommended for larger numbers of sets. The package supports GRanges and GRangesList inputs, and integrates with analysis workflows for ChIP-seq, ATAC-seq, and other genomic interval data. It generates clean, interpretable, and publication-ready figures.

Active21 month ago

notameStats

BiomedicalInformatics

Provides univariate and multivariate statistics for feature prioritization in untargeted LC-MS metabolomics research.

Active01 month ago

assorthead

Vendors an assortment of useful header-only C++ libraries. Bioconductor packages can use these libraries in their own C++ code by LinkingTo this package without introducing any additional dependencies. The use of a central repository avoids duplicate vendoring of libraries across multiple R packages, and enables better coordination of version updates across cohorts of interdependent C++ libraries.

Active11 month ago

iscream

DataImport

BED files store ranged genomic data that can be queried even when the files are compressed. iscream can query data from BED files and return them in muliple formats: parsed records or their summary statistics as data frames or GenomicRanges objects, and matrices as matrix, GenomicRanges, or SummarizedExperiment objects. iscream also provides specialized support for importing methylation data.

Active01 month ago

BiocPkgTools

Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.

Active221 month ago

tripr

BatchEffect

TRIP is a software framework that provides analytics services on antigen receptor (B cell receptor immunoglobulin, BcR IG | T cell receptor, TR) gene sequence data. It is a web application written in R Shiny. It takes as input the output files of the IMGT/HighV-Quest tool. Users can select to analyze the data from each of the input samples separately, or the combined data files from all samples and visualize the results accordingly.

Active31 month ago

StatescopeR

GeneExpression

StatescopeR is an R wrapper around Statescope, a computational framework designed to discover cell states from cell type-specific gene expression profiles inferred from bulk RNA profiles.

Active01 month ago

splicelogic

AlternativeSplicing

Translate differential transcript usage results into discrete splice events.

Active11 month ago

DeeDeeExperiment

DeeDeeExperiment is an S4 class extending the SingleCellExperiment class, designed to integrate and manage omics analysis results. It introduces two dedicated slots to store Differential Expression Analysis (DEA) results and Functional Enrichment Analysis (FEA) results, providing a structured approach for downstream analysis.

Active01 month ago

scrapper

Normalization

Implements R bindings to C++ code for analyzing single-cell (expression) data, mostly from various libscran libraries. Each function performs an individual step in the single-cell analysis workflow, ranging from quality control to clustering and marker detection. Additional wrappers are provided for easy construction of end-to-end workflows involving Bioconductor objects like SingleCellExperiments.

Active81 month ago

smoothclust

Spatial

Method for identification of spatial domains and spatially-aware clustering in spatial transcriptomics data. The method generates spatial domains with smooth boundaries by smoothing gene expression profiles across neighboring spatial locations, followed by unsupervised clustering. Spatial domains consisting of consistent mixtures of cell types may then be further investigated by applying cell type compositional analyses or differential analyses.

Active11 month ago

sketchR

Provides an R interface for various subsampling algorithms implemented in python packages. Currently, interfaces to the geosketch and scSampler python packages are implemented. In addition it also provides diagnostic plots to evaluate the subsampling.

Active31 month ago

methylclock

DNAMethylation

This package allows to estimate chronological and gestational DNA methylation (DNAm) age as well as biological age using different methylation clocks. Chronological DNAm age (in years) : Horvath's clock, Hannum's clock, BNN, Horvath's skin+blood clock, PedBE clock and Wu's clock. Gestational DNAm age : Knight's clock, Bohlin's clock, Mayne's clock and Lee's clocks. Biological DNAm clocks : Levine's clock and Telomere Length's clock.

Active531 month ago

SEMPLR

MotifAnnotation

SEMPLR computes transcription factor binding affinity scores for genomic positions and genetic variants. Scores are computed from SNP Effect Matrices (SEMs) produced by SEMpl. 223 pre-computed SEMs are included with the package or custom sets can be provided. Enrichment can be tested among sets of genomic positions to determine if transcription factor binding events occur more often than expected. Comparing binding affinity scores between alleles can reveal differences in transcription factor binding with genetic variation. This package also includes several visualization functions to view scores both on the motif and variant/position level.

Active11 month ago

immLynx

A comprehensive toolkit that bridges popular Python-based immune repertoire analysis tools and Hugging Face protein language models into the R environment. Provides unified interfaces for TCR distance calculations (tcrdist3), sequence generation probability (OLGA), selection inference (soNNia), clustering (clusTCR), protein embeddings (ESM-2), metaclone discovery (metaclonotypist). Fully compatible with the scRepertoire and immApex ecosystem for single-cell immune repertoire analysis.

Active21 month ago

DelayedMatrixStats

Infrastructure

A port of the 'matrixStats' API for use with DelayedMatrix objects from the 'DelayedArray' package. High-performing functions operating on rows and columns of DelayedMatrix objects, e.g. col / rowMedians(), col / rowRanks(), and col / rowSds(). Functions optimized per data type and for subsetted calculations such that both memory usage and processing time is minimized.

Active152 months ago

immApex

A set of tools to for machine and deep learning in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.

Active142 months ago

scLang

scLang is a suite for package development for scRNA-seq analysis. It offers functions that can operate on both Seurat and SingleCellExperiment objects. These functions are primarily aimed to help developers build tools compatible with both types of input.

Active22 months ago

CSOA

Cell Set Overlap Analysis (CSOA) is a tool for calculating per-cell gene signature scores in an scRNA-seq dataset. CSOA constructs a set for each gene in the signature, consisting of the cells that highly express the gene. Next, all overlaps of pairs of cell sets are computed, ranked, filtered and scored. The CSOA per-cell score is calculated by summing up all products of the overlap scores and the min-max-normalized expression of the two involved genes. CSOA can run on a Seurat object, a SingleCellExperiment object, a matrix and a dgCMatrix.

Active12 months ago

zellkonverter

Provides methods to convert between Python AnnData objects and SingleCellExperiment objects. These are primarily intended for use by downstream Bioconductor packages that wrap Python methods for single-cell data analysis. It also includes functions to read and write H5AD files used for saving AnnData objects to disk.

Active2102 months ago