EDIRquery

Genetics

EDIRquery provides a tool to search for genes of interest within the Exome Database of Interspersed Repeats (EDIR). A gene name is a required input, and users can additionally specify repeat sequence lengths, minimum and maximum distance between sequences, and whether to allow a 1-bp mismatch. Outputs include a summary of results by repeat length, as well as a dataframe of query results. Example data provided includes a subset of the data for the gene GAA (ENSG00000171298). To query the full database requires providing a path to the downloaded database files as a parameter.

Source attribution

  • BioconductorEDIRquery

Related resources

MUSCLE performs multiple sequence alignments of nucleotide or amino acid sequences.

Perform step by step methylation analysis of Next Generation Sequencing data.

MotifPeeker is used to compare and analyse datasets from epigenomic profiling methods with motif enrichment as the key benchmark. The package outputs an HTML report consisting of three sections: (1. General Metrics) Overview of peaks-related general metrics for the datasets (FRiP scores, peak widths and motif-summit distances). (2. Known Motif Enrichment Analysis) Statistics for the frequency of user-provided motifs enriched in the datasets. (3. Motif Discovery Enrichment Analysis) Statistics for the frequency of ab-initio discovered motifs enriched in the datasets and compared with known motifs.

Infrastructure shared by all the Biostrings-based genome data packages.

The package is aimed at inference on the amount of agreement in two sorted lists using the Rank-Rank Hypergeometric Overlap test.

The two main functions in the package are pairwiseAlignment() and stringDist(). The former solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems. The latter computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings.