Find open-source science resources

Genome-wide association studies (GWAS) is a widely used tool for identification of genetic variants associated with phenotypes and diseases, though complex diseases featuring many genetic variants with small effects present difficulties for traditional these studies. By leveraging pleiotropy, the statistical power of a single GWAS can be increased. This package provides functions for fitting graph-GPA, a statistical framework to prioritize GWAS results by integrating pleiotropy. 'GGPA' package provides user-friendly interface to fit graph-GPA models, implement association mapping, and generate a phenotype graph.

Stale16 years ago

GPL-2.0+

mimager

Infrastructure

Easily visualize and inspect microarrays for spatial artifacts.

Stale06 years ago

loci2path

FunctionalGenomics

loci2path performs statistics-rigorous enrichment analysis of eQTLs in genomic regions of interest. Using eQTL collections provided by the Genotype-Tissue Expression (GTEx) project and pathway collections from MSigDB.

Stale16 years ago

Genome Browsers / Gene Diagrams

Circleator

Flexible circular visualization of genome-associated data with BioPerl and SVG.

Stale466 years ago

Perl

NOASSERTION

geneXtendeR

ChIPSeq

Genome Browsers / Gene Diagrams

geneXtendeR optimizes the functional annotation of ChIP-seq peaks by exploring relative differences in annotating ChIP-seq peak sets to variable-length gene bodies. In contrast to prior techniques, geneXtendeR considers peak annotations beyond just the closest gene, allowing users to see peak summary statistics for the first-closest gene, second-closest gene, ..., n-closest gene whilst ranking the output according to biologically relevant events and iteratively comparing the fidelity of peak-to-gene overlap across a user-defined range of upstream and downstream extensions on the original boundaries of each gene's coordinates. Since different ChIP-seq peak callers produce different differentially enriched peaks with a large variance in peak length distribution and total peak count, annotating peak lists with their nearest genes can often be a noisy process. As such, the goal of geneXtendeR is to robustly link differentially enriched peaks with their respective genes, thereby aiding experimental follow-up and validation in designing primers for a set of prospective gene candidates during qPCR.

Stale107 years ago

GPL-3.0+

scribl

JavaScript library for drawing canvas-based gene diagrams.

Stale767 years ago

JavaScript

CGHbase

Infrastructure

Contains functions and classes that are needed by arrayCGH packages.

Stale07 years ago

GPL

coRdon

Software

Tool for analysis of codon usage in various unannotated or KEGG/COG annotated DNA sequences. Calculates different measures of CU bias and CU-based predictors of gene expressivity, and performs gene set enrichment analysis for annotated sequences. Implements several methods for visualization of CU and enrichment analysis results.

Stale237 years ago

VCFArray

Infrastructure

VCFArray extends the DelayedArray to represent VCF data entries as array-like objects with on-disk / remote VCF file as backend. Data entries from VCF files, including info fields, FORMAT fields, and the fixed columns (REF, ALT, QUAL, FILTER) could be converted into VCFArray instances with different dimensions.

Stale17 years ago

The Leek group guide to genomics papers

Miscellaneous

Expertly curated genomics papers to get up to speed on genomics, RNA-seq, statistics (used in genomics), software development, and more.

Stale5027 years ago

Circos

Circos Related

Perl package for circular plots, which are well suited for genomic rearrangements.

Stale887 years ago

Perl

Telseq

BAM File Utilities

Telseq is a tool for estimating telomere length from whole genome sequence data.

Stale767 years ago

C++

abseqR

Sequencing

AbSeq is a comprehensive bioinformatic pipeline for the analysis of sequencing datasets generated from antibody libraries and abseqR is one of its packages. abseqR empowers the users of abseqPy (https://github.com/malhamdoosh/abseqPy) with plotting and reporting capabilities and allows them to generate interactive HTML reports for the convenience of viewing and sharing with other researchers. Additionally, abseqR extends abseqPy to compare multiple repertoire analyses and perform further downstream analysis on its output.

Stale07 years ago

ipdDb

GenomicVariation

All alleles from the IPD IMGT/HLA <https://www.ebi.ac.uk/ipd/imgt/hla/> and IPD KIR <https://www.ebi.ac.uk/ipd/kir/> database for Homo sapiens. Reference: Robinson J, Maccari G, Marsh SGE, Walter L, Blokhuis J, Bimber B, Parham P, De Groot NG, Bontrop RE, Guethlein LA, and Hammond JA KIR Nomenclature in non-human species Immunogenetics (2018), in preparation.

Stale07 years ago

runibic

Microarray

This package implements UbiBic algorithm in R. This biclustering algorithm for analysis of gene expression data was introduced by Zhenjia Wang et al. in 2016. It is currently considered the most promising biclustering method for identification of meaningful structures in complex and noisy data.

Stale47 years ago

3D e-Chem Virtual Machine

Virtual Machine

Virtual machine with all software and sample data to run 3D-e-Chem Knime workflows

Stale177 years ago

Shell

Apache-2.0

MetID

AssayDomain

This package uses an innovative network-based approach that will enhance our ability to determine the identities of significant ions detected by LC-MS.

Stale17 years ago

rCGH

aCGH

A comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH arrays. As inputs, rCGH supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS or Affymetrix Power Tools. rCGH also supports custom arrays, provided data complies with the expected format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to profiles segmentation and gene annotations. This package also provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.

Stale58 years ago

Awesome-alternative-splicing

Bioinformatics on GitHub

List of resources on alternative splicing including software, databases, and other tools.

Stale588 years ago

NP-Likeness

Small molecules

Natural Product-likeness calculator v-2.1 : calculates natural product-likeness of small molecules based on open-data of natural products.

Stale48 years ago

Java

GA4GHclient

DataRepresentation

GA4GHclient provides an easy way to access public data servers through Global Alliance for Genomics and Health (GA4GH) genomics API. It provides low-level access to GA4GH API and translates response data into Bioconductor-based class objects.

Stale18 years ago

GPL-2.0+

pysic

Simulations

A calculator incorporating various empirical pair and many-body potentials.

Stale238 years ago

Fortran

LGPL-3.0

RJMCMCNucleosomes

BiologicalQuestion

This package does nucleosome positioning using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling.

Stale08 years ago

GSALightning

Software

GSALightning provides a fast implementation of permutation-based gene set analysis for two-sample problem. This package is particularly useful when testing simultaneously a large number of gene sets, or when a large number of permutations is necessary for more accurate p-values estimation.

Stale58 years ago

SMITE

ImmunoOncology

This package builds on the Epimods framework which facilitates finding weighted subnetworks ("modules") on Illumina Infinium 27k arrays using the SpinGlass algorithm, as implemented in the iGraph package. We have created a class of gene centric annotations associated with p-values and effect sizes and scores from any researchers prior statistical results to find functional modules.

Stale19 years ago

GPL-2.0+

isobar

ImmunoOncology

isobar provides methods for preprocessing, normalization, and report generation for the analysis of quantitative mass spectrometry proteomics data labeled with isobaric tags, such as iTRAQ and TMT. Features modules for integrating and validating PTM-centric datasets (isobar-PTM). More information on http://www.ms-isobar.org.

Stale109 years ago

LGPL-2.0

ctsGE

ImmunoOncology

Methodology for supervised clustering of potentially many predictor variables, such as genes etc., in time series datasets Provides functions that help the user assigning genes to predefined set of model profiles.

Stale19 years ago

GOpro

Annotation

Find the most characteristic gene ontology terms for groups of human genes. This package was created as a part of the thesis which was developed under the auspices of MI^2 Group (http://mi2.mini.pw.edu.pl/, https://github.com/geneticsMiNIng).

Stale29 years ago

fmcsR

Cheminformatics

The fmcsR package introduces an efficient maximum common substructure (MCS) algorithms combined with a novel matching strategy that allows for atom and/or bond mismatches in the substructures shared among two small molecules. The resulting flexible MCSs (FMCSs) are often larger than strict MCSs, resulting in the identification of more common features in their source structures, as well as a higher sensitivity in finding compounds with weak structural similarities. The fmcsR package provides several utilities to use the FMCS algorithm for pairwise compound comparisons, structure similarity searching and clustering.

Stale610 years ago

GeneBreak

aCGH

Recurrent breakpoint gene detection on copy number aberration profiles.

Stale210 years ago

rTRMui

Transcription

This package provides a web interface to compute transcriptional regulatory modules with rTRM.

Stale110 years ago

Genome Browsers / Gene Diagrams

Island Plot

D3 JavaScript based genome viewer. Constructs SVGs.

Stale3310 years ago

JavaScript

RBPBench

RNA

RBPBench is a multi-function tool to evaluate CLIP-seq and other related genomic region data using a comprehensive collection of known RNA-binding protein (RBP) binding motifs. RBPBench can be used for a variety of purposes, from RBP motif search (database or user-supplied RBP motifs) in genomic regions, over motif enrichment and co-occurrence analysis, in-depth comparisons over multiple datasets via sequence and genomic annotation statistics, to benchmarking CLIP-seq peak caller methods as well as comparisons across cell types and CLIP-seq protocols. RBPBench supports both sequence and structure motifs, as well as regular expressions (sequence and structure patterns). Moreover, users can easily provide their own motif collections.

HyPPI

Protein interactions

HyPPI classifies a protein-protein complex based on its interaction type into permanent, transient, or crystal artifact. Permanent protein-protein complexes are only stable in their complexed state. Their subunits would denature upon dissociation of the protein-protein complex. Transient protein-protein complexes are stable in the complexed as well as in the monomeric form, depending on the necessary function of the complex. Crystal artifacts have no biological function and are artificially formed during the crystallization process. The discrimination is performed using two characteristics of the protein-protein complex, the hydrophobicity of the interface (ΔGhydrophobic) and the quotient of interface area ratios (IF-quotient). The IF-quotient considers whether the protein-protein interface is symmetric.

JAMDA

Molecular modelling

JAMDA enables the preparation of individual protein structures and the docking of small molecules in preprocessed binding sites of choice. JAMDA simplifies the process of protein-ligand docking by automatic preprocessing protocols for the protein and binding sites of interest. The JAMDAscore scoring function retrieved 75% of the native poses in the three highest-ranked solutions for high-quality protein-ligand complexes with default settings. Individual configurations for protein preparation are available, e.g., considering protein ensembles, relevant binding site water molecules, or cofactors. A user-defined number of input conformations for the ligands of interest can be generated fully automated using Conformator. Alternatively, users can also provide externally prepared ligand conformers.

DoGSite3

Protein binding sites

DoGSite3 was developed for predicting robust and reliable small molecule binding sites and computing their geometrical and chemical descriptors. It is based on the grid-based DoGSite algorithm for predicting pockets and their sub-pockets. The new tool is largely rotation- and translation-invariant due to a normalization procedure before binding site prediction. Known ligands in the structure can be used to bias the grid by sufficiently buried ligand fragments. The output encompasses novel chemical binding site descriptors considering solvent accessibility. Compared to its predecessor, it shows increased robustness through comprehensive parameter optimization. DoGSite3 runs finish within seconds.

DoGSiteScorer

Protein binding sites

DoGSiteScorer is a grid-based automated pocket detection and analysis tool. It applies a Difference of Gaussian filter to detect potential binding pockets and splits them into sub-pockets. The method solely uses the 3D structure of the protein. Global properties, describing the size, shape, and chemical features of the predicted (sub-)pockets, are calculated. Per default, a simple druggability score based on a linear combination of the three descriptors describing volume, hydrophobicity, and enclosure is provided for each (sub-)pocket. Furthermore, a subset of meaningful descriptors is incorporated in a support vector machine (libsvm) to predict the (sub-)pocket druggability score (values are between zero and one). The higher the score, the more druggable the pocket is estimated to be.

QP-Insights Uploader

Medical imaging

This desktop application enables users to upload DICOM data along with associated clinical information to QP-Insights—the data management platform of the UPV Reference Node within EUCAIM.

CC-BY-NC-ND-4.0

DICOM-SEG Annotation

Data quality management

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

Apache-2.0

PoseEdit

Protein interactions

PoseEdit automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction model that relies on atom types and simple geometric criteria. The structure mining tool GeoMine also uses this model to describe binding sites. In addition, users can manipulate the diagrams by translating, rotating, mirroring parts of the structure, adding additional interactions, or removing them. Furthermore, users can add individual labels or adjust available labels. Users can download the final 2D diagrams for a binding site of interest in JSON or SVG format.

METALizer

Protein interactions

METALizer predicts the coordination geometry of metal ions in metalloproteins. Users can compare potential coordination geometries to those found in the examined structure. The predicted coordination geometries and the observed metal interaction distances can be interactively compared to statistics calculated based on the PDB.

Data Integration Quality Check Tool (DIQCT)

Data quality management

A tool that checks the clinical metadata quality (validity, completeness), the integrity between images and clinical metadata provided as well as their accuracy, the de-identification protocol applied, and existence of annotation together with the consistency between the images and the annotation files and informs the user on corrective actions prior to data upload.

Proprietary

Image Duplicates Checker

Data quality management

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]

Apache-2.0

SNP-sites

Genomics

Finds SNP sites from a multi-FASTA alignment file.

NanoporeDB

Computational biology

NanoporeDB is an open-access structural database dedicated to the exploration, analysis, and design of protein nanopores, which serve as essential molecular gateways in biological membranes and form the basis of many advanced biosensing and sequencing technologies. This platform integrates large-scale structure-guided mining and deep learning-based modeling using AlphaFold-Multimer and AlphaFold3 to provide about 7,000 high-confidence multimeric nanopore structures. Each entry includes detailed information on membrane embedding, pore geometry annotation, and constriction profiling to support functional and biophysical interpretation. Through an interactive 3D visualization interface and quantitative parameters such as tilt angle, insertion depth, and pore geometry, NanoporeDB enables researchers to explore nanopore diversity, discover novel scaffolds, and accelerate innovation in molecular sensing, precision diagnostics, and synthetic biology.

generate_count_matrix

Transcriptomics

Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.

s3segmenter

Bioimaging

S3segmenter is a Matlab-based set of functions that generates single cell (nuclei and cytoplasm) label masks.

MATLAB

CompuCell3D

Systems biology

CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python.

miniconda

Software management

Miniconda is a minimal Python distribution that includes the Conda package and environment manager plus only essential dependencies. It provides a lightweight way to create isolated environments and install Python packages as needed, without the large preinstalled package set of Anaconda.

Proprietary

FigCanvas