Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

3,202 of 5,923 resources

Showing 9511,000

Genome-wide association studies (GWAS) is a widely used tool for identification of genetic variants associated with phenotypes and diseases, though complex diseases featuring many genetic variants with small effects present difficulties for traditional these studies. By leveraging pleiotropy, the statistical power of a single GWAS can be increased. This package provides functions for fitting graph-GPA, a statistical framework to prioritize GWAS results by integrating pleiotropy. 'GGPA' package provides user-friendly interface to fit graph-GPA models, implement association mapping, and generate a phenotype graph.

Stale16 years ago
R
GPL-2.0+

Easily visualize and inspect microarrays for spatial artifacts.

Stale06 years ago
R
MIT

loci2path performs statistics-rigorous enrichment analysis of eQTLs in genomic regions of interest. Using eQTL collections provided by the Genotype-Tissue Expression (GTEx) project and pathway collections from MSigDB.

Stale16 years ago
R
Artistic-2.0

Flexible circular visualization of genome-associated data with BioPerl and SVG.

Stale466 years ago
Perl
NOASSERTION

geneXtendeR optimizes the functional annotation of ChIP-seq peaks by exploring relative differences in annotating ChIP-seq peak sets to variable-length gene bodies. In contrast to prior techniques, geneXtendeR considers peak annotations beyond just the closest gene, allowing users to see peak summary statistics for the first-closest gene, second-closest gene, ..., n-closest gene whilst ranking the output according to biologically relevant events and iteratively comparing the fidelity of peak-to-gene overlap across a user-defined range of upstream and downstream extensions on the original boundaries of each gene's coordinates. Since different ChIP-seq peak callers produce different differentially enriched peaks with a large variance in peak length distribution and total peak count, annotating peak lists with their nearest genes can often be a noisy process. As such, the goal of geneXtendeR is to robustly link differentially enriched peaks with their respective genes, thereby aiding experimental follow-up and validation in designing primers for a set of prospective gene candidates during qPCR.

Stale107 years ago
R
GPL-3.0+

JavaScript library for drawing canvas-based gene diagrams.

Stale767 years ago
JavaScript

Contains functions and classes that are needed by arrayCGH packages.

Stale07 years ago
R
GPL

Tool for analysis of codon usage in various unannotated or KEGG/COG annotated DNA sequences. Calculates different measures of CU bias and CU-based predictors of gene expressivity, and performs gene set enrichment analysis for annotated sequences. Implements several methods for visualization of CU and enrichment analysis results.

Stale237 years ago
R
Artistic-2.0

VCFArray extends the DelayedArray to represent VCF data entries as array-like objects with on-disk / remote VCF file as backend. Data entries from VCF files, including info fields, FORMAT fields, and the fixed columns (REF, ALT, QUAL, FILTER) could be converted into VCFArray instances with different dimensions.

Stale17 years ago
R
GPL-3.0

Expertly curated genomics papers to get up to speed on genomics, RNA-seq, statistics (used in genomics), software development, and more.

Stale5027 years ago

Perl package for circular plots, which are well suited for genomic rearrangements.

Stale887 years ago
Perl

Telseq is a tool for estimating telomere length from whole genome sequence data.

Stale767 years ago
C++
GPL-3.0

AbSeq is a comprehensive bioinformatic pipeline for the analysis of sequencing datasets generated from antibody libraries and abseqR is one of its packages. abseqR empowers the users of abseqPy (https://github.com/malhamdoosh/abseqPy) with plotting and reporting capabilities and allows them to generate interactive HTML reports for the convenience of viewing and sharing with other researchers. Additionally, abseqR extends abseqPy to compare multiple repertoire analyses and perform further downstream analysis on its output.

Stale07 years ago
R
GPL-3.0

All alleles from the IPD IMGT/HLA <https://www.ebi.ac.uk/ipd/imgt/hla/> and IPD KIR <https://www.ebi.ac.uk/ipd/kir/> database for Homo sapiens. Reference: Robinson J, Maccari G, Marsh SGE, Walter L, Blokhuis J, Bimber B, Parham P, De Groot NG, Bontrop RE, Guethlein LA, and Hammond JA KIR Nomenclature in non-human species Immunogenetics (2018), in preparation.

Stale07 years ago
R
Artistic-2.0

This package implements UbiBic algorithm in R. This biclustering algorithm for analysis of gene expression data was introduced by Zhenjia Wang et al. in 2016. It is currently considered the most promising biclustering method for identification of meaningful structures in complex and noisy data.

Stale47 years ago
R
MIT

Virtual machine with all software and sample data to run 3D-e-Chem Knime workflows

Stale177 years ago
Shell
Apache-2.0

This package uses an innovative network-based approach that will enhance our ability to determine the identities of significant ions detected by LC-MS.

Stale17 years ago
R
Artistic-2.0

A comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH arrays. As inputs, rCGH supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS or Affymetrix Power Tools. rCGH also supports custom arrays, provided data complies with the expected format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to profiles segmentation and gene annotations. This package also provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.

Stale58 years ago
R
Artistic-2.0

List of resources on alternative splicing including software, databases, and other tools.

Stale588 years ago

Natural Product-likeness calculator v-2.1 : calculates natural product-likeness of small molecules based on open-data of natural products.

Stale48 years ago
Java

GA4GHclient provides an easy way to access public data servers through Global Alliance for Genomics and Health (GA4GH) genomics API. It provides low-level access to GA4GH API and translates response data into Bioconductor-based class objects.

Stale18 years ago
R
GPL-2.0+

A calculator incorporating various empirical pair and many-body potentials.

Stale238 years ago
Fortran
LGPL-3.0

This package does nucleosome positioning using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling.

Stale08 years ago
R
Artistic-2.0

GSALightning provides a fast implementation of permutation-based gene set analysis for two-sample problem. This package is particularly useful when testing simultaneously a large number of gene sets, or when a large number of permutations is necessary for more accurate p-values estimation.

Stale58 years ago
R
GPL-2.0

This package builds on the Epimods framework which facilitates finding weighted subnetworks ("modules") on Illumina Infinium 27k arrays using the SpinGlass algorithm, as implemented in the iGraph package. We have created a class of gene centric annotations associated with p-values and effect sizes and scores from any researchers prior statistical results to find functional modules.

Stale19 years ago
R
GPL-2.0+

isobar provides methods for preprocessing, normalization, and report generation for the analysis of quantitative mass spectrometry proteomics data labeled with isobaric tags, such as iTRAQ and TMT. Features modules for integrating and validating PTM-centric datasets (isobar-PTM). More information on http://www.ms-isobar.org.

Stale109 years ago
R
LGPL-2.0

Methodology for supervised clustering of potentially many predictor variables, such as genes etc., in time series datasets Provides functions that help the user assigning genes to predefined set of model profiles.

Stale19 years ago
R
GPL-2.0

Find the most characteristic gene ontology terms for groups of human genes. This package was created as a part of the thesis which was developed under the auspices of MI^2 Group (http://mi2.mini.pw.edu.pl/, https://github.com/geneticsMiNIng).

Stale29 years ago
R
GPL-3.0

The fmcsR package introduces an efficient maximum common substructure (MCS) algorithms combined with a novel matching strategy that allows for atom and/or bond mismatches in the substructures shared among two small molecules. The resulting flexible MCSs (FMCSs) are often larger than strict MCSs, resulting in the identification of more common features in their source structures, as well as a higher sensitivity in finding compounds with weak structural similarities. The fmcsR package provides several utilities to use the FMCS algorithm for pairwise compound comparisons, structure similarity searching and clustering.

Stale610 years ago
R
Artistic-2.0

Recurrent breakpoint gene detection on copy number aberration profiles.

Stale210 years ago
R
GPL-2.0

This package provides a web interface to compute transcriptional regulatory modules with rTRM.

Stale110 years ago
R
GPL-3.0

D3 JavaScript based genome viewer. Constructs SVGs.

Stale3310 years ago
JavaScript
GPL-2.0

RBPBench is a multi-function tool to evaluate CLIP-seq and other related genomic region data using a comprehensive collection of known RNA-binding protein (RBP) binding motifs. RBPBench can be used for a variety of purposes, from RBP motif search (database or user-supplied RBP motifs) in genomic regions, over motif enrichment and co-occurrence analysis, in-depth comparisons over multiple datasets via sequence and genomic annotation statistics, to benchmarking CLIP-seq peak caller methods as well as comparisons across cell types and CLIP-seq protocols. RBPBench supports both sequence and structure motifs, as well as regular expressions (sequence and structure patterns). Moreover, users can easily provide their own motif collections.

HyPPI classifies a protein-protein complex based on its interaction type into permanent, transient, or crystal artifact. Permanent protein-protein complexes are only stable in their complexed state. Their subunits would denature upon dissociation of the protein-protein complex. Transient protein-protein complexes are stable in the complexed as well as in the monomeric form, depending on the necessary function of the complex. Crystal artifacts have no biological function and are artificially formed during the crystallization process. The discrimination is performed using two characteristics of the protein-protein complex, the hydrophobicity of the interface (ΔGhydrophobic) and the quotient of interface area ratios (IF-quotient). The IF-quotient considers whether the protein-protein interface is symmetric.

JAMDA enables the preparation of individual protein structures and the docking of small molecules in preprocessed binding sites of choice. JAMDA simplifies the process of protein-ligand docking by automatic preprocessing protocols for the protein and binding sites of interest. The JAMDAscore scoring function retrieved 75% of the native poses in the three highest-ranked solutions for high-quality protein-ligand complexes with default settings. Individual configurations for protein preparation are available, e.g., considering protein ensembles, relevant binding site water molecules, or cofactors. A user-defined number of input conformations for the ligands of interest can be generated fully automated using Conformator. Alternatively, users can also provide externally prepared ligand conformers.

DoGSite3 was developed for predicting robust and reliable small molecule binding sites and computing their geometrical and chemical descriptors. It is based on the grid-based DoGSite algorithm for predicting pockets and their sub-pockets. The new tool is largely rotation- and translation-invariant due to a normalization procedure before binding site prediction. Known ligands in the structure can be used to bias the grid by sufficiently buried ligand fragments. The output encompasses novel chemical binding site descriptors considering solvent accessibility. Compared to its predecessor, it shows increased robustness through comprehensive parameter optimization. DoGSite3 runs finish within seconds.

DoGSiteScorer is a grid-based automated pocket detection and analysis tool. It applies a Difference of Gaussian filter to detect potential binding pockets and splits them into sub-pockets. The method solely uses the 3D structure of the protein. Global properties, describing the size, shape, and chemical features of the predicted (sub-)pockets, are calculated. Per default, a simple druggability score based on a linear combination of the three descriptors describing volume, hydrophobicity, and enclosure is provided for each (sub-)pocket. Furthermore, a subset of meaningful descriptors is incorporated in a support vector machine (libsvm) to predict the (sub-)pocket druggability score (values are between zero and one). The higher the score, the more druggable the pocket is estimated to be.

This desktop application enables users to upload DICOM data along with associated clinical information to QP-Insights—the data management platform of the UPV Reference Node within EUCAIM.

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

PoseEdit automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction model that relies on atom types and simple geometric criteria. The structure mining tool GeoMine also uses this model to describe binding sites. In addition, users can manipulate the diagrams by translating, rotating, mirroring parts of the structure, adding additional interactions, or removing them. Furthermore, users can add individual labels or adjust available labels. Users can download the final 2D diagrams for a binding site of interest in JSON or SVG format.

METALizer predicts the coordination geometry of metal ions in metalloproteins. Users can compare potential coordination geometries to those found in the examined structure. The predicted coordination geometries and the observed metal interaction distances can be interactively compared to statistics calculated based on the PDB.

A tool that checks the clinical metadata quality (validity, completeness), the integrity between images and clinical metadata provided as well as their accuracy, the de-identification protocol applied, and existence of annotation together with the consistency between the images and the annotation files and informs the user on corrective actions prior to data upload.

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]

Finds SNP sites from a multi-FASTA alignment file.

NanoporeDB is an open-access structural database dedicated to the exploration, analysis, and design of protein nanopores, which serve as essential molecular gateways in biological membranes and form the basis of many advanced biosensing and sequencing technologies. This platform integrates large-scale structure-guided mining and deep learning-based modeling using AlphaFold-Multimer and AlphaFold3 to provide about 7,000 high-confidence multimeric nanopore structures. Each entry includes detailed information on membrane embedding, pore geometry annotation, and constriction profiling to support functional and biophysical interpretation. Through an interactive 3D visualization interface and quantitative parameters such as tilt angle, insertion depth, and pore geometry, NanoporeDB enables researchers to explore nanopore diversity, discover novel scaffolds, and accelerate innovation in molecular sensing, precision diagnostics, and synthetic biology.

Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.

S3segmenter is a Matlab-based set of functions that generates single cell (nuclei and cytoplasm) label masks.

CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python.

Miniconda is a minimal Python distribution that includes the Conda package and environment manager plus only essential dependencies. It provides a lightweight way to create isolated environments and install Python packages as needed, without the large preinstalled package set of Anaconda.

FigCanvas is an AI scientific figure generator for life-science researchers. It produces publication-ready biological diagrams (mechanism diagrams, pathway figures, cell biology visuals), CONSORT and methodology flowcharts, and data visualizations such as volcano plots from text prompts or uploaded datasets. The tool turns methods-section text or structured data into editable vector figures suitable for manuscripts, posters, and slides, helping researchers iterate on figures without rebuilding them in Illustrator.