Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source
Type(1)
3,187 of 5,893 resources
Showing 3,101–3,150
Ularcirc reads in STAR aligned splice junction files and provides visualisation and analysis tools for splicing analysis. Users can assess backsplice junctions and forward canonical junctions.
The biodbChebi library provides access to the ChEBI Database, using biodb package framework. It allows to retrieve entries by their accession number. Web services can be accessed for searching the database by name, mass or other fields.
Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations.
The goal of MineICA is to perform Independent Component Analysis (ICA) on multiple transcriptome datasets, integrating additional data (e.g molecular, clinical and pathological). This Integrative ICA helps the biological interpretation of the components by studying their association with variables (e.g sample annotations) and gene sets, and enables the comparison of components from different datasets using correlation-based graph.
The soGGi package provides a toolset to create genomic interval aggregate/summary plots of signal or motif occurence from BAM and bigWig files as well as PWM, rlelist, GRanges and GAlignments Bioconductor objects. soGGi allows for normalisation, transformation and arithmetic operation on and between summary plot objects as well as grouping and subsetting of plots by GRanges objects and user supplied metadata. Plots are created using the GGplot2 libary to allow user defined manipulation of the returned plot object. Coupled together, soGGi features a broad set of methods to visualise genomics data in the context of groups of genomic intervals such as genes, superenhancers and transcription factor binding events.
We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).
Starting with a BAM file, this package provides the necessary functions for quality assessment, read start position recalibration, the counting of reads on CDS, 3'UTR, and 5'UTR, plotting of count data: pairs, log fold-change, codon frequency and coverage assessment, principal component analysis on codon coverage.
The BPRMeth package is a probabilistic method to quantify explicit features of methylation profiles, in a way that would make it easier to formally use such profiles in downstream modelling efforts, such as predicting gene expression levels or clustering genomic regions or cells according to their methylation profiles.
This package provides an alternative interface to Bioconductor 'annotation' resources, in particular the gene identifier mapping functionality of the 'org' packages (e.g., org.Hs.eg.db) and the genome coordinate functionality of the 'TxDb' packages (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene).
MFA models genomic bifurcations using a Bayesian hierarchical mixture of factor analysers.
MetaNeighbor allows users to quantify cell type replicability across datasets using neighbor voting.
RcisTarget identifies transcription factor binding motifs (TFBS) over-represented on a gene list. In a first step, RcisTarget selects DNA motifs that are significantly over-represented in the surroundings of the transcription start site (TSS) of the genes in the gene-set. This is achieved by using a database that contains genome-wide cross-species rankings for each motif. The motifs that are then annotated to TFs and those that have a high Normalized Enrichment Score (NES) are retained. Finally, for each motif and gene-set, RcisTarget predicts the candidate target genes (i.e. genes in the gene-set that are ranked above the leading edge).
This package allows users to perform DE analysis using multiple algorithms. It seeks consensus from multiple methods. Currently it supports "Voom", "EdgeR" and "DESeq". It uses RUV-seq (optional) to remove unwanted sources of variation.
An R package for fully unsupervised deconvolution of complex tissues. It provides basic functions to perform unsupervised deconvolution on mixture expression profiles by Convex Analysis of Mixtures (CAM) and some auxiliary functions to help understand the subpopulation-specific results. It also implements functions to perform supervised deconvolution based on prior knowledge of molecular markers, S matrix or A matrix. Combining molecular markers from CAM and from prior knowledge can achieve semi-supervised deconvolution of mixtures.
receptLoss identifies genes whose expression is lost in subsets of tumors relative to normal tissue. It is particularly well-suited in cases where the number of normal tissue samples is small, as the distribution of gene expression in normal tissue samples is approximated by a Gaussian. Originally designed for identifying nuclear hormone receptor expression loss but can be applied transcriptome wide as well.
This package provides users with the ability to query the Human Cell Atlas data repository for single-cell experiment data. The `projects()`, `files()`, `samples()` and `bundles()` functions retrieve summary information on each of these indexes; corresponding `*_details()` are available for individual entries of each index. File-based resources can be downloaded using `files_download()`. Advanced use of the package allows the user to page through large result sets, and to flexibly query the 'list-of-lists' structure representing query responses.
Identifies motifs that are significantly co-enriched from enhancer-promoter interaction data. While enhancer-promoter annotation is commonly used to define groups of interaction anchors, spatzie also supports co-enrichment analysis between preprocessed interaction anchors. Supports BEDPE interaction data derived from genome-wide assays such as HiC, ChIA-PET, and HiChIP. Can also be used to look for differentially enriched motif pairs between two interaction experiments.
RgnTX allows the integration of transcriptome annotations so as to model the complex alternative splicing patterns. It supports the testing of transcriptome elements without clear isoform association, which is often the real scenario due to technical limitations. It involves functions that do permutaion test for evaluating association between features and transcriptome regions.
The toolkit 'µSTASIS', or microSTASIS, has been developed for the stability analysis of microbiota in a temporal framework by leveraging on iterative clustering. Concretely, the core function uses Hartigan-Wong k-means algorithm as many times as possible for stressing out paired samples from the same individuals to test if they remain together for multiple numbers of clusters over a whole data set of individuals. Moreover, the package includes multiple functions to subset samples from paired times, validate the results or visualize the output.
This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.
Google DeepMind's official collection of agentic science skills accelerating scientific workflows with better grounding and higher token efficiency, integrating insights from AlphaGenome, AFDB, UniProt and 30+ other databases and tools (2026)
Structure prediction and design of proteins with noncanonical amino acids, enabling AI-powered modeling of synthetic biology constructs and expanded genetic code systems (133+ stars, 2025)
Multimodal AI system generating virtual populations for tumor microenvironment modeling from H&E and multiplex immunofluorescence pathology images, enabling large-scale spatial analysis of cancer biology and therapeutic response prediction (Microsoft Research & Providence, 370+ stars)
Open-source medical large language model for complex clinical reasoning, extending the o1 long-chain-of-thought paradigm to biomedical question answering and diagnostic inference (FreedomIntelligence, 1.3K+ stars)
This package facilitates reading, preprocessing and manipulating Codelink microarray data. The raw data must be exported as text file using the Codelink software.
This package provides functions for an Interactive Differential Expression AnaLysis of RNA-sequencing datasets, to extract quickly and effectively information downstream the step of differential expression. A Shiny application encapsulates the whole package. Support for reproducibility of the whole analysis is provided by means of a template report which gets automatically compiled and can be stored/shared.
This package integrates colocalization probabilities from colocalization analysis with transcriptome-wide association study (TWAS) scan summary statistics to implicate genes that may be biologically relevant to a complex trait. The probabilistic framework implemented in this package constrains the TWAS scan z-score-based likelihood using a gene-level colocalization probability. Given gene set annotations, this package can estimate gene set enrichment using posterior probabilities from the TWAS-colocalization integration step.
Phantasus is a web-application for visual and interactive gene expression analysis. Phantasus is based on Morpheus – a web-based software for heatmap visualisation and analysis, which was integrated with an R environment via OpenCPU API. Aside from basic visualization and filtering methods, R-based methods such as k-means clustering, principal component analysis or differential expression analysis with limma package are supported.
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs (460+ stars, 2024)
First bioinformatics-native AI agent skill library enabling local-first, reproducible genomic and population-genetics research workflows built on OpenClaw (871+ stars, MIT License, 2026)
A machine learning-based tool to estimate the overall survival probability in patients with neuroblastoma, supporting clinical decision-making and prognosis.
A machine learning model that predicts overall survival in patients with glioblastoma, using radiomic and clinical features.
Performs volumetric analysis of brain structures by segmenting and calculating the volume of grey matter, white matter, and CSF. Results support studies on neurodegeneration, development, or disease progression.
Extracts deep features from MR images using pretrained neural networks. These features can be used for classification, clustering, or survival prediction tasks in medical imaging.
Computes R1 and T1 maps from MR images, showing the rate and time of longitudinal relaxation. These are key quantitative biomarkers for tissue characterization.
Extracts diffusion-related maps (e.g., ADC, IVIM, Kurtosis) from DWI sequences to evaluate microstructural properties of tissues, commonly used in oncology and neurology.
Tool for calculating R2 maps from T2*-weighted images. These maps reflect tissue relaxation rates and can be used to assess tissue properties and detect abnormalities.
Implemented by GIBI230, this tool is a Docker-based software designed for extracting radiomic features from 3D medical images in NIfTI format using the PyRadiomics library (if DICOM images, the DICOM to NIFTI converter must be run before using this tool). It streamlines the radiomics calculation process by generating a structured CSV file containing all extracted variables from medical images. The dockerized software enables users to configure parameters like filters, bin width, resampling spacing, and normalization settings can be specified. The output radiomic variables provide quantitative information for further analysis in medical imaging research and machine learning applications. Specially important the parameter selection of the band width. For robust and reproducible results, a bin width of 5 is commonly recommended, but it should be adjusted based on image resolution, modality, and noise levels.
This tool extracts perfusion maps from dynamic imaging data (e.g., DCE-MRI) using pharmacokinetic models or semi-quantitative methods. It supports the evaluation of blood flow and tissue vascularity.
The tool is designed to perform radiomics harmonization on large and heterogeneous datasets, where the risk of over-harmonization is present. Instead of directly applying harmonization based on predefined batch labels, the tool first identifies groups of batches that share similar characteristics through clustering of the radiomics data. It then performs harmonization using these cluster-derived labels. The tool allows the harmonization of radiomics variables using two methods: (1) original ComBat (Rabinovic, 2007) method, where each original batch group is considered for the harmonization process and (2) cluster-based ComBat method, where batch groups with similar radiomics characteristics form clusters and the latter are being considered for the harmonization process.
This preprocessing tool is design for 2D digital mammograms in DICOM format. It standardizes and harmonizes images through a configurable pipeline that includes spatial reorientation, pseudo-3D stacking, isotropic resampling, intensity normalization, optional denoising, contrast enhancement, and mask processing (if available).
The tool performs by deep learning an automatic segmentation of the possible neuroblastoma tumours on Contrast Enhanced CT images (CE-CTs). Model architecture is Unet-based with residual operations, atrous dilation convolution and specific batch generator. It applies preprocessing steps as RAS conversion, resizing, z-score normalization, patching; and postprocessing operations. It takes DICOM images as input and generates tumoral masks in DICOM SEG or NIFTI formats.
The tool performs an automatic segmentation of the possible glioblastoma tumours on MRI images and its subregions: necrosis (Intratumoral necrotic core), edema (Peritumoral vasogenic edema), enhancing (Contrast-enhancing tumor region), total (Total tumor including edema and necrosis by a single model) and total-fused (Total tumor fusioning of necrosis+edema+enhancing). It applies preprocessing steps as skull stripping, intra-patient registration, z-score normalization, patching, among others. It takes DICOM images as input and generates tumoral masks in DICOM SEG or NIFTI formats.
The tool performs an automatic segmentation of the possible DIPG tumours on MR images. DIPG (Diffuse Intrinsic Pontine Glioma), or more recently, DMG (Diffuse Midline Glioma) is a H3 K27M–mutant pediatric brainstem cancer detected in T1W and Flair/T2-weighted magnetic resonance images. The tool includes a complete workflow from DICOM images to DICOM seg tumoral masks.
This tool is specifically designed and validated for automated detection and segmentation of neuroblastic tumours in T2-weighted magnetic resonance images (T2-MR) using deep learning. It processes DICOM or NIfTI input data and outputs in NIFTI or DICOM SEG. TRAINING & VALIDATION COHORTS: Initial Development (Veiga-Canuto 2022): -Training: 106 patients, 5-fold CV (median DSC 0.965 ± 0.018). -Internal validation: 26 patients (median DSC 0.918 ± 0.067). -Sources: La Fe (Spain), SIOPEN HR-NBL1/LINES, St. Anna (Austria), Pisa (Italy). -Mean age: 37.6 ± 39.3 months. -Median tumor volume: 116,518 mm³. External Validation (Veiga-Canuto 2023): -300 patients, 535 independent T2 MRI scans (486 at diagnosis, 49 post-chemotherapy). -Performance: median DSC 0.997 (0.944–1.000), 94% successful detection. -Sources: 12 European countries (HR-NBL1/SIOPEN 119, LINES/SIOPEN 107, German Registry 62, others 12). -Heterogeneous data: 1.5T (435), 3T (100); Siemens (318), Philips (109), GE (105), Canon (3).
The tool is designed to perform a customisable image pre-processing to reduce noise and inhomogeneity field effect, thus improving image quality and reproducibility of radiomics features. This tool consists of two independent steps: one for denoising using one of the 5 integrated filters (Bilateral Filter, Anisotropic Diffusion Filter (ADF), Curvature Flow Filter (CFF), SUSAN and Non Local Means (NLM)), and another for the ANTs N4 and another for the ANT's N4 bias correction filter. The parameter configuration of this tool has been optimised for TW1, T2W, DWI and DCE sequences in neuroblastoma (NB) and paediatric brain tumours, but it can also be configured with some of their parameters using a JSON parameter configuration file.
A tool based on artificial intelligence that is able to perform a categorisation of MRI series by using standardized DICOM tags. The categorisation includes the type of sequence (e.g. spin echo, gradient echo), the weighting (e.g. T1W, T2W, DCE, ...), the presence of fat suppression and the detection of non-relevant / junk series (e.g. localizers, calibrations, screenshots...).
Tool that aims to validate visually the chronological order and logical consistency of dates associated with a patient's medical history. It generates a timeline visualization for each patient from an Excel file and highlights rule violations. Status : Containerized