Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

43 of 5,940 resources

Derives cells per well and suspension pipette volumes for standard 6-, 12-, 24-, 48-, 96-, and 384-well plates from a hemocytometer stock count, trypan blue viability, and target seeding confluency, with QC flags for low viability and impractical transfers. A browser calculator supports interactive planning with cell-line presets; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured plate tables and shareable run identifiers.

Active11 week ago
Python
MIT

Computes weighed laboratory buffer recipes from target pH, concentration, and volume, accounting for separate preparation and working temperatures when pKa shifts with temperature. Supports calculator mode from dry reagents and stock dilution mode, returning acid and base masses, ionic strength estimates, optional NaCl adjustment, gravimetric and titration routes, and stepwise protocols. A browser calculator supports interactive recipe entry with shareable links; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured recipe tables, compatibility warnings, and shareable run identifiers.

Active11 week ago
Python
MIT

Plans geometric serial dilution series for molecular biology and biochemistry workflows, rounding transfer volumes to declared pipette ranges and optional 96- or 384-well plate layouts. A browser calculator supports interactive protocol design; a Python client and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured step tables and shareable run identifiers.

Active11 week ago
Python
MIT

Modular toolchain for an extensible and customizable ETL pipeline that extracts, transforms, and loads clinical data and medical imaging metadata, applying dataset-specific mappings to generate outputs compatible with the EUCAIM Common Data Model (CDM). Its design aims to minimize manual data preparation efforts and facilitate customization and integration with other components, such as data quality assurance tools. Containerized, currently supports input datasets in CSV, JSON, XLSX.

Active02 weeks ago
Python

JCVI is a versatile toolkit for comparative genomics analysis. It is a collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

Active9161 month ago
Python
BSD-2-Clause

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

Active2662 months ago
Python
BSD-3-Clause

A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.

Active373 months ago
Python
MIT

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.

Active2574 months ago
Python
Other

Utility that performs integrated analyses of 'gene' data (a set of genes or other genomic features) with 'peak' data (a set of regions, for example ChIP peaks) to identify the genes nearest to each peak, and vice versa.

Idle51 year ago
Python
Artistic-2.0

AlphaPickle is a Python tool that converts AlphaFold and ColabFold output files into user-friendly CSV files and plots, enabling easy analysis and visualization of protein prediction data without requiring programming expertise. It processes .pkl, .json, and PDB files to extract and visualize metrics like pLDDT and PAE.

Stale332 years ago
Python
GPL-3.0

NuclearPhaser is a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs. This is an overview of the phasing pipeline for dikaryons.

Stale133 years ago
Python
GPL-3.0

A cookiecutter template for bioinformatics projects, with a focus on building bioinformatics workflows that can run on the MPI-IE cluster according to FAIR principles.

Stale143 years ago
Python
MIT

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.

Stale926 years ago
Python
MIT

This desktop application enables users to upload DICOM data along with associated clinical information to QP-Insights—the data management platform of the UPV Reference Node within EUCAIM.

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

A tool that checks the clinical metadata quality (validity, completeness), the integrity between images and clinical metadata provided as well as their accuracy, the de-identification protocol applied, and existence of annotation together with the consistency between the images and the annotation files and informs the user on corrective actions prior to data upload.

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]

Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.

CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python.

Miniconda is a minimal Python distribution that includes the Conda package and environment manager plus only essential dependencies. It provides a lightweight way to create isolated environments and install Python packages as needed, without the large preinstalled package set of Anaconda.

Circlator is a tool to circularize genome assemblies. It will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.

An interactive platform that performs statistical analyses on metabolomics datasets and allows visualising results with ease. The interface gives users autonomy in creating figures suited to their reporting and publication needs.

Implemented by GIBI230, this tool is a Docker-based software designed for extracting radiomic features from 3D medical images in NIfTI format using the PyRadiomics library (if DICOM images, the DICOM to NIFTI converter must be run before using this tool). It streamlines the radiomics calculation process by generating a structured CSV file containing all extracted variables from medical images. The dockerized software enables users to configure parameters like filters, bin width, resampling spacing, and normalization settings can be specified. The output radiomic variables provide quantitative information for further analysis in medical imaging research and machine learning applications. Specially important the parameter selection of the band width. For robust and reproducible results, a bin width of 5 is commonly recommended, but it should be adjusted based on image resolution, modality, and noise levels.

The tool is designed to perform radiomics harmonization on large and heterogeneous datasets, where the risk of over-harmonization is present. Instead of directly applying harmonization based on predefined batch labels, the tool first identifies groups of batches that share similar characteristics through clustering of the radiomics data. It then performs harmonization using these cluster-derived labels. The tool allows the harmonization of radiomics variables using two methods: (1) original ComBat (Rabinovic, 2007) method, where each original batch group is considered for the harmonization process and (2) cluster-based ComBat method, where batch groups with similar radiomics characteristics form clusters and the latter are being considered for the harmonization process.

This preprocessing tool is design for 2D digital mammograms in DICOM format. It standardizes and harmonizes images through a configurable pipeline that includes spatial reorientation, pseudo-3D stacking, isotropic resampling, intensity normalization, optional denoising, contrast enhancement, and mask processing (if available).

The tool performs by deep learning an automatic segmentation of the possible neuroblastoma tumours on Contrast Enhanced CT images (CE-CTs). Model architecture is Unet-based with residual operations, atrous dilation convolution and specific batch generator. It applies preprocessing steps as RAS conversion, resizing, z-score normalization, patching; and postprocessing operations. It takes DICOM images as input and generates tumoral masks in DICOM SEG or NIFTI formats.

The tool performs an automatic segmentation of the possible glioblastoma tumours on MRI images and its subregions: necrosis (Intratumoral necrotic core), edema (Peritumoral vasogenic edema), enhancing (Contrast-enhancing tumor region), total (Total tumor including edema and necrosis by a single model) and total-fused (Total tumor fusioning of necrosis+edema+enhancing). It applies preprocessing steps as skull stripping, intra-patient registration, z-score normalization, patching, among others. It takes DICOM images as input and generates tumoral masks in DICOM SEG or NIFTI formats.

The tool performs an automatic segmentation of the possible DIPG tumours on MR images. DIPG (Diffuse Intrinsic Pontine Glioma), or more recently, DMG (Diffuse Midline Glioma) is a H3 K27M–mutant pediatric brainstem cancer detected in T1W and Flair/T2-weighted magnetic resonance images. The tool includes a complete workflow from DICOM images to DICOM seg tumoral masks.

This tool is specifically designed and validated for automated detection and segmentation of neuroblastic tumours in T2-weighted magnetic resonance images (T2-MR) using deep learning. It processes DICOM or NIfTI input data and outputs in NIFTI or DICOM SEG. TRAINING & VALIDATION COHORTS: Initial Development (Veiga-Canuto 2022): -Training: 106 patients, 5-fold CV (median DSC 0.965 ± 0.018). -Internal validation: 26 patients (median DSC 0.918 ± 0.067). -Sources: La Fe (Spain), SIOPEN HR-NBL1/LINES, St. Anna (Austria), Pisa (Italy). -Mean age: 37.6 ± 39.3 months. -Median tumor volume: 116,518 mm³. External Validation (Veiga-Canuto 2023): -300 patients, 535 independent T2 MRI scans (486 at diagnosis, 49 post-chemotherapy). -Performance: median DSC 0.997 (0.944–1.000), 94% successful detection. -Sources: 12 European countries (HR-NBL1/SIOPEN 119, LINES/SIOPEN 107, German Registry 62, others 12). -Heterogeneous data: 1.5T (435), 3T (100); Siemens (318), Philips (109), GE (105), Canon (3).

The tool is designed to perform a customisable image pre-processing to reduce noise and inhomogeneity field effect, thus improving image quality and reproducibility of radiomics features. This tool consists of two independent steps: one for denoising using one of the 5 integrated filters (Bilateral Filter, Anisotropic Diffusion Filter (ADF), Curvature Flow Filter (CFF), SUSAN and Non Local Means (NLM)), and another for the ANTs N4 and another for the ANT's N4 bias correction filter. The parameter configuration of this tool has been optimised for TW1, T2W, DWI and DCE sequences in neuroblastoma (NB) and paediatric brain tumours, but it can also be configured with some of their parameters using a JSON parameter configuration file.

A tool based on artificial intelligence that is able to perform a categorisation of MRI series by using standardized DICOM tags. The categorisation includes the type of sequence (e.g. spin echo, gradient echo), the weighting (e.g. T1W, T2W, DCE, ...), the presence of fat suppression and the detection of non-relevant / junk series (e.g. localizers, calibrations, screenshots...).

Tool that aims to validate visually the chronological order and logical consistency of dates associated with a patient's medical history. It generates a timeline visualization for each patient from an Excel file and highlights rule violations. Status : Containerized

The tool performs a DICOM quality check in terms of correct number of files per sequence, corrupted files, precise directory hierarchy, separated dynamic series merging them, interest series filtering/selection by specific series description lists and diffusion sequence identification by b-values. It applies the desired changes to the dataset and generates a report containing information about the selected sequences, corrupted files, missing files and merged files. Status: Deployed

Membrane Protein-Lipid Interaction Database. A large-scale experimentally validated dataset of 80685 residue-level lipid contact annotations across 4712 membrane proteins derived from PDB crystal and cryo-EM structures. Provides pre-computed binary contact labels, continuous distance values, sequence-identity-based cluster assignments, and ready-made train-validation-test splits for machine learning.

RepEnrich is a method to estimate repetitive element enrichment using high-throughput sequencing data.

Screen a bacterial assembly (contigs/CDS or proteins) for nucleotide or protein sequences. Pipeline that screens for presence of genes of interest (GOI) in bacterial assemblies. Generates multiple CSVs and plots that describe which genes are present and how variable their sequence is. Can use DNA or protein query sequences (GOIs) and DNA contigs/fastas or protein fastas as database (db) to search in.

Tandem repeat genotyping with long reads, being a modified version of HipSTR.

Processes 96-well plate absorbance data through blank subtraction, regression fitting, and dilution correction to report sample concentrations with QC flags for BCA, Bradford, and ELISA workflows. A browser calculator supports interactive grid entry with CSV and PDF export; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits plate layout and absorbance values and returns model comparison, per-sample concentrations, and shareable run identifiers.

Estimates PCR primer melting temperatures and polymerase-specific annealing temperatures from sequence and buffer inputs, with per-pair QC for hairpins, dimers, and Tm balance. A browser calculator supports interactive single-pair and batch entry (up to 200 pairs) with method comparison and export; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic for the API client is hosted remotely; sequences are transmitted for programmatic runs while the web interface performs calculations in the browser.

Plans PCR and qPCR master-mix reagent volumes from stock and final concentrations, reaction counts, and pipetting overage, with consolidated totals when several assays are prepared together. A browser calculator supports interactive recipe entry with printable bench sheets; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured volume tables, dilution warnings, and shareable run identifiers.

Computes laboratory solution preparation parameters—powder mass to weigh, stock and diluent volumes for single dilutions, and multi-step serial concentration tables—with correction for hydrated salts and supplier purity. A browser calculator supports interactive prep planning with saved recipes and shareable links; a Python client and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured protocol steps and shareable run identifiers.

FlavoTyper is a bioinformatics tool that performs in silico serotyping of Flavobacterium psychrophilum genome assemblies.

MONAI Label is an intelligent open source image labeling and learning tool that enables users to create annotated datasets and build AI annotation models for clinical evaluation. MONAI Label enables application developers to build labeling apps in a serverless way, where custom labeling apps are exposed as a service through the MONAI Label Server.