Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source(1)
Type
118 of 5,940 resources
Showing 51–100
Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]
NanoporeDB is an open-access structural database dedicated to the exploration, analysis, and design of protein nanopores, which serve as essential molecular gateways in biological membranes and form the basis of many advanced biosensing and sequencing technologies. This platform integrates large-scale structure-guided mining and deep learning-based modeling using AlphaFold-Multimer and AlphaFold3 to provide about 7,000 high-confidence multimeric nanopore structures. Each entry includes detailed information on membrane embedding, pore geometry annotation, and constriction profiling to support functional and biophysical interpretation. Through an interactive 3D visualization interface and quantitative parameters such as tilt angle, insertion depth, and pore geometry, NanoporeDB enables researchers to explore nanopore diversity, discover novel scaffolds, and accelerate innovation in molecular sensing, precision diagnostics, and synthetic biology.
Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.
S3segmenter is a Matlab-based set of functions that generates single cell (nuclei and cytoplasm) label masks.
CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python.
Miniconda is a minimal Python distribution that includes the Conda package and environment manager plus only essential dependencies. It provides a lightweight way to create isolated environments and install Python packages as needed, without the large preinstalled package set of Anaconda.
FigCanvas is an AI scientific figure generator for life-science researchers. It produces publication-ready biological diagrams (mechanism diagrams, pathway figures, cell biology visuals), CONSORT and methodology flowcharts, and data visualizations such as volcano plots from text prompts or uploaded datasets. The tool turns methods-section text or structured data into editable vector figures suitable for manuscripts, posters, and slides, helping researchers iterate on figures without rebuilding them in Illustrator.
AlphaFind v2 is a tool for fast, structure‑based search for protein structures against the AlphaFold DB (https://alphafold.ebi.ac.uk/) and TED DB (https://ted.cathdb.info/). The tool uses protein‑level embeddings to provide a rapid pre‑filter, with top candidates undergoing TM-Score, RMSD and residue‑level alignments computations. Four complementary search modes are available: (i) whole‑chain search, (ii) pLDDT‑aware search that restricts similarity to high‑confidence regions, (iii) domain search against the TED database, and (iv) multidomain search that combines several chain‑level matches into a single score. Users can restrict queries to a given organism, CATH superfamily or to proteins with experimental structures, and submit queries by UniProt/AlphaFold identifier. Results comprise a ranked list with similarity metrics, rich metadata and an interactive 3‑D superposition view. The service is freely accessible at https://alphafind.ics.muni.cz/.
Circlator is a tool to circularize genome assemblies. It will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.
PoseView automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction mode that relies on atom types and simple geometric criteria. It adheres to the conventions of chemical structure diagram generation. The quality of the resulting diagrams is comparable to manually drawn examples from books and scientific publications.
An interactive platform that performs statistical analyses on metabolomics datasets and allows visualising results with ease. The interface gives users autonomy in creating figures suited to their reporting and publication needs.
MicroMiner assists in identifying single-residue substitutions in protein structure databases. It searches protein residue environments with local sequence and structural similarity based on the SIENA methodology. Users can search for structural mutation in the entire PDB, their in-house structure collection, or (subsets of) the AlphaFold Database. They can use the method to explore the mutation landscape of proteins with experimental or predicted structures. MicroMiner can be applied to single domains or even protein-protein or protein-ligand interfaces. Several filter options to simplify downstream analysis are available.
SIENA is a software pipeline enabling the fully automated construction of protein structure ensembles from the PDB. Starting with a single query structure, all binding sites with high sequence similarity are extracted from the PDB, aligned, and superimposed. SIENA also handles complicated cases, such as comparing binding sites at protein domain interfaces or within multimeric proteins.
GeoMine enables the automated mining of protein-ligand binding sites. Based on individually designed queries, users can search for spatial interaction patterns in huge collections of protein-ligand complexes and binding pockets. The regularly updated GeoMine database relies on the free database systems SQLite and PostgreSQL. It supports radius-based pockets (based on ligands and predicted pockets (based on DoGSite3) for query generation. The query management is based on XML (for the REST service) or JSON in the GUI mode. Its output consists of the query-based superpositions of the matched binding sites and statistics on matching points, distances, and angles.
WarPP predicts the position and orientation of water molecules in small-molecule binding sites. It places and scores water molecules in binding sites of crystallographic structures based on EDIAscorer results and interaction geometries as known from experimentally solved protein structures. WarPP was validated on a high-quality set of 1,500 protein-ligand complexes, containing 20,000 crystallographically observed water molecules. It is sufficiently fast for high-throughput analyses. It correctly places water molecules in approx. 80% of the cases. Users can export the predictions as PDB files for, e.g., molecular docking with JAMDA.
Protoss is a fully automated hydrogen atom placement tool for protein-ligand complexes. It adds missing hydrogen atoms to protein structures and detects reasonable protonation states, tautomeric states, and hydrogen coordinates of both protein and ligand molecules by optimizing the hydrogen bond network.
Primerpickr is an open-source tool for rt-PCr primer picking powered by the aggregation of public usage of rt-pcr primers from open source papers. The database is validated with 154 genes and contains over 31,000 genes across 10 species.
The electron density score for individual atoms (EDIA) quantifies the electron density fit of each atom in a crystallographically resolved structure. Multiple EDIA values can be combined using the power mean to compute the EDIAm, i.e., the electron density score for a group of several atoms. It enables users to score a set of atoms, such as a ligand, a residue, or an active site.
Three-dimensional protein structures play a vital role in drug design. Structure-based design necessitates an in-depth examination of the available quality data before using the structure in computational experiments and for method evaluation. StructureProfiler assists in automatically profiling sets of protein-ligand complex structures based on multiple quality indicators, ranging from model characteristics, e.g., the R factor, and active site features, e.g., bond length deviations, to ligand properties such as electron density support and the validity of torsion angles.
LifeSoaks was designed to find solvent channels in macromolecular structures solved by X-ray crystallography. It predicts their accessibility by molecules through an automated annotation of so-called bottleneck radii. It simplifies the process of manually checking a crystal structure for solvent channels. Bottleneck radii can be calculated for solvent channels and small molecule binding sites. The tool is ideally suited for channel analyses before the actual soaking experiments to select the most promising experimental conditions and crystal forms. LifeSoaks runs fully automated and will finish within seconds to minutes for moderately sized crystals.
PlantiSMASH is a specialized extension of antiSMASH for the identification and analysis of biosynthetic gene clusters (BGCs) in plant genomes. It supports advanced plant-specific detection rules and features for comparative genomics, visualization, and more.
DigestedProteinDB provides a scalable computational infrastructure for indexing and querying peptide cleavage data. Designed for seamless integration into high-throughput mass spectrometry pipelines, it enables low-latency searches and advanced filtering of digested protein datasets to accelerate experimental spectra cross-referencing.
The Open Neuroscience Graph (openneuroscience.org) is an open-access, curated knowledge graph that maps the open science ecosystem in neuroscience as a browsable digital garden. Built from an Obsidian vault and published as a static website using Quartz, the project replaces traditional linear presentation with a networked structure of interlinked Markdown notes. Bidirectional links, full-text search, and an integrated graph visualization allow users to navigate thematic relationships dynamically rather than sequentially. The complete source material is openly available to sustain, replicate and extend the resource, includding all Markdown content, media attachments, Quartz configuration files, and site customizations. Researchers, educators, and open-science practitioners may explore the site directly, download the vault for offline use in Obsidian, or fork the material to build new, derivative knowledge bases. PID=https://doi.org/10.5281/zenodo.20181900
It is a web-application for visual and interactive gene expression analysis. Phantasus is based on Morpheus – a web-based software for heatmap visualisation and analysis, which was integrated with an R environment via OpenCPU API. Aside from basic visualization and filtering methods, R-based methods such as k-means clustering, principal component analysis or differential expression analysis with limma package are supported.
CalibraCurve is a computational tool designed to generate calibration curves for targeted mass spectrometry-based quantitative data. It is applicable to various omics disciplines, including proteomics, lipidomics, and metabolomics. The package also offers functionalities for data and calibration curve visualization and concentration prediction from new datasets based on the established curves.
A machine learning-based tool to estimate the overall survival probability in patients with neuroblastoma, supporting clinical decision-making and prognosis.
A machine learning model that predicts overall survival in patients with glioblastoma, using radiomic and clinical features.
Performs volumetric analysis of brain structures by segmenting and calculating the volume of grey matter, white matter, and CSF. Results support studies on neurodegeneration, development, or disease progression.
Extracts deep features from MR images using pretrained neural networks. These features can be used for classification, clustering, or survival prediction tasks in medical imaging.
Computes R1 and T1 maps from MR images, showing the rate and time of longitudinal relaxation. These are key quantitative biomarkers for tissue characterization.
Extracts diffusion-related maps (e.g., ADC, IVIM, Kurtosis) from DWI sequences to evaluate microstructural properties of tissues, commonly used in oncology and neurology.
Tool for calculating R2 maps from T2*-weighted images. These maps reflect tissue relaxation rates and can be used to assess tissue properties and detect abnormalities.
Implemented by GIBI230, this tool is a Docker-based software designed for extracting radiomic features from 3D medical images in NIfTI format using the PyRadiomics library (if DICOM images, the DICOM to NIFTI converter must be run before using this tool). It streamlines the radiomics calculation process by generating a structured CSV file containing all extracted variables from medical images. The dockerized software enables users to configure parameters like filters, bin width, resampling spacing, and normalization settings can be specified. The output radiomic variables provide quantitative information for further analysis in medical imaging research and machine learning applications. Specially important the parameter selection of the band width. For robust and reproducible results, a bin width of 5 is commonly recommended, but it should be adjusted based on image resolution, modality, and noise levels.
This tool extracts perfusion maps from dynamic imaging data (e.g., DCE-MRI) using pharmacokinetic models or semi-quantitative methods. It supports the evaluation of blood flow and tissue vascularity.
The tool is designed to perform radiomics harmonization on large and heterogeneous datasets, where the risk of over-harmonization is present. Instead of directly applying harmonization based on predefined batch labels, the tool first identifies groups of batches that share similar characteristics through clustering of the radiomics data. It then performs harmonization using these cluster-derived labels. The tool allows the harmonization of radiomics variables using two methods: (1) original ComBat (Rabinovic, 2007) method, where each original batch group is considered for the harmonization process and (2) cluster-based ComBat method, where batch groups with similar radiomics characteristics form clusters and the latter are being considered for the harmonization process.
This preprocessing tool is design for 2D digital mammograms in DICOM format. It standardizes and harmonizes images through a configurable pipeline that includes spatial reorientation, pseudo-3D stacking, isotropic resampling, intensity normalization, optional denoising, contrast enhancement, and mask processing (if available).
The tool performs by deep learning an automatic segmentation of the possible neuroblastoma tumours on Contrast Enhanced CT images (CE-CTs). Model architecture is Unet-based with residual operations, atrous dilation convolution and specific batch generator. It applies preprocessing steps as RAS conversion, resizing, z-score normalization, patching; and postprocessing operations. It takes DICOM images as input and generates tumoral masks in DICOM SEG or NIFTI formats.
The tool performs an automatic segmentation of the possible glioblastoma tumours on MRI images and its subregions: necrosis (Intratumoral necrotic core), edema (Peritumoral vasogenic edema), enhancing (Contrast-enhancing tumor region), total (Total tumor including edema and necrosis by a single model) and total-fused (Total tumor fusioning of necrosis+edema+enhancing). It applies preprocessing steps as skull stripping, intra-patient registration, z-score normalization, patching, among others. It takes DICOM images as input and generates tumoral masks in DICOM SEG or NIFTI formats.
The tool performs an automatic segmentation of the possible DIPG tumours on MR images. DIPG (Diffuse Intrinsic Pontine Glioma), or more recently, DMG (Diffuse Midline Glioma) is a H3 K27M–mutant pediatric brainstem cancer detected in T1W and Flair/T2-weighted magnetic resonance images. The tool includes a complete workflow from DICOM images to DICOM seg tumoral masks.
This tool is specifically designed and validated for automated detection and segmentation of neuroblastic tumours in T2-weighted magnetic resonance images (T2-MR) using deep learning. It processes DICOM or NIfTI input data and outputs in NIFTI or DICOM SEG. TRAINING & VALIDATION COHORTS: Initial Development (Veiga-Canuto 2022): -Training: 106 patients, 5-fold CV (median DSC 0.965 ± 0.018). -Internal validation: 26 patients (median DSC 0.918 ± 0.067). -Sources: La Fe (Spain), SIOPEN HR-NBL1/LINES, St. Anna (Austria), Pisa (Italy). -Mean age: 37.6 ± 39.3 months. -Median tumor volume: 116,518 mm³. External Validation (Veiga-Canuto 2023): -300 patients, 535 independent T2 MRI scans (486 at diagnosis, 49 post-chemotherapy). -Performance: median DSC 0.997 (0.944–1.000), 94% successful detection. -Sources: 12 European countries (HR-NBL1/SIOPEN 119, LINES/SIOPEN 107, German Registry 62, others 12). -Heterogeneous data: 1.5T (435), 3T (100); Siemens (318), Philips (109), GE (105), Canon (3).
The tool is designed to perform a customisable image pre-processing to reduce noise and inhomogeneity field effect, thus improving image quality and reproducibility of radiomics features. This tool consists of two independent steps: one for denoising using one of the 5 integrated filters (Bilateral Filter, Anisotropic Diffusion Filter (ADF), Curvature Flow Filter (CFF), SUSAN and Non Local Means (NLM)), and another for the ANTs N4 and another for the ANT's N4 bias correction filter. The parameter configuration of this tool has been optimised for TW1, T2W, DWI and DCE sequences in neuroblastoma (NB) and paediatric brain tumours, but it can also be configured with some of their parameters using a JSON parameter configuration file.
A tool based on artificial intelligence that is able to perform a categorisation of MRI series by using standardized DICOM tags. The categorisation includes the type of sequence (e.g. spin echo, gradient echo), the weighting (e.g. T1W, T2W, DCE, ...), the presence of fat suppression and the detection of non-relevant / junk series (e.g. localizers, calibrations, screenshots...).
Tool that aims to validate visually the chronological order and logical consistency of dates associated with a patient's medical history. It generates a timeline visualization for each patient from an Excel file and highlights rule violations. Status : Containerized
The tool performs a DICOM quality check in terms of correct number of files per sequence, corrupted files, precise directory hierarchy, separated dynamic series merging them, interest series filtering/selection by specific series description lists and diffusion sequence identification by b-values. It applies the desired changes to the dataset and generates a report containing information about the selected sequences, corrupted files, missing files and merged files. Status: Deployed
VIP is a web portal for medical imaging applications. It allows users to access scientific applications as a service (directly through the web browser with no installation required), as well as distributed computing resources in a transparent manner. It exploits the resources available in the biomed virtual organization of the EGI e-infrastructure to offer an open service to researchers worldwide.
Membrane Protein-Lipid Interaction Database. A large-scale experimentally validated dataset of 80685 residue-level lipid contact annotations across 4712 membrane proteins derived from PDB crystal and cryo-EM structures. Provides pre-computed binary contact labels, continuous distance values, sequence-identity-based cluster assignments, and ready-made train-validation-test splits for machine learning.
ekokrati computes habitat connectivity metrics (PC, IIC, EC(PC), dPC and its decomposition into intra-patch, flux and connector components) for habitat patch networks. Users upload polygon data as GeoPackage or shapefile, set species-specific dispersal parameters, and receive patch importance scores and landscape-level indices. Designed for conservation planners, landscape ecologists and environmental consultants. No installation required.
Verbex is a private, on-device Voice-to-ELN iOS app for scientists. It helps researchers capture experiment notes by voice as work happens, organize those notes into scientific sections, and prepare clean, reviewable, ELN-ready scientific records.
The MetaProteomeAnalyzer Cloud (MPA Cloud) is an intuitive, open-source tool for metaproteomics data analysis and interpretation, designed to analyse comprehensive metaproteomics data from tandem mass spectrometry experiments through a web interface.
A comprehensive R package for identifying and ranking influential nodes in biological and other complex networks. The package implements the Integrated Value of Influence (IVI), Experimental data-based Integrative Ranking (ExIR), SIRIR, and numerous network centrality measures, enabling network topology analysis, influential node detection, feature prioritization, and candidate biomarker discovery. It also provides functions for network reconstruction, centrality assessment, visualization, and analysis of relationships between centrality measures.