Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

117 of 5,940 resources

Showing 150

Nallo is a bioinformatics analysis pipeline for long-reads from both PacBio and (targeted) ONT-data, focused on rare-disease. The pipeline detects a wide range of genetic variants, performs genome assembly, and reports CpG methylation. It also enables annotation and ranking of variants based on their predicted functional consequences.

Active674 days ago
Groovy
MIT

Auto-generates clean, customizable academic CVs from open research data (OpenAlex, ORCID, Crossref, DataCite, Open Editors Plus). A single canonical CV object drives every output format (HTML, PDF, DOCX, LaTeX, Markdown); citations render through CSL; and the account holder is matched by persistent identifier (ORCID / OpenAlex ID) rather than name string. Free for individuals, open-source, and FAIR by design.

Active05 days ago
JavaScript
Apache-2.0

A static web application presents an interactive knowledge graph of single-cell long-read RNA sequencing literature synthesized from seven source papers. Users navigate mind-tree, network graph, guided learning-path, and Sankey views linking platforms, protocols, methods, and software. A benchmark tab provides 34 question-answer pairs with category and difficulty filters, exportable as JSON or CSV for LLM and agent evaluation.

Active05 days ago
JavaScript
MIT

Phylo-Movies is an open-source React and Flask web application, also available as a desktop app, for inspecting ordered phylogenetic tree series. It computes and visualizes subtree-prune-and-regraft transition frames between consecutive trees, helping users see which taxa or subtrees move across sliding-window analyses, bootstrap replicates, and curated tree-series comparisons. The viewer includes timeline playback, tree comparison, MSA context, coloring, analytics, image export, and recording tools.

Active15 days ago
HTML
MIT

Databank of optimised macromolecular structures. PDB-REDO entries are refined, rebuilt and validated with one consistent protocol using the equivalent entry in the Protein Data Bank and its experimental data. PDB-REDO entries typically have higher structural quality and a better fit to the experimental data.

Active05 days ago
Freeware

Derives cells per well and suspension pipette volumes for standard 6-, 12-, 24-, 48-, 96-, and 384-well plates from a hemocytometer stock count, trypan blue viability, and target seeding confluency, with QC flags for low viability and impractical transfers. A browser calculator supports interactive planning with cell-line presets; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured plate tables and shareable run identifiers.

Active11 week ago
Python
MIT

Computes weighed laboratory buffer recipes from target pH, concentration, and volume, accounting for separate preparation and working temperatures when pKa shifts with temperature. Supports calculator mode from dry reagents and stock dilution mode, returning acid and base masses, ionic strength estimates, optional NaCl adjustment, gravimetric and titration routes, and stepwise protocols. A browser calculator supports interactive recipe entry with shareable links; a Python library and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured recipe tables, compatibility warnings, and shareable run identifiers.

Active11 week ago
Python
MIT

Galaxy workflow for BlockClust pipeline.

Active1231 week ago
R
MIT

Plans geometric serial dilution series for molecular biology and biochemistry workflows, rounding transfer volumes to declared pipette ranges and optional 96- or 384-well plate layouts. A browser calculator supports interactive protocol design; a Python client and command-line tool submit the same parameters to the Pepkio Tools API for scripted and pipeline use. Calculator arithmetic is hosted remotely; the client transmits parameters and returns structured step tables and shareable run identifiers.

Active11 week ago
Python
MIT

xgt is a command-line tool for programmatic access to the GTDB REST API. It provides four subcommands: search (genome queries with pagination), genome (cards, metadata, taxonomic history), taxon (lineage and genome set retrieval), and diff (per-rank taxonomic comparison between any two GTDB releases). All subcommands support batch input, JSON/CSV/TSV output, file splitting, and automatic retry. Implemented in Rust as a self-contained binary with no runtime dependencies.

Active301 week ago
Rust
Apache-2.0

Tool for converting raw DNA data files between 23andMe, AncestryDNA, MyHeritage, and FamilyTreeDNA formats.

Active12 weeks ago
PHP
MIT

A Flexible Model For Record Linkage

Active12 weeks ago
C++
GPL-3.0-or-later

FlowVision is offline flow cytometry analysis software for Windows and macOS. It supports FCS 2.0, 3.0, 3.1 and 3.2 file formats, polygon/rectangle/ellipse/quadrant gating with auto-fit (snap to cluster), spillover compensation, biexponential and hyperlog scales, MFI statistics (median, geometric mean, CV%), multi-file batch analysis with per-file gate overrides, and hierarchical gating. Spectral unmixing supports linear, NNLS, and Poisson-weighted least squares algorithms, with autofluorescence extraction and spillover spreading matrix. UMAP dimensionality reduction with reproducible seed and landmark mode for high-parameter panels. Imports FlowJo .wsp (compensation matrix) and exports gates to FlowJo .wsp and Gating-ML 2.0 (ISAC open standard) for interoperability with FlowJo, R/flowWorkspace/CytoML, and FCS Express.

Active02 weeks ago
JavaScript
Proprietary

Modular toolchain for an extensible and customizable ETL pipeline that extracts, transforms, and loads clinical data and medical imaging metadata, applying dataset-specific mappings to generate outputs compatible with the EUCAIM Common Data Model (CDM). Its design aims to minimize manual data preparation efforts and facilitate customization and integration with other components, such as data quality assurance tools. Containerized, currently supports input datasets in CSV, JSON, XLSX.

Active02 weeks ago
Python

Standalone browser-based Gene Ontology network viewer for exploring, filtering, searching, and exporting GO term and gene annotation neighborhoods from locally preprocessed GO OBO and GAF data.

Active03 weeks ago
TypeScript
MIT

A small <720Kb C++ windows utility. That allows you to load Ancestry, 23andMe, FTDNA, or Genes for Good RAW DNA files search them, merge them. covert them to Ancestry format. But also create files from peer reviewed publications to compare with you loaded data to give your genetic disposition for the condition you have entered the data for an statistical risk if OR values are included. Included with the program are example files for Type 2 Diabetes risk factors. (As I have type 2 Diabetes so I could test the results).

Active04 weeks ago
C++
GPL-3.0

JCVI is a versatile toolkit for comparative genomics analysis. It is a collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

Active9161 month ago
Python
BSD-2-Clause

Pathogensurveillance is a population genomics pipeline for pathogen identification, variant detection, and biosurveillance. The pipeline accepts paths to raw reads for one or more organisms and creates reports in the form of an interactive HTML document. Significant features include the ability to analyze unidentified eukaryotic and prokaryotic samples, creation of reports for multiple user-defined groupings of samples, automated discovery and downloading of reference assemblies from NCBI RefSeq, and rapid initial identification based on k-mer sketches followed by a more robust multi gene phylogeny and SNP-based phylogeny.

Active601 month ago
Groovy
MIT

The complexity of high-throughput quantitative omics experiments often leads to low replicates numbers and many missing values. We implemented a new test to simultaneously consider missing values and quantitative changes, which we combined with well-performing statistical tests for high confidence detection of differentially regulated features. The package contains functions to run the test and to visualize the results.

Active01 month ago
R
GPL-2.0

Design of linear and cyclic peptide binders from protein sequence information.

Active2681 month ago
Jupyter Notebook

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

Active2662 months ago
Python
BSD-3-Clause

Genome mapping and spliced alignment of cDNA or amino acid sequences

Active1133 months ago
C
GPL-2.0

A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.

Active373 months ago
Python
MIT

seqlib is a type-safe Rust library for working with DNA and RNA sequences.

Active04 months ago
Rust
Other

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.

Active2574 months ago
Python
Other

A python extension, written in C, for quick access to bigBed files and access to and creation of bigWig files.

Active2445 months ago
C
MIT

Minigraph is a sequence-to-graph mapper and graph constructor. For graph generation, it aligns a query sequence against a sequence graph and incrementally augments an existing graph with long query subsequences diverged from the graph.

Idle48110 months ago
C
MIT

A database system designed to store, organize, and manage large-scale nucleotide sequencing read data (like PacBio reads) for the Dazzler genome assembler

Idle361 year ago
C
Other

Git repo for Bio::DB::HTS module on CPAN, providing Perl links into HTSlib

Idle261 year ago
Shell
Apache-2.0

Utility that performs integrated analyses of 'gene' data (a set of genes or other genomic features) with 'peak' data (a set of regions, for example ChIP peaks) to identify the genes nearest to each peak, and vice versa.

Idle51 year ago
Python
Artistic-2.0

In silico derivatization for GC. The GC-derivatization tool converts carbonyl groups to C═N-OCH3 (MeOX) and transforms acidic protons into -Si(CH3)3 (TMS). Key functionalities include checking for specific groups, removing derivatization groups, and adding derivatization groups to molecules.

Stale12 years ago
Jupyter Notebook
MIT

AlphaPickle is a Python tool that converts AlphaFold and ColabFold output files into user-friendly CSV files and plots, enabling easy analysis and visualization of protein prediction data without requiring programming expertise. It processes .pkl, .json, and PDB files to extract and visualize metrics like pLDDT and PAE.

Stale332 years ago
Python
GPL-3.0

NuclearPhaser is a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs. This is an overview of the phasing pipeline for dikaryons.

Stale133 years ago
Python
GPL-3.0

VerityMap is a tool for mapping long reads to assemblies of extra-long tandem repeats, producing SAM files and identifying potential heterozygous sites and assembly errors through analysis of rare k-mers. It supports PacBio HiFi and ONT reads and generates interactive HTML plots for variant analysis.

Stale393 years ago
C
GPL-3.0

A cookiecutter template for bioinformatics projects, with a focus on building bioinformatics workflows that can run on the MPI-IE cluster according to FAIR principles.

Stale143 years ago
Python
MIT

MITObim - mitochondrial baiting and iterative mapping

Stale1165 years ago
Perl
MIT

Finds SNP sites from a multi-FASTA alignment file.

Stale2795 years ago
C
NOASSERTION

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.

Stale926 years ago
Python
MIT

Natural Product-likeness calculator v-2.1 : calculates natural product-likeness of small molecules based on open-data of natural products.

Stale48 years ago
Java

RBPBench is a multi-function tool to evaluate CLIP-seq and other related genomic region data using a comprehensive collection of known RNA-binding protein (RBP) binding motifs. RBPBench can be used for a variety of purposes, from RBP motif search (database or user-supplied RBP motifs) in genomic regions, over motif enrichment and co-occurrence analysis, in-depth comparisons over multiple datasets via sequence and genomic annotation statistics, to benchmarking CLIP-seq peak caller methods as well as comparisons across cell types and CLIP-seq protocols. RBPBench supports both sequence and structure motifs, as well as regular expressions (sequence and structure patterns). Moreover, users can easily provide their own motif collections.

HyPPI classifies a protein-protein complex based on its interaction type into permanent, transient, or crystal artifact. Permanent protein-protein complexes are only stable in their complexed state. Their subunits would denature upon dissociation of the protein-protein complex. Transient protein-protein complexes are stable in the complexed as well as in the monomeric form, depending on the necessary function of the complex. Crystal artifacts have no biological function and are artificially formed during the crystallization process. The discrimination is performed using two characteristics of the protein-protein complex, the hydrophobicity of the interface (ΔGhydrophobic) and the quotient of interface area ratios (IF-quotient). The IF-quotient considers whether the protein-protein interface is symmetric.

JAMDA enables the preparation of individual protein structures and the docking of small molecules in preprocessed binding sites of choice. JAMDA simplifies the process of protein-ligand docking by automatic preprocessing protocols for the protein and binding sites of interest. The JAMDAscore scoring function retrieved 75% of the native poses in the three highest-ranked solutions for high-quality protein-ligand complexes with default settings. Individual configurations for protein preparation are available, e.g., considering protein ensembles, relevant binding site water molecules, or cofactors. A user-defined number of input conformations for the ligands of interest can be generated fully automated using Conformator. Alternatively, users can also provide externally prepared ligand conformers.

DoGSite3 was developed for predicting robust and reliable small molecule binding sites and computing their geometrical and chemical descriptors. It is based on the grid-based DoGSite algorithm for predicting pockets and their sub-pockets. The new tool is largely rotation- and translation-invariant due to a normalization procedure before binding site prediction. Known ligands in the structure can be used to bias the grid by sufficiently buried ligand fragments. The output encompasses novel chemical binding site descriptors considering solvent accessibility. Compared to its predecessor, it shows increased robustness through comprehensive parameter optimization. DoGSite3 runs finish within seconds.

DoGSiteScorer is a grid-based automated pocket detection and analysis tool. It applies a Difference of Gaussian filter to detect potential binding pockets and splits them into sub-pockets. The method solely uses the 3D structure of the protein. Global properties, describing the size, shape, and chemical features of the predicted (sub-)pockets, are calculated. Per default, a simple druggability score based on a linear combination of the three descriptors describing volume, hydrophobicity, and enclosure is provided for each (sub-)pocket. Furthermore, a subset of meaningful descriptors is incorporated in a support vector machine (libsvm) to predict the (sub-)pocket druggability score (values are between zero and one). The higher the score, the more druggable the pocket is estimated to be.

This desktop application enables users to upload DICOM data along with associated clinical information to QP-Insights—the data management platform of the UPV Reference Node within EUCAIM.

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

PoseEdit automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction model that relies on atom types and simple geometric criteria. The structure mining tool GeoMine also uses this model to describe binding sites. In addition, users can manipulate the diagrams by translating, rotating, mirroring parts of the structure, adding additional interactions, or removing them. Furthermore, users can add individual labels or adjust available labels. Users can download the final 2D diagrams for a binding site of interest in JSON or SVG format.

METALizer predicts the coordination geometry of metal ions in metalloproteins. Users can compare potential coordination geometries to those found in the examined structure. The predicted coordination geometries and the observed metal interaction distances can be interactively compared to statistics calculated based on the PDB.

A tool that checks the clinical metadata quality (validity, completeness), the integrity between images and clinical metadata provided as well as their accuracy, the de-identification protocol applied, and existence of annotation together with the consistency between the images and the annotation files and informs the user on corrective actions prior to data upload.

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]