Find open-source science resources

AlphaFind v2 is a tool for fast, structure‑based search for protein structures against the AlphaFold DB (https://alphafold.ebi.ac.uk/) and TED DB (https://ted.cathdb.info/). The tool uses protein‑level embeddings to provide a rapid pre‑filter, with top candidates undergoing TM-Score, RMSD and residue‑level alignments computations. Four complementary search modes are available: (i) whole‑chain search, (ii) pLDDT‑aware search that restricts similarity to high‑confidence regions, (iii) domain search against the TED database, and (iv) multidomain search that combines several chain‑level matches into a single score. Users can restrict queries to a given organism, CATH superfamily or to proteins with experimental structures, and submit queries by UniProt/AlphaFold identifier. Results comprise a ranked list with similarity metrics, rich metadata and an interactive 3‑D superposition view. The service is freely accessible at https://alphafind.ics.muni.cz/.

Circlator

Sequence assembly

Circlator is a tool to circularize genome assemblies. It will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.

Python

PoseView

Protein interactions

PoseView automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction mode that relies on atom types and simple geometric criteria. It adheres to the conventions of chemical structure diagram generation. The quality of the resulting diagrams is comparable to manually drawn examples from books and scientific publications.

EdinOmics Dash App

Metabolomics

Protein structure analysis

An interactive platform that performs statistical analyses on metabolomics datasets and allows visualising results with ease. The interface gives users autonomy in creating figures suited to their reporting and publication needs.

Python

CC-BY-4.0

MicroMiner

MicroMiner assists in identifying single-residue substitutions in protein structure databases. It searches protein residue environments with local sequence and structural similarity based on the SIENA methodology. Users can search for structural mutation in the entire PDB, their in-house structure collection, or (subsets of) the AlphaFold Database. They can use the method to explore the mutation landscape of proteins with experimental or predicted structures. MicroMiner can be applied to single domains or even protein-protein or protein-ligand interfaces. Several filter options to simplify downstream analysis are available.

SIENA

Protein binding sites

SIENA is a software pipeline enabling the fully automated construction of protein structure ensembles from the PDB. Starting with a single query structure, all binding sites with high sequence similarity are extracted from the PDB, aligned, and superimposed. SIENA also handles complicated cases, such as comparing binding sites at protein domain interfaces or within multimeric proteins.

GeoMine

Protein interactions

GeoMine enables the automated mining of protein-ligand binding sites. Based on individually designed queries, users can search for spatial interaction patterns in huge collections of protein-ligand complexes and binding pockets. The regularly updated GeoMine database relies on the free database systems SQLite and PostgreSQL. It supports radius-based pockets (based on ligands and predicted pockets (based on DoGSite3) for query generation. The query management is based on XML (for the REST service) or JSON in the GUI mode. Its output consists of the query-based superpositions of the matched binding sites and statistics on matching points, distances, and angles.

WarPP

Protein properties

WarPP predicts the position and orientation of water molecules in small-molecule binding sites. It places and scores water molecules in binding sites of crystallographic structures based on EDIAscorer results and interaction geometries as known from experimentally solved protein structures. WarPP was validated on a high-quality set of 1,500 protein-ligand complexes, containing 20,000 crystallographically observed water molecules. It is sufficiently fast for high-throughput analyses. It correctly places water molecules in approx. 80% of the cases. Users can export the predictions as PDB files for, e.g., molecular docking with JAMDA.

Protoss

Protein interactions

Protoss is a fully automated hydrogen atom placement tool for protein-ligand complexes. It adds missing hydrogen atoms to protein structures and detects reasonable protonation states, tautomeric states, and hydrogen coordinates of both protein and ligand molecules by optimizing the hydrogen bond network.

PrimerPickr

PCR experiment

Primerpickr is an open-source tool for rt-PCr primer picking powered by the aggregation of public usage of rt-pcr primers from open source papers. The database is validated with 154 genes and contains over 31,000 genes across 10 species.

CC-BY-NC-ND-4.0

EDIAscorer

Structure analysis

The electron density score for individual atoms (EDIA) quantifies the electron density fit of each atom in a crystallographically resolved structure. Multiple EDIA values can be combined using the power mean to compute the EDIAm, i.e., the electron density score for a group of several atoms. It enables users to score a set of atoms, such as a ligand, a residue, or an active site.

StructureProfiler

Structure analysis

Three-dimensional protein structures play a vital role in drug design. Structure-based design necessitates an in-depth examination of the available quality data before using the structure in computational experiments and for method evaluation. StructureProfiler assists in automatically profiling sets of protein-ligand complex structures based on multiple quality indicators, ranging from model characteristics, e.g., the R factor, and active site features, e.g., bond length deviations, to ligand properties such as electron density support and the validity of torsion angles.

LifeSoaks

X-ray diffraction

LifeSoaks was designed to find solvent channels in macromolecular structures solved by X-ray crystallography. It predicts their accessibility by molecules through an automated annotation of so-called bottleneck radii. It simplifies the process of manually checking a crystal structure for solvent channels. Bottleneck radii can be calculated for solvent channels and small molecule binding sites. The tool is ideally suited for channel analyses before the actual soaking experiments to select the most promising experimental conditions and crystal forms. LifeSoaks runs fully automated and will finish within seconds to minutes for moderately sized crystals.

Transcription factors and regulatory sites

plantiSMASH

PlantiSMASH is a specialized extension of antiSMASH for the identification and analysis of biosynthetic gene clusters (BGCs) in plant genomes. It supports advanced plant-specific detection rules and features for comparative genomics, visualization, and more.

AGPL-3.0

DigestedProteinDB

Proteomics

DigestedProteinDB provides a scalable computational infrastructure for indexing and querying peptide cleavage data. Designed for seamless integration into high-throughput mass spectrometry pipelines, it enables low-latency searches and advanced filtering of digested protein datasets to accelerate experimental spectra cross-referencing.

Open Neuroscience Graph

Biosciences

The Open Neuroscience Graph (openneuroscience.org) is an open-access, curated knowledge graph that maps the open science ecosystem in neuroscience as a browsable digital garden. Built from an Obsidian vault and published as a static website using Quartz, the project replaces traditional linear presentation with a networked structure of interlinked Markdown notes. Bidirectional links, full-text search, and an integrated graph visualization allow users to navigate thematic relationships dynamically rather than sequentially. The complete source material is openly available to sustain, replicate and extend the resource, includding all Markdown content, media attachments, Quartz configuration files, and site customizations. Researchers, educators, and open-science practitioners may explore the site directly, download the vault for offline use in Obsidian, or fork the material to build new, derivative knowledge bases. PID=https://doi.org/10.5281/zenodo.20181900

JavaScript

CC-BY-4.0

phantasus

Gene expression

It is a web-application for visual and interactive gene expression analysis. Phantasus is based on Morpheus – a web-based software for heatmap visualisation and analysis, which was integrated with an R environment via OpenCPU API. Aside from basic visualization and filtering methods, R-based methods such as k-means clustering, principal component analysis or differential expression analysis with limma package are supported.

MIT

Bioperl

Package suites

International association of users & developers of open source Perl tools for bioinformatics, genomics and life sciences.

Rust-Bio

Package suites

Rust implementations of algorithms and data structures useful for bioinformatics.

Biojava

Package suites

Java framework for processing biological data.

SRA-Explorer

Downloading

Easily get SRA download links and other information.

BioNode

Modular and universal bioinformatics, Bionode provides pipeable UNIX command line tools and JavaScript APIs for bioinformatics analysis workflows.

grabix

A wee tool for random access into BGZF files.

grepq

Fast FASTQ filtering by matching reads against one or more regex patterns.

wormtable

Write-once-read-many table for large datasets.

zindex

Create an index on a compressed text file.

BigDataScript

Workflow Managers

A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities.

SeqWare

Workflow Managers

Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments.

Workflow Descriptor Language

Workflow Managers

Workflow standard developed by the Broad.

Bactopia

A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes.

Bacannot

A generic but comprehensive bacterial annotation pipeline, built with Nextflow, with nice graphical options for investigating results.

R-Peridot

Customizable pipeline for differential expression analysis with an intuitive GUI.

ngs-preprocess

A pipeline for preprocessing short and long sequencing reads, built with Nextflow.

MultiQC

Sequence Processing

Aggregate results from bioinformatics analyses across many samples into a single report.

SeqKit

Sequence Processing

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang.

Seqtk

Sequence Processing

Toolkit for processing sequences in FASTA/Q formats.

Hail

Data Analysis

Scalable genomic analysis.

GLNexus

Data Analysis

Scalable gVCF merging and joint variant calling for population sequencing projects.

Bowtie 2

Pairwise

An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

WFA

Pairwise

the wavefront alignment algorithm (WFA) which expoit sequence similarity to speed up alignment

DIAMOND

Pairwise

Multiple Sequence Alignment

An ultrafast protein aligner for `blastp` and `blastx` like searches.

POA

Partial-Order Alignment for fast alignment and consensus of multiple homologous sequences.

RSEM

Quantification

A software package for estimating gene and isoform expression levels from RNA-Seq data.

freebayes

Variant Calling

Bayesian haplotype-based polymorphism discovery and genotyping.

GATK

Variant Calling

Structural variant callers

Variant Discovery in High-Throughput Sequencing Data.

manta

Structural variant and indel caller for mapped sequencing data.

mergesam

BAM File Utilities

Automate common SAM & BAM conversions.

SAMstat

BAM File Utilities

Displaying sequence statistics for next-generation sequencing.

Somalier

BAM File Utilities

Fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs.

vcfanno

VCF File Utilities