Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source
Type(1)
3,202 of 5,923 resources
Showing 1,001–1,050
AlphaFind v2 is a tool for fast, structure‑based search for protein structures against the AlphaFold DB (https://alphafold.ebi.ac.uk/) and TED DB (https://ted.cathdb.info/). The tool uses protein‑level embeddings to provide a rapid pre‑filter, with top candidates undergoing TM-Score, RMSD and residue‑level alignments computations. Four complementary search modes are available: (i) whole‑chain search, (ii) pLDDT‑aware search that restricts similarity to high‑confidence regions, (iii) domain search against the TED database, and (iv) multidomain search that combines several chain‑level matches into a single score. Users can restrict queries to a given organism, CATH superfamily or to proteins with experimental structures, and submit queries by UniProt/AlphaFold identifier. Results comprise a ranked list with similarity metrics, rich metadata and an interactive 3‑D superposition view. The service is freely accessible at https://alphafind.ics.muni.cz/.
Circlator is a tool to circularize genome assemblies. It will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.
PoseView automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction mode that relies on atom types and simple geometric criteria. It adheres to the conventions of chemical structure diagram generation. The quality of the resulting diagrams is comparable to manually drawn examples from books and scientific publications.
An interactive platform that performs statistical analyses on metabolomics datasets and allows visualising results with ease. The interface gives users autonomy in creating figures suited to their reporting and publication needs.
MicroMiner assists in identifying single-residue substitutions in protein structure databases. It searches protein residue environments with local sequence and structural similarity based on the SIENA methodology. Users can search for structural mutation in the entire PDB, their in-house structure collection, or (subsets of) the AlphaFold Database. They can use the method to explore the mutation landscape of proteins with experimental or predicted structures. MicroMiner can be applied to single domains or even protein-protein or protein-ligand interfaces. Several filter options to simplify downstream analysis are available.
SIENA is a software pipeline enabling the fully automated construction of protein structure ensembles from the PDB. Starting with a single query structure, all binding sites with high sequence similarity are extracted from the PDB, aligned, and superimposed. SIENA also handles complicated cases, such as comparing binding sites at protein domain interfaces or within multimeric proteins.
GeoMine enables the automated mining of protein-ligand binding sites. Based on individually designed queries, users can search for spatial interaction patterns in huge collections of protein-ligand complexes and binding pockets. The regularly updated GeoMine database relies on the free database systems SQLite and PostgreSQL. It supports radius-based pockets (based on ligands and predicted pockets (based on DoGSite3) for query generation. The query management is based on XML (for the REST service) or JSON in the GUI mode. Its output consists of the query-based superpositions of the matched binding sites and statistics on matching points, distances, and angles.
WarPP predicts the position and orientation of water molecules in small-molecule binding sites. It places and scores water molecules in binding sites of crystallographic structures based on EDIAscorer results and interaction geometries as known from experimentally solved protein structures. WarPP was validated on a high-quality set of 1,500 protein-ligand complexes, containing 20,000 crystallographically observed water molecules. It is sufficiently fast for high-throughput analyses. It correctly places water molecules in approx. 80% of the cases. Users can export the predictions as PDB files for, e.g., molecular docking with JAMDA.
Protoss is a fully automated hydrogen atom placement tool for protein-ligand complexes. It adds missing hydrogen atoms to protein structures and detects reasonable protonation states, tautomeric states, and hydrogen coordinates of both protein and ligand molecules by optimizing the hydrogen bond network.
Primerpickr is an open-source tool for rt-PCr primer picking powered by the aggregation of public usage of rt-pcr primers from open source papers. The database is validated with 154 genes and contains over 31,000 genes across 10 species.
The electron density score for individual atoms (EDIA) quantifies the electron density fit of each atom in a crystallographically resolved structure. Multiple EDIA values can be combined using the power mean to compute the EDIAm, i.e., the electron density score for a group of several atoms. It enables users to score a set of atoms, such as a ligand, a residue, or an active site.
Three-dimensional protein structures play a vital role in drug design. Structure-based design necessitates an in-depth examination of the available quality data before using the structure in computational experiments and for method evaluation. StructureProfiler assists in automatically profiling sets of protein-ligand complex structures based on multiple quality indicators, ranging from model characteristics, e.g., the R factor, and active site features, e.g., bond length deviations, to ligand properties such as electron density support and the validity of torsion angles.
LifeSoaks was designed to find solvent channels in macromolecular structures solved by X-ray crystallography. It predicts their accessibility by molecules through an automated annotation of so-called bottleneck radii. It simplifies the process of manually checking a crystal structure for solvent channels. Bottleneck radii can be calculated for solvent channels and small molecule binding sites. The tool is ideally suited for channel analyses before the actual soaking experiments to select the most promising experimental conditions and crystal forms. LifeSoaks runs fully automated and will finish within seconds to minutes for moderately sized crystals.
PlantiSMASH is a specialized extension of antiSMASH for the identification and analysis of biosynthetic gene clusters (BGCs) in plant genomes. It supports advanced plant-specific detection rules and features for comparative genomics, visualization, and more.
DigestedProteinDB provides a scalable computational infrastructure for indexing and querying peptide cleavage data. Designed for seamless integration into high-throughput mass spectrometry pipelines, it enables low-latency searches and advanced filtering of digested protein datasets to accelerate experimental spectra cross-referencing.
The Open Neuroscience Graph (openneuroscience.org) is an open-access, curated knowledge graph that maps the open science ecosystem in neuroscience as a browsable digital garden. Built from an Obsidian vault and published as a static website using Quartz, the project replaces traditional linear presentation with a networked structure of interlinked Markdown notes. Bidirectional links, full-text search, and an integrated graph visualization allow users to navigate thematic relationships dynamically rather than sequentially. The complete source material is openly available to sustain, replicate and extend the resource, includding all Markdown content, media attachments, Quartz configuration files, and site customizations. Researchers, educators, and open-science practitioners may explore the site directly, download the vault for offline use in Obsidian, or fork the material to build new, derivative knowledge bases. PID=https://doi.org/10.5281/zenodo.20181900
It is a web-application for visual and interactive gene expression analysis. Phantasus is based on Morpheus – a web-based software for heatmap visualisation and analysis, which was integrated with an R environment via OpenCPU API. Aside from basic visualization and filtering methods, R-based methods such as k-means clustering, principal component analysis or differential expression analysis with limma package are supported.
International association of users & developers of open source Perl tools for bioinformatics, genomics and life sciences.
Rust implementations of algorithms and data structures useful for bioinformatics.
Java framework for processing biological data.
Easily get SRA download links and other information.
Modular and universal bioinformatics, Bionode provides pipeable UNIX command line tools and JavaScript APIs for bioinformatics analysis workflows.
A wee tool for random access into BGZF files.
Fast FASTQ filtering by matching reads against one or more regex patterns.
Write-once-read-many table for large datasets.
Create an index on a compressed text file.
A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities.
Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments.
Workflow standard developed by the Broad.
A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes.
A generic but comprehensive bacterial annotation pipeline, built with Nextflow, with nice graphical options for investigating results.
Customizable pipeline for differential expression analysis with an intuitive GUI.
A pipeline for preprocessing short and long sequencing reads, built with Nextflow.
Aggregate results from bioinformatics analyses across many samples into a single report.
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang.
Toolkit for processing sequences in FASTA/Q formats.
Scalable genomic analysis.
Scalable gVCF merging and joint variant calling for population sequencing projects.
An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
the wavefront alignment algorithm (WFA) which expoit sequence similarity to speed up alignment
Partial-Order Alignment for fast alignment and consensus of multiple homologous sequences.
A software package for estimating gene and isoform expression levels from RNA-Seq data.
Bayesian haplotype-based polymorphism discovery and genotyping.
Variant Discovery in High-Throughput Sequencing Data.
Structural variant and indel caller for mapped sequencing data.
Automate common SAM & BAM conversions.
Displaying sequence statistics for next-generation sequencing.
Fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs.
Annotate a VCF with other VCFs/BEDs/tabixed files.