Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active839
Idle417
Stale346
Archived14
(None)4345

Domain

Software422
ImmunoOncology251
Microarray138
Infrastructure123
GeneExpression117
Sequencing85
SingleCell72
Protein & Drug Discovery66
text-generation65
Visualization61
Genetics52
Annotation51
(None)2338

Language

R2426
Python510
Jupyter Notebook57
HTML31
C24
Makefile20
C++19
JavaScript17
Java12
Shell11
TypeScript7
Web Ontology Language7
(None)2765

License

GPL-3.0624
MIT577
Artistic-2.0555
CC-BY-4.0270
GPL-2.0251
GPL-2.0+245
Apache-2.0126
CC0-1.0120
GPL-3.0+100
NOASSERTION94
CC-BY-3.083
Other61
(None)2404

Source

bioregistry2419
bioconductor2418
github1300
awesome-ai-for-science422
huggingface322
bio.tools131
awesome-bioinformatics126
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18

Type

Software tool3220
Database2419
AI model322

Filters

Health

Active839
Idle417
Stale346
Archived14
(None)4345

Domain

Software422
ImmunoOncology251
Microarray138
Infrastructure123
GeneExpression117
Sequencing85
SingleCell72
Protein & Drug Discovery66
text-generation65
Visualization61
Genetics52
Annotation51
(None)2338

Language

R2426
Python510
Jupyter Notebook57
HTML31
C24
Makefile20
C++19
JavaScript17
Java12
Shell11
TypeScript7
Web Ontology Language7
(None)2765

License

GPL-3.0624
MIT577
Artistic-2.0555
CC-BY-4.0270
GPL-2.0251
GPL-2.0+245
Apache-2.0126
CC0-1.0120
GPL-3.0+100
NOASSERTION94
CC-BY-3.083
Other61
(None)2404

Source

bioregistry2419
bioconductor2418
github1300
awesome-ai-for-science422
huggingface322
bio.tools131
awesome-bioinformatics126
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18

Type

Software tool3220
Database2419
AI model322

5,961 resources indexed

Showing 3,851–3,900

Zazuko Prefix Server

This service fills a gap between services like prefix.cc and LOV or looking up the original vocabulary specification. Not all vocabularies (or schema or ontology, whatever you want to call them) provide an HTML view. If you resolve some of the common prefixes all you get back is some RDF serialization which is not ideal. (from <https://prefix.zazuko.com/about>)

Standard-Thesaurus Wirtschaft

Der Thesaurus enthält Vokabular zu allen ökonomischen Themenstellungen: über 6.000 Schlagwörter und über 28.000 zusätzliche Synonyme als Zugangshilfe, um individuelle Formulierungen bei der Suche aufzufangen. Auch Fachbegriffe aus benachbarten Bereichen wie z.B. Recht, Soziologie oder Politik sowie Geobegriffe sind dort zu finden. Wenn Sie Schlagwörter aus diesem Vokabular auswählen, können Sie sicher sein Treffer zu erhalten, die zu dem gewünschten Sachgebiet passen. [from homepage]

zea

zeco

Zenodo

Zenodo is an open repository that allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts.

zfa

Zebrafish Information Network Gene

zfs

ZINC is not Commercial

zobodat.person

zobodat.taxon

zp

RBPBench

RBPBench is a multi-function tool to evaluate CLIP-seq and other related genomic region data using a comprehensive collection of known RNA-binding protein (RBP) binding motifs. RBPBench can be used for a variety of purposes, from RBP motif search (database or user-supplied RBP motifs) in genomic regions, over motif enrichment and co-occurrence analysis, in-depth comparisons over multiple datasets via sequence and genomic annotation statistics, to benchmarking CLIP-seq peak caller methods as well as comparisons across cell types and CLIP-seq protocols. RBPBench supports both sequence and structure motifs, as well as regular expressions (sequence and structure patterns). Moreover, users can easily provide their own motif collections.

HyPPI

Protein interactions

HyPPI classifies a protein-protein complex based on its interaction type into permanent, transient, or crystal artifact. Permanent protein-protein complexes are only stable in their complexed state. Their subunits would denature upon dissociation of the protein-protein complex. Transient protein-protein complexes are stable in the complexed as well as in the monomeric form, depending on the necessary function of the complex. Crystal artifacts have no biological function and are artificially formed during the crystallization process. The discrimination is performed using two characteristics of the protein-protein complex, the hydrophobicity of the interface (ΔGhydrophobic) and the quotient of interface area ratios (IF-quotient). The IF-quotient considers whether the protein-protein interface is symmetric.

JAMDA

Molecular modelling

JAMDA enables the preparation of individual protein structures and the docking of small molecules in preprocessed binding sites of choice. JAMDA simplifies the process of protein-ligand docking by automatic preprocessing protocols for the protein and binding sites of interest. The JAMDAscore scoring function retrieved 75% of the native poses in the three highest-ranked solutions for high-quality protein-ligand complexes with default settings. Individual configurations for protein preparation are available, e.g., considering protein ensembles, relevant binding site water molecules, or cofactors. A user-defined number of input conformations for the ligands of interest can be generated fully automated using Conformator. Alternatively, users can also provide externally prepared ligand conformers.

DoGSite3

Protein binding sites

DoGSite3 was developed for predicting robust and reliable small molecule binding sites and computing their geometrical and chemical descriptors. It is based on the grid-based DoGSite algorithm for predicting pockets and their sub-pockets. The new tool is largely rotation- and translation-invariant due to a normalization procedure before binding site prediction. Known ligands in the structure can be used to bias the grid by sufficiently buried ligand fragments. The output encompasses novel chemical binding site descriptors considering solvent accessibility. Compared to its predecessor, it shows increased robustness through comprehensive parameter optimization. DoGSite3 runs finish within seconds.

DoGSiteScorer

Protein binding sites

DoGSiteScorer is a grid-based automated pocket detection and analysis tool. It applies a Difference of Gaussian filter to detect potential binding pockets and splits them into sub-pockets. The method solely uses the 3D structure of the protein. Global properties, describing the size, shape, and chemical features of the predicted (sub-)pockets, are calculated. Per default, a simple druggability score based on a linear combination of the three descriptors describing volume, hydrophobicity, and enclosure is provided for each (sub-)pocket. Furthermore, a subset of meaningful descriptors is incorporated in a support vector machine (libsvm) to predict the (sub-)pocket druggability score (values are between zero and one). The higher the score, the more druggable the pocket is estimated to be.

QP-Insights Uploader

Medical imaging

This desktop application enables users to upload DICOM data along with associated clinical information to QP-Insights—the data management platform of the UPV Reference Node within EUCAIM.

CC-BY-NC-ND-4.0

DICOM-SEG Annotation

Data quality management

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

PoseEdit

Protein interactions

PoseEdit automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction model that relies on atom types and simple geometric criteria. The structure mining tool GeoMine also uses this model to describe binding sites. In addition, users can manipulate the diagrams by translating, rotating, mirroring parts of the structure, adding additional interactions, or removing them. Furthermore, users can add individual labels or adjust available labels. Users can download the final 2D diagrams for a binding site of interest in JSON or SVG format.

METALizer

Protein interactions

METALizer predicts the coordination geometry of metal ions in metalloproteins. Users can compare potential coordination geometries to those found in the examined structure. The predicted coordination geometries and the observed metal interaction distances can be interactively compared to statistics calculated based on the PDB.

Data Integration Quality Check Tool (DIQCT)

Data quality management

A tool that checks the clinical metadata quality (validity, completeness), the integrity between images and clinical metadata provided as well as their accuracy, the de-identification protocol applied, and existence of annotation together with the consistency between the images and the annotation files and informs the user on corrective actions prior to data upload.

Image Duplicates Checker

Data quality management

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]

NanoporeDB

Computational biology

NanoporeDB is an open-access structural database dedicated to the exploration, analysis, and design of protein nanopores, which serve as essential molecular gateways in biological membranes and form the basis of many advanced biosensing and sequencing technologies. This platform integrates large-scale structure-guided mining and deep learning-based modeling using AlphaFold-Multimer and AlphaFold3 to provide about 7,000 high-confidence multimeric nanopore structures. Each entry includes detailed information on membrane embedding, pore geometry annotation, and constriction profiling to support functional and biophysical interpretation. Through an interactive 3D visualization interface and quantitative parameters such as tilt angle, insertion depth, and pore geometry, NanoporeDB enables researchers to explore nanopore diversity, discover novel scaffolds, and accelerate innovation in molecular sensing, precision diagnostics, and synthetic biology.

generate_count_matrix

Transcriptomics

Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.

s3segmenter

S3segmenter is a Matlab-based set of functions that generates single cell (nuclei and cytoplasm) label masks.

CompuCell3D

Systems biology

CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python.

miniconda

Software management

Miniconda is a minimal Python distribution that includes the Conda package and environment manager plus only essential dependencies. It provides a lightweight way to create isolated environments and install Python packages as needed, without the large preinstalled package set of Anaconda.

FigCanvas

FigCanvas is an AI scientific figure generator for life-science researchers. It produces publication-ready biological diagrams (mechanism diagrams, pathway figures, cell biology visuals), CONSORT and methodology flowcharts, and data visualizations such as volcano plots from text prompts or uploaded datasets. The tool turns methods-section text or structured data into editable vector figures suitable for manuscripts, posters, and slides, helping researchers iterate on figures without rebuilding them in Illustrator.

AlphaFind v2

AlphaFind v2 is a tool for fast, structure‑based search for protein structures against the AlphaFold DB (https://alphafold.ebi.ac.uk/) and TED DB (https://ted.cathdb.info/). The tool uses protein‑level embeddings to provide a rapid pre‑filter, with top candidates undergoing TM-Score, RMSD and residue‑level alignments computations. Four complementary search modes are available: (i) whole‑chain search, (ii) pLDDT‑aware search that restricts similarity to high‑confidence regions, (iii) domain search against the TED database, and (iv) multidomain search that combines several chain‑level matches into a single score. Users can restrict queries to a given organism, CATH superfamily or to proteins with experimental structures, and submit queries by UniProt/AlphaFold identifier. Results comprise a ranked list with similarity metrics, rich metadata and an interactive 3‑D superposition view. The service is freely accessible at https://alphafind.ics.muni.cz/.

Circlator

Sequence assembly

Circlator is a tool to circularize genome assemblies. It will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.

PoseView

Protein interactions

PoseView automatically generates 2D diagrams of protein-ligand complexes, focusing on the interactions between protein and ligand. Interactions between molecules are estimated by an underlying interaction mode that relies on atom types and simple geometric criteria. It adheres to the conventions of chemical structure diagram generation. The quality of the resulting diagrams is comparable to manually drawn examples from books and scientific publications.

EdinOmics Dash App

An interactive platform that performs statistical analyses on metabolomics datasets and allows visualising results with ease. The interface gives users autonomy in creating figures suited to their reporting and publication needs.

MicroMiner

Protein structure analysis

MicroMiner assists in identifying single-residue substitutions in protein structure databases. It searches protein residue environments with local sequence and structural similarity based on the SIENA methodology. Users can search for structural mutation in the entire PDB, their in-house structure collection, or (subsets of) the AlphaFold Database. They can use the method to explore the mutation landscape of proteins with experimental or predicted structures. MicroMiner can be applied to single domains or even protein-protein or protein-ligand interfaces. Several filter options to simplify downstream analysis are available.

SIENA

Protein binding sites

SIENA is a software pipeline enabling the fully automated construction of protein structure ensembles from the PDB. Starting with a single query structure, all binding sites with high sequence similarity are extracted from the PDB, aligned, and superimposed. SIENA also handles complicated cases, such as comparing binding sites at protein domain interfaces or within multimeric proteins.

GeoMine

Protein interactions

GeoMine enables the automated mining of protein-ligand binding sites. Based on individually designed queries, users can search for spatial interaction patterns in huge collections of protein-ligand complexes and binding pockets. The regularly updated GeoMine database relies on the free database systems SQLite and PostgreSQL. It supports radius-based pockets (based on ligands and predicted pockets (based on DoGSite3) for query generation. The query management is based on XML (for the REST service) or JSON in the GUI mode. Its output consists of the query-based superpositions of the matched binding sites and statistics on matching points, distances, and angles.

WarPP

Protein properties

WarPP predicts the position and orientation of water molecules in small-molecule binding sites. It places and scores water molecules in binding sites of crystallographic structures based on EDIAscorer results and interaction geometries as known from experimentally solved protein structures. WarPP was validated on a high-quality set of 1,500 protein-ligand complexes, containing 20,000 crystallographically observed water molecules. It is sufficiently fast for high-throughput analyses. It correctly places water molecules in approx. 80% of the cases. Users can export the predictions as PDB files for, e.g., molecular docking with JAMDA.

Protoss

Protein interactions

Protoss is a fully automated hydrogen atom placement tool for protein-ligand complexes. It adds missing hydrogen atoms to protein structures and detects reasonable protonation states, tautomeric states, and hydrogen coordinates of both protein and ligand molecules by optimizing the hydrogen bond network.

PrimerPickr

Primerpickr is an open-source tool for rt-PCr primer picking powered by the aggregation of public usage of rt-pcr primers from open source papers. The database is validated with 154 genes and contains over 31,000 genes across 10 species.

CC-BY-NC-ND-4.0

EDIAscorer

Structure analysis

The electron density score for individual atoms (EDIA) quantifies the electron density fit of each atom in a crystallographically resolved structure. Multiple EDIA values can be combined using the power mean to compute the EDIAm, i.e., the electron density score for a group of several atoms. It enables users to score a set of atoms, such as a ligand, a residue, or an active site.

StructureProfiler

Structure analysis

Three-dimensional protein structures play a vital role in drug design. Structure-based design necessitates an in-depth examination of the available quality data before using the structure in computational experiments and for method evaluation. StructureProfiler assists in automatically profiling sets of protein-ligand complex structures based on multiple quality indicators, ranging from model characteristics, e.g., the R factor, and active site features, e.g., bond length deviations, to ligand properties such as electron density support and the validity of torsion angles.

LifeSoaks

X-ray diffraction

LifeSoaks was designed to find solvent channels in macromolecular structures solved by X-ray crystallography. It predicts their accessibility by molecules through an automated annotation of so-called bottleneck radii. It simplifies the process of manually checking a crystal structure for solvent channels. Bottleneck radii can be calculated for solvent channels and small molecule binding sites. The tool is ideally suited for channel analyses before the actual soaking experiments to select the most promising experimental conditions and crystal forms. LifeSoaks runs fully automated and will finish within seconds to minutes for moderately sized crystals.

plantiSMASH

Transcription factors and regulatory sites

PlantiSMASH is a specialized extension of antiSMASH for the identification and analysis of biosynthetic gene clusters (BGCs) in plant genomes. It supports advanced plant-specific detection rules and features for comparative genomics, visualization, and more.

DigestedProteinDB

DigestedProteinDB provides a scalable computational infrastructure for indexing and querying peptide cleavage data. Designed for seamless integration into high-throughput mass spectrometry pipelines, it enables low-latency searches and advanced filtering of digested protein datasets to accelerate experimental spectra cross-referencing.

Open Neuroscience Graph

The Open Neuroscience Graph (openneuroscience.org) is an open-access, curated knowledge graph that maps the open science ecosystem in neuroscience as a browsable digital garden. Built from an Obsidian vault and published as a static website using Quartz, the project replaces traditional linear presentation with a networked structure of interlinked Markdown notes. Bidirectional links, full-text search, and an integrated graph visualization allow users to navigate thematic relationships dynamically rather than sequentially. The complete source material is openly available to sustain, replicate and extend the resource, includding all Markdown content, media attachments, Quartz configuration files, and site customizations. Researchers, educators, and open-science practitioners may explore the site directly, download the vault for offline use in Obsidian, or fork the material to build new, derivative knowledge bases. PID=https://doi.org/10.5281/zenodo.20181900

phantasus

Gene expression

It is a web-application for visual and interactive gene expression analysis. Phantasus is based on Morpheus – a web-based software for heatmap visualisation and analysis, which was integrated with an R environment via OpenCPU API. Aside from basic visualization and filtering methods, R-based methods such as k-means clustering, principal component analysis or differential expression analysis with limma package are supported.

Bioperl

International association of users & developers of open source Perl tools for bioinformatics, genomics and life sciences.

Rust-Bio

Rust implementations of algorithms and data structures useful for bioinformatics.

Biojava

Java framework for processing biological data.

SRA-Explorer

Easily get SRA download links and other information.

1
76
77
78
79
80
120

Submit a resource bio.tools Awesome Bioinformatics