Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source
Type
5,961 resources indexed
Showing 3,901–3,950
Modular and universal bioinformatics, Bionode provides pipeable UNIX command line tools and JavaScript APIs for bioinformatics analysis workflows.
A wee tool for random access into BGZF files.
Write-once-read-many table for large datasets.
A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities.
Workflow standard developed by the Broad.
A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes.
A generic but comprehensive bacterial annotation pipeline, built with Nextflow, with nice graphical options for investigating results.
Customizable pipeline for differential expression analysis with an intuitive GUI.
A pipeline for preprocessing short and long sequencing reads, built with Nextflow.
Aggregate results from bioinformatics analyses across many samples into a single report.
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang.
Toolkit for processing sequences in FASTA/Q formats.
Scalable genomic analysis.
Scalable gVCF merging and joint variant calling for population sequencing projects
the wavefront alignment algorithm (WFA) which expoit sequence similarity to speed up alignment
Bayesian haplotype-based polymorphism discovery and genotyping.
Variant Discovery in High-Throughput Sequencing Data.
Structural variant and indel caller for mapped sequencing data.
Automate common SAM & BAM conversions.
Displaying sequence statistics for next-generation sequencing.
Fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs.
A C++ library for parsing and manipulating VCF files.
VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst).
**Comes with samtools!** - Reads simulator.
Python library for blazing-fast genomic interval operations and genomic file formats I/O on Polars DataFrames
Scalable toolkit for analyzing single-cell gene expression data, including preprocessing, visualization, clustering, and trajectory inference.
SKESA is a de-novo sequence read assembler for microbial genomes. It uses conservative heuristics and is designed to create breaks at repeat regions in the genome. This leads to excellent sequence quality without significantly compromising contiguity.
Minimap2 is an pairwise aligner for genomic and spliced nucleotide sequences. It can perform the assembly-to-assembly alignment, and works with gzip'd FASTQ, FASTA formats. It also finds overlaps between long-reads.
Bakta is a tool for the rapid & standardized annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readable JSON & bioinformatics standard file formats for automatic downstream analysis.
De novo assembler for single molecule sequencing reads using repeat graphs.
Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF.
Horizon chart D3-based JavaScript library for DNA data.
Java-based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats.
JavaScript genome browser that is highly customizable via plugins and track customizations.
[@crazyhottommy](https://github.com/crazyhottommy)'s notes on various steps and considerations when doing RNA-seq analysis.
Universal molecular toolkit that can be used for molecular fingerprinting, substructure search, and molecular visualization written in C++ package, with Java, C#, and Python wrappers.
Molecule validation and standardization based on [RDKit](http://www.rdkit.org/).
A script to run structural alerts using the RDKit and ChEMBL
Chemical 2D structure editor application/applet based on the [Chemistry Development Kit](https://sourceforge.net/projects/cdk/).
Simple RDKit molecule editor GUI using PySide.
Molecular descriptor calculator based on [RDKit](http://www.rdkit.org/).
Descriptor computation(chemistry) and (optional) storage for machine learning.
Vector representations of molecular substructures.
Molecular property prediction with unified API for diverse models and respresentations,
A Library for Deep Learning in Biology and Chemistry.
A python package for optimizing chemical reactions using machine learning (contains 10 algorithms + several benchmarks).
Open source web framework for small molecule analysis based on Django.