Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source(1)
Type
126 of 5,893 resources
Showing 1–50
A compressor of common genomic file formats (BAM, CRAM, FASTQ, VCF etc).
samtools/bcftools are a suite of tools for manipulating NGS data and can be used to call variants.
A Workflow Management System geared towards scientific workflows.
A haplotype-resolved assembler for accurate Hifi reads.
A small language for defining pipeline stages and linking them together to make pipelines.
The modern C++ library for sequence analysis.
Structural variant discovery by integrated paired-end and split-read analysis.
Sequence manipulation toolkit for FASTA/FASTQ files written in Nim.
A quality control tool for high throughput sequence data.
Utilities for working with CSV/Tab-delimited files.
Suite of tools to handle gene annotations in any GTF/GFF format.
Cython + HTSlib == fast VCF parsing; even faster parsing than pyVCF.
SPAdes (St. Petersburg genome assembler) is an assembly toolkit containing various assembly pipelines and the de-facto standard for prokaryotic genome assemblies.
fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
Access to Biological Web Services from Python.
GFF and GTF file manipulation and interconversion.
Deep learning-based variant caller
Genetic variant annotation and effect prediction toolbox.
A single molecule sequence assembler for genomes large and small.
FASTQ and SAM quality control using Python.
BWA-MEM drop-in replacement: 2-3x faster, 2-5x cheaper, 100% identical output on standard CPUs.
lumpy: a general probabilistic framework for structural variant discovery.
A polymorphic bayesian genotyping model with wide applicability.
a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.
Prokka: rapid prokaryotic genome annotation. Prokka is one of the most cited annotation command line tools for microbial genome annotations.
Biocaml aims to be a high-performance user-friendly library for Bioinformatics.
Sort genomic files according to a specified order.
SIMD C library for global, semi-global, and local pairwise sequence alignments
GRIDSS: the Genomic Rearrangement IDentification Software Suite.
Collection of tools for working with BAM files.
A Go library and command line utility for engineering organisms.
Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction.
Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output
Resources on ChIP-seq data which include papers, methods, links to software, and analysis.
structural variant calling and genotyping with existing tools, but,smoothly.
A collection of research papers for AI-based protein design.
Solid path for those of you who want to complete a Bioinformatics course on your own time, for free, with courses from the best universities in the World.
file format conversion in Biopython in a convenient way.
Predicts whether an amino acid substitution affects protein function.
A fuzzy Bruijn graph approach to long noisy reads assembly
Educational resource on performing RNA-seq analysis in the cloud using Amazon AWS cloud services. Topics include preparing the data, preprocessing, differential expression, isoform discovery, data visualization, and interpretation.
Easily submitting PBS jobs with script template. Multiple input files supported.
Syntax Highlighting for Computational Biology file formats (SAM, VCF, GTF, FASTA, PDB, etc...) in vim/less/gedit/sublime.
Point and click, cross platform suite for analysing and visualizing next-generation sequencing datasets.