Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

126 of 5,893 resources

Showing 150

Ultra-fast, sensitive search and clustering suite for protein and nucleotide sequence sets.

Active2.1K4 days ago
C
MIT

Freely available tools for biological computing in Python, with included cookbook, packaging and thorough documentation. Part of the [Open Bioinformatics Foundation](http://open-bio.org/). Contains the very useful [Entrez](https://biopython.org/DIST/docs/api/Bio.Entrez-module.html) package for API access to the NCBI databases.

Active5.1K4 days ago
Python
NOASSERTION

A compressor of common genomic file formats (BAM, CRAM, FASTQ, VCF etc).

Active1831 week ago
C
NOASSERTION

samtools/bcftools are a suite of tools for manipulating NGS data and can be used to call variants.

Active8711 week ago
C
NOASSERTION

A Workflow Management System geared towards scientific workflows.

Active1.1K1 week ago
Scala
BSD-3-Clause

A haplotype-resolved assembler for accurate Hifi reads.

Active7791 week ago
C++
MIT

A small language for defining pipeline stages and linking them together to make pipelines.

Active2422 weeks ago
Groovy
NOASSERTION

The modern C++ library for sequence analysis.

Active4542 weeks ago
C++
NOASSERTION

Structural variant discovery by integrated paired-end and split-read analysis.

Active5212 weeks ago
C++
BSD-3-Clause

Sequence manipulation toolkit for FASTA/FASTQ files written in Nim.

Active1273 weeks ago
Nim
GPL-3.0

A quality control tool for high throughput sequence data.

Active6013 weeks ago
Java
GPL-3.0

Utilities for working with CSV/Tab-delimited files.

Active6.4K3 weeks ago
Python
MIT

Suite of tools to handle gene annotations in any GTF/GFF format.

Active5713 weeks ago
HTML
GPL-3.0

Cython + HTSlib == fast VCF parsing; even faster parsing than pyVCF.

Active4434 weeks ago
Cython
MIT

Pythonic Access to the Ensembl database.

Active4001 month ago
Python
Apache-2.0

SPAdes (St. Petersburg genome assembler) is an assembly toolkit containing various assembly pipelines and the de-facto standard for prokaryotic genome assemblies.

Active9351 month ago
C++
NOASSERTION

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.

Active8561 month ago
Nim
MIT

Another cross-platform, efficient, practical and pretty CSV/TSV toolkit.

Active1.2K1 month ago
Go
MIT

Access to Biological Web Services from Python.

Active3371 month ago
Python
NOASSERTION

A list of pipeline resources.

Active6.6K1 month ago

A python-based workflow manager.

Active5901 month ago
Python
Apache-2.0

GFF and GTF file manipulation and interconversion.

Active3192 months ago
Python
MIT

Deep learning-based variant caller

Active3.7K2 months ago
Python
BSD-3-Clause

Genetic variant annotation and effect prediction toolbox.

Active3083 months ago
Java
NOASSERTION

A single molecule sequence assembler for genomes large and small.

Active7003 months ago
C++

FASTQ and SAM quality control using Python.

Active1093 months ago
Python
MIT

BWA-MEM drop-in replacement: 2-3x faster, 2-5x cheaper, 100% identical output on standard CPUs.

Active223 months ago
C
MIT

lumpy: a general probabilistic framework for structural variant discovery.

Active3423 months ago
C
MIT

A polymorphic bayesian genotyping model with wide applicability.

Active3233 months ago
C++
MIT

a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.

Active1.5K5 months ago
Common Workflow Language
Apache-2.0

Prokka: rapid prokaryotic genome annotation. Prokka is one of the most cited annotation command line tools for microbial genome annotations.

Active9825 months ago
Perl
GPL-3.0

Biocaml aims to be a high-performance user-friendly library for Bioinformatics.

Idle1256 months ago
OCaml
NOASSERTION

Sort genomic files according to a specified order.

Idle367 months ago
Go
MIT

SIMD C library for global, semi-global, and local pairwise sequence alignments

Idle2849 months ago
C
NOASSERTION

A circos representation of multiple GWAS results.

Idle971 year ago
R
GPL-3.0

GRIDSS: the Genomic Rearrangement IDentification Software Suite.

Idle2831 year ago
Java
NOASSERTION

Collection of tools for working with BAM files.

Idle4301 year ago
C++
MIT

Burrow-Wheeler Aligner for pairwise alignment between DNA sequences.

Idle1.7K1 year ago
C
GPL-3.0

A Swiss Army knife for genome arithmetic.

Idle1K1 year ago
C
MIT

A system for rapidly aligning entire genomes, whether in complete or draft form.

Idle5611 year ago
C++
Artistic-2.0

A Go library and command line utility for engineering organisms.

Idle7291 year ago
Go
MIT

Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction.

Idle1K1 year ago
Python
MIT

Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output

Idle1.1K1 year ago
Go
MIT

Resources on ChIP-seq data which include papers, methods, links to software, and analysis.

Idle8501 year ago
Python
MIT

UNIX-style FASTA manipulation tools.

Idle171 year ago
Python
MIT

structural variant calling and genotyping with existing tools, but,smoothly.

Idle2641 year ago
Go
Apache-2.0

A collection of research papers for AI-based protein design.

Stale3062 years ago
Apache-2.0

Solid path for those of you who want to complete a Bioinformatics course on your own time, for free, with courses from the best universities in the World.

Archived6.9K2 years ago

file format conversion in Biopython in a convenient way.

Stale1182 years ago
Python
GPL-3.0

Predicts whether an amino acid substitution affects protein function.

Stale5482 years ago
MIT