Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

17 of 5,893 resources

Ultra-fast, sensitive search and clustering suite for protein and nucleotide sequence sets.

Active2.1K5 days ago
C
MIT

This repository contains GGUF files for gemma4-12b-bioinfo, a fine-tuned Gemma 4 12B model for bioinformatics and computational biology.

Active6596 days ago
C

A compressor of common genomic file formats (BAM, CRAM, FASTQ, VCF etc).

Active1831 week ago
C
NOASSERTION

samtools/bcftools are a suite of tools for manipulating NGS data and can be used to call variants.

Active8711 week ago
C
NOASSERTION

Fast and accurate protein structure search using a learned 3Di structural alphabet (VQ-VAE) that discretizes tertiary interactions into structural tokens, enabling protein-universe-scale structural alignment at sequence-search speeds (4-5 orders of magnitude faster than DALI/TM-align) and underpinning many AI4S tools such as SaProt, ESMAtlas search, and AFDB clustering pipelines (Steinegger Lab, Nature Biotechnology 2023)

Active1.2K1 month ago
C
GPL-3.0

Genome mapping and spliced alignment of cDNA or amino acid sequences

Active1133 months ago
C
GPL-2.0

BWA-MEM drop-in replacement: 2-3x faster, 2-5x cheaper, 100% identical output on standard CPUs.

Active223 months ago
C
MIT

lumpy: a general probabilistic framework for structural variant discovery.

Active3423 months ago
C
MIT

A python extension, written in C, for quick access to bigBed files and access to and creation of bigWig files.

Active2445 months ago
C
MIT

SIMD C library for global, semi-global, and local pairwise sequence alignments

Idle2849 months ago
C
NOASSERTION

Minigraph is a sequence-to-graph mapper and graph constructor. For graph generation, it aligns a query sequence against a sequence graph and incrementally augments an existing graph with long query subsequences diverged from the graph.

Idle48110 months ago
C
MIT

Burrow-Wheeler Aligner for pairwise alignment between DNA sequences.

Idle1.7K1 year ago
C
GPL-3.0

A Swiss Army knife for genome arithmetic.

Idle1K1 year ago
C
MIT

A database system designed to store, organize, and manage large-scale nucleotide sequencing read data (like PacBio reads) for the Dazzler genome assembler

Idle361 year ago
C
Other

A fuzzy Bruijn graph approach to long noisy reads assembly

Stale5302 years ago
C
GPL-3.0

VerityMap is a tool for mapping long reads to assemblies of extra-long tandem repeats, producing SAM files and identifying potential heterozygous sites and assembly errors through analysis of rare k-mers. It supports PacBio HiFi and ONT reads and generates interactive HTML plots for variant analysis.

Stale393 years ago
C
GPL-3.0

FASTQ/A short-reads pre-processing tools: Demultiplexing, trimming, clipping, quality filtering, and masking utilities.

Stale2024 years ago
C
NOASSERTION

Table file index.

Archived924 years ago
C

Finds SNP sites from a multi-FASTA alignment file.

adapter trimmer for Oxford Nanopore reads

Comprehensive set of programs for phylogenetic analyses; available for PC and Mac; source code available for easy compiling in UNIX.