Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language(1)
License
Source
Type
15 of 5,923 resources
A haplotype-resolved assembler for accurate Hifi reads.
A Flexible Model For Record Linkage
The modern C++ library for sequence analysis.
Structural variant discovery by integrated paired-end and split-read analysis.
A collection of object-oriented software tools for problems involving chemical kinetics, thermodynamics, and transport processes.
A small <720Kb C++ windows utility. That allows you to load Ancestry, 23andMe, FTDNA, or Genes for Good RAW DNA files search them, merge them. covert them to Ancestry format. But also create files from peer reviewed publications to compare with you loaded data to give your genetic disposition for the condition you have entered the data for an statistical risk if OR values are included. Included with the program are example files for Type 2 Diabetes risk factors. (As I have type 2 Diabetes so I could test the results).
SPAdes (St. Petersburg genome assembler) is an assembly toolkit containing various assembly pipelines and the de-facto standard for prokaryotic genome assemblies.
Descriptor library containing a variety of fingerprinting techniques, including the Smooth Overlap of Atomic Positions (SOAP).
Deep learning framework for molecular docking extending AutoDock Vina with convolutional neural network scoring functions, achieving superior virtual screening enrichment and pose prediction across diverse target classes; widely adopted in pharmaceutical structure-based drug design (J. Cheminformatics, 915+ stars, actively maintained)
A single molecule sequence assembler for genomes large and small.
A polymorphic bayesian genotyping model with wide applicability.
Collection of tools for working with BAM files.
A system for rapidly aligning entire genomes, whether in complete or draft form.
distinct is a statistical method to perform differential testing between two or more groups of distributions; differential testing is performed via hierarchical non-parametric permutation tests on the cumulative distribution functions (cdfs) of each sample. While most methods for differential expression target differences in the mean abundance between conditions, distinct, by comparing full cdfs, identifies, both, differential patterns involving changes in the mean, as well as more subtle variations that do not involve the mean (e.g., unimodal vs. bi-modal distributions with the same mean). distinct is a general and flexible tool: due to its fully non-parametric nature, which makes no assumptions on how the data was generated, it can be applied to a variety of datasets. It is particularly suitable to perform differential state analyses on single cell data (i.e., differential analyses within sub-populations of cells), such as single cell RNA sequencing (scRNA-seq) and high-dimensional flow or mass cytometry (HDCyto) data. To use distinct one needs data from two or more groups of samples (i.e., experimental conditions), with at least 2 samples (i.e., biological replicates) per group.
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
Telseq is a tool for estimating telomere length from whole genome sequence data.