Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

15 of 5,923 resources

A haplotype-resolved assembler for accurate Hifi reads.

Active7792 weeks ago
C++
MIT

A Flexible Model For Record Linkage

Active12 weeks ago
C++
GPL-3.0-or-later

The modern C++ library for sequence analysis.

Active4542 weeks ago
C++
NOASSERTION

Structural variant discovery by integrated paired-end and split-read analysis.

Active5212 weeks ago
C++
BSD-3-Clause

A collection of object-oriented software tools for problems involving chemical kinetics, thermodynamics, and transport processes.

Active8063 weeks ago
C++
NOASSERTION

A small <720Kb C++ windows utility. That allows you to load Ancestry, 23andMe, FTDNA, or Genes for Good RAW DNA files search them, merge them. covert them to Ancestry format. But also create files from peer reviewed publications to compare with you loaded data to give your genetic disposition for the condition you have entered the data for an statistical risk if OR values are included. Included with the program are example files for Type 2 Diabetes risk factors. (As I have type 2 Diabetes so I could test the results).

Active04 weeks ago
C++
GPL-3.0

SPAdes (St. Petersburg genome assembler) is an assembly toolkit containing various assembly pipelines and the de-facto standard for prokaryotic genome assemblies.

Active9351 month ago
C++
NOASSERTION

Descriptor library containing a variety of fingerprinting techniques, including the Smooth Overlap of Atomic Positions (SOAP).

Active4661 month ago
C++
Apache-2.0

Deep learning framework for molecular docking extending AutoDock Vina with convolutional neural network scoring functions, achieving superior virtual screening enrichment and pose prediction across diverse target classes; widely adopted in pharmaceutical structure-based drug design (J. Cheminformatics, 915+ stars, actively maintained)

Active9363 months ago
C++
Apache-2.0

A single molecule sequence assembler for genomes large and small.

Active7003 months ago
C++

A polymorphic bayesian genotyping model with wide applicability.

Active3234 months ago
C++
MIT

Collection of tools for working with BAM files.

Idle4301 year ago
C++
MIT

A system for rapidly aligning entire genomes, whether in complete or draft form.

Idle5611 year ago
C++
Artistic-2.0

distinct is a statistical method to perform differential testing between two or more groups of distributions; differential testing is performed via hierarchical non-parametric permutation tests on the cumulative distribution functions (cdfs) of each sample. While most methods for differential expression target differences in the mean abundance between conditions, distinct, by comparing full cdfs, identifies, both, differential patterns involving changes in the mean, as well as more subtle variations that do not involve the mean (e.g., unimodal vs. bi-modal distributions with the same mean). distinct is a general and flexible tool: due to its fully non-parametric nature, which makes no assumptions on how the data was generated, it can be applied to a variety of datasets. It is particularly suitable to perform differential state analyses on single cell data (i.e., differential analyses within sub-populations of cells), such as single cell RNA sequencing (scRNA-seq) and high-dimensional flow or mass cytometry (HDCyto) data. To use distinct one needs data from two or more groups of samples (i.e., experimental conditions), with at least 2 samples (i.e., biological replicates) per group.

Stale132 years ago
C++

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

Stale3226 years ago
C++
BSL-1.0

Telseq is a tool for estimating telomere length from whole genome sequence data.

Stale767 years ago
C++
GPL-3.0

maeparser is a parser for Schrodinger Maestro files.