Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

3,202 of 5,923 resources

Showing 1,0511,100

A C++ library for parsing and manipulating VCF files.

VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst).

Tools for adding mutations to existing `.bam` files, used for testing mutation callers.

**Comes with samtools!** - Reads simulator.

Pythonic access to the UCSC Genome database.

A port of [pyVCF](https://github.com/jamescasbon/PyVCF) using Cython for speed.

Python library for blazing-fast genomic interval operations and genomic file formats I/O on Polars DataFrames

Python wrapper for [bedtools](https://github.com/arq5x/bedtools).

Pythonic access to FASTA files.

Python wrapper for [samtools](https://github.com/samtools/samtools).

Scalable toolkit for analyzing single-cell gene expression data, including preprocessing, visualization, clustering, and trajectory inference.

SKESA is a de-novo sequence read assembler for microbial genomes. It uses conservative heuristics and is designed to create breaks at repeat regions in the genome. This leads to excellent sequence quality without significantly compromising contiguity.

Minimap2 is an pairwise aligner for genomic and spliced nucleotide sequences. It can perform the assembly-to-assembly alignment, and works with gzip'd FASTQ, FASTA formats. It also finds overlaps between long-reads.

Bakta is a tool for the rapid & standardized annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readable JSON & bioinformatics standard file formats for automatic downstream analysis.

De novo assembler for single molecule sequencing reads using repeat graphs.

Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF.

Horizon chart D3-based JavaScript library for DNA data.

Java-based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats.

JavaScript genome browser that is highly customizable via plugins and track customizations.

[@crazyhottommy](https://github.com/crazyhottommy)'s notes on various steps and considerations when doing RNA-seq analysis.

Universal molecular toolkit that can be used for molecular fingerprinting, substructure search, and molecular visualization written in C++ package, with Java, C#, and Python wrappers.

- Molecular Manipulation Made Easy. A light wrapper build on top of RDKit.

Molecule validation and standardization based on [RDKit](http://www.rdkit.org/).

A script to run structural alerts using the RDKit and ChEMBL

Chemical 2D structure editor application/applet based on the [Chemistry Development Kit](https://sourceforge.net/projects/cdk/).

Simple RDKit molecule editor GUI using PySide.

Molecular descriptor calculator based on [RDKit](http://www.rdkit.org/).

Descriptor computation(chemistry) and (optional) storage for machine learning.

Vector representations of molecular substructures.

Molecular property prediction with unified API for diverse models and respresentations,

A Library for Deep Learning in Biology and Chemistry.

A python package for optimizing chemical reactions using machine learning (contains 10 algorithms + several benchmarks).

Chemical Information from the Web.

Open source web framework for small molecule analysis based on Django.

Cheminformatic extension for the SQLAlchemy database.

Analysis of molecular dynamics trajectories.

Parsers and algorithms for computational chemistry logfiles.

an automated workflow for the generation and storage of DFT calculations for organic molecules.

Wrapper for RDKit's RunReactants to improve stereochemistry handling

Webapp for generating conformers

A list of papers, data sets, and other resources for machine learning for small-molecule drug discovery.

Conversational data analysis using natural language

Secure text-to-visualization through standardized chart specifications

Multi-type data labeling and annotation tool

Multi-agent system with Parser-Planner-Painter architecture converting `paper.pdf` to editable `poster.pptx`, outperforms GPT-4o with 87% fewer tokens

Multimodal LLM for scientific charts and diagrams understanding/generation

Beyond text-to-slides generation with PPTEval multi-dimensional evaluation (EMNLP 2025)

Transform arXiv papers into Beamer slides using LLMs

Convert PDF files into editable slides with three lines of code

First benchmark for automatic video generation from scientific papers (NeurIPS 2025)