Find open-source science resources
A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.
Filters
Health
Domain
Language
License
Source
Type(1)
3,202 of 5,923 resources
Showing 1,051–1,100
A C++ library for parsing and manipulating VCF files.
VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst).
Tools for adding mutations to existing `.bam` files, used for testing mutation callers.
**Comes with samtools!** - Reads simulator.
Python library for blazing-fast genomic interval operations and genomic file formats I/O on Polars DataFrames
Python wrapper for [bedtools](https://github.com/arq5x/bedtools).
Scalable toolkit for analyzing single-cell gene expression data, including preprocessing, visualization, clustering, and trajectory inference.
SKESA is a de-novo sequence read assembler for microbial genomes. It uses conservative heuristics and is designed to create breaks at repeat regions in the genome. This leads to excellent sequence quality without significantly compromising contiguity.
Minimap2 is an pairwise aligner for genomic and spliced nucleotide sequences. It can perform the assembly-to-assembly alignment, and works with gzip'd FASTQ, FASTA formats. It also finds overlaps between long-reads.
Bakta is a tool for the rapid & standardized annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readable JSON & bioinformatics standard file formats for automatic downstream analysis.
De novo assembler for single molecule sequencing reads using repeat graphs.
Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF.
Horizon chart D3-based JavaScript library for DNA data.
Java-based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats.
JavaScript genome browser that is highly customizable via plugins and track customizations.
[@crazyhottommy](https://github.com/crazyhottommy)'s notes on various steps and considerations when doing RNA-seq analysis.
Universal molecular toolkit that can be used for molecular fingerprinting, substructure search, and molecular visualization written in C++ package, with Java, C#, and Python wrappers.
- Molecular Manipulation Made Easy. A light wrapper build on top of RDKit.
Molecule validation and standardization based on [RDKit](http://www.rdkit.org/).
A script to run structural alerts using the RDKit and ChEMBL
Chemical 2D structure editor application/applet based on the [Chemistry Development Kit](https://sourceforge.net/projects/cdk/).
Simple RDKit molecule editor GUI using PySide.
Molecular descriptor calculator based on [RDKit](http://www.rdkit.org/).
Descriptor computation(chemistry) and (optional) storage for machine learning.
Vector representations of molecular substructures.
Molecular property prediction with unified API for diverse models and respresentations,
A Library for Deep Learning in Biology and Chemistry.
A python package for optimizing chemical reactions using machine learning (contains 10 algorithms + several benchmarks).
Open source web framework for small molecule analysis based on Django.
Analysis of molecular dynamics trajectories.
Parsers and algorithms for computational chemistry logfiles.
an automated workflow for the generation and storage of DFT calculations for organic molecules.
A list of papers, data sets, and other resources for machine learning for small-molecule drug discovery.
Conversational data analysis using natural language
Secure text-to-visualization through standardized chart specifications
Multi-type data labeling and annotation tool
Multi-agent system with Parser-Planner-Painter architecture converting `paper.pdf` to editable `poster.pptx`, outperforms GPT-4o with 87% fewer tokens
Multimodal LLM for scientific charts and diagrams understanding/generation
Beyond text-to-slides generation with PPTEval multi-dimensional evaluation (EMNLP 2025)
Transform arXiv papers into Beamer slides using LLMs
Convert PDF files into editable slides with three lines of code
First benchmark for automatic video generation from scientific papers (NeurIPS 2025)