Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active147
Idle73
Stale48
Archived4
(None)1

Domain

Software32
Protein & Drug Discovery15
SingleCell9
GeneExpression8
Genomics & Bioinformatics6
Machine Learning6
Autonomous Research Systems (2023-2025 Breakthroughs)5
Climate Modeling5
CRISPR5
DNAMethylation5
Command Line Utilities4
Force Fields4
(None)8

Language

R125
Python85
Jupyter Notebook23
C6
Go4
C++3
TypeScript3
HTML2
JavaScript2
Julia2
Ruby2
CSS1
(None)7

License(1)

MIT273
GPL-3.0175
Artistic-2.0139
Apache-2.092
NOASSERTION82
GPL-2.0+38
BSD-3-Clause37
GPL-3.0+35
GPL-2.033
CC-BY-4.030
CC0-1.018
Other12
(None)123

Source(1)

bioconductor386
github273
awesome-ai-for-science75
awesome-bioinformatics25
bio.tools25
awesome-python-chemistry20
bioregistry12
awesome-cheminformatics7
awesome-scientific-python1

Type

Software tool264
Database9

Filters

Health

Active147
Idle73
Stale48
Archived4
(None)1

Domain

Software32
Protein & Drug Discovery15
SingleCell9
GeneExpression8
Genomics & Bioinformatics6
Machine Learning6
Autonomous Research Systems (2023-2025 Breakthroughs)5
Climate Modeling5
CRISPR5
DNAMethylation5
Command Line Utilities4
Force Fields4
(None)8

Language

R125
Python85
Jupyter Notebook23
C6
Go4
C++3
TypeScript3
HTML2
JavaScript2
Julia2
Ruby2
CSS1
(None)7

License(1)

MIT273
GPL-3.0175
Artistic-2.0139
Apache-2.092
NOASSERTION82
GPL-2.0+38
BSD-3-Clause37
GPL-3.0+35
GPL-2.033
CC-BY-4.030
CC0-1.018
Other12
(None)123

Source(1)

bioconductor386
github273
awesome-ai-for-science75
awesome-bioinformatics25
bio.tools25
awesome-python-chemistry20
bioregistry12
awesome-cheminformatics7
awesome-scientific-python1

Type

Software tool264
Database9

273 of 5,923 resources

Showing 251–273

Cookiecutter Bioinformatics

A cookiecutter template for bioinformatics projects, with a focus on building bioinformatics workflows that can run on the MPI-IE cluster according to FAIR principles.

Stale★143 years ago

PanomiR

PanomiR is a package to detect miRNAs that target groups of pathways from gene expression data. This package provides functionality for generating pathway activity profiles, determining differentially activated pathways between user-specified conditions, determining clusters of pathways via the PCxN package, and generating miRNAs targeting clusters of pathways. These function can be used separately or sequentially to analyze RNA-Seq data.

Stale★33 years ago

hgraph2graph

General Chemistry

Hierarchical Generation of Molecular Graphs using Structural Motifs.

Stale★4383 years ago

snapcount

snapcount is a client interface to the Snaptron webservices which support querying by gene name or genomic region. Results include raw expression counts derived from alignment of RNA-seq samples and/or various summarized measures of expression across one or more regions/genes per-sample (e.g. percent spliced in).

Stale★34 years ago

multiHiCcompare

multiHiCcompare provides functions for joint normalization and difference detection in multiple Hi-C datasets. This extension of the original HiCcompare package now allows for Hi-C experiments with more than 2 groups and multiple samples per group. multiHiCcompare operates on processed Hi-C data in the form of sparse upper triangular matrices. It accepts four column (chromosome, region1, region2, IF) tab-separated text files storing chromatin interaction matrices. multiHiCcompare provides cyclic loess and fast loess (fastlo) methods adapted to jointly normalizing Hi-C data. Additionally, it provides a general linear model (GLM) framework adapting the edgeR package to detect differences in Hi-C data in a distance dependent manner.

Stale★104 years ago

InterCellar

InterCellar is implemented as an R/Bioconductor Package containing a Shiny app that allows users to interactively analyze cell-cell communication from scRNA-seq data. Starting from precomputed ligand-receptor interactions, InterCellar provides filtering options, annotations and multiple visualizations to explore clusters, genes and functions. Finally, based on functional annotation from Gene Ontology and pathway databases, InterCellar implements data-driven analyses to investigate cell-cell communication in one or multiple conditions.

Stale★124 years ago

CluMSID

CluMSID is a tool that aids the identification of features in untargeted LC-MS/MS analysis by the use of MS2 spectra similarity and unsupervised statistical methods. It offers functions for a complete and customisable workflow from raw data to visualisations and is interfaceable with the xmcs family of preprocessing packages.

Stale★104 years ago

tomoda

This package provides many easy-to-use methods to analyze and visualize tomo-seq data. The tomo-seq technique is based on cryosectioning of tissue and performing RNA-seq on consecutive sections. (Reference: Kruse F, Junker JP, van Oudenaarden A, Bakkers J. Tomo-seq: A method to obtain genome-wide expression data with spatial resolution. Methods Cell Biol. 2016;135:299-307. doi:10.1016/bs.mcb.2016.01.006) The main purpose of the package is to find zones with similar transcriptional profiles and spatially expressed genes in a tomo-seq sample. Several visulization functions are available to create easy-to-modify plots.

Stale★04 years ago

maser

AlternativeSplicing

This package provides functionalities for downstream analysis, annotation and visualizaton of alternative splicing events generated by rMATS.

Stale★214 years ago

MEAT

This package estimates epigenetic age in skeletal muscle, using DNA methylation data generated with the Illumina Infinium technology (HM27, HM450 and HMEPIC).

Stale★14 years ago

cyanoFilter

An approach to filter out and/or identify phytoplankton cells from all particles measured via flow cytometry pigment and cell complexity information. It does this using a sequence of one-dimensional gates on pre-defined channels measuring certain pigmentation and complexity. The package is especially tuned for cyanobacteria, but will work fine for phytoplankton communities where there is at least one cell characteristic that differentiates every phytoplankton in the community.

Stale★04 years ago

fedup

GeneSetEnrichment

An R package that tests for enrichment and depletion of user-defined pathways using a Fisher's exact test. The method is designed for versatile pathway annotation formats (eg. gmt, txt, xlsx) to allow the user to run pathway analysis on custom annotations. This package is also integrated with Cytoscape to provide network-based pathway visualization that enhances the interpretability of the results.

Stale★74 years ago

Ruffus

Workflow Managers

Computation Pipeline library for python widely used in science and bioinformatics.

Stale★1754 years ago

Squiggle

Genome Browsers / Gene Diagrams

Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations.

Archived★424 years ago

MITObim

MITObim - mitochondrial baiting and iterative mapping

Stale★1165 years ago

Python for Data Analysis

Luke Thompson, NOAA.

Stale★8905 years ago

Jupyter Notebook

nanosv

Structural genomics

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.

Stale★926 years ago

SomaticSignatures

The SomaticSignatures package identifies mutational signatures of single nucleotide variants (SNVs). It provides a infrastructure related to the methodology described in Nik-Zainal (2012, Cell), with flexibility in the matrix decomposition algorithms.

Archived★236 years ago

AfterQC

Sequence Processing

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.

Stale★2146 years ago

microbiomeDASim

A toolkit for simulating differential microbiome data designed for longitudinal analyses. Several functional forms may be specified for the mean trend. Observations are drawn from a multivariate normal model. The objective of this package is to be able to simulate data in order to accurately compare different longitudinal methods for differential abundance.

Stale★36 years ago

mimager

Easily visualize and inspect microarrays for spatial artifacts.

Stale★06 years ago

runibic

This package implements UbiBic algorithm in R. This biclustering algorithm for analysis of gene expression data was introduced by Zhenjia Wang et al. in 2016. It is currently considered the most promising biclustering method for identification of meaningful structures in complex and noisy data.

Stale★47 years ago

The SEED

With the growing number of available genomes, the need for an environment to support effective comparative analysis increases. The original SEED Project was started in 2003 by the [Fellowship for Interpretation of Genomes (FIG)](http://thefig.info/) as a largely unfunded open source effort. Argonne National Laboratory and the University of Chicago joined the project, and now much of the activity occurs at those two institutions (as well as the University of Illinois at Urbana-Champaign, Hope college, San Diego State University, the Burnham Institute and a number of other institutions). The cooperative effort focuses on the development of the comparative genomics environment called the SEED and, more importantly, on the development of curated genomic data. This prefix provides identifiers for molecular roles that describe the function of one or more proteins in microbes and plants.

1
2
3
4
5
6

Next →

Submit a resource bio.tools Awesome Bioinformatics