Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active325
Idle128
Stale99
Archived5
(None)156

Domain

Software95
SingleCell27
Protein & Drug Discovery26
GeneExpression23
ImmunoOncology19
DataImport15
Genomics & Bioinformatics14
Autonomous Research Systems (2023-2025 Breakthroughs)13
Visualization13
Infrastructure10
Machine Learning10
RNASeq10
(None)21

Language

R388
Python198
Jupyter Notebook36
C++10
C8
JavaScript8
Shell7
TypeScript7
Go6
HTML5
Julia3
Nextflow3
(None)18

License(1)

MIT713
GPL-3.0653
Artistic-2.0543
CC-BY-4.0262
GPL-2.0255
GPL-2.0+242
Apache-2.0228
NOASSERTION169
CC0-1.0114
GPL-3.0+101
CC-BY-3.079
BSD-3-Clause78
(None)2434

Source

github561
bioconductor384
awesome-ai-for-science169
bio.tools54
awesome-bioinformatics43
awesome-python-chemistry35
bioregistry21
awesome-cheminformatics12
awesome-scientific-python2

Type

Software tool692
Database21

Filters

Health

Active325
Idle128
Stale99
Archived5
(None)156

Domain

Software95
SingleCell27
Protein & Drug Discovery26
GeneExpression23
ImmunoOncology19
DataImport15
Genomics & Bioinformatics14
Autonomous Research Systems (2023-2025 Breakthroughs)13
Visualization13
Infrastructure10
Machine Learning10
RNASeq10
(None)21

Language

R388
Python198
Jupyter Notebook36
C++10
C8
JavaScript8
Shell7
TypeScript7
Go6
HTML5
Julia3
Nextflow3
(None)18

License(1)

MIT713
GPL-3.0653
Artistic-2.0543
CC-BY-4.0262
GPL-2.0255
GPL-2.0+242
Apache-2.0228
NOASSERTION169
CC0-1.0114
GPL-3.0+101
CC-BY-3.079
BSD-3-Clause78
(None)2434

Source

github561
bioconductor384
awesome-ai-for-science169
bio.tools54
awesome-bioinformatics43
awesome-python-chemistry35
bioregistry21
awesome-cheminformatics12
awesome-scientific-python2

Type

Software tool692
Database21

713 of 6,358 resources

Showing 501–550

lipidr

lipidr an easy-to-use R package implementing a complete workflow for downstream analysis of targeted and untargeted lipidomics data. lipidomics results can be imported into lipidr as a numerical matrix or a Skyline export, allowing integration into current analysis frameworks. Data mining of lipidomics datasets is enabled through integration with Metabolomics Workbench API. lipidr allows data inspection, normalization, univariate and multivariate analysis, displaying informative visualizations. lipidr also implements a novel Lipid Set Enrichment Analysis (LSEA), harnessing molecular information such as lipid class, total chain length and unsaturation.

Stale★333 years ago

easy_qsub

Command Line Utilities

Easily submitting PBS jobs with script template. Multiple input files supported.

Stale★293 years ago

chainer-chemistry

Machine Learning

A Library for Deep Learning in Biology and Chemistry.

Stale★7023 years ago

chainer-chemistry

Machine Learning

A deep learning framework (based on Chainer) with applications in Biology and Chemistry.

Stale★7023 years ago

HPiP

HPiP (Host-Pathogen Interaction Prediction) uses an ensemble learning algorithm for prediction of host-pathogen protein-protein interactions (HP-PPIs) using structural and physicochemical descriptors computed from amino acid-composition of host and pathogen proteins.The proposed package can effectively address data shortages and data unavailability for HP-PPI network reconstructions. Moreover, establishing computational frameworks in that regard will reveal mechanistic insights into infectious diseases and suggest potential HP-PPI targets, thus narrowing down the range of possible candidates for subsequent wet-lab experimental validations.

Stale★33 years ago

lineagespot

VariantDetection

Lineagespot is a framework written in R, and aims to identify SARS-CoV-2 related mutations based on a single (or a list) of variant(s) file(s) (i.e., variant calling format). The method can facilitate the detection of SARS-CoV-2 lineages in wastewater samples using next generation sequencing, and attempts to infer the potential distribution of the SARS-CoV-2 lineages.

Stale★23 years ago

GraphINVENT

Generative Molecular Design

A platform for graph-based molecular generation using graph neural networks.

Archived★3813 years ago

FamAgg

Framework providing basic pedigree analysis and plotting utilities as well as a variety of methods to evaluate familial aggregation of traits in large pedigrees.

Stale★13 years ago

atom3d

Machine Learning

Enables machine learning on three-dimensional molecular structure.

Stale★3193 years ago

uncoverappLib

a Shiny application containing a suite of graphical and statistical tools to support clinical assessment of low coverage regions.It displays three web pages each providing a different analysis module: Coverage analysis, calculate AF by allele frequency app and binomial distribution. uncoverAPP provides a statisticl summary of coverage given target file or genes name.

Stale★33 years ago

awesome-molecular-docking

A curated list of molecular docking software, datasets, and other closely related resources.

Stale★1063 years ago

SOMNiBUS

This package aims to analyse count-based methylation data on predefined genomic regions, such as those obtained by targeted sequencing, and thus to identify differentially methylated regions (DMRs) that are associated with phenotypes or traits. The method is built a rich flexible model that allows for the effects, on the methylation levels, of multiple covariates to vary smoothly along genomic regions. At the same time, this method also allows for sequencing errors and can adjust for variability in cell type mixture.

Stale★13 years ago

MoleOOD

Machine Learning

a robust molecular representation learning framework against distribution shifts.

Stale★613 years ago

censcyt

Methods for differential abundance analysis in high-dimensional cytometry data when a covariate is subject to right censoring (e.g. survival time) based on multiple imputation and generalized linear mixed models.

Stale★03 years ago

fermi-lite

Standalone C library for assembling Illumina short reads in small regions

Stale★723 years ago

brendaDb

ThirdPartyClient

R interface for importing and analyzing enzyme information from the BRENDA database.

Stale★23 years ago

wppi

GraphAndNetwork

Protein-protein interaction data is essential for omics data analysis and modeling. Database knowledge is general, not specific for cell type, physiological condition or any other context determining which connections are functional and contribute to the signaling. Functional annotations such as Gene Ontology and Human Phenotype Ontology might help to evaluate the relevance of interactions. This package predicts functional relevance of protein-protein interactions based on functional annotations such as Human Protein Ontology and Gene Ontology, and prioritizes genes based on network topology, functional scores and a path search algorithm.

Stale★13 years ago

GGD

Go Get Data; A command line interface for obtaining genomic data.

Stale★423 years ago

ito

Stale★783 years ago

Jupyter Notebook

Cookiecutter Bioinformatics

A cookiecutter template for bioinformatics projects, with a focus on building bioinformatics workflows that can run on the MPI-IE cluster according to FAIR principles.

Stale★133 years ago

PanomiR

PanomiR is a package to detect miRNAs that target groups of pathways from gene expression data. This package provides functionality for generating pathway activity profiles, determining differentially activated pathways between user-specified conditions, determining clusters of pathways via the PCxN package, and generating miRNAs targeting clusters of pathways. These function can be used separately or sequentially to analyze RNA-Seq data.

Stale★33 years ago

scatterHatch

The objective of this package is to efficiently create scatterplots where groups can be distinguished by color and texture. Visualizations in computational biology tend to have many groups making it difficult to distinguish between groups solely on color. Thus, this package is useful for increasing the accessibility of scatterplot visualizations to those with visual impairments such as color blindness.

Stale★74 years ago

hgraph2graph

General Chemistry

Hierarchical Generation of Molecular Graphs using Structural Motifs.

Stale★4414 years ago

snapcount

snapcount is a client interface to the Snaptron webservices which support querying by gene name or genomic region. Results include raw expression counts derived from alignment of RNA-seq samples and/or various summarized measures of expression across one or more regions/genes per-sample (e.g. percent spliced in).

Stale★34 years ago

TADCompare

TADCompare is an R package designed to identify and characterize differential Topologically Associated Domains (TADs) between multiple Hi-C contact matrices. It contains functions for finding differential TADs between two datasets, finding differential TADs over time and identifying consensus TADs across multiple matrices. It takes all of the main types of HiC input and returns simple, comprehensive, easy to analyze results.

Stale★274 years ago

preciseTAD

preciseTAD provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.

Stale★84 years ago

multiHiCcompare

multiHiCcompare provides functions for joint normalization and difference detection in multiple Hi-C datasets. This extension of the original HiCcompare package now allows for Hi-C experiments with more than 2 groups and multiple samples per group. multiHiCcompare operates on processed Hi-C data in the form of sparse upper triangular matrices. It accepts four column (chromosome, region1, region2, IF) tab-separated text files storing chromatin interaction matrices. multiHiCcompare provides cyclic loess and fast loess (fastlo) methods adapted to jointly normalizing Hi-C data. Additionally, it provides a general linear model (GLM) framework adapting the edgeR package to detect differences in Hi-C data in a distance dependent manner.

Stale★104 years ago

InterCellar

InterCellar is implemented as an R/Bioconductor Package containing a Shiny app that allows users to interactively analyze cell-cell communication from scRNA-seq data. Starting from precomputed ligand-receptor interactions, InterCellar provides filtering options, annotations and multiple visualizations to explore clusters, genes and functions. Finally, based on functional annotation from Gene Ontology and pathway databases, InterCellar implements data-driven analyses to investigate cell-cell communication in one or multiple conditions.

Stale★124 years ago

DeepSphere

Astronomy & Astrophysics

Spherical CNNs for astronomy

Stale★1674 years ago

CluMSID

CluMSID is a tool that aids the identification of features in untargeted LC-MS/MS analysis by the use of MS2 spectra similarity and unsupervised statistical methods. It offers functions for a complete and customisable workflow from raw data to visualisations and is interfaceable with the xmcs family of preprocessing packages.

Stale★104 years ago

tomoda

This package provides many easy-to-use methods to analyze and visualize tomo-seq data. The tomo-seq technique is based on cryosectioning of tissue and performing RNA-seq on consecutive sections. (Reference: Kruse F, Junker JP, van Oudenaarden A, Bakkers J. Tomo-seq: A method to obtain genome-wide expression data with spatial resolution. Methods Cell Biol. 2016;135:299-307. doi:10.1016/bs.mcb.2016.01.006) The main purpose of the package is to find zones with similar transcriptional profiles and spatially expressed genes in a tomo-seq sample. Several visulization functions are available to create easy-to-modify plots.

Stale★04 years ago

maser

AlternativeSplicing

This package provides functionalities for downstream analysis, annotation and visualizaton of alternative splicing events generated by rMATS.

Stale★214 years ago

RNA-seq Analysis

[@crazyhottommy](https://github.com/crazyhottommy)'s notes on various steps and considerations when doing RNA-seq analysis.

Stale★1.1K4 years ago

strainberry

Automated strain separation of low-complexity metagenomes

Stale★524 years ago

subTOM

Electron microscopy

Subvolume processing scripts with the TOM toolbox is a collection of scripts form a pipeline for subvolume alignment and averaging of electron cryo-tomography data.

Stale★94 years ago

Crystal Graph CNNs

Materials Discovery

Crystal property prediction

Stale★8754 years ago

MEAT

This package estimates epigenetic age in skeletal muscle, using DNA methylation data generated with the Illumina Infinium technology (HM27, HM450 and HMEPIC).

Stale★14 years ago

cyanoFilter

An approach to filter out and/or identify phytoplankton cells from all particles measured via flow cytometry pigment and cell complexity information. It does this using a sequence of one-dimensional gates on pre-defined channels measuring certain pigmentation and complexity. The package is especially tuned for cyanobacteria, but will work fine for phytoplankton communities where there is at least one cell characteristic that differentiates every phytoplankton in the community.

Stale★04 years ago

fedup

GeneSetEnrichment

An R package that tests for enrichment and depletion of user-defined pathways using a Fisher's exact test. The method is designed for versatile pathway annotation formats (eg. gmt, txt, xlsx) to allow the user to run pathway analysis on custom annotations. This package is also integrated with Cytoscape to provide network-based pathway visualization that enhances the interpretability of the results.

Stale★85 years ago

Ruffus

Workflow Managers

Computation Pipeline library for python widely used in science and bioinformatics.

Stale★1755 years ago

Squiggle

Genome Browsers / Gene Diagrams

Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations.

Archived★425 years ago

biobtreeR

The biobtreeR package provides an interface to [biobtree](https://github.com/tamerh/biobtree) tool which covers large set of bioinformatics datasets and allows search and chain mappings functionalities.

Stale★35 years ago

MITObim

MITObim - mitochondrial baiting and iterative mapping

Stale★1165 years ago

Python for Data Analysis

Luke Thompson, NOAA.

Stale★8935 years ago

Jupyter Notebook

cruzdb

Pythonic access to the UCSC Genome database.

Stale★1375 years ago

nanosv

Structural genomics

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.

Stale★926 years ago

SomaticSignatures

The SomaticSignatures package identifies mutational signatures of single nucleotide variants (SNVs). It provides a infrastructure related to the methodology described in Nik-Zainal (2012, Cell), with flexibility in the matrix decomposition algorithms.

Archived★236 years ago

AfterQC

Sequence Processing

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.

Stale★2146 years ago

microbiomeDASim

A toolkit for simulating differential microbiome data designed for longitudinal analyses. Several functional forms may be specified for the mean trend. Observations are drawn from a multivariate normal model. The objective of this package is to be able to simulate data in order to accurately compare different longitudinal methods for differential abundance.

Stale★36 years ago

MolVS

Format Checking

Molecule validation and standardization based on [RDKit](http://www.rdkit.org/).

Stale★1866 years ago

1
9
10
11
12
13
15

Submit a resource bio.tools Awesome Bioinformatics