Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active587
Stale284
Idle260
Archived13
(None)6

Domain

Software125
Infrastructure42
ImmunoOncology39
Protein & Drug Discovery38
GeneExpression30
SingleCell27
Genomics & Bioinformatics24
Sequencing21
Autonomous Research Systems (2023-2025 Breakthroughs)18
Simulations17
Medical AI & Clinical Applications15
Machine Learning14
(None)156

Language

R590
Python267
Jupyter Notebook52
HTML30
Makefile19
C17
C++14
JavaScript13
Shell9
Java8
Web Ontology Language7
TypeScript6
(None)71

License

MIT273
GPL-3.0175
Artistic-2.0139
Apache-2.092
NOASSERTION82
GPL-2.0+38
BSD-3-Clause37
GPL-3.0+35
GPL-2.033
CC-BY-4.030
CC0-1.018
Other12
(None)123

Source(1)

bioconductor2418
bioregistry2418
github1150
awesome-ai-for-science418
huggingface303
awesome-bioinformatics126
bio.tools116
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18

Type

Software tool987
Database163

Filters

Health

Active587
Stale284
Idle260
Archived13
(None)6

Domain

Software125
Infrastructure42
ImmunoOncology39
Protein & Drug Discovery38
GeneExpression30
SingleCell27
Genomics & Bioinformatics24
Sequencing21
Autonomous Research Systems (2023-2025 Breakthroughs)18
Simulations17
Medical AI & Clinical Applications15
Machine Learning14
(None)156

Language

R590
Python267
Jupyter Notebook52
HTML30
Makefile19
C17
C++14
JavaScript13
Shell9
Java8
Web Ontology Language7
TypeScript6
(None)71

License

MIT273
GPL-3.0175
Artistic-2.0139
Apache-2.092
NOASSERTION82
GPL-2.0+38
BSD-3-Clause37
GPL-3.0+35
GPL-2.033
CC-BY-4.030
CC0-1.018
Other12
(None)123

Source(1)

bioconductor2418
bioregistry2418
github1150
awesome-ai-for-science418
huggingface303
awesome-bioinformatics126
bio.tools116
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18

Type

Software tool987
Database163

1,150 of 5,923 resources

Showing 1,101–1,150

Citation Typing Ontology

An ontology that enables characterization of the nature or type of citations, both factually and rhetorically.

Stale★156 years ago

Publishing Roles Ontology

An ontology for the characterisation of the roles of agents – people, corporate bodies and computational agents in the publication process. These agents can be, e.g. authors, editors, reviewers, publishers or librarians.

Stale★16 years ago

Circleator

Genome Browsers / Gene Diagrams

Flexible circular visualization of genome-associated data with BioPerl and SVG.

Stale★466 years ago

geneXtendeR

geneXtendeR optimizes the functional annotation of ChIP-seq peaks by exploring relative differences in annotating ChIP-seq peak sets to variable-length gene bodies. In contrast to prior techniques, geneXtendeR considers peak annotations beyond just the closest gene, allowing users to see peak summary statistics for the first-closest gene, second-closest gene, ..., n-closest gene whilst ranking the output according to biologically relevant events and iteratively comparing the fidelity of peak-to-gene overlap across a user-defined range of upstream and downstream extensions on the original boundaries of each gene's coordinates. Since different ChIP-seq peak callers produce different differentially enriched peaks with a large variance in peak length distribution and total peak count, annotating peak lists with their nearest genes can often be a noisy process. As such, the goal of geneXtendeR is to robustly link differentially enriched peaks with their respective genes, thereby aiding experimental follow-up and validation in designing primers for a set of prospective gene candidates during qPCR.

Stale★107 years ago

scribl

Genome Browsers / Gene Diagrams

JavaScript library for drawing canvas-based gene diagrams.

Stale★767 years ago

TogoID Ontology

TogoID is an ID conversion service implementing unique features with an intuitive web interface and an API for programmatic access. TogoID supports datasets from various biological categories such as gene, protein, chemical compound, pathway, disease, etc. TogoID users can perform exploratory multistep conversions to find a path among IDs. To guide the interpretation of biological meanings in the conversions, we crafted an ontology that defines the semantics of the dataset relations. (from https://togoid.dbcls.jp/)

Stale★27 years ago

CGHbase

Contains functions and classes that are needed by arrayCGH packages.

Stale★07 years ago

coRdon

Tool for analysis of codon usage in various unannotated or KEGG/COG annotated DNA sequences. Calculates different measures of CU bias and CU-based predictors of gene expressivity, and performs gene set enrichment analysis for annotated sequences. Implements several methods for visualization of CU and enrichment analysis results.

Stale★237 years ago

VCFArray

VCFArray extends the DelayedArray to represent VCF data entries as array-like objects with on-disk / remote VCF file as backend. Data entries from VCF files, including info fields, FORMAT fields, and the fixed columns (REF, ALT, QUAL, FILTER) could be converted into VCFArray instances with different dimensions.

Stale★17 years ago

Shape Expression Vocabulary

The Shape Expressions (ShEx) language describes RDF nodes and graph structures. A node constraint describes an RDF node (IRI, blank node or literal) and a shape describes the triples involving nodes in an RDF graph. These descriptions identify predicates and their associated cardinalities and datatypes. ShEx shapes can be used to communicate data structures associated with some process or interface, generate or validate data, or drive user interfaces.

Stale★17 years ago

The Leek group guide to genomics papers

Expertly curated genomics papers to get up to speed on genomics, RNA-seq, statistics (used in genomics), software development, and more.

Stale★5027 years ago

Circos

Perl package for circular plots, which are well suited for genomic rearrangements.

Stale★887 years ago

Telseq

BAM File Utilities

Telseq is a tool for estimating telomere length from whole genome sequence data.

Stale★767 years ago

abseqR

AbSeq is a comprehensive bioinformatic pipeline for the analysis of sequencing datasets generated from antibody libraries and abseqR is one of its packages. abseqR empowers the users of abseqPy (https://github.com/malhamdoosh/abseqPy) with plotting and reporting capabilities and allows them to generate interactive HTML reports for the convenience of viewing and sharing with other researchers. Additionally, abseqR extends abseqPy to compare multiple repertoire analyses and perform further downstream analysis on its output.

Stale★07 years ago

ipdDb

GenomicVariation

All alleles from the IPD IMGT/HLA <https://www.ebi.ac.uk/ipd/imgt/hla/> and IPD KIR <https://www.ebi.ac.uk/ipd/kir/> database for Homo sapiens. Reference: Robinson J, Maccari G, Marsh SGE, Walter L, Blokhuis J, Bimber B, Parham P, De Groot NG, Bontrop RE, Guethlein LA, and Hammond JA KIR Nomenclature in non-human species Immunogenetics (2018), in preparation.

Stale★07 years ago

FAIR* Reviews Ontology

An ontology that enables the description of reviews of scientific articles and other scholarly resources.

Stale★07 years ago

runibic

This package implements UbiBic algorithm in R. This biclustering algorithm for analysis of gene expression data was introduced by Zhenjia Wang et al. in 2016. It is currently considered the most promising biclustering method for identification of meaningful structures in complex and noisy data.

Stale★47 years ago

3D e-Chem Virtual Machine

Virtual Machine

Virtual machine with all software and sample data to run 3D-e-Chem Knime workflows

Stale★177 years ago

MetID

This package uses an innovative network-based approach that will enhance our ability to determine the identities of significant ions detected by LC-MS.

Stale★17 years ago

rCGH

A comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH arrays. As inputs, rCGH supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS or Affymetrix Power Tools. rCGH also supports custom arrays, provided data complies with the expected format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to profiles segmentation and gene annotations. This package also provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.

Stale★58 years ago

BioTop

Upper-Level ontology for Biology and Medicine. Compatible with BFO, DOLCE, and the UMLS Semantic Network

Stale★48 years ago

Histopathology Ontology

An ontology of histopathological morphologies used by pathologists to classify/categorise animal lesions observed histologically during regulatory toxicology studies. The ontology was developed using real data from over 6000 regulatory toxicology studies donated by 13 companies spanning nine species. The original structure of the histopathology ontology was designed ab initio when the [INHAND](http://www.goreni.org/) manuscripts were not available. However, the ontology has been repetitively reviewed and updated to align with the subsequently published INHAND manuscripts. During this process cross references to INHAND lesion identifiers were added to the ontology. [from GitHub]

Stale★98 years ago

Awesome-alternative-splicing

Bioinformatics on GitHub

List of resources on alternative splicing including software, databases, and other tools.

Stale★588 years ago

Drug Target Ontology

DTO integrates and harmonizes knowledge of the most important druggable protein families: kinases, GPCRs, ion channels and nuclear hormone receptors.

Stale★88 years ago

cmpo

Stale★68 years ago

NP-Likeness

Small molecules

Natural Product-likeness calculator v-2.1 : calculates natural product-likeness of small molecules based on open-data of natural products.

Stale★48 years ago

GA4GHclient

DataRepresentation

GA4GHclient provides an easy way to access public data servers through Global Alliance for Genomics and Health (GA4GH) genomics API. It provides low-level access to GA4GH API and translates response data into Bioconductor-based class objects.

Stale★18 years ago

Selventa Chemicals

Selventa legacy chemical namespace used with the Biological Expression Language

Archived★08 years ago

pysic

A calculator incorporating various empirical pair and many-body potentials.

Stale★238 years ago

RJMCMCNucleosomes

BiologicalQuestion

This package does nucleosome positioning using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling.

Stale★08 years ago

GSALightning

GSALightning provides a fast implementation of permutation-based gene set analysis for two-sample problem. This package is particularly useful when testing simultaneously a large number of gene sets, or when a large number of permutations is necessary for more accurate p-values estimation.

Stale★58 years ago

SMITE

This package builds on the Epimods framework which facilitates finding weighted subnetworks ("modules") on Illumina Infinium 27k arrays using the SpinGlass algorithm, as implemented in the iGraph package. We have created a class of gene centric annotations associated with p-values and effect sizes and scores from any researchers prior statistical results to find functional modules.

Stale★19 years ago

isobar

isobar provides methods for preprocessing, normalization, and report generation for the analysis of quantitative mass spectrometry proteomics data labeled with isobaric tags, such as iTRAQ and TMT. Features modules for integrating and validating PTM-centric datasets (isobar-PTM). More information on http://www.ms-isobar.org.

Stale★109 years ago

ctsGE

Methodology for supervised clustering of potentially many predictor variables, such as genes etc., in time series datasets Provides functions that help the user assigning genes to predefined set of model profiles.

Stale★19 years ago

tib.ofm

Stale★39 years ago

Web Ontology Language

GOpro

Find the most characteristic gene ontology terms for groups of human genes. This package was created as a part of the thesis which was developed under the auspices of MI^2 Group (http://mi2.mini.pw.edu.pl/, https://github.com/geneticsMiNIng).

Stale★29 years ago

Human Dermatological Disease Ontology

DermO is an ontology with broad coverage of the domain of dermatologic disease and we demonstrate here its utility for text mining and investigation of phenotypic relationships between dermatologic disorders

Stale★410 years ago

Web Ontology Language

Open Biomedical Annotations

It is an ontology model used to describe associations between biomedical entities in triple format based on W3C specification. OBAN is a generic association representation model that loosely couples a subject and object (e.g. disease and its associated phenotypes supported by the source of evidence for that association) via a construction of class OBAN:association. [from GitHub]

Stale★610 years ago

Web Ontology Language

NanoParticle Ontology

An ontology that represents the basic knowledge of physical, chemical and functional characteristics of nanotechnology as used in cancer diagnosis and therapy.

Stale★110 years ago

Web Ontology Language

fmcsR

Cheminformatics

The fmcsR package introduces an efficient maximum common substructure (MCS) algorithms combined with a novel matching strategy that allows for atom and/or bond mismatches in the substructures shared among two small molecules. The resulting flexible MCSs (FMCSs) are often larger than strict MCSs, resulting in the identification of more common features in their source structures, as well as a higher sensitivity in finding compounds with weak structural similarities. The fmcsR package provides several utilities to use the FMCS algorithm for pairwise compound comparisons, structure similarity searching and clustering.

Stale★610 years ago

GeneBreak

Recurrent breakpoint gene detection on copy number aberration profiles.

Stale★210 years ago

rTRMui

This package provides a web interface to compute transcriptional regulatory modules with rTRM.

Stale★110 years ago

Island Plot

Genome Browsers / Gene Diagrams

D3 JavaScript based genome viewer. Constructs SVGs.

Stale★3310 years ago

Annotation Ontology

The Annotation Ontology specification is currently used as input for the activities of the http://www.w3.org/community/openannotation/'>W3C Open Annotation Community Group that works towards a common, RDF-based, specification for annotating digital resources. The Group effort starts by working towards a reconciliation of two proposals that have emerged over the past two years: the http://code.google.com/p/annotation-ontology/'>Annotation Ontology and the http://www.openannotation.org/spec/beta/'>Open Annotation Model. Initially, editors of these proposals will closely collaborate to devise a common draft specification that addresses requirements and use cases that were identified in the course of their respective efforts. The goal is to make this draft available for public feedback and experimentation in the second quarter of 2012. The final deliverable of the Open Annotation Community Group will be a specification, published under an appropriate open license, that is informed by the existing proposals, the common draft specification, and the community feedback. [from homepage]

Stale★011 years ago

Web Ontology Language

The SEED

With the growing number of available genomes, the need for an environment to support effective comparative analysis increases. The original SEED Project was started in 2003 by the [Fellowship for Interpretation of Genomes (FIG)](http://thefig.info/) as a largely unfunded open source effort. Argonne National Laboratory and the University of Chicago joined the project, and now much of the activity occurs at those two institutions (as well as the University of Illinois at Urbana-Champaign, Hope college, San Diego State University, the Burnham Institute and a number of other institutions). The cooperative effort focuses on the development of the comparative genomics environment called the SEED and, more importantly, on the development of curated genomic data. This prefix provides identifiers for molecular roles that describe the function of one or more proteins in microbes and plants.

AlphaProteo

Protein & Drug Discovery

Deep learning system for de novo design of high-affinity protein binders, achieving strong binding across diverse target classes including challenging intracellular proteins with significantly higher success rates than traditional wet-lab screening methods (Google DeepMind, Nature 2024)

EcoNet

Ecological Modeling

Ecological modeling and conservation AI

MinervaAI

General Science Models

Mathematical reasoning

islify

This software is meant to be used for classification of images of cell-based assays for neuronal surface autoantibody detection or similar techniques. It takes imaging files as input and creates a composite score from these, that for example can be used to classify samples as negative or positive for a certain antibody-specificity. The reason for its name is that I during its creation have thought about the individual picture as an archielago where we with different filters control the water level as well as ground characteristica, thereby finding islands of interest.

scBFA

This package is designed to model gene detection pattern of scRNA-seq through a binary factor analysis model. This model allows user to pass into a cell level covariate matrix X and gene level covariate matrix Q to account for nuisance variance(e.g batch effect), and it will output a low dimensional embedding matrix for downstream analysis.

1
21
22
23

Next →

Submit a resource bio.tools Awesome Bioinformatics