Open Science Index

Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

Filters

Health

Active748
Idle370
Stale316
Archived13
(None)4476

Domain

Software422
ImmunoOncology251
Microarray138
Infrastructure123
GeneExpression117
Sequencing85
SingleCell72
Protein & Drug Discovery66
text-generation63
Visualization61
Annotation51
Genetics51
(None)2332

Language

R2426
Python448
Jupyter Notebook52
HTML30
C21
Makefile19
JavaScript16
C++15
Java10
Shell9
Web Ontology Language7
Perl6
(None)2815

License

GPL-3.0620
Artistic-2.0550
MIT549
CC-BY-4.0268
GPL-2.0252
GPL-2.0+243
CC0-1.0120
Apache-2.0107
GPL-3.0+101
CC-BY-3.083
NOASSERTION82
Other61
(None)2441

Source

bioconductor2418
bioregistry2418
github1150
awesome-ai-for-science418
huggingface303
awesome-bioinformatics126
bio.tools116
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18

Type

Software tool3202
Database2418
AI model303

Filters

Health

Active748
Idle370
Stale316
Archived13
(None)4476

Domain

Software422
ImmunoOncology251
Microarray138
Infrastructure123
GeneExpression117
Sequencing85
SingleCell72
Protein & Drug Discovery66
text-generation63
Visualization61
Annotation51
Genetics51
(None)2332

Language

R2426
Python448
Jupyter Notebook52
HTML30
C21
Makefile19
JavaScript16
C++15
Java10
Shell9
Web Ontology Language7
Perl6
(None)2815

License

GPL-3.0620
Artistic-2.0550
MIT549
CC-BY-4.0268
GPL-2.0252
GPL-2.0+243
CC0-1.0120
Apache-2.0107
GPL-3.0+101
CC-BY-3.083
NOASSERTION82
Other61
(None)2441

Source

bioconductor2418
bioregistry2418
github1150
awesome-ai-for-science418
huggingface303
awesome-bioinformatics126
bio.tools116
awesome-python-chemistry87
awesome-cheminformatics45
awesome-scientific-python18

Type

Software tool3202
Database2418
AI model303

5,923 resources indexed

Showing 751–800

plant-llms/PlantBiMoE

by plant-llms

## Model Overview PlantBiMoE is a DNA language model trained on 42 representative plant species genomes. More specifically, PlantBiMoE uses the BiMamba and SparseMoE architecture with a masked language modeling objective to leverage highly available genotype data from 42 different plant speices to…

Idle↓106 months ago

TITAN (Nature Medicine 2024)

Computational Pathology & Digital Pathology

Multimodal whole-slide pathology foundation model jointly pretrained on H&E histology and diagnostic text reports, enabling zero-shot cancer subtyping, biomarker prediction, and multimodal reasoning across diverse cancer types (Mahmood Lab, 341+ stars)

Idle★3506 months ago

prov-gigatime/GigaTIME

by prov-gigatime

Idle↓2846 months ago

hypeR

GeneSetEnrichment

An R Package for Geneset Enrichment Workflows.

Idle★796 months ago

xCell2

xCell2 provides methods for cell type enrichment analysis using cell type signatures. It includes three main functions - 1. xCell2Train for training custom references objects from bulk or single-cell RNA-seq datasets. 2. xCell2Analysis for conducting the cell type enrichment analysis using the custom reference. 3. xCell2GetLineage for identifying dependencies between different cell types using ontology.

Idle★216 months ago

SpliceWiz

The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization.

Idle★246 months ago

mspms

This package provides functions for the analysis of data generated by the multiplex substrate profiling by mass spectrometry for proteases (MSP-MS) method. Data exported from upstream proteomics software is accepted as input and subsequently processed for analysis. Tools for statistical analysis, visualization, and interpretation of the data are provided.

Idle★16 months ago

methrix

Bedgraph files generated by Bisulfite pipelines often come in various flavors. Critical downstream step requires summarization of these files into methylation/coverage matrices. This step of data aggregation is done by Methrix, including many other useful downstream functions.

Idle★366 months ago

juppy44/plant-identification-2m-vit-b

by juppy44

image-classification

Idle↓3206 months ago

Bibliographic Framework Initiative Vocabulary

The Bibframe vocabulary consists of RDF classes and properties used for the description of items cataloged principally by libraries, but may also be used to describe items cataloged by museums and archives. Classes include the three core classes - Work, Instance, and Item - in addition to many more classes to support description. Properties describe characteristics of the resource being described as well as relationships among resources. For example: one Work might be a "translation of" another Work; an Instance may be an "instance of" a particular Bibframe Work. Other properties describe attributes of Works and Instances. For example: the Bibframe property "subject" expresses an important attribute of a Work (what the Work is about), and the property "extent" (e.g. number of pages) expresses an attribute of an Instance.

Idle★546 months ago

VIVO Ontology

An ontology about scholarship

Idle★166 months ago

EnhancedVolcano

Volcano plots represent a useful way to visualise the results of differential expression analyses. Here, we present a highly-configurable function that produces publication-ready volcano plots. EnhancedVolcano will attempt to fit as many point labels in the plot window as possible, thus avoiding 'clogging' up the plot with labels that could not otherwise have been read. Other functionality allows the user to identify up to 4 different types of attributes in the same plot space via colour, shape, size, and shade parameter configurations.

Idle★4646 months ago

MedSwin/MedSwin-DaRE-TIES-KD-0.7

by MedSwin

question-answering

This is a merge of pre-trained language models created using mergekit.

Idle↓416 months ago

gatom

This package implements a metabolic network analysis pipeline to identify an active metabolic module based on high throughput data. The pipeline takes as input transcriptional and/or metabolic data and finds a metabolic subnetwork (module) most regulated between the two conditions of interest. The package further provides functions for module post-processing, annotation and visualization.

Idle★86 months ago

GladiaTOX

GladiaTOX R package is an open-source, flexible solution to high-content screening data processing and reporting in biomedical research. GladiaTOX takes advantage of the tcpl core functionalities and provides a number of extensions: it provides a web-service solution to fetch raw data; it computes severity scores and exports ToxPi formatted files; furthermore it contains a suite of functionalities to generate pdf reports for quality control and data processing.

Idle★06 months ago

iSEEu

iSEEu (the iSEE universe) contains diverse functionality to extend the usage of the iSEE package, including additional classes for the panels, or modes allowing easy configuration of iSEE applications.

Idle★96 months ago

mosdef

This package provides functionality to run a number of tasks in the differential expression analysis workflow. This encompasses the most widely used steps, from running various enrichment analysis tools with a unified interface to creating plots and beautifying table components linking to external websites and databases. This streamlines the generation of comprehensive analysis reports.

Idle★06 months ago

flowcatchR

flowcatchR is a set of tools to analyze in vivo microscopy imaging data, focused on tracking flowing blood cells. It guides the steps from segmentation to calculation of features, filtering out particles not of interest, providing also a set of utilities to help checking the quality of the performed operations (e.g. how good the segmentation was). It allows investigating the issue of tracking flowing cells such as in blood vessels, to categorize the particles in flowing, rolling and adherent. This classification is applied in the study of phenomena such as hemostasis and study of thrombosis development. Moreover, flowcatchR presents an integrated workflow solution, based on the integration with a Shiny App and Jupyter notebooks, which is delivered alongside the package, and can enable fully reproducible bioimage analysis in the R environment.

Idle★46 months ago

mariner

FunctionalGenomics

Tools for manipulating paired ranges and working with Hi-C data in R. Functionality includes manipulating/merging paired regions, generating paired ranges, extracting/aggregating interactions from `.hic` files, and visualizing the results. Designed for compatibility with plotgardener for visualization.

Idle★126 months ago

Curriculum Course Syllabus Ontology

CCSO is an educational ontology acting as a data model for concepts and entities within an academic setting, enabling also the annotation of potentially available resources. The ontology aims to conceptualize educational entities within Curriculum and Syllabus with appropriate coverage and quality, in order to support rich services on top for improving curriculum management and automatically enabling syllabus semantic processes. (from homepage)

Idle★06 months ago

DESpace

Intuitive framework for identifying spatially variable genes (SVGs) and differential spatial variable pattern (DSP) between conditions via edgeR, a popular method for performing differential expression analyses. Based on pre-annotated spatial clusters as summarized spatial information, DESpace models gene expression using a negative binomial (NB), via edgeR, with spatial clusters as covariates. SVGs are then identified by testing the significance of spatial clusters. For multi-sample, multi-condition datasets, we again fit a NB model via edgeR, incorporating spatial clusters, conditions and their interactions as covariates. DSP genes-representing differences in spatial gene expression patterns across experimental conditions-are identified by testing the interaction between spatial clusters and conditions.

Idle★76 months ago

GeDi

The package provides different distances measurements to calculate the difference between genesets. Based on these scores the genesets are clustered and visualized as graph. This is all presented in an interactive Shiny application for easy usage.

Idle★26 months ago

mace-foundations/mace-mh-1

by mace-foundations

MACE-MH-1 is a foundation machine-learning interatomic potential (MLIP) that bridges molecular, surface, and materials chemistry through cross-domain learning:

Idle↓06 months ago

ZJU-AI4H/Hulu-Med-4B

by ZJU-AI4H

image-text-to-text

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

Idle↓18.6K6 months ago

Biocaml

Biocaml aims to be a high-performance user-friendly library for Bioinformatics.

Idle★1256 months ago

methylCC

A tool to estimate the cell composition of DNA methylation whole blood sample measured on any platform technology (microarray and sequencing).

Idle★186 months ago

GEOquery

The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.

Idle★1126 months ago

microsoft/llava-med-v1.5-mistral-7b

by microsoft

image-text-to-text

Large Language and Vision Assistant for bioMedicine (i.e., “LLaVA-Med”) is a large language and vision model trained using a curriculum learning method for adapting LLaVA to the biomedical domain. It is an open-source release intended for research use only to facilitate reproducibility of the…

Idle↓21.4K6 months ago

ChEMBL_Structure_Pipeline (formerly standardiser)

Format Checking

Tool designed to provide a simple way of standardising molecules as a prelude to e.g. molecular modelling exercises.

Idle★2416 months ago

biocmake

Manages the installation of CMake for building Bioconductor packages. This avoids the need for end-users to manually install CMake on their system. No action is performed if a suitable version of CMake is already available.

Idle★16 months ago

nullranges

Modular package for generation of sets of ranges representing the null hypothesis. These can take the form of bootstrap samples of ranges (using the block bootstrap framework of Bickel et al 2010), or sets of control ranges that are matched across one or more covariates. nullranges is designed to be inter-operable with other packages for analysis of genomic overlap enrichment, including the plyranges Bioconductor package.

Idle★286 months ago

chemmodlab

Machine Learning

A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models in R.

Idle★176 months ago

stPipe

This package serves as an upstream pipeline for pre-processing sequencing-based spatial transcriptomics data. Functions includes FASTQ trimming, BAM file reformatting, index building, spatial barcode detection, demultiplexing, gene count matrix generation with UMI deduplication, QC, and revelant visualization. Config is an essential input for most of the functions which aims to improve reproducibility.

Idle★56 months ago

BiocSingular

Implements exact and approximate methods for singular value decomposition and principal components analysis, in a framework that allows them to be easily switched within Bioconductor packages or workflows. Where possible, parallelization is achieved using the BiocParallel framework.

Idle★86 months ago

beer

BEER implements a Bayesian model for analyzing phage-immunoprecipitation sequencing (PhIP-seq) data. Given a PhIPData object, BEER returns posterior probabilities of enriched antibody responses, point estimates for the relative fold-change in comparison to negative control samples, and more. Additionally, BEER provides a convenient implementation for using edgeR to identify enriched antibody responses.

Idle★117 months ago

infercnv

Using single-cell RNA-Seq expression to visualize CNV in cells.

Idle★6717 months ago

Human Reference Atlas Common Coordinate Framework Ontology

Idle★37 months ago

CRISPRball

A Shiny application for visualization, exploration, comparison, and filtering of CRISPR screens analyzed with MAGeCK RRA or MLE. Features include interactive plots with on-click labeling, full customization of plot aesthetics, data upload and/or download, and much more. Quickly and easily explore your CRISPR screen results and generate publication-quality figures in seconds.

Idle★137 months ago

hoodscanR

hoodscanR is an user-friendly R package providing functions to assist cellular neighborhood analysis of any spatial transcriptomics data with single-cell resolution. All functions in the package are built based on the SpatialExperiment object, allowing integration into various spatial transcriptomics-related packages from Bioconductor. The package can result in cell-level neighborhood annotation output, along with funtions to perform neighborhood colocalization analysis and neighborhood-based cell clustering.

Idle★137 months ago

cellSAM

Medical AI & Clinical Applications

Foundation model for universal cell segmentation achieving state-of-the-art performance across bacteria, tissue, yeast, cell culture, and diverse imaging modalities (brightfield, fluorescence, phase), with pip-installable inference and Napari plugin (vanvalenlab/Caltech, bioRxiv 2024)

Idle★1957 months ago

BridgeDbR

Use BridgeDb functions and load identifier mapping databases in R. It uses GitHub, Zenodo, and Figshare if you use this package to download identifier mappings files.

Idle★47 months ago

Costal and Marine Ecological Classification Standard

Use this database to browse the CMECS classification and to get definitions for individual CMECS Units. This database contains the units that were published in the Coastal and Marine Ecological Classification Standard.

Idle★87 months ago

waddR

The package offers statistical tests based on the 2-Wasserstein distance for detecting and characterizing differences between two distributions given in the form of samples. Functions for calculating the 2-Wasserstein distance and testing for differential distributions are provided, as well as a specifically tailored test for differential expression in single-cell RNA sequencing data.

Idle★287 months ago

gsort

Command Line Utilities

Sort genomic files according to a specified order.

Idle★367 months ago

Awesome LLM Scientific Discovery

📋 Paper Collections & Repositories

LLM papers for scientific discovery

Idle★3457 months ago

Cell2Sentence

Genomics & Bioinformatics

Teaching Large Language Models the Language of Biology through single-cell transcriptomics (ICML 2024)

Idle★8627 months ago

Jupyter Notebook

gINTomics

gINTomics is an R package for Multi-Omics data integration and visualization. gINTomics is designed to detect the association between the expression of a target and of its regulators, taking into account also their genomics modifications such as Copy Number Variations (CNV) and methylation. What is more, gINTomics allows integration results visualization via a Shiny-based interactive app.

Idle★37 months ago

mitoClone2

This package primarily identifies variants in mitochondrial genomes from BAM alignment files. It filters these variants to remove RNA editing events then estimates their evolutionary relationship (i.e. their phylogenetic tree) and groups single cells into clones. It also visualizes the mutations and providing additional genomic context.

Idle★17 months ago

gbyuvd/chemembed-chemselfies

by gbyuvd

sentence-similarity

ChemFIE-BED is a sentence-transformers based on gbyuvd/chemselfies-base-bertmlm fine-tuned on around (for now) 2 million pairs of valid molecules' SELFIES (Krenn et al. 2020) taken from COCONUTDB (Sorokina et al. 2021) and ChemBL34 (Zdrazil et al. 2023).

Idle↓1177 months ago

ChemFormula

General Chemistry

ChemFormula provides a class for working with chemical formulas. It allows parsing chemical formulas, calculating formula weights, and generating formatted output strings (e.g. in HTML, LaTeX, or Unicode).

Idle★337 months ago

1
14
15
16
17
18
119

Submit a resource bio.tools Awesome Bioinformatics