Find open-source science resources

Feature rankings can be distorted by a single case in the context of high-dimensional data. The cases exerts abnormal influence on feature rankings are called influential points (IPs). The package aims at detecting IPs based on case deletion and quantifies their effects by measuring the rank changes (DOI:10.48550/arXiv.2303.10516). The package applies a novel rank comparing measure using the adaptive weights that stress the top-ranked important features and adjust the weights to ranking properties.

Idle01 year ago

DMRScan

This package detects significant differentially methylated regions (for both qualitative and quantitative traits), using a scan statistic with underlying Poisson heuristics. The scan statistic will depend on a sequence of window sizes (# of CpGs within each window) and on a threshold for each window size. This threshold can be calculated by three different means: i) analytically using Siegmund et.al (2012) solution (preferred), ii) an important sampling as suggested by Zhang (2008), and a iii) full MCMC modeling of the data, choosing between a number of different options for modeling the dependency between each CpG.

Idle21 year ago

The data cube vocabulary

This vocabulary allows multi-dimensional data, such as statistics, to be published in RDF. It is based on the core information model from SDMX (and thus also DDI).

Idle131 year ago

HTML

The Artificial Intelligence Ontology

This ontology models classes and relationships describing deep learning networks, their component layers and activation functions, as well as potential biases.

Idle491 year ago

Jupyter Notebook

ChemBERTa

Protein & Drug Discovery

Chemical language model

Idle4961 year ago

Jupyter Notebook

crisprBowtie

CRISPR

Provides a user-friendly interface to map on-targets and off-targets of CRISPR gRNA spacer sequences using bowtie. The alignment is fast, and can be performed using either commonly-used or custom CRISPR nucleases. The alignment can work with any reference or custom genomes. Both DNA- and RNA-targeting nucleases are supported.

Idle31 year ago

oncoscanR

CopyNumberVariation

The software uses the copy number segments from a text file and identifies all chromosome arms that are globally altered and computes various genome-wide scores. The following HRD scores (characteristic of BRCA-mutated cancers) are included: LST, HR-LOH, nLST and gLOH. the package is tailored for the ThermoFisher Oncoscan assay analyzed with their Chromosome Alteration Suite (ChAS) but can be adapted to any input.

Idle31 year ago

NOASSERTION

(Poly)merase

Package suites

A Go library and command line utility for engineering organisms.

Idle7291 year ago

iSEEhex

This package provides panels summarising data points in hexagonal bins for `iSEE`. It is part of `iSEEu`, the iSEE universe of panels that extend the `iSEE` package.

Idle01 year ago

Artistic-2.0

EBImage

Visualization

EBImage provides general purpose functionality for image processing and analysis. In the context of (high-throughput) microscopy-based cellular assays, EBImage offers tools to segment cells and extract quantitative cellular descriptors. This allows the automation of such tasks using the R programming language and facilitates the use of other tools in the R environment for signal processing, statistical modeling, machine learning and visualization with image data.

Idle771 year ago

LGPL

Ontology for Biomarkers of Clinical Interest

The Ontology for Biomarkers of Clinical Interest (OBCI) formally defines biomarkers for diseases, phenotypes, and effects.

Idle11 year ago

CC-BY-4.0

coMethDMR

DNAMethylation

coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.

Idle71 year ago

Physics-Informed Neural Networks

SciANN

Keras-based scientific neural networks

Idle11 year ago

Sharkipedia Species

Sharkipedia is an open source research initiative to make all published biological traits and population trends on sharks, rays, and chimaeras accessible to everyone.

Idle41 year ago

Ruby

Rcpi

A molecular informatics toolkit with an integration of bioinformatics and chemoinformatics tools for drug discovery.

Idle391 year ago

Artistic-2.0

spatialSimGP

Spatial

This packages simulates spatial transcriptomics data with the mean- variance relationship using a Gaussian Process model per gene.

Idle01 year ago

chihaya

DataImport

Saves the delayed operations of a DelayedArray to a HDF5 file. This enables efficient recovery of the DelayedArray's contents in other languages and analysis frameworks.

Idle01 year ago

zitools

zitools allows for zero inflated count data analysis by either using down-weighting of excess zeros or by replacing an appropriate proportion of excess zeros with NA. Through overloading frequently used statistical functions (such as mean, median, standard deviation), plotting functions (such as boxplots or heatmap) or differential abundance tests, it allows a wide range of downstream analyses for zero-inflated data in a less biased manner. This becomes applicable in the context of microbiome analyses, where the data is often overdispersed and zero-inflated, therefore making data analysis extremly challenging.

Idle01 year ago

BSD-3-Clause

SeqVarTools

SNP

An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.

Idle31 year ago

TnT

Infrastructure

Chart-to-Code & Reproducibility

A R interface to the TnT javascript library (https://github.com/ tntvis) to provide interactive and flexible visualization of track-based genomic data.

Idle151 year ago

AGPL-3.0

ChartAssistant / ChartAst (ACL 2024)

Universal chart comprehension and reasoning model

Idle1351 year ago

NOASSERTION

RnaChipIntegrator

Computational biology

Utility that performs integrated analyses of 'gene' data (a set of genes or other genomic features) with 'peak' data (a set of regions, for example ChIP peaks) to identify the genes nearest to each peak, and vice versa.

Idle51 year ago

Artistic-2.0

Academic Event Ontology

The academic event ontology, currently still in development and thus unstable, is an OBO compliant reference ontology for describing academic events such as conferences, workshops or seminars and their series. It is being developed as part of the [ConfIDent project](https://projects.tib.eu/confident/) to allow RDF representations of the academic events and series stored and curated in the [ConfIDent platform](https://www.confident-conference.org/index.php/main_page).

Idle141 year ago

Makefile

CC-BY-4.0

phantasusLite

GeneExpression

PhantasusLite – a lightweight package with helper functions of general interest extracted from phantasus package. In parituclar it simplifies working with public RNA-seq datasets from GEO by providing access to the remote HSDS repository with the precomputed gene counts from ARCHS4 and DEE2 projects.

Idle111 year ago

multiMiR

miRNAData

A collection of microRNAs/targets from external resources, including validated microRNA-target databases (miRecords, miRTarBase and TarBase), predicted microRNA-target databases (DIANA-microT, ElMMo, MicroCosm, miRanda, miRDB, PicTar, PITA and TargetScan) and microRNA-disease/drug databases (miR2Disease, Pharmaco-miR VerSe and PhenomiR).

Idle251 year ago

bcbio-nextgen

Pipelines

Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction.

Idle1K1 year ago

HERON

Microarray

HERON is a software package for analyzing peptide binding array data. In addition to identifying significant binding probes, HERON also provides functions for finding epitopes (string of consecutive peptides within a protein). HERON also calculates significance on the probe, epitope, and protein level by employing meta p-value methods. HERON is designed for obtaining calls on the sample level and calculates fractions of hits for different conditions.

Idle11 year ago

GPL-3.0+

ProteinMPNN

Protein & Drug Discovery

Deep learning-based protein sequence design (inverse folding) from backbone structures, achieving 52.4% sequence recovery vs 32.9% for Rosetta, core tool in modern protein design pipelines (Baker Lab, Science 2022)

Idle1.7K1 year ago

Jupyter Notebook

SciPipe

Workflow Managers

Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output

Idle1.1K1 year ago

LRMI Alignment Type Vocabulary

A concept scheme that defines the types of relationships between a learning resource and a node in an educational framework.

Idle261 year ago

HTML

zFPKM

ImmunoOncology

Perform the zFPKM transform on RNA-seq FPKM data. This algorithm is based on the publication by Hart et al., 2013 (Pubmed ID 24215113). Reference recommends using zFPKM > -3 to select expressed genes. Validated with encode open/closed chromosome data. Works well for gene level data using FPKM or TPM. Does not appear to calibrate well for transcript level data.

Idle91 year ago

Chinese Medical Dataset

Biology & Medicine

Comprehensive collection of Chinese medical datasets for AI research

Idle2801 year ago

PepSetTest

DifferentialExpression

Peptide Set Test (PepSetTest) is a peptide-centric strategy to infer differentially expressed proteins in LC-MS/MS proteomics data. This test detects coordinated changes in the expression of peptides originating from the same protein and compares these changes against the rest of the peptidome. Compared to traditional aggregation-based approaches, the peptide set test demonstrates improved statistical power, yet controlling the Type I error rate correctly in most cases. This test can be valuable for discovering novel biomarkers and prioritizing drug targets, especially when the direct application of statistical analysis to protein data fails to provide substantial insights.

Idle21 year ago

GPL-3.0+

ChIP-seq analysis notes from Tommy Tang

ChIP-Seq

Resources on ChIP-seq data which include papers, methods, links to software, and analysis.

Idle8501 year ago

smof

Sequence Processing

UNIX-style FASTA manipulation tools.

Idle171 year ago

QDNAseq

CopyNumberVariation

Quantitative DNA sequencing for chromosomal aberrations. The genome is divided into non-overlapping fixed-sized bins, number of sequence reads in each counted, adjusted with a simultaneous two-dimensional loess correction for sequence mappability and GC content, and filtered to remove spurious regions in the genome. Downstream steps of segmentation and calling are also implemented via packages DNAcopy and CGHcall, respectively.

Idle551 year ago

GPL

Semantic Web for Earth and Environment Technology Ontology

The Semantic Web for Earth and Environmental Terminology is a mature foundational ontology that contains over 6000 concepts organized in 200 ontologies represented in OWL. Top level concepts include Representation (math, space, science, time, data), Realm (Ocean, Land Surface, Terrestrial Hydroshere, Atmosphere, etc.), Phenomena (macro-scale ecological and physical), Processes (micro-scale physical, biological, chemical, and mathematical), Human Activities (Decision, Commerce, Jurisdiction, Environmental, Research).

Idle1401 year ago

Turtle

NOASSERTION

BioGPT

Domain-Specific Models

Biomedical text generation

Idle4.5K1 year ago

ccImpute

SingleCell

Dropout events make the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult. ccImpute is an imputation algorithm that uses cell similarity established by consensus clustering to impute the most probable dropout events in the scRNA-seq datasets. ccImpute demonstrated performance which exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities.

Idle21 year ago

Biocompute Object

Chart-to-Code & Reproducibility

BioCompute is shorthand for the IEEE 2791-2020 standard for Bioinformatics Analyses Generated by High-Throughput Sequencing (HTS) to facilitate communication. This pipeline documentation approach has been adopted by a few FDA centers. The goal is to ease the communication burdens between research centers, organizations, and industries. This web portal allows users to build a BioCompute Objects through the interface in a human and machine readable format.

Idle171 year ago

HTML

Chart-to-Text Datasets

Large-scale chart summarization datasets for training chart description capabilities

Idle1271 year ago

OpenEdge ABL

dcanr

NetworkInference

This package implements methods and an evaluation framework to infer differential co-expression/association networks. Various methods are implemented and can be evaluated using simulated datasets. Inference of differential co-expression networks can allow identification of networks that are altered between two conditions (e.g., health and disease).

Idle71 year ago

hdxmsqc

QualityControl

The hdxmsqc package enables us to analyse and visualise the quality of HDX-MS experiments. Either as a final quality check before downstream analysis and publication or as part of a interative procedure to determine the quality of the data. The package builds on the QFeatures and Spectra packages to integrate with other mass-spectrometry data.

Idle11 year ago

Other

Genie 2

Protein & Drug Discovery

Diffusion model for scalable protein structure design with multi-motif scaffolding capabilities, achieving state-of-the-art designability, diversity, and novelty through SE(3)-equivariant attention and massive data augmentation (AlQuraishi Lab, 2024)

Idle1921 year ago

Apache-2.0

gypsum

DataImport

Client for the gypsum REST API (https://gypsum.artifactdb.com), a cloud-based file store in the ArtifactDB ecosystem. This package provides functions for uploads, downloads, and various adminstrative and management tasks. Check out the documentation at https://github.com/ArtifactDB/gypsum-worker for more details.

Idle11 year ago

Structural variant callers

smoove

structural variant calling and genotyping with existing tools, but,smoothly.

Idle2641 year ago

Apache-2.0

orth

Idle31 year ago

padma