Find open-source science resources

This package is for designing Crispr/Cas9 and Prime Editing experiments. It contains functions to (1) define and transform genomic targets, (2) find spacers (4) count offtarget (mis)matches, and (5) compute Doench2016/2014 targeting efficiency. Care has been taken for multicrispr to scale well towards large target sets, enabling the design of large Crispr/Cas9 libraries.

Idle01 year ago

GPL-2.0

stJoincount

Transcriptomics

stJoincount facilitates the application of join count analysis to spatial transcriptomic data generated from the 10x Genomics Visium platform. This tool first converts a labeled spatial tissue map into a raster object, in which each spatial feature is represented by a pixel coded by label assignment. This process includes automatic calculation of optimal raster resolution and extent for the sample. A neighbors list is then created from the rasterized sample, in which adjacent and diagonal neighbors for each pixel are identified. After adding binary spatial weights to the neighbors list, a multi-categorical join count analysis is performed to tabulate "joins" between all possible combinations of label pairs. The function returns the observed join counts, the expected count under conditions of spatial randomness, and the variance calculated under non-free sampling. The z-score is then calculated as the difference between observed and expected counts, divided by the square root of the variance.

Idle51 year ago

findIPs

GeneExpression

Feature rankings can be distorted by a single case in the context of high-dimensional data. The cases exerts abnormal influence on feature rankings are called influential points (IPs). The package aims at detecting IPs based on case deletion and quantifies their effects by measuring the rank changes (DOI:10.48550/arXiv.2303.10516). The package applies a novel rank comparing measure using the adaptive weights that stress the top-ranked important features and adjust the weights to ranking properties.

Idle01 year ago

osmapi/Nidum-Gemma-2B-Uncensored-GGUF

by osmapi

text-generation

Welcome to the repository for Nidum-Limitless-Gemma-2B-GGUF, an advanced language model that provides unrestricted and versatile responses across a wide range of topics. This version is designed for maximum flexibility, allowing you to run it on both CPU and GPU.

Idle2.4K1 year ago

DMRScan

This package detects significant differentially methylated regions (for both qualitative and quantitative traits), using a scan statistic with underlying Poisson heuristics. The scan statistic will depend on a sequence of window sizes (# of CpGs within each window) and on a threshold for each window size. This threshold can be calculated by three different means: i) analytically using Siegmund et.al (2012) solution (preferred), ii) an important sampling as suggested by Zhang (2008), and a iii) full MCMC modeling of the data, choosing between a number of different options for modeling the dependency between each CpG.

Idle21 year ago

The data cube vocabulary

This vocabulary allows multi-dimensional data, such as statistics, to be published in RDF. It is based on the core information model from SDMX (and thus also DDI).

Idle131 year ago

HTML

The Artificial Intelligence Ontology

This ontology models classes and relationships describing deep learning networks, their component layers and activation functions, as well as potential biases.

Idle491 year ago

Jupyter Notebook

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-HIV-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-HIV-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle131 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-FREESOLV-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-FREESOLV-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle271 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-QM7-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-QM7-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle161 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-BBBP-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-BBBP-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle951 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-ESOL-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-ESOL-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle131 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-CLINTOX-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-CLINTOX-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle161 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOXCAST-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOXCAST-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle81 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-LIPOPHILICITY-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-LIPOPHILICITY-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image,…

Idle3.3K1 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOX21-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-TOX21-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle121 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-SIDER-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-SIDER-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle131 year ago

ibm-research/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-MUV-101

by ibm-research

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-MUV-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Idle121 year ago

PULSE-ECG/PULSE-7B

by PULSE-ECG

image-text-to-text

Dataset for paper "Teach Multimodal LLMs to Comprehend Electrocardiographic Images".

Idle1.1K1 year ago

ChemBERTa

Protein & Drug Discovery

Chemical language model

Idle4961 year ago

Jupyter Notebook

crisprBowtie

CRISPR

Provides a user-friendly interface to map on-targets and off-targets of CRISPR gRNA spacer sequences using bowtie. The alignment is fast, and can be performed using either commonly-used or custom CRISPR nucleases. The alignment can work with any reference or custom genomes. Both DNA- and RNA-targeting nucleases are supported.

Idle31 year ago

oncoscanR

CopyNumberVariation

The software uses the copy number segments from a text file and identifies all chromosome arms that are globally altered and computes various genome-wide scores. The following HRD scores (characteristic of BRCA-mutated cancers) are included: LST, HR-LOH, nLST and gLOH. the package is tailored for the ThermoFisher Oncoscan assay analyzed with their Chromosome Alteration Suite (ChAS) but can be adapted to any input.

Idle31 year ago

NOASSERTION

(Poly)merase

Package suites

A Go library and command line utility for engineering organisms.

Idle7291 year ago

iSEEhex

This package provides panels summarising data points in hexagonal bins for `iSEE`. It is part of `iSEEu`, the iSEE universe of panels that extend the `iSEE` package.

Idle01 year ago

Artistic-2.0

EBImage

Visualization

EBImage provides general purpose functionality for image processing and analysis. In the context of (high-throughput) microscopy-based cellular assays, EBImage offers tools to segment cells and extract quantitative cellular descriptors. This allows the automation of such tasks using the R programming language and facilitates the use of other tools in the R environment for signal processing, statistical modeling, machine learning and visualization with image data.

Idle771 year ago

LGPL

Ontology for Biomarkers of Clinical Interest

The Ontology for Biomarkers of Clinical Interest (OBCI) formally defines biomarkers for diseases, phenotypes, and effects.

Idle11 year ago

CC-BY-4.0

coMethDMR

DNAMethylation

coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.

Idle71 year ago

Physics-Informed Neural Networks

SciANN

Keras-based scientific neural networks

Idle11 year ago

Sharkipedia Species

Sharkipedia is an open source research initiative to make all published biological traits and population trends on sharks, rays, and chimaeras accessible to everyone.

Idle41 year ago

Ruby

mradermacher/Palmyra-Med-70B-GGUF

by mradermacher

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Idle3811 year ago

Rcpi

A molecular informatics toolkit with an integration of bioinformatics and chemoinformatics tools for drug discovery.

Idle391 year ago

Artistic-2.0

spatialSimGP

Spatial

This packages simulates spatial transcriptomics data with the mean- variance relationship using a Gaussian Process model per gene.

Idle01 year ago

chihaya

DataImport

Saves the delayed operations of a DelayedArray to a HDF5 file. This enables efficient recovery of the DelayedArray's contents in other languages and analysis frameworks.

Idle01 year ago

zitools

zitools allows for zero inflated count data analysis by either using down-weighting of excess zeros or by replacing an appropriate proportion of excess zeros with NA. Through overloading frequently used statistical functions (such as mean, median, standard deviation), plotting functions (such as boxplots or heatmap) or differential abundance tests, it allows a wide range of downstream analyses for zero-inflated data in a less biased manner. This becomes applicable in the context of microbiome analyses, where the data is often overdispersed and zero-inflated, therefore making data analysis extremly challenging.

Idle01 year ago

BSD-3-Clause

SeqVarTools

SNP

An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.

Idle31 year ago

TnT

Infrastructure

Chart-to-Code & Reproducibility

A R interface to the TnT javascript library (https://github.com/ tntvis) to provide interactive and flexible visualization of track-based genomic data.

Idle151 year ago

AGPL-3.0

ChartAssistant / ChartAst (ACL 2024)

Universal chart comprehension and reasoning model

Idle1351 year ago

NOASSERTION

RnaChipIntegrator

Computational biology

Utility that performs integrated analyses of 'gene' data (a set of genes or other genomic features) with 'peak' data (a set of regions, for example ChIP peaks) to identify the genes nearest to each peak, and vice versa.

Idle51 year ago

Artistic-2.0

Academic Event Ontology

The academic event ontology, currently still in development and thus unstable, is an OBO compliant reference ontology for describing academic events such as conferences, workshops or seminars and their series. It is being developed as part of the [ConfIDent project](https://projects.tib.eu/confident/) to allow RDF representations of the academic events and series stored and curated in the [ConfIDent platform](https://www.confident-conference.org/index.php/main_page).

Idle141 year ago

Makefile

CC-BY-4.0

phantasusLite

GeneExpression

PhantasusLite – a lightweight package with helper functions of general interest extracted from phantasus package. In parituclar it simplifies working with public RNA-seq datasets from GEO by providing access to the remote HSDS repository with the precomputed gene counts from ARCHS4 and DEE2 projects.

Idle111 year ago

multiMiR

miRNAData

A collection of microRNAs/targets from external resources, including validated microRNA-target databases (miRecords, miRTarBase and TarBase), predicted microRNA-target databases (DIANA-microT, ElMMo, MicroCosm, miRanda, miRDB, PicTar, PITA and TargetScan) and microRNA-disease/drug databases (miR2Disease, Pharmaco-miR VerSe and PhenomiR).

Idle251 year ago

bcbio-nextgen

Pipelines

Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction.

Idle1K1 year ago

HERON

Microarray

HERON is a software package for analyzing peptide binding array data. In addition to identifying significant binding probes, HERON also provides functions for finding epitopes (string of consecutive peptides within a protein). HERON also calculates significance on the probe, epitope, and protein level by employing meta p-value methods. HERON is designed for obtaining calls on the sample level and calculates fractions of hits for different conditions.

Idle11 year ago

GPL-3.0+

ProteinMPNN

Protein & Drug Discovery

Deep learning-based protein sequence design (inverse folding) from backbone structures, achieving 52.4% sequence recovery vs 32.9% for Rosetta, core tool in modern protein design pipelines (Baker Lab, Science 2022)

Idle1.7K1 year ago

Jupyter Notebook

SciPipe

Workflow Managers

Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output

Idle1.1K1 year ago

gbyuvd/synthaccess-chemselfies

by gbyuvd

text-classification

ChemFIE-SA is a BERT-like sequence classifier for predicting synthesis accessibility given a SELFIES string of a compound, fine-tuned from gbyuvd/chemselfies-base-bertmlm on DeepSA's expanded dataset from Wang et al. 2023.

Idle71 year ago

gbyuvd/drugtargetpred-chemselfies

by gbyuvd

text-classification

This model is a BERT-like sequence classifier for 221 human protein drug targets, fine-tuned from gbyuvd/chemselfies-base-bertmlm on a dataset derived ChemBL34 (Zdrazil et al. 2023). It predicts potential drug targets using chemical structures represented as SELFIES (Self-Referencing Embedded…

Idle81 year ago

LRMI Alignment Type Vocabulary

A concept scheme that defines the types of relationships between a learning resource and a node in an educational framework.

Idle261 year ago

HTML

zFPKM

ImmunoOncology

Perform the zFPKM transform on RNA-seq FPKM data. This algorithm is based on the publication by Hart et al., 2013 (Pubmed ID 24215113). Reference recommends using zFPKM > -3 to select expressed genes. Validated with encode open/closed chromosome data. Works well for gene level data using FPKM or TPM. Does not appear to calibrate well for transcript level data.

Idle91 year ago

Chinese Medical Dataset

Biology & Medicine