Find open-source science resources

The AOPO provides classes and relationships for the semantic representation of the Adverse Outcome Pathway framework.

Stale132 years ago

Rich Text Format

RNAmodR.ML

Software

RNAmodR.ML extend the functionality of the RNAmodR package and classical detection strategies towards detection through machine learning models. RNAmodR.ML provides classes, functions and an example workflow to establish a detection stratedy, which can be packaged.

Stale12 years ago

Minimum Information about a Biosynthetic Gene Cluster data repository

MIBiG (Minimum Information about a Biosynthetic Gene Cluster) is a data repository and associated data standard designed to describe biosynthetic gene clusters involved in the production of specialized metabolites. It also stores data on measured biological activities and links to other resources such as NCBI, NPAtlas, and ChEBI. MIBiG is used as a reference database, knowledgebase, and training dataset for machine learning.

Stale102 years ago

Genomics & Bioinformatics

GenePT

Generative pre-training for genomics

Stale3202 years ago

seqmagick

Sequence Processing

file format conversion in Biopython in a convenient way.

Stale1182 years ago

MIMIC III Database

MIMIC-III is a dataset comprising health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012

Stale1462 years ago

PLpgSQL

gscreend

Software

Package for the analysis of pooled genetic screens (e.g. CRISPR-KO). The analysis of such screens is based on the comparison of gRNA abundances before and after a cell proliferation phase. The gscreend packages takes gRNA counts as input and allows detection of genes whose knockout decreases or increases cell proliferation.

Stale122 years ago

Genomics & Bioinformatics

AlphaMissense

Google DeepMind's AlphaFold-derived classifier for proteome-wide missense variant effect prediction, providing pathogenicity scores for all ~71M possible human missense variants and classifying 89% with 90% precision; pre-computed predictions are integrated into Ensembl VEP and UCSC Genome Browser to support clinical variant interpretation (Science 2023)

Archived6332 years ago

DEWSeq

Sequencing

DEWSeq is a sliding window approach for the analysis of differentially enriched binding regions eCLIP or iCLIP next generation sequencing data.

Stale52 years ago

LGPL-3.0+

dinoR

NucleosomePositioning

dinoR tests for significant differences in NOMe-seq footprints between two conditions, using genomic regions of interest (ROI) centered around a landmark, for example a transcription factor (TF) motif. This package takes NOMe-seq data (GCH methylation/protection) in the form of a Ranged Summarized Experiment as input. dinoR can be used to group sequencing fragments into 3 or 5 categories representing characteristic footprints (TF bound, nculeosome bound, open chromatin), plot the percentage of fragments in each category in a heatmap, or averaged across different ROI groups, for example, containing a common TF motif. It is designed to compare footprints between two sample groups, using edgeR's quasi-likelihood methods on the total fragment counts per ROI, sample, and footprint category.

Stale02 years ago

Ontology of Core Data Mining Entities

OntoDM-core defines the most essential data mining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. (from abstract)

Stale12 years ago

M3Drop

RNASeq

This package fits a model to the pattern of dropouts in single-cell RNASeq data. This model is used as a null to identify significantly variable (i.e. differentially expressed) genes for use in downstream analysis, such as clustering cells. Also includes an method for calculating exact Pearson residuals in UMI-tagged data using a library-size aware negative binomial model.

Stale332 years ago

GPL-2.0+

tpSVG

Spatial

The goal of `tpSVG` is to detect and visualize spatial variation in the gene expression for spatially resolved transcriptomics data analysis. Specifically, `tpSVG` introduces a family of count-based models, with generalizable parametric assumptions such as Poisson distribution or negative binomial distribution. In addition, comparing to currently available count-based model for spatially resolved data analysis, the `tpSVG` models improves computational time, and hence greatly improves the applicability of count-based models in SRT data analysis.

Stale22 years ago

FindIT2

Software

This package implements functions to find influential TF and target based on different input type. It have five module: Multi-peak multi-gene annotaion(mmPeakAnno module), Calculate regulation potential(calcRP module), Find influential Target based on ChIP-Seq and RNA-Seq data(Find influential Target module), Find influential TF based on different input(Find influential TF module), Calculate peak-gene or peak-peak correlation(peakGeneCor module). And there are also some other useful function like integrate different source information, calculate jaccard similarity for your TF.

Stale62 years ago

alphapickle

AlphaPickle is a Python tool that converts AlphaFold and ColabFold output files into user-friendly CSV files and plots, enabling easy analysis and visualization of protein prediction data without requiring programming expertise. It processes .pkl, .json, and PDB files to extract and visualize metrics like pLDDT and PAE.

Stale332 years ago

BioMistral/BioMistral-7B

by BioMistral

Abstract:

Stale102.4K2 years ago

BioMistral/BioMistral-7B-GGUF

by BioMistral

Abstract:

Stale8752 years ago

knowledgator/SMILES2IUPAC-canonical-base

by knowledgator

SMILES2IUPAC-canonical-base was designed to accurately translate SMILES chemical names to IUPAC standards.

Stale5.1K2 years ago

Variant Prediction/Annotation

SIFT

Predicts whether an amino acid substitution affects protein function.

Stale5482 years ago

BASiCStan

ImmunoOncology

Provides an interface to infer the parameters of BASiCS using the variational inference (ADVI), Markov chain Monte Carlo (NUTS), and maximum a posteriori (BFGS) inference engines in the Stan programming language. BASiCS is a Bayesian hierarchical model that uses an adaptive Metropolis within Gibbs sampling scheme. Alternative inference methods provided by Stan may be preferable in some situations, for example for particularly large data or posterior distributions with difficult geometries.

Stale02 years ago

Generative Molecular Design

GuacaMol

A package for benchmarking of models for _de novo_ molecular design.

Stale5212 years ago

cellbaseR

Annotation

This R package makes use of the exhaustive RESTful Web service API that has been implemented for the Cellabase database. It enable researchers to query and obtain a wealth of biological information from a single database saving a lot of time. Another benefit is that researchers can easily make queries about different biological topics and link all this information together as all information is integrated.

Stale22 years ago

ESMFold

Protein & Drug Discovery

Protein structure prediction from ESM models

Archived4.1K2 years ago

Educational Sectors Vocabulary

This is a vocabulary for educational sectors as used in the OER World Map (https://oerworldmap.org)

Stale42 years ago

debrowser

Sequencing

Bioinformatics platform containing interactive plots and tables for differential gene and region expression studies. Allows visualizing expression data much more deeply in an interactive and faster way. By changing the parameters, users can easily discover different parts of the data that like never have been done before. Manually creating and looking these plots takes time. With DEBrowser users can prepare plots without writing any code. Differential expression, PCA and clustering analysis are made on site and the results are shown in various plots such as scatter, bar, box, volcano, ma plots and Heatmaps.

Stale612 years ago

Autonomous Research Systems (2023-2025 Breakthroughs)

FunSearch (DeepMind, Nature 2023)

First system to make novel, verifiable scientific discoveries by pairing LLMs with evolutionary search, solving open problems in combinatorics (cap set problem) and discovering faster matrix multiplication algorithms

Stale1.1K2 years ago

Cellosaurus

Stale142 years ago

CC-BY-4.0

demuxmix

SingleCell

A package for demultiplexing single-cell sequencing experiments of pooled cells labeled with barcode oligonucleotides. The package implements methods to fit regression mixture models for a probabilistic classification of cells, including multiplet detection. Demultiplexing error rates can be estimated, and methods for quality control are provided.

Stale52 years ago

songlab/tokenizer-dna-mlm

by songlab

Stale02 years ago

KIM German School Types

This SKOS vocabulary describes types of primary and secondary schools in Germany, such as Grundschule, Gymnasium, and Realschule. This does not include post-secondary education such as universities or hochschulen.

Stale12 years ago

CC0-1.0

rootstrap-org/Alzheimer-Classifier-Demo

by rootstrap-org

image-classification

### Model Description A machine learning model for waste classification

Stale02 years ago

rScudo

GeneExpression

SCUDO (Signature-based Clustering for Diagnostic Purposes) is a rank-based method for the analysis of gene expression profiles for diagnostic and classification purposes. It is based on the identification of sample-specific gene signatures composed of the most up- and down-regulated genes for that sample. Starting from gene expression data, functions in this package identify sample-specific gene signatures and use them to build a graph of samples. In this graph samples are joined by edges if they have a similar expression profile, according to a pre-computed similarity matrix. The similarity between the expression profiles of two samples is computed using a method similar to GSEA. The graph of samples can then be used to perform community clustering or to perform supervised classification of samples in a testing set.

Stale42 years ago

flowGraph

FlowCytometry

Identifies maximal differential cell populations in flow cytometry data taking into account dependencies between cell populations; flowGraph calculates and plots SpecEnr abundance scores given cell population cell counts.

Stale12 years ago

tib.mdo

Stale322 years ago

CSS

awst

Normalization

We propose an Asymmetric Within-Sample Transformation (AWST) to regularize RNA-seq read counts and reduce the effect of noise on the classification of samples. AWST comprises two main steps: standardization and smoothing. These steps transform gene expression data to reduce the noise of the lowly expressed features, which suffer from background effects and low signal-to-noise ratio, and the influence of the highly expressed features, which may be the result of amplification bias and other experimental artifacts.

Stale32 years ago

Pangu-Weather

Climate Modeling

Huawei's 3D high-resolution global weather forecast model at 0.25° resolution, first AI method to comprehensively outperform traditional NWP across all variables and lead times, integrated into ECMWF operational forecasts (Nature 2023)

Stale1.4K2 years ago

BiocFHIR

Infrastructure

FHIR R4 bundles in JSON format are derived from https://synthea.mitre.org/downloads. Transformation inspired by a kaggle notebook published by Dr Alexander Scarlat, https://www.kaggle.com/code/drscarlat/fhir-starter-parse-healthcare-bundles-into-tables. This is a very limited illustration of some basic parsing and reorganization processes. Additional tooling will be required to move beyond the Synthea data illustrations.

Stale42 years ago

targetdiff

Protein & Drug Discovery

3D Equivariant Diffusion for Target-Aware Molecule Generation (ICLR2023)

Stale3412 years ago

Psi4NumPy

Simulations

Psi4-based reference implementations and Jupyter notebook-based tutorials for foundational quantum chemistry methods.

Stale3942 years ago

BSD-3-Clause

TachyHealth/Thealth_Mixtral-8x7B

by TachyHealth

Stale02 years ago

magpie

Epitranscriptomics

This package aims to perform power analysis for the MeRIP-seq study. It calculates FDR, FDC, power, and precision under various study design parameters, including but not limited to sample size, sequencing depth, and testing method. It can also output results into .xlsx files or produce corresponding figures of choice.

Stale02 years ago

idsa

Archived762 years ago

Shell

Genomics & Bioinformatics

scBERT

Single-cell BERT for gene expression

Stale3572 years ago

scPipe

ImmunoOncology

A preprocessing pipeline for single cell RNA-seq/ATAC-seq data that starts from the fastq files and produces a feature count matrix with associated quality control information. It can process fastq data generated by CEL-seq, MARS-seq, Drop-seq, Chromium 10x and SMART-seq protocols.

Stale672 years ago

GPL-2.0+

WeatherBench

Climate Modeling

Weather prediction benchmark

Stale8282 years ago

CARNIVAL

Transcriptomics

An upgraded causal reasoning tool from Melas et al in R with updated assignments of TFs' weights from PROGENy scores. Optimization parameters can be freely adjusted and multiple solutions can be obtained and aggregated.

Stale622 years ago

VplotR

NucleosomePositioning

The pattern of digestion and protection from DNA nucleases such as DNAse I, micrococcal nuclease, and Tn5 transposase can be used to infer the location of associated proteins. This package contains useful functions to analyze patterns of paired-end sequencing fragment density. VplotR facilitates the generation of V-plots and footprint profiles over single or aggregated genomic loci of interest.

Stale112 years ago

GPL-3.0+

OpenChem

Machine Learning

OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend.

Stale7452 years ago

chimeraviz

Infrastructure

chimeraviz manages data from fusion gene finders and provides useful visualization tools.

Stale392 years ago

AmelieSchreiber/esm_interact

by AmelieSchreiber

fill-mask

This model was finetuned on concatenated pairs of interacting proteins in much the same way as PepMLM. It is meant to generate interaction partners for proteins using the masked language modeling capabilities of ESM-2. The model is not well tested, so use with caution.

Stale42 years ago