Find open-source science resources

A package to 'build' collections of materials properties from the output of computational materials calculations.

fromage

The "FRamewOrk for Molecular AGgregate Excitations" enables localised QM/QM' excited state calculations in a solid state environment.

HTMD

High-Throughput Molecular Dynamics: Programming Environment for Molecular Discovery.

OPEM

Open source PEM (Proton Exchange Membrane) fuel cell simulation tool.

PLAMS

Python Library for Automating Molecular Simulation: input preparation, job execution, file management, output processing and building data workflows.

ReNView

Usage-Instructions) - A program to visualize reaction networks.

yank

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.

acpype

Force Fields

Convert AMBER forcefields from ANTECHAMBER to GROMACS format.

global-chem

Force Fields

A Chemical Knowledge Graph and Toolkit, writting in IUPAC/SMILES/SMARTS, for common small molecules from diverse communities to aid users in selecting compounds for forcefield parametirization.

matbench-discovery

Force Fields

A benchmark for ML-guided high-throughput materials discovery.

nglview

Molecular Visualization

A [Jupyter](https://jupyter.org/) widget to interactively view molecular structures and trajectories.

ChemInformant

Database Wrappers

High-throughput PubChem client for batch queries with caching, validation, rate-limit-aware retries, and a simple CLI.

SciCompforChemists

Learning Resources

Scientific Computing for Chemists with Python is a Jupyter book teaching basic python in chemistry skills, including relevant libraries, and applies them to solving chemical problems.

Dask

Parallel computing

Parallel computing with task scheduling.

khmer

Bioinformatics

k-mer counting, filtering, and graph traversal.

NiBabel

Neuroimaging

Neuro-imaging file formats.

Expyriment

Neuroimaging

Behavioral and neuroimaging experiments.

Klusta

Neuroscience

Spike detection and clustering-based spike sorting.

Useful libraries for data science in Python

Lists of libraries

by Sebastian Raschka.

Python for Scientific Audio

Lists of libraries

by Fabian-Robert Stöter.

Lectures on scientific computing with Python

Tutorials

Robert Johansson.

A gallery of interesting Jupyter Notebooks

Tutorials

List of Python Data Science Tutorials

Tutorials

Ujjwal Karn.

a4

Umbrella package is available for the entire Automated Affymetrix Array Analysis suite of package.

a4Base

Base utility functions are available for the Automated Affymetrix Array Analysis set of packages.

a4Classif

Functionalities for classification of Affymetrix microarray data, integrating within the Automated Affymetrix Array Analysis set of packages.

a4Core

Utility functions for the Automated Affymetrix Array Analysis set of packages.

a4Preproc

Utility functions to pre-process data for the Automated Affymetrix Array Analysis set of packages.

a4Reporting

Utility functions to facilitate the reporting of the Automated Affymetrix Array Analysis Reporting set of packages.

ABarray

Automated pipline to perform gene expression analysis for Applied Biosystems Genome Survey Microarray (AB1700) data format. Functions include data preprocessing, filtering, control probe analysis, statistical analysis in one single function. A GUI interface is also provided. The raw data, processed data, graphics output and statistical results are organized into folders according to the analysis settings used.

GPL

ABSSeq

DifferentialExpression

Inferring differential expression genes by absolute counts difference between two groups, utilizing Negative binomial distribution and moderating fold-change according to heterogeneity of dispersion across expression level.

GPL-3.0+

acde

DifferentialExpression

This package provides a multivariate inferential analysis method for detecting differentially expressed genes in gene expression data. It uses artificial components, close to the data's principal components but with an exact interpretation in terms of differential genetic expression, to identify differentially expressed genes while controlling the false discovery rate (FDR). The methods on this package are described in the vignette or in the article 'Multivariate Method for Inferential Identification of Differentially Expressed Genes in Gene Expression Experiments' by J. P. Acosta, L. Lopez-Kleine and S. Restrepo (2015, pending publication).

aCGH

CopyNumberVariation

Functions for reading aCGH data from image analysis output files and clone information files, creation of aCGH S3 objects for storing these data. Basic methods for accessing/replacing, subsetting, printing and plotting aCGH objects.

GPL-2.0

ADaCGH2

Analysis and plotting of array CGH data. Allows usage of Circular Binary Segementation, wavelet-based smoothing (both as in Liu et al., and HaarSeg as in Ben-Yaacov and Eldar), HMM, GLAD, CGHseg. Most computations are parallelized (either via forking or with clusters, including MPI and sockets clusters) and use ff for storing data.

GPL-3.0+

ADAM

GeneSetEnrichment

ADAM is a GSEA R package created to group a set of genes from comparative samples (control versus experiment) belonging to different species according to their respective functions (Gene Ontology and KEGG pathways as default) and show their significance by calculating p-values referring togene diversity and activity. Each group of genes is called GFAG (Group of Functionally Associated Genes).

ADAMgui

GeneSetEnrichment

ADAMgui is a Graphical User Interface for the ADAM package. The ADAMgui package provides 2 shiny-based applications that allows the user to study the output of the ADAM package files through different plots. It's possible, for example, to choose a specific GFAG and observe the gene expression behavior with the plots created with the GFAGtargetUi function. Features such as differential expression and foldchange can be easily seen with aid of the plots made with GFAGpathUi function.

ADAPT

DifferentialExpression

ADAPT carries out differential abundance analysis for microbiome metagenomics data in phyloseq format. It has two innovations. One is to treat zero counts as left censored and use Tobit models for log count ratios. The other is an innovative way to find non-differentially abundant taxa as reference, then use the reference taxa to find the differentially abundant ones.

MIT

adductomicsR

MassSpectrometry

Processes MS2 data to identify potentially adducted peptides from spectra that has been corrected for mass drift and retention time drift and quantifies MS1 level mass spectral peaks.

Artistic-2.0

ADImpute

GeneExpression

Single-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values (‘dropout imputation’). Most existing dropout imputation methods are limited in the sense that they exclusively use the scRNA-seq dataset at hand and do not exploit external gene-gene relationship information. Here we propose two novel methods: a gene regulatory network-based approach using gene-gene relationships learnt from external data and a baseline approach corresponding to a sample-wide average. ADImpute can implement these novel methods and also combine them with existing imputation methods (currently supported: DrImpute, SAVER). ADImpute can learn the best performing method per gene and combine the results from different methods into an ensemble.

adSplit

This package implements clustering of microarray gene expression profiles according to functional annotations. For each term genes are annotated to, splits into two subclasses are computed and a significance of the supporting gene set is determined.

adverSCarial

Software

adverSCarial is an R Package designed for generating and analyzing the vulnerability of scRNA-seq classifiers to adversarial attacks. The package is versatile and provides a format for integrating any type of classifier. It offers functions for studying and generating two types of attacks, single gene attack and max change attack. The single-gene attack involves making a small modification to the input to alter the classification. The max-change attack involves making a large modification to the input without changing its classification. The CGD attack is based on an estimated gradient descent. against adversarial attacks. The package provides a comprehensive solution for evaluating the robustness of scRNA-seq classifiers against adversarial attacks.

MIT

Aerith

Proteomics

Visualisation of peptide isotopic peaks and SIP peptide spectra match (PSM). Filtration of high quality PSM. Accurate isotopic abundance calculation of peptide and metabolites. Visualisation of SIP proteomics results.

AffiXcan

GeneExpression

Impute a GReX (Genetically Regulated Expression) for a set of genes in a sample of individuals, using a method based on the Total Binding Affinity (TBA). Statistical models to impute GReX can be trained with a training dataset where the real total expression values are known.

affxparser

Infrastructure

Package for parsing Affymetrix files (CDF, CEL, CHP, BPMAP, BAR). It provides methods for fast and memory efficient parsing of Affymetrix files using the Affymetrix' Fusion SDK. Both ASCII- and binary-based files are supported. Currently, there are methods for reading chip definition file (CDF) and a cell intensity file (CEL). These files can be read either in full or in part. For example, probe signals from a few probesets can be extracted very quickly from a set of CEL files into a convenient list structure.

LGPL-2.0+

affy

The package contains functions for exploratory oligonucleotide array analysis. The dependence on tkWidgets only concerns few convenience functions. 'affy' is fully functional without it.

LGPL-2.0+

affycomp

OneChannel

The package contains functions that can be used to compare expression measures for Affymetrix Oligonucleotide Arrays.

affyContam

Infrastructure

structured corruption of cel file data to demonstrate QA effectiveness

Artistic-2.0

affycoretools

ReportWriting

Various wrapper functions that have been written to streamline the more common analyses that a core Biostatistician might see.

Artistic-2.0

affyILM

affyILM is a preprocessing tool which estimates gene expression levels for Affymetrix Gene Chips. Input from physical chemistry is employed to first background subtract intensities before calculating concentrations on behalf of the Langmuir model.

affyio