Find open-source science resources

A Python program to compute quasi-harmonic thermochemical data from Gaussian frequency calculations.

molmass

Calculate mass, elemental composition, and mass distribution spectrum of a molecule given by its chemical formula, relative element weights, or sequence.

pymatviz

A toolkit for visualizations in materials informatics.

spectrochempy

A library for processing, analyzing and modeling spectroscopic data.

Matminer

Library of descriptors to aid in the data-mining of materials properties, created by the Lawrence Berkeley National Laboratory.

MAML

Aims to provide useful high-level interfaces that make ML for materials science as easy as possible.

MORFEUS

Library for fast calculations of **mo**lecula**r** **fe**at**u**re**s** from 3D structures for machine learning with a focus on steric descriptors.

ROBERT

Ensemble of automated machine learning protocols that can be run sequentially through a single command line. The program works for regression and classification problems.

selfies

Generative Molecular Design

Self-Referencing Embedded Strings (SELFIES): A 100% robust molecular string representation.

moses

A benchmarking platform for molecular generation models.

alchemlyb

Makes alchemical free energy calculations easier by leveraging the full power and flexibility of the PyData stack.

ccinput

A tool and library for creating quantum chemistry input files.

emmet

A package to 'build' collections of materials properties from the output of computational materials calculations.

fromage

The "FRamewOrk for Molecular AGgregate Excitations" enables localised QM/QM' excited state calculations in a solid state environment.

HTMD

High-Throughput Molecular Dynamics: Programming Environment for Molecular Discovery.

OPEM

Open source PEM (Proton Exchange Membrane) fuel cell simulation tool.

PLAMS

Python Library for Automating Molecular Simulation: input preparation, job execution, file management, output processing and building data workflows.

ReNView

Usage-Instructions) - A program to visualize reaction networks.

yank

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.

acpype

Force Fields

Convert AMBER forcefields from ANTECHAMBER to GROMACS format.

global-chem

Force Fields

A Chemical Knowledge Graph and Toolkit, writting in IUPAC/SMILES/SMARTS, for common small molecules from diverse communities to aid users in selecting compounds for forcefield parametirization.

matbench-discovery

Force Fields

A benchmark for ML-guided high-throughput materials discovery.

nglview

Molecular Visualization

A [Jupyter](https://jupyter.org/) widget to interactively view molecular structures and trajectories.

ChemInformant

Database Wrappers

High-throughput PubChem client for batch queries with caching, validation, rate-limit-aware retries, and a simple CLI.

SciCompforChemists

Learning Resources

Scientific Computing for Chemists with Python is a Jupyter book teaching basic python in chemistry skills, including relevant libraries, and applies them to solving chemical problems.

Dask

Parallel computing

Parallel computing with task scheduling.

khmer

Bioinformatics

k-mer counting, filtering, and graph traversal.

NiBabel

Neuroimaging

Neuro-imaging file formats.

Expyriment

Neuroimaging

Behavioral and neuroimaging experiments.

Klusta

Neuroscience

Spike detection and clustering-based spike sorting.

Useful libraries for data science in Python

Lists of libraries

by Sebastian Raschka.

Lectures on scientific computing with Python

Tutorials

Robert Johansson.

A gallery of interesting Jupyter Notebooks

Tutorials

List of Python Data Science Tutorials

Tutorials

Ujjwal Karn.

a4

Umbrella package is available for the entire Automated Affymetrix Array Analysis suite of package.

a4Base

Base utility functions are available for the Automated Affymetrix Array Analysis set of packages.

a4Classif

Functionalities for classification of Affymetrix microarray data, integrating within the Automated Affymetrix Array Analysis set of packages.

a4Core

Utility functions for the Automated Affymetrix Array Analysis set of packages.

a4Preproc

Utility functions to pre-process data for the Automated Affymetrix Array Analysis set of packages.

a4Reporting

Utility functions to facilitate the reporting of the Automated Affymetrix Array Analysis Reporting set of packages.

ABarray

Automated pipline to perform gene expression analysis for Applied Biosystems Genome Survey Microarray (AB1700) data format. Functions include data preprocessing, filtering, control probe analysis, statistical analysis in one single function. A GUI interface is also provided. The raw data, processed data, graphics output and statistical results are organized into folders according to the analysis settings used.

GPL

ABSSeq

DifferentialExpression

Inferring differential expression genes by absolute counts difference between two groups, utilizing Negative binomial distribution and moderating fold-change according to heterogeneity of dispersion across expression level.

GPL-3.0+

acde

DifferentialExpression

This package provides a multivariate inferential analysis method for detecting differentially expressed genes in gene expression data. It uses artificial components, close to the data's principal components but with an exact interpretation in terms of differential genetic expression, to identify differentially expressed genes while controlling the false discovery rate (FDR). The methods on this package are described in the vignette or in the article 'Multivariate Method for Inferential Identification of Differentially Expressed Genes in Gene Expression Experiments' by J. P. Acosta, L. Lopez-Kleine and S. Restrepo (2015, pending publication).

aCGH

CopyNumberVariation

Functions for reading aCGH data from image analysis output files and clone information files, creation of aCGH S3 objects for storing these data. Basic methods for accessing/replacing, subsetting, printing and plotting aCGH objects.

GPL-2.0

ADAM

GeneSetEnrichment

ADAM is a GSEA R package created to group a set of genes from comparative samples (control versus experiment) belonging to different species according to their respective functions (Gene Ontology and KEGG pathways as default) and show their significance by calculating p-values referring togene diversity and activity. Each group of genes is called GFAG (Group of Functionally Associated Genes).

GPL-2.0+

ADAMgui

GeneSetEnrichment

ADAMgui is a Graphical User Interface for the ADAM package. The ADAMgui package provides 2 shiny-based applications that allows the user to study the output of the ADAM package files through different plots. It's possible, for example, to choose a specific GFAG and observe the gene expression behavior with the plots created with the GFAGtargetUi function. Features such as differential expression and foldchange can be easily seen with aid of the plots made with GFAGpathUi function.

GPL-2.0+

ADAPT

DifferentialExpression

ADAPT carries out differential abundance analysis for microbiome metagenomics data in phyloseq format. It has two innovations. One is to treat zero counts as left censored and use Tobit models for log count ratios. The other is an innovative way to find non-differentially abundant taxa as reference, then use the reference taxa to find the differentially abundant ones.

MIT

adductomicsR

MassSpectrometry

Processes MS2 data to identify potentially adducted peptides from spectra that has been corrected for mass drift and retention time drift and quantifies MS1 level mass spectral peaks.

Artistic-2.0

ADImpute

GeneExpression

Single-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values (‘dropout imputation’). Most existing dropout imputation methods are limited in the sense that they exclusively use the scRNA-seq dataset at hand and do not exploit external gene-gene relationship information. Here we propose two novel methods: a gene regulatory network-based approach using gene-gene relationships learnt from external data and a baseline approach corresponding to a sample-wide average. ADImpute can implement these novel methods and also combine them with existing imputation methods (currently supported: DrImpute, SAVER). ADImpute can learn the best performing method per gene and combine the results from different methods into an ensemble.

adSplit