Find open-source science resources

Resources on ChIP-seq data which include papers, methods, links to software, and analysis.

Idle8501 year ago

smof

Sequence Processing

UNIX-style FASTA manipulation tools.

Idle171 year ago

BioGPT

Domain-Specific Models

Biomedical text generation

Idle4.5K1 year ago

Genie 2

Diffusion model for scalable protein structure design with multi-motif scaffolding capabilities, achieving state-of-the-art designability, diversity, and novelty through SE(3)-equivariant attention and massive data augmentation (AlQuraishi Lab, 2024)

Idle1921 year ago

Data Analysis & Visualization

AutoViz

Automated data visualization with minimal code

Stale1.9K2 years ago

Graphormer

General-purpose deep learning backbone for molecular modeling

Stale2.5K2 years ago

Chroma

Generative model for programmable protein design using diffusion modeling, equivariant graph neural networks, and conditional random fields to efficiently sample diverse all-atom structures; supports conditional generation via composable conditioners for substructure, symmetry, shape, and neural-network predictions; validated crystallographically (Generate Biomedicines, Nature 2023)

Stale8192 years ago

Beaker

Web APIs

[RDKit](http://www.rdkit.org/) and [OSRA](https://cactus.nci.nih.gov/osra/) in the [Bottle](http://bottlepy.org/docs/dev/) on [Tornado](http://www.tornadoweb.org/en/stable/).

Archived502 years ago

seqmagick

Sequence Processing

file format conversion in Biopython in a convenient way.

Stale1182 years ago

Genomics & Bioinformatics

AlphaMissense

Google DeepMind's AlphaFold-derived classifier for proteome-wide missense variant effect prediction, providing pathogenicity scores for all ~71M possible human missense variants and classifying 89% with 90% precision; pre-computed predictions are integrated into Ensembl VEP and UCSC Genome Browser to support clinical variant interpretation (Science 2023)

Archived6332 years ago

alphapickle

AlphaPickle is a Python tool that converts AlphaFold and ColabFold output files into user-friendly CSV files and plots, enabling easy analysis and visualization of protein prediction data without requiring programming expertise. It processes .pkl, .json, and PDB files to extract and visualize metrics like pLDDT and PAE.

Stale332 years ago

Generative Molecular Design

GuacaMol

A package for benchmarking of models for _de novo_ molecular design.

Stale5212 years ago

ESMFold

Protein structure prediction from ESM models

Archived4.1K2 years ago

Pangu-Weather

Climate Modeling

Huawei's 3D high-resolution global weather forecast model at 0.25° resolution, first AI method to comprehensively outperform traditional NWP across all variables and lead times, integrated into ECMWF operational forecasts (Nature 2023)

Stale1.4K2 years ago

targetdiff

3D Equivariant Diffusion for Target-Aware Molecule Generation (ICLR2023)

Stale3412 years ago

Genomics & Bioinformatics

scBERT

Single-cell BERT for gene expression

Stale3572 years ago

OpenChem

OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend.

Stale7452 years ago

DGL-LifeSci

DGL-LifeSci is a [DGL](https://www.dgl.ai/)-based package for various applications in life science with graph neural network.

Stale8032 years ago

ClimaX

Climate Modeling

First foundation model for weather and climate by Microsoft, Vision Transformer-based architecture trained on heterogeneous datasets (ICML 2023)

Stale6982 years ago

pyVCF

Tools

A VCF Parser for Python.

Stale4192 years ago

Computational Pathology & Digital Pathology

PLIP (Nature Medicine 2023)

First vision-and-language foundation model for pathology AI, fine-tuned from CLIP on 249K image-caption pairs, enabling open-ended visual-semantic search and zero-shot diagnosis across histopathology (Pathology Foundation, 376+ stars)

Stale3762 years ago

easy_qsub

Command Line Utilities

Easily submitting PBS jobs with script template. Multiple input files supported.

Stale293 years ago

chainer-chemistry

A deep learning framework (based on Chainer) with applications in Biology and Chemistry.

Stale7003 years ago

Generative Molecular Design

GraphINVENT

A platform for graph-based molecular generation using graph neural networks.

Archived3803 years ago

atom3d

Enables machine learning on three-dimensional molecular structure.

Stale3193 years ago

NuclearPhaser

Genomics

NuclearPhaser is a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs. This is an overview of the phasing pipeline for dikaryons.

Stale133 years ago

MoleOOD

a robust molecular representation learning framework against distribution shifts.

Stale613 years ago

oddt

Simulations

Open Drug Discovery Toolkit, a modular and comprehensive toolkit for use in cheminformatics, molecular modeling etc.

Stale4643 years ago

BSD-3-Clause

CGRtools

General Purpose

Toolkit for processing molecules, reactions and condensed graphs of reactions. Can be used for chemical standardization, MCS search, tautomers generation with backward compatibility to RDKit and NetworkX.

Stale513 years ago

LGPL-3.0

GGD

Downloading

Go Get Data; A command line interface for obtaining genomic data.

Stale423 years ago

Cookiecutter Bioinformatics

Bioinformatics

A cookiecutter template for bioinformatics projects, with a focus on building bioinformatics workflows that can run on the MPI-IE cluster according to FAIR principles.

Stale143 years ago

hgraph2graph

General Chemistry

Hierarchical Generation of Molecular Graphs using Structural Motifs.

Stale4383 years ago

Neural Operators & Model Discovery

DeepONet

Learning nonlinear operators

Stale8193 years ago

Molecular Transformers

Chemical Synthesis

AI for chemical reaction prediction and synthesis planning

Stale4244 years ago

Ruffus

Workflow Managers

Computation Pipeline library for python widely used in science and bioinformatics.

Stale1754 years ago

Genome Browsers / Gene Diagrams

Squiggle

Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations.

Archived424 years ago

nanosv

Structural genomics

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.

Stale926 years ago

AfterQC

Sequence Processing

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.

Stale2146 years ago

QP-Insights Uploader

Medical imaging

This desktop application enables users to upload DICOM data along with associated clinical information to QP-Insights—the data management platform of the UPV Reference Node within EUCAIM.

DICOM-SEG Annotation

Data quality management

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

Data Integration Quality Check Tool (DIQCT)

Data quality management

A tool that checks the clinical metadata quality (validity, completeness), the integrity between images and clinical metadata provided as well as their accuracy, the de-identification protocol applied, and existence of annotation together with the consistency between the images and the annotation files and informs the user on corrective actions prior to data upload.

Proprietary

Image Duplicates Checker

Data quality management

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]

generate_count_matrix

Transcriptomics

Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.

CompuCell3D

Systems biology

CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python.

miniconda

Software management

Miniconda is a minimal Python distribution that includes the Conda package and environment manager plus only essential dependencies. It provides a lightweight way to create isolated environments and install Python packages as needed, without the large preinstalled package set of Anaconda.

Proprietary

Circlator

Sequence assembly

Circlator is a tool to circularize genome assemblies. It will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.

EdinOmics Dash App

Metabolomics

An interactive platform that performs statistical analyses on metabolomics datasets and allows visualising results with ease. The interface gives users autonomy in creating figures suited to their reporting and publication needs.

CC-BY-4.0

Radiomic features extraction (EUCAIM-SW-055_T-02-01-003)

Implemented by GIBI230, this tool is a Docker-based software designed for extracting radiomic features from 3D medical images in NIfTI format using the PyRadiomics library (if DICOM images, the DICOM to NIFTI converter must be run before using this tool). It streamlines the radiomics calculation process by generating a structured CSV file containing all extracted variables from medical images. The dockerized software enables users to configure parameters like filters, bin width, resampling spacing, and normalization settings can be specified. The output radiomic variables provide quantitative information for further analysis in medical imaging research and machine learning applications. Specially important the parameter selection of the band width. For robust and reproducible results, a bin width of 5 is commonly recommended, but it should be adjusted based on image resolution, modality, and noise levels.

Data curation and archival

Cluster based harmonization (EUCAIM-SW-044_T-01-03-006)

The tool is designed to perform radiomics harmonization on large and heterogeneous datasets, where the risk of over-harmonization is present. Instead of directly applying harmonization based on predefined batch labels, the tool first identifies groups of batches that share similar characteristics through clustering of the radiomics data. It then performs harmonization using these cluster-derived labels. The tool allows the harmonization of radiomics variables using two methods: (1) original ComBat (Rabinovic, 2007) method, where each original batch group is considered for the harmonization process and (2) cluster-based ComBat method, where batch groups with similar radiomics characteristics form clusters and the latter are being considered for the harmonization process.

Data curation and archival

2D Digital Mammography Harmonization (EUCAIM-SW-046_T-01-03-008)

This preprocessing tool is design for 2D digital mammograms in DICOM format. It standardizes and harmonizes images through a configurable pipeline that includes spatial reorientation, pseudo-3D stacking, isotropic resampling, intensity normalization, optional denoising, contrast enhancement, and mask processing (if available).