Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

269 of 5,923 resources

Showing 201250

Resources on ChIP-seq data which include papers, methods, links to software, and analysis.

Idle8501 year ago
Python
MIT

UNIX-style FASTA manipulation tools.

Idle171 year ago
Python
MIT

Biomedical text generation

Idle4.5K1 year ago
Python
MIT

Diffusion model for scalable protein structure design with multi-motif scaffolding capabilities, achieving state-of-the-art designability, diversity, and novelty through SE(3)-equivariant attention and massive data augmentation (AlQuraishi Lab, 2024)

Idle1921 year ago
Python
Apache-2.0

Automated data visualization with minimal code

Stale1.9K2 years ago
Python
Apache-2.0

General-purpose deep learning backbone for molecular modeling

Stale2.5K2 years ago
Python
MIT

Generative model for programmable protein design using diffusion modeling, equivariant graph neural networks, and conditional random fields to efficiently sample diverse all-atom structures; supports conditional generation via composable conditioners for substructure, symmetry, shape, and neural-network predictions; validated crystallographically (Generate Biomedicines, Nature 2023)

Stale8192 years ago
Python
Apache-2.0

[RDKit](http://www.rdkit.org/) and [OSRA](https://cactus.nci.nih.gov/osra/) in the [Bottle](http://bottlepy.org/docs/dev/) on [Tornado](http://www.tornadoweb.org/en/stable/).

Archived502 years ago
Python
NOASSERTION

file format conversion in Biopython in a convenient way.

Stale1182 years ago
Python
GPL-3.0

Google DeepMind's AlphaFold-derived classifier for proteome-wide missense variant effect prediction, providing pathogenicity scores for all ~71M possible human missense variants and classifying 89% with 90% precision; pre-computed predictions are integrated into Ensembl VEP and UCSC Genome Browser to support clinical variant interpretation (Science 2023)

Archived6332 years ago
Python
Apache-2.0

AlphaPickle is a Python tool that converts AlphaFold and ColabFold output files into user-friendly CSV files and plots, enabling easy analysis and visualization of protein prediction data without requiring programming expertise. It processes .pkl, .json, and PDB files to extract and visualize metrics like pLDDT and PAE.

Stale332 years ago
Python
GPL-3.0

A package for benchmarking of models for _de novo_ molecular design.

Stale5212 years ago
Python
MIT

Protein structure prediction from ESM models

Archived4.1K2 years ago
Python
MIT

Huawei's 3D high-resolution global weather forecast model at 0.25° resolution, first AI method to comprehensively outperform traditional NWP across all variables and lead times, integrated into ECMWF operational forecasts (Nature 2023)

Stale1.4K2 years ago
Python

3D Equivariant Diffusion for Target-Aware Molecule Generation (ICLR2023)

Stale3412 years ago
Python

Single-cell BERT for gene expression

Stale3572 years ago
Python
GPL-3.0

OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend.

Stale7452 years ago
Python
MIT

DGL-LifeSci is a [DGL](https://www.dgl.ai/)-based package for various applications in life science with graph neural network.

Stale8032 years ago
Python
Apache-2.0

First foundation model for weather and climate by Microsoft, Vision Transformer-based architecture trained on heterogeneous datasets (ICML 2023)

Stale6982 years ago
Python
MIT

A VCF Parser for Python.

Stale4192 years ago
Python
NOASSERTION

First vision-and-language foundation model for pathology AI, fine-tuned from CLIP on 249K image-caption pairs, enabling open-ended visual-semantic search and zero-shot diagnosis across histopathology (Pathology Foundation, 376+ stars)

Stale3762 years ago
Python

Easily submitting PBS jobs with script template. Multiple input files supported.

Stale293 years ago
Python
MIT

A deep learning framework (based on Chainer) with applications in Biology and Chemistry.

Stale7003 years ago
Python
MIT

A platform for graph-based molecular generation using graph neural networks.

Archived3803 years ago
Python
MIT

Enables machine learning on three-dimensional molecular structure.

Stale3193 years ago
Python
MIT

NuclearPhaser is a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs. This is an overview of the phasing pipeline for dikaryons.

Stale133 years ago
Python
GPL-3.0

a robust molecular representation learning framework against distribution shifts.

Stale613 years ago
Python
MIT

Open Drug Discovery Toolkit, a modular and comprehensive toolkit for use in cheminformatics, molecular modeling etc.

Stale4643 years ago
Python
BSD-3-Clause

Toolkit for processing molecules, reactions and condensed graphs of reactions. Can be used for chemical standardization, MCS search, tautomers generation with backward compatibility to RDKit and NetworkX.

Stale513 years ago
Python
LGPL-3.0

Go Get Data; A command line interface for obtaining genomic data.

Stale423 years ago
Python
MIT

A cookiecutter template for bioinformatics projects, with a focus on building bioinformatics workflows that can run on the MPI-IE cluster according to FAIR principles.

Stale143 years ago
Python
MIT

Hierarchical Generation of Molecular Graphs using Structural Motifs.

Stale4383 years ago
Python
MIT

Learning nonlinear operators

Stale8193 years ago
Python
NOASSERTION

AI for chemical reaction prediction and synthesis planning

Stale4244 years ago
Python
NOASSERTION

Computation Pipeline library for python widely used in science and bioinformatics.

Stale1754 years ago
Python
MIT

Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations.

Archived424 years ago
Python
MIT

NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences RSII or Sequel sequencers.

Stale926 years ago
Python
MIT

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.

Stale2146 years ago
Python
MIT

This desktop application enables users to upload DICOM data along with associated clinical information to QP-Insights—the data management platform of the UPV Reference Node within EUCAIM.

This module provides a command line tool to validate DICOM SEG files against predefined requirements specified in an Excel file. It contains components for finding relevant DICOM files, loading and parsing validation requests and applying validation rules. The main validation process checks each DICOM file for compliance with the Type 1, 1C, 2, 2C and 3 attributes specified in the requirements file. A detailed report is generated highlighting issues such as missing, invalid or conditionally required attributes, including file paths and affected DICOM tags. The tool is designed to ensure data integrity and compliance with DICOM standards.

A tool that checks the clinical metadata quality (validity, completeness), the integrity between images and clinical metadata provided as well as their accuracy, the de-identification protocol applied, and existence of annotation together with the consistency between the images and the annotation files and informs the user on corrective actions prior to data upload.

Automatically detects duplicate and near-duplicate DICOM image series in large medical imaging datasets. Uses a tiered pipeline combining DICOM metadata analysis, SHA-based pixel hashing, and image similarity metrics (SSIM, cosine, MAD) to identify exact copies, re-exported series, and near-identical acquisitions. All findings are reported for human expert review — no files are modified or deleted automatically. For scenarios requiring strict, image-level deduplication based on pixel content, fully agnostic to metadata changes, consider using [https://bio.tools/image_duplicate_check_tool]

Tool to generate a count matrix for expression data in Galaxy. generate_count_matrix reads in one or more input text files with expression counts and produces a single combined file. Each input will have a column in the matrix containing expression values. The column containing gene (or feature) names should be identical for all input count files.

CompuCell3D is a multiscale multicellular virtual tissue modeling and simulation environment. CompuCell3D is written in C++ and provides Python bindings for model and simulation development in Python.

Miniconda is a minimal Python distribution that includes the Conda package and environment manager plus only essential dependencies. It provides a lightweight way to create isolated environments and install Python packages as needed, without the large preinstalled package set of Anaconda.

Circlator is a tool to circularize genome assemblies. It will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.

An interactive platform that performs statistical analyses on metabolomics datasets and allows visualising results with ease. The interface gives users autonomy in creating figures suited to their reporting and publication needs.

Implemented by GIBI230, this tool is a Docker-based software designed for extracting radiomic features from 3D medical images in NIfTI format using the PyRadiomics library (if DICOM images, the DICOM to NIFTI converter must be run before using this tool). It streamlines the radiomics calculation process by generating a structured CSV file containing all extracted variables from medical images. The dockerized software enables users to configure parameters like filters, bin width, resampling spacing, and normalization settings can be specified. The output radiomic variables provide quantitative information for further analysis in medical imaging research and machine learning applications. Specially important the parameter selection of the band width. For robust and reproducible results, a bin width of 5 is commonly recommended, but it should be adjusted based on image resolution, modality, and noise levels.

The tool is designed to perform radiomics harmonization on large and heterogeneous datasets, where the risk of over-harmonization is present. Instead of directly applying harmonization based on predefined batch labels, the tool first identifies groups of batches that share similar characteristics through clustering of the radiomics data. It then performs harmonization using these cluster-derived labels. The tool allows the harmonization of radiomics variables using two methods: (1) original ComBat (Rabinovic, 2007) method, where each original batch group is considered for the harmonization process and (2) cluster-based ComBat method, where batch groups with similar radiomics characteristics form clusters and the latter are being considered for the harmonization process.

This preprocessing tool is design for 2D digital mammograms in DICOM format. It standardizes and harmonizes images through a configurable pipeline that includes spatial reorientation, pseudo-3D stacking, isotropic resampling, intensity normalization, optional denoising, contrast enhancement, and mask processing (if available).