Find open-source science resources

A directory of tools, AI models, datasets, and research resources for biotech, bioinformatics, and other scientific fields. Aggregated from curated GitHub awesome-lists, HuggingFace, bio.tools, Bioconductor, and more.

5,923 resources indexed

Showing 651700

Computational fluid dynamics in JAX, enabling differentiable Navier-Stokes simulations with automatic differentiation for ML-accelerated CFD research, supporting turbulence modeling, convection-diffusion, and complex boundary conditions on CPUs and GPUs (Google Research, 947+ stars)

Active9483 months ago
Jupyter Notebook
Apache-2.0

Foundation models for genomics and transcriptomics pretrained on 3,000+ human genomes and 850+ diverse species, enabling chromatin accessibility prediction, splice site detection, and promoter classification across multiple model scales (InstaDeep, NVIDIA & TUM, Nature Methods 2023)

Active8843 months ago
Jupyter Notebook
NOASSERTION

# ibm/biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-BACE-101 biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of…

Active3.1K3 months ago

# ibm-research/biomed.sm.mv-te-84m biomed.sm.mv-te-84m is a multimodal biomedical foundation model for small molecules created using MMELON (Multi-view Molecular Embedding with Late Fusion), a flexible approach to aggregate multiple views (sequence, image, graph) of molecules in a foundation model…

Active1223 months ago

A single molecule sequence assembler for genomes large and small.

Active7003 months ago
C++

FASTQ and SAM quality control using Python.

Active1093 months ago
Python
MIT

## Model Description This repository contains the weights for the Borzoi model, a model designed to predict functional genomic tracks from genomic DNA sequences.

Active03 months ago
Active6.5K3 months ago
Python

BWA-MEM drop-in replacement: 2-3x faster, 2-5x cheaper, 100% identical output on standard CPUs.

Active223 months ago
C
MIT

lumpy: a general probabilistic framework for structural variant discovery.

Active3423 months ago
C
MIT

ONNX export of the Cellpose cpsam (Cellpose-SAM) model for cell segmentation in microscopy images.

Active03 months ago

A library containing basis sets for use in quantum chemistry calculations. In addition, this library has functionality for manipulation of basis set data.

Active1993 months ago
Python
BSD-3-Clause

MIRit is an R package that provides several methods for investigating the relationships between miRNAs and genes in different biological conditions. In particular, MIRit allows to explore the functions of dysregulated miRNAs, and makes it possible to identify miRNA-gene regulatory axes that control biological pathways, thus enabling the users to unveil the complexity of miRNA biology. MIRit is an all-in-one framework that aims to help researchers in all the central aspects of an integrative miRNA-mRNA analyses, from differential expression analysis to network characterization.

Active23 months ago
R
GPL-3.0+

A Python script that converts positional information from a SAM dataset into interval format with 0-based start and 1-based end. CIGAR string of SAM format is used to compute the end coordinate.

Active373 months ago
Python
MIT

Universal pretrained neural network potential with charge and magnetic moment awareness, trained on 1.5M+ Materials Project inorganic structures for charge-informed molecular dynamics and phase diagram prediction (Berkeley, Nature Machine Intelligence 2023 Cover)

Active3833 months ago
Python
NOASSERTION

Model weights, configurations, and vocabularies for CpGPT: A Foundation Model for DNA Methylation.

Active03 months ago

Implements a variety of methods for combining p-values in differential analyses of genome-scale datasets. Functions can combine p-values across different tests in the same analysis (e.g., genomic windows in ChIP-seq, exons in RNA-seq) or for corresponding tests across separate analyses (e.g., replicated comparisons, effect of different treatment conditions). Support is provided for handling log-transformed input p-values, missing values and weighting where appropriate.

Active23 months ago
R
GPL-3.0

Provides visualization functionality for untargeted LC-MS metabolomics research. Includes quality control visualizations, feature-wise visualizations and results visualizations.

Active03 months ago
R
MIT

Deep learning-based object detection and segmentation for star-convex shapes, widely adopted for cell and nucleus segmentation in fluorescence and electron microscopy via a compact neural network architecture with non-maximum suppression and shape-based post-processing (Nature Methods 2020, 1.2K+ stars)

Active1.2K3 months ago
Python
BSD-3-Clause

cfDNA fragments carry important features for building cancer sample classification ML models, such as fragment size, and fragment end motif etc. Analyzing and visualizing fragment size metrics, as well as other biological features in a curated, standardized, scalable, well-documented, and reproducible way might be time intensive. This package intends to resolve these problems and simplify the process. It offers two sets of functions for cfDNA feature characterization and visualization.

Active433 months ago
R
GPL-3.0

sRACIPE implements a randomization-based method for gene circuit modeling. It allows us to study the effect of both the gene expression noise and the parametric variation on any gene regulatory circuit (GRC) using only its topology, and simulates an ensemble of models with random kinetic parameters at multiple noise levels. Statistical analysis of the generated gene expressions reveals the basin of attraction and stability of various phenotypic states and their changes associated with intrinsic and extrinsic noises. sRACIPE provides a holistic picture to evaluate the effects of both the stochastic nature of cellular processes and the parametric variation.

Active63 months ago
R
MIT

![Figure #](

Active30.6K4 months ago

Utility package to facilitate integration and analysis of EBI MGnify data in R. The package can be used to import microbial data for instance into TreeSummarizedExperiment (TreeSE). In TreeSE format, the data is directly compatible with miaverse framework.

Active234 months ago
R
Artistic-2.0

A polymorphic bayesian genotyping model with wide applicability.

Active3234 months ago
C++
MIT

TaxSEA is an R package for Taxon Set Enrichment Analysis, which utilises a Kolmogorov-Smirnov test analyses to investigate differential abundance analysis output for whether there are alternations in a-priori defined sets of taxa from public databases (BugSigDB, MiMeDB, GutMGene, mBodyMap, BacDive and GMRepoV2) and collated from the literature. TaxSEA takes as input a list of taxonomic identifiers (e.g. species names, NCBI IDs etc.) and a rank (E.g. fold change, correlation coefficient). TaxSEA be applied to any microbiota taxonomic profiling technology (array-based, 16S rRNA gene sequencing, shotgun metagenomics & metatranscriptomics etc.) and enables researchers to rapidly contextualize their findings within the broader literature to accelerate interpretation of results.

Active104 months ago
R
GPL-3.0

Rectified Quaternion Flow for efficient protein backbone generation, 37× faster than RFDiffusion with 0.972 designability (ICML 2025)

Active844 months ago
Python

Physics-informed neural networks

Active5.9K4 months ago
Python
MIT

Azure Semantic Kernel multi-agent PPT generation reference

Active494 months ago
Python
MIT

Detection of rare aberrant splicing events in transcriptome profiles. Read count ratio expectations are modeled by an autoencoder to control for confounding factors in the data. Given these expectations, the ratios are assumed to follow a beta-binomial distribution with a junction specific dispersion. Outlier events are then identified as read-count ratios that deviate significantly from this distribution. FRASER is able to detect alternative splicing, but also intron retention. The package aims to support diagnostics in the field of rare diseases where RNA-seq is performed to identify aberrant splicing defects.

Active554 months ago
R
Other

Uses platform-specific implemenations of the GatingML2.0 standard to exchange gated cytometry data with other software platforms.

Active354 months ago
R
AGPL-3.0

Family of large language models for materials research via continued pretraining of LLaMA-2/3 on ~30B materials science tokens, outperforming commercial LLMs on materials science tasks while identifying "adaptation rigidity" in overtrained models; includes MatNLP benchmark and CIF crystal generation capabilities (IIT Delhi M3RG, MIT License)

Active614 months ago
Jupyter Notebook
MIT

Neural Network Force Field based on PyTorch.

Active2904 months ago
Jupyter Notebook
MIT
Active04 months ago

MOSim package simulates multi-omic experiments that mimic regulatory mechanisms within the cell, allowing flexible experimental design including time course and multiple groups.

Active124 months ago
R
GPL-3.0

This package wraps the functionality of the Thermo Fisher Scientic RawFileReader .NET 8.0 assembly. Within the R environment, spectra and chromatograms are represented by S3 objects. The package provides basic functions to download and install the required third-party libraries. The package is developed, tested, and used at the Functional Genomics Center Zurich, Switzerland.

Active624 months ago
R
GPL-3.0

From Inquiry to Decision: Building Trustworthy Medical AI

Active204 months ago
Python
Active454 months ago
Python

A package for working with nuclear magnetic resonance (NMR) data including functions for reading common binary file formats and processing NMR data.

Active2654 months ago
Python
BSD-3-Clause

File parser/converter for QM, MD and plane-wave DFT programs.

Active1634 months ago
Python
LGPL-3.0

A Neural Machine Translation (NMT) model based on a custom Transformer (Encoder-Decoder) architecture, trained from scratch. This model is designed to translate English sentences into Hebrew using multilingual encoding and specialized layer configurations.

Active184 months ago
Python

A package for creating fast and accurate interatomic potentials.

Active3584 months ago
Python
MIT

Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines.

Active824 months ago
R
MIT

Infrastructure to store and manage all aspects related to a complete proteomics or metabolomics mass spectrometry (MS) experiment. The MsExperiment package provides light-weight and flexible containers for MS experiments building on the new MS infrastructure provided by the Spectra, QFeatures and related packages. Along with raw data representations, links to original data files and sample annotations, additional metadata or annotations can also be stored within the MsExperiment container. To guarantee maximum flexibility only minimal constraints are put on the type and content of the data within the containers.

Active54 months ago
R
Artistic-2.0

The MsFeature package defines functionality for Mass Spectrometry features. This includes functions to group (LC-MS) features based on some of their properties, such as retention time (coeluting features), or correlation of signals across samples. This packge hence allows to group features, and its results can be used as an input for the `QFeatures` package which allows to aggregate abundance levels of features within each group. This package defines concepts and functions for base and common data types, implementations for more specific data types are expected to be implemented in the respective packages (such as e.g. `xcms`). All functionality of this package is implemented in a modular way which allows combination of different grouping approaches and enables its re-use in other R packages.

Active74 months ago
R
Artistic-2.0

SQL-based mass spectrometry (MS) data backend supporting also storange and handling of very large data sets. Objects from this package are supposed to be used with the Spectra Bioconductor package. Through the MsBackendSql with its minimal memory footprint, this package thus provides an alternative MS data representation for very large or remote MS data sets.

Active44 months ago
R
Artistic-2.0

scQTLtools is a comprehensive R/Bioconductor package that facilitates end-to-end single-cell eQTL analysis, from preprocessing to visualization

Active64 months ago
R
MIT

seqlib is a type-safe Rust library for working with DNA and RNA sequences.

Active04 months ago
Rust
Other

This package draws protein schematics from Uniprot API output. From the JSON returned by the GET command, it creates a dataframe from the Uniprot Features API. This dataframe can then be used by geoms based on ggplot2 and base R to draw protein schematics.

Active354 months ago
R
MIT

Evaluating multimodal autonomous agents in realistic scientific workflows across real scientific software environments (KAlgebra, Celestia, Grass GIS, Lean 4, etc.) with VM-based evaluation infrastructure and agent trajectories

Active1294 months ago
Python
MIT

A new clustering algorithm, "binary cut", for clustering similarity matrices of functional terms is implemeted in this package. It also provides functions for visualizing, summarizing and comparing the clusterings.

Active1254 months ago
R
MIT