Find open-source science resources

A circos representation of multiple GWAS results.

Idle971 year ago

Uni-Mol

Protein & Drug Discovery

Universal 3D molecular pretraining framework with 209M conformations, scaling to 1.1B parameters (Uni-Mol2) on 800M conformations for molecular property prediction, docking, and quantum chemistry (ICLR 2023, NeurIPS 2024)

Idle1.1K1 year ago

SpatialExperiment

DataRepresentation

Defines an S4 class for storing data from spatial -omics experiments. The class extends SingleCellExperiment to support storage and retrieval of additional information from spot-based and molecule-based platforms, including spatial coordinates, images, and image metadata. A specialized constructor function is included for data from the 10x Genomics Visium platform.

Idle731 year ago

G4SNVHunter

Epigenetics

G-quadruplexes (G4s) are unique nucleic acid secondary structures predominantly found in guanine-rich regions and have been shown to be involved in various biological regulatory processes. G4SNVHunter is an R package designed to rapidly identify genomic sequences with G4-forming propensity and to accurately screen user-provided single nucleotide variants—as well as other small-scale variants such as indels and MNVs—for their potential to destabilize these structures. This allows researchers to then screen these critical variants for deeper study, digging into how they might influence biological functions—think gene regulation, for instance—by impairing G4 formation propensity.

Idle01 year ago

cytoviewer

ImmunoOncology

This R package supports interactive visualization of multi-channel images and segmentation masks generated by imaging mass cytometry and other highly multiplexed imaging techniques using shiny. The cytoviewer interface is divided into image-level (Composite and Channels) and cell-level visualization (Masks). It allows users to overlay individual images with segmentation masks, integrates well with SingleCellExperiment and SpatialExperiment objects for metadata visualization and supports image downloads.

Idle71 year ago

cytomapper

ImmunoOncology

Highly multiplexed imaging acquires the single-cell expression of selected proteins in a spatially-resolved fashion. These measurements can be visualised across multiple length-scales. First, pixel-level intensities represent the spatial distributions of feature expression with highest resolution. Second, after segmentation, expression values or cell-level metadata (e.g. cell-type information) can be visualised on segmented cell areas. This package contains functions for the visualisation of multiplexed read-outs and cell-level information obtained by multiplexed imaging technologies. The main functions of this package allow 1. the visualisation of pixel-level information across multiple channels, 2. the display of cell-level information (expression and/or metadata) on segmentation masks and 3. gating and visualisation of single cells.

Idle361 year ago

GPL-2.0+

Mid-level Energy Ontology

Genomics & Bioinformatics

The midlevel energy ontology (MENO) is a BFO-based midlevel ontology. It comprises the concepts for energy qualities, energy-based dispositions and energy-driven transformation and transfer processes and their interrelations. It has the goal to provide an upper level structure for these concepts for energy-related domain ontologies.

Idle21 year ago

Shell

RNA-FM (Nature Methods 2024)

RNA foundation model trained on millions of RNA sequences for generalist RNA sequence understanding, enabling downstream structure prediction, function annotation, and representation learning for non-coding RNAs (ml4bio, 372+ stars)

Idle3741 year ago

REINVENT

Protein & Drug Discovery

Industrial-grade reinforcement-learning-based generative platform for de novo molecular design with transformer architectures, supporting multi-objective optimization, scaffold decoration, and curriculum learning (AstraZeneca MolecularAI, REINVENT 4, 2024)

Archived3731 year ago

Apache-2.0

bartowski/PocketDoc_Dans-PersonalityEngine-V1.3.0-24b-GGUF

by bartowski

Using llama.cpp release b5466 for quantization.

Idle10.9K1 year ago

PocketDoc/Dans-PersonalityEngine-V1.3.0-24b

by PocketDoc

Dans-PersonalityEngine-V1.3.0-24b Dans-PersonalityEngine-V1.3.0-24b ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⡂⠀⠁⡄⢀⠁⢀⣈⡄⠌⠐⠠⠤⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⡄⠆⠀⢠⠀⠛⣸⣄⣶⣾⡷⡾⠘⠃⢀⠀⣴⠀⡄⠰⢆⣠⠘⠰⠀⡀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠃⠀⡋⢀⣤⡿⠟⠋⠁⠀⡠⠤⢇⠋⠀⠈⠃⢀⠀⠈⡡⠤⠀⠀⠁⢄⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠁⡂⠀⠀⣀⣔⣧⠟⠋⠀⢀⡄⠀⠪⣀⡂⢁⠛⢆⠀⠀⠀⢎⢀⠄⢡⠢⠛⠠⡀⠀⠄⠀⠀ ⠀⠀⡀⠡⢑⠌⠈⣧⣮⢾⢏⠁⠀⠀⡀⠠⠦⠈⠀⠞⠑⠁⠀⠀⢧⡄⠈⡜⠷⠒⢸⡇⠐⠇⠿⠈⣖⠂⠀…

Idle1651 year ago

PocketDoc/Dans-PersonalityEngine-V1.2.0-24b

by PocketDoc

Dans-PersonalityEngine-V1.2.0-24b ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⡂⠀⠁⡄⢀⠁⢀⣈⡄⠌⠐⠠⠤⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⡄⠆⠀⢠⠀⠛⣸⣄⣶⣾⡷⡾⠘⠃⢀⠀⣴⠀⡄⠰⢆⣠⠘⠰⠀⡀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠃⠀⡋⢀⣤⡿⠟⠋⠁⠀⡠⠤⢇⠋⠀⠈⠃⢀⠀⠈⡡⠤⠀⠀⠁⢄⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠁⡂⠀⠀⣀⣔⣧⠟⠋⠀⢀⡄⠀⠪⣀⡂⢁⠛⢆⠀⠀⠀⢎⢀⠄⢡⠢⠛⠠⡀⠀⠄⠀⠀ ⠀⠀⡀⠡⢑⠌⠈⣧⣮⢾⢏⠁⠀⠀⡀⠠⠦⠈⠀⠞⠑⠁⠀⠀⢧⡄⠈⡜⠷⠒⢸⡇⠐⠇⠿⠈⣖⠂⠀ ⠀⢌⠀⠤⠀⢠⣞⣾⡗⠁⠀⠈⠁⢨⡼⠀⠀⠀⢀⠀⣀⡤⣄⠄⠈⢻⡇⠀⠐⣠⠜⠑⠁⠀⣀⡔⡿⠨⡄…

Idle561 year ago

ProtTrans

Protein & Drug Discovery

State-of-the-art pretrained language models for proteins trained on thousands of GPUs and Google TPUs using Transformer architectures, enabling protein property prediction, feature extraction, and transfer learning across diverse downstream tasks (1.3K+ stars, MIT, 2020-2026)

Idle1.3K1 year ago

Structural variant callers

gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite.

Idle2831 year ago

Java

NOASSERTION

CatKit

Simulations

General purpose tools for high-throughput catalysis.

Idle1041 year ago

unsloth/medgemma-27b-text-it-GGUF

by unsloth

image-text-to-text

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Idle9.5K1 year ago

systemPipeShiny

ShinyApps

systemPipeShiny (SPS) extends the widely used systemPipeR (SPR) workflow environment with a versatile graphical user interface provided by a Shiny App. This allows non-R users, such as experimentalists, to run many systemPipeR’s workflow designs, control, and visualization functionalities interactively without requiring knowledge of R. Most importantly, SPS has been designed as a general purpose framework for interacting with other R packages in an intuitive manner. Like most Shiny Apps, SPS can be used on both local computers as well as centralized server-based deployments that can be accessed remotely as a public web service for using SPR’s functionalities with community and/or private data. The framework can integrate many core packages from the R/Bioconductor ecosystem. Examples of SPS’ current functionalities include: (a) interactive creation of experimental designs and metadata using an easy to use tabular editor or file uploader; (b) visualization of workflow topologies combined with auto-generation of R Markdown preview for interactively designed workflows; (d) access to a wide range of data processing routines; (e) and an extendable set of visualization functionalities. Complex visual results can be managed on a 'Canvas Workbench’ allowing users to organize and to compare plots in an efficient manner combined with a session snapshot feature to continue work at a later time. The present suite of pre-configured visualization examples. The modular design of SPR makes it easy to design custom functions without any knowledge of Shiny, as well as extending the environment in the future with contributions from the community.

Idle351 year ago

GPL-3.0+

Bamtools

BAM File Utilities

Collection of tools for working with BAM files.

Idle4301 year ago

C++

Autonomous Research Systems (2023-2025 Breakthroughs)

POPPER

Automated hypothesis testing with agentic sequential falsifications

Idle2741 year ago

DNAfusion

TargetedResequencing

DNAfusion can identify gene fusions such as EML4-ALK based on paired-end sequencing results. This package was developed using position deduplicated BAM files generated with the AVENIO Oncology Analysis Software. These files are made using the AVENIO ctDNA surveillance kit and Illumina Nextseq 500 sequencing. This is a targeted hybridization NGS approach and includes ALK-specific but not EML4-specific probes.

Idle41 year ago

Medical AI & Clinical Applications

MedSAM

Universal medical image segmentation foundation model trained on 1.57M image-mask pairs across 10 imaging modalities and 30+ cancer types (Nature Communications 2024)

Idle4.3K1 year ago

Apache-2.0

MassSpecWavelet

ImmunoOncology

Slides & Presentation Generation

Peak Detection in Mass Spectrometry data is one of the important preprocessing steps. The performance of peak detection affects subsequent processes, including protein identification, profile alignment and biomarker identification. Using Continuous Wavelet Transform (CWT), this package provides a reliable algorithm for peak detection that does not require any type of smoothing or previous baseline correction method, providing more consistent results for different spectra. See <doi:10.1093/bioinformatics/btl355} for further details.

Idle111 year ago

LGPL-2.0+

PaperToSlides

AI-powered tool that automatically converts academic papers (PDF) into presentation slides

Idle131 year ago

Knowledge Graph Resources

Awesome-LLM-KG

Comprehensive collection of papers on unifying LLMs and knowledge graphs

Idle2.6K1 year ago

ibm-research/GP-MoLFormer-Uniq

by ibm-research

GP-MoLFormer is a class of models pretrained on SMILES string representations of 0.65-1.1B molecules from ZINC and PubChem. This repository is for the model pretrained on all the unique molecules from both datasets.

Idle1.5K1 year ago

XformAI-india/qwen-0.6b-mentalhealth-support

by XformAI-india

Model Repo: xformai/qwen-0.6b-mentalhealth-support Base Model: Qwen/Qwen-0.5B Task: Empathetic Conversational AI for mental health & emotional support Fine-Tuned By: XformAI

Idle71 year ago

ProteinWorkshop

Biology & Medicine

Unified benchmarking framework for protein representation learning, providing standardized interfaces for pre-training and diverse downstream tasks including structure prediction, fitness prediction, and property prediction across multiple protein datasets and model architectures (ICLR 2024, 273+ stars, MIT License)

Idle2741 year ago

QIAIUNCC/EYE-Llama_gqa

by QIAIUNCC

## Model Description EYE-Llama_gqa is a large language model specifically designed for ophthalmic question-answering (QA). It is built upon the Llama 2 architecture and fine-tuned on a the EYE-lit and EYE-QA+ dataset.

Idle1061 year ago

oabed/Curated_ImmtorLig_DB_H2O_AutoML_Models

by oabed

## Overview This project focuses on curating and modeling bioactivity data of small molecules targeting immune receptors. Using datasets from ImmtorLig_DB, we applied machine learning techniques to predict interactions between small molecules and immune receptors or cytokines, aiding drug discovery…

Idle01 year ago

SEraster

SEraster is a rasterization preprocessing framework that aggregates cellular information into spatial pixels to reduce resource requirements for spatial omics data analysis. SEraster reduces the number of spatial points in spatial omics datasets for downstream analysis through a process of rasterization where single cells’ gene expression or cell-type labels are aggregated into equally sized pixels based on a user-defined resolution. SEraster is built on an R/Bioconductor S4 class called SpatialExperiment. SEraster can be incorporated with other packages to conduct downstream analyses for spatial omics datasets, such as detecting spatially variable genes.

Idle191 year ago

scHiCcompare

This package provides functions for differential chromatin interaction analysis between two single-cell Hi-C data groups. It includes tools for imputation, normalization, and differential analysis of chromatin interactions. The package implements pooling techniques for imputation and offers methods to normalize and test for differential interactions across single-cell Hi-C datasets.

Idle01 year ago

prov-gigapath/prov-gigapath

by prov-gigapath

image-feature-extraction

Idle60.4K1 year ago

XAItest

XAItest is an R Package that identifies features using eXplainable AI (XAI) methods such as SHAP or LIME. This package allows users to compare these methods with traditional statistical tests like t-tests, empirical Bayes, and Fisher's test. Additionally, it includes simThresh, a system that enables the comparison of feature importance with p-values by incorporating calibrated simulated data.

Idle11 year ago

SQLDataFrame

DataRepresentation

Implements bindings for SQL tables that are compatible with Bioconductor S4 data structures, namely the DataFrame and DelayedArray. This allows SQL-derived data to be easily used inside other Bioconductor objects (e.g., SummarizedExperiments) while keeping everything on disk.

Idle21 year ago

LGPL-3.0+

clustifyr

SingleCell

Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.

Idle1251 year ago

tidysbml

GraphAndNetwork

Starting from one SBML file, it extracts information from each listOfCompartments, listOfSpecies and listOfReactions element by saving them into data frames. Each table provides one row for each entity (i.e. either compartment, species, reaction or speciesReference) and one set of columns for the attributes, one column for the content of the 'notes' subelement and one set of columns for the content of the 'annotation' subelement.

Idle21 year ago

CC-BY-4.0

spatialDE

SpatialDE is a method to find spatially variable genes (SVG) from spatial transcriptomics data. This package provides wrappers to use the Python SpatialDE library in R, using reticulate and basilisk.

Idle31 year ago

RUCova

Mass cytometry enables the simultaneous measurement of dozens of protein markers at the single-cell level, producing high dimensional datasets that provide deep insights into cellular heterogeneity and function. However, these datasets often contain unwanted covariance introduced by technical variations, such as differences in cell size, staining efficiency, and instrument-specific artifacts, which can obscure biological signals and complicate downstream analysis. This package addresses this challenge by implementing a robust framework of linear models designed to identify and remove these sources of unwanted covariance. By systematically modeling and correcting for technical noise, the package enhances the quality and interpretability of mass cytometry data, enabling researchers to focus on biologically relevant signals.

Idle21 year ago

medicalai/ClinicalBERT

by medicalai

fill-mask

This model card describes the ClinicalBERT model, which was trained on a large multicenter dataset with a large corpus of 1.2B words of diverse diseases we constructed. We then utilized a large-scale corpus of EHRs from over 3 million patient records to fine tune the base language model.

Idle21.6K1 year ago

co_320

Idle221 year ago

CSS

CC-BY-4.0

AHMassBank

MassSpectrometry

Supplies AnnotationHub with MassBank metabolite/compound annotations bundled in CompDb SQLite databases. CompDb SQLite databases contain general compound annotation as well as fragment spectra representing fragmentation patterns of compounds' ions. MassBank data is retrieved from https://massbank.eu/MassBank and processed using helper functions from the CompoundDb Bioconductor package into redistributable SQLite databases.

Idle11 year ago

Artistic-2.0

ibm-research/biomed.omics.bl.sm.ma-ted-458m.protein_solubility

by ibm-research

Neural Differential Equations

Protein solubility is a critical factor in both pharmaceutical research and production processes, as it can significantly impact the quality and function of a protein. This is an example for finetuning ibm/biomed.omics.bl.sm-ted-458m for protein solubility prediction (binary classification) based…

Idle1181 year ago

torchdiffeq

PyTorch implementation of neural ODEs

Idle6.4K1 year ago

oim

schema

Idle21 year ago

Shell

google/derm-foundation

by google

image-classification

Idle1831 year ago

pathview

Pathways

Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.

Idle481 year ago

GPL-3.0+

CCAFE

GenomeWideAssociation

Functions to reconstruct case and control AFs from summary statistics. One function uses OR, NCase, NControl, and SE(log(OR)). The second function uses OR, NCase, NControl, and AF for the whole sample.

Idle11 year ago

prithivMLmods/Indian-Western-Food-34

by prithivMLmods

image-classification

!fffffff.png

Idle271 year ago

Computational Pathology & Digital Pathology

UNI (Nature Medicine 2024)

General-purpose pathology foundation model pretrained on 100K+ diagnostic whole-slide images across 20 major tissue types, achieving state-of-the-art transfer learning across 30+ clinical tasks and serving as a universal feature extractor for digital pathology (Mahmood Lab, 722+ stars)

Idle7441 year ago

NOASSERTION

Biofactoid