Organization Card

popV

Welcome to the popV framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process. By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them. This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our GitHub repository to add your dataset.

Model Overview

popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your own dataset, please refer to our tutorial

Algorithms

Currently implemented algorithms are:

K-nearest neighbor classification after dataset integration with BBKNN
K-nearest neighbor classification after dataset integration with SCANORAMA
K-nearest neighbor classification after dataset integration with scVI
K-nearest neighbor classification after dataset integration with Harmony
Random forest classification
Support vector machine classification
OnClass cell type classification
scANVI label transfer
Celltypist cell type classification

Key Applications

The purpose of these models is to perform cell-type label transfer. We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales well to 100k cells. PopV has three levels of prediction complexities:

retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).

Publications

Original popV paper:
- Published in Nature Genetics, this paper introduces popV and benchmarks it.

Contact

GitHub: https://github.com/YosefLab/popV
User questions: Discourse

Collections 1

models 11

datasets

None public yet

popV

AI & ML interests

Recent Activity

popV

Model Overview

Algorithms

Key Applications

Publications

Contact

Collections 1

popV/tabula_sapiens_Endothelium

popV/tabula_sapiens_Germline

popV/tabula_sapiens_Immune

popV/tabula_sapiens_Neural

models 11

popV/tabula_sapiens_Eye

popV/tabula_sapiens_Ear

popV/tabula_sapiens_Bone_Marrow

popV/tabula_sapiens_Blood

popV/tabula_sapiens_Bladder

popV/tabula_sapiens_Stromal

popV/tabula_sapiens_Neural

popV/tabula_sapiens_Immune

popV/tabula_sapiens_Germline

popV/tabula_sapiens_Endothelium

datasets

AI & ML interests

Recent Activity

Team members 2

popV

Model Overview

Algorithms

Key Applications

Publications

Contact

Collections 1

models 11 Sort: Recently updated

datasets

models 11