popV

community

AI & ML interests

popularVoting for cell-type annotation in single-cell genomics

Recent Activity

canergen  updated a collection 1 day ago
Tabula Sapiens
canergen  updated a model 1 day ago
popV/tabula_sapiens_Eye
canergen  updated a collection 1 day ago
Tabula Sapiens
View all activity

popV

Welcome to the popV framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process. By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them. This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our GitHub repository to add your dataset.


Model Overview

popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your own dataset, please refer to our tutorial

Algorithms

Currently implemented algorithms are:

  • K-nearest neighbor classification after dataset integration with BBKNN
  • K-nearest neighbor classification after dataset integration with SCANORAMA
  • K-nearest neighbor classification after dataset integration with scVI
  • K-nearest neighbor classification after dataset integration with Harmony
  • Random forest classification
  • Support vector machine classification
  • OnClass cell type classification
  • scANVI label transfer
  • Celltypist cell type classification

Key Applications

The purpose of these models is to perform cell-type label transfer. We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales well to 100k cells. PopV has three levels of prediction complexities:

  • retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
  • inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
  • fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).

Publications

  • Original popV paper:
    • Published in Nature Genetics, this paper introduces popV and benchmarks it.

Contact

datasets

None public yet