CellphoneDB Visualization for Cell Communication Analysis

Time flies so fast, another month has passed! In previous issues, I shared the study notes of CellChat, including:

On this basis, I also wrote CellphoneDB notes: A Preliminary Study of CellphoneDB for Cell Communication Analysis (1) , In this post, CellphoneDB is briefly introduced, as well as CellphoneDB's environment configuration, single-sample combat, and finally a visual function cellphoneDB_Dotplot is provided. In addition, cellphoneDB does not seem to support the data of other species such as mice, so I wrote One line of code completes single-cell data human-mouse gene homology conversion , provides a function, one line of code to complete the gene homology conversion of human and mouse, and then use the converted data to go through the cellphoneDB process.

The post in this issue is continued from the above, and mainly solves two problems of CellphoneDB:

  • Since a single-cell project basically contains several/dozens of samples, how to realize the batch processing of CellphoneDB;
  • Enable more and richer visualizations.

1. Batch processing of CellphoneDB

1.1 First output the expression data and phenotype data in the R language environment:

library(Seurat)
library(tidyverse)
library(data.table)
library(dplyr)
library(readr)

#### 1.load data
scRNA = read_rds("./Step4.m.seurat.rds")

#### 2. Batch output Output
out.dir = "Step8.CellphoneDB/"
for(sample in unique(scRNA$sampleID)){
  sp1 <- subset(scRNA,sampleID == sample)
  sp1_counts <- as.data.frame(sp1@assays$RNA@data) %>% 
    rownames_to_column(var = "Gene")
  
  sp1_meta <- data.frame(Cell=rownames(sp1@meta.data), 
                         cell_type=sp1@meta.data$celltype)
  
  fwrite(sp1_counts,file = paste0(out.dir,sample,"_counts.txt"), row.names=F, sep='\t')
  fwrite(sp1_meta, file = paste0(out.dir,sample,"_meta.txt"), row.names=F, sep='\t')
}
copy

After running, a txt file of the expression data and meta phenotype data of each sample will be generated in the out.dir directory.

Then run CellphoneDB in batches under Linux environment:

1.2 First write a general-purpose shell script

##Go to the destination folder
cd Step8.CellphoneDB/

##write a script first
cat >gc_cellphoneDB.bash
file_count=./$1_counts.txt
file_anno=./$1_meta.txt
outdir=./$1_Output

if [ ! -d ${outdir} ]; then
mkdir ${outdir}
fi
cellphonedb method statistical_analysis ${file_anno} ${file_count} --counts-data hgnc_symbol --output-path ${outdir} --threshold 0.01 --threads 10 
##If there are too many cells, you can add a downsampling parameter, and by default only 1/3 of the cells are analyzed
#--subsampling
#--subsampling-log true #For data without log transformation, add this parameter
copy

1.3 Then run in batches

##Activate the environment
conda activate cpdb 

##Run this script in batch
ls *counts.txt | while read id;
do 
id=${id%%_counts.txt}
echo $id
bash gc_cellphoneDB.bash $id 1>${id}.log 2>&1 
done
copy

The results for each sample are in the current directory.

2. Various visualization schemes of CellphoneDB

Except that I used the function cellphoneDB_Dotplot adapted from Mr. Jianming's code to draw a bubble chart in the last issue (see A Preliminary Study of CellphoneDB for Cell Communication Analysis (1) ), we can also use the following scheme for visualization.

2.1 Functions connected to Cellchat

In fact, as long as we have a deep understanding of the function logic of the Cellchat package, we can use the results of cellphoneDB as input data for visualization.

The file needed here is count_network.txt as input data.

library(CellChat)
library(tidyr)
df.net <- read.table("./Step8.CellphoneDB/GSM5573466_Output/Outplot/count_network.txt",
                     header = T,sep = "\t",stringsAsFactors = F)
meta.data <- read.table("./Step8.CellphoneDB/GSM5573466_meta.txt",
                        header = T,sep = "\t",stringsAsFactors = F)

groupSize <- as.numeric(table(meta.data$cell_type))

df.net <- spread(df.net, TARGET, count)
rownames(df.net) <- df.net$SOURCE
df.net <- df.net[, -1]
df.net <- as.matrix(df.net)

netVisual_circle(df.net, vertex.weight = groupSize,
                 weight.scale = T, label.edge= F,
                 title.name = "Number of interactions")
copy

image-20221229204130690

# par(mfrow = c(3,3), xpd=TRUE)
for (i in 1:nrow(df.net)) {
  mat2 <- matrix(0, nrow = nrow(df.net), ncol = ncol(df.net), dimnames = dimnames(df.net))
  mat2[i, ] <- df.net[i, ]
  
  netVisual_circle(mat2, 
                   vertex.weight = groupSize,
                   weight.scale = T, 
                   edge.weight.max = max(df.net),
                   title.name = rownames(df.net)[i],
                   arrow.size=0.5)
  
}
copy

Output the interaction results in batches, only one of them is displayed here:

2.2 R package ktplots

Refer to the notes written by Brother Zhuifeng in the short book: 10X Single Cell && 10X Spatial Transcriptome Analysis - cellphoneDB Visualization Advanced, github: https://github.com/zktuong/ktplots. This package comes with a lot of visualization solutions, here are one or two of them:

#devtools::install_github('zktuong/ktplots', dependencies = TRUE)
library(ktplots)
library(Seurat)
library(SingleCellExperiment)
library(tidyverse)
library(data.table)
library(dplyr)
library(readr)
copy
#### 1 Load data
mypvals <- read.table("./Step8.CellphoneDB/GSM5573466_Output/pvalues.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F)
mymeans <- read.table("./Step8.CellphoneDB/GSM5573466_Output/means.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F) 
mydecon <- read.table("./Step8.CellphoneDB/GSM5573466_Output/deconvoluted.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F)

#single cell data
seurat.data = read_rds("./Step4.m.seurat.rds")
scRNA =  subset(seurat.data,sampleID == "GSM5573466")
table(scRNA$celltype)
#Seurat to SCE
sce <- Seurat::as.SingleCellExperiment(scRNA)
copy
## Vis 1
p1 <- plot_cpdb3(cell_type1 = 'CD4T|Treg', 
                 cell_type2 = 'EpiC|EpiInt|EpiPit',
                 scdata = sce,
                 idents = 'celltype', # column name where the cell ids are located in the metadata
                 means = mymeans,
                 pvals = mypvals,
                 deconvoluted = mydecon, # new options from here on specific to plot_cpdb3
                 keep_significant_only = TRUE,
                 standard_scale = TRUE,
                 remove_self = TRUE,
                 legend.pos.x = -5,legend.pos.y = 5
)
p1
copy

A little more complicated:

p2 <- plot_cpdb2(cell_type1 = "CD4T|Treg", # same usage style as plot_cpdb
 cell_type2 = 'EpiC|EpiInt|EpiPit',
 idents = 'celltype',
 scdata = sce,
 means = mymeans,
 pvals = mypvals,
 deconvoluted = mydecon, # new options from here on specific to plot_cpdb2
 desiredInteractions = list(c('CD4T', 'Treg'), c('Treg', 'EpiC'), c('Treg', 'EpiInt'), c('Treg', 'EpiPit')),
 interaction_grouping = interaction_annotation,
    edge_group_colors = c("Activating" = "#e15759", "Chemotaxis" = "#59a14f", "Inhibitory" = "#4e79a7", "Intracellular trafficking" = "#9c755f", "DC_development" = "#B07aa1"),
    node_group_colors = c("CD4T" = "#86bc86", "Treg" = "#79706e", "EpiC" = "#ff7f0e", "EpiInt" = "#bcbd22"  ,"EpiPit" = "#17becf"),
    keep_significant_only = TRUE,
    standard_scale = TRUE,
    remove_self = TRUE)
p2
copy

The R package ktplots provides a lot of visualization solutions, see github for details: https://github.com/zktuong/ktplot

- END -

Tags: Cyber Security git Open Source https github

Posted by jcleary on Thu, 05 Jan 2023 19:42:10 +0530