Time flies so fast, another month has passed! In previous issues, I shared the study notes of CellChat, including:
- CellChat cell communication analysis (1)
- CellChat Cell Communication (2) Visualization
- CellChat cell communication analysis (3) Multi-group comparative analysis
On this basis, I also wrote CellphoneDB notes: A Preliminary Study of CellphoneDB for Cell Communication Analysis (1) , In this post, CellphoneDB is briefly introduced, as well as CellphoneDB's environment configuration, single-sample combat, and finally a visual function cellphoneDB_Dotplot is provided. In addition, cellphoneDB does not seem to support the data of other species such as mice, so I wrote One line of code completes single-cell data human-mouse gene homology conversion , provides a function, one line of code to complete the gene homology conversion of human and mouse, and then use the converted data to go through the cellphoneDB process.
The post in this issue is continued from the above, and mainly solves two problems of CellphoneDB:
- Since a single-cell project basically contains several/dozens of samples, how to realize the batch processing of CellphoneDB;
- Enable more and richer visualizations.
1. Batch processing of CellphoneDB
1.1 First output the expression data and phenotype data in the R language environment:
copylibrary(Seurat) library(tidyverse) library(data.table) library(dplyr) library(readr) #### 1.load data scRNA = read_rds("./Step4.m.seurat.rds") #### 2. Batch output Output out.dir = "Step8.CellphoneDB/" for(sample in unique(scRNA$sampleID)){ sp1 <- subset(scRNA,sampleID == sample) sp1_counts <- as.data.frame(sp1@assays$RNA@data) %>% rownames_to_column(var = "Gene") sp1_meta <- data.frame(Cell=rownames(sp1@meta.data), cell_type=sp1@meta.data$celltype) fwrite(sp1_counts,file = paste0(out.dir,sample,"_counts.txt"), row.names=F, sep='\t') fwrite(sp1_meta, file = paste0(out.dir,sample,"_meta.txt"), row.names=F, sep='\t') }
After running, a txt file of the expression data and meta phenotype data of each sample will be generated in the out.dir directory.
Then run CellphoneDB in batches under Linux environment:
1.2 First write a general-purpose shell script
copy##Go to the destination folder cd Step8.CellphoneDB/ ##write a script first cat >gc_cellphoneDB.bash file_count=./$1_counts.txt file_anno=./$1_meta.txt outdir=./$1_Output if [ ! -d ${outdir} ]; then mkdir ${outdir} fi cellphonedb method statistical_analysis ${file_anno} ${file_count} --counts-data hgnc_symbol --output-path ${outdir} --threshold 0.01 --threads 10 ##If there are too many cells, you can add a downsampling parameter, and by default only 1/3 of the cells are analyzed #--subsampling #--subsampling-log true #For data without log transformation, add this parameter
1.3 Then run in batches
copy##Activate the environment conda activate cpdb ##Run this script in batch ls *counts.txt | while read id; do id=${id%%_counts.txt} echo $id bash gc_cellphoneDB.bash $id 1>${id}.log 2>&1 done
The results for each sample are in the current directory.
2. Various visualization schemes of CellphoneDB
Except that I used the function cellphoneDB_Dotplot adapted from Mr. Jianming's code to draw a bubble chart in the last issue (see A Preliminary Study of CellphoneDB for Cell Communication Analysis (1) ), we can also use the following scheme for visualization.
2.1 Functions connected to Cellchat
In fact, as long as we have a deep understanding of the function logic of the Cellchat package, we can use the results of cellphoneDB as input data for visualization.
The file needed here is count_network.txt as input data.
copylibrary(CellChat) library(tidyr) df.net <- read.table("./Step8.CellphoneDB/GSM5573466_Output/Outplot/count_network.txt", header = T,sep = "\t",stringsAsFactors = F) meta.data <- read.table("./Step8.CellphoneDB/GSM5573466_meta.txt", header = T,sep = "\t",stringsAsFactors = F) groupSize <- as.numeric(table(meta.data$cell_type)) df.net <- spread(df.net, TARGET, count) rownames(df.net) <- df.net$SOURCE df.net <- df.net[, -1] df.net <- as.matrix(df.net) netVisual_circle(df.net, vertex.weight = groupSize, weight.scale = T, label.edge= F, title.name = "Number of interactions")
image-20221229204130690
copy# par(mfrow = c(3,3), xpd=TRUE) for (i in 1:nrow(df.net)) { mat2 <- matrix(0, nrow = nrow(df.net), ncol = ncol(df.net), dimnames = dimnames(df.net)) mat2[i, ] <- df.net[i, ] netVisual_circle(mat2, vertex.weight = groupSize, weight.scale = T, edge.weight.max = max(df.net), title.name = rownames(df.net)[i], arrow.size=0.5) }
Output the interaction results in batches, only one of them is displayed here:
2.2 R package ktplots
Refer to the notes written by Brother Zhuifeng in the short book: 10X Single Cell && 10X Spatial Transcriptome Analysis - cellphoneDB Visualization Advanced, github: https://github.com/zktuong/ktplots. This package comes with a lot of visualization solutions, here are one or two of them:
copy#devtools::install_github('zktuong/ktplots', dependencies = TRUE) library(ktplots) library(Seurat) library(SingleCellExperiment) library(tidyverse) library(data.table) library(dplyr) library(readr)
copy#### 1 Load data mypvals <- read.table("./Step8.CellphoneDB/GSM5573466_Output/pvalues.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F) mymeans <- read.table("./Step8.CellphoneDB/GSM5573466_Output/means.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F) mydecon <- read.table("./Step8.CellphoneDB/GSM5573466_Output/deconvoluted.txt",header = T,sep = "\t",stringsAsFactors = F,check.names = F) #single cell data seurat.data = read_rds("./Step4.m.seurat.rds") scRNA = subset(seurat.data,sampleID == "GSM5573466") table(scRNA$celltype) #Seurat to SCE sce <- Seurat::as.SingleCellExperiment(scRNA)
copy## Vis 1 p1 <- plot_cpdb3(cell_type1 = 'CD4T|Treg', cell_type2 = 'EpiC|EpiInt|EpiPit', scdata = sce, idents = 'celltype', # column name where the cell ids are located in the metadata means = mymeans, pvals = mypvals, deconvoluted = mydecon, # new options from here on specific to plot_cpdb3 keep_significant_only = TRUE, standard_scale = TRUE, remove_self = TRUE, legend.pos.x = -5,legend.pos.y = 5 ) p1
A little more complicated:
copyp2 <- plot_cpdb2(cell_type1 = "CD4T|Treg", # same usage style as plot_cpdb cell_type2 = 'EpiC|EpiInt|EpiPit', idents = 'celltype', scdata = sce, means = mymeans, pvals = mypvals, deconvoluted = mydecon, # new options from here on specific to plot_cpdb2 desiredInteractions = list(c('CD4T', 'Treg'), c('Treg', 'EpiC'), c('Treg', 'EpiInt'), c('Treg', 'EpiPit')), interaction_grouping = interaction_annotation, edge_group_colors = c("Activating" = "#e15759", "Chemotaxis" = "#59a14f", "Inhibitory" = "#4e79a7", "Intracellular trafficking" = "#9c755f", "DC_development" = "#B07aa1"), node_group_colors = c("CD4T" = "#86bc86", "Treg" = "#79706e", "EpiC" = "#ff7f0e", "EpiInt" = "#bcbd22" ,"EpiPit" = "#17becf"), keep_significant_only = TRUE, standard_scale = TRUE, remove_self = TRUE) p2
The R package ktplots provides a lot of visualization solutions, see github for details: https://github.com/zktuong/ktplot
- END -