The following are (Introduction to R language programming) live supporting notes
1. Data Types and Vectors
1. Data Type
1.1 Judging the data type class() 1.2 Pressing the Tab key to automatically complete 1.3 Judging and converting data types (1) is family function, judgment, the return value is TRUE or FALSE
copyis.numeric("123") is.character("a") is.logical(TRUE)
(2) The as family function realizes the conversion between data types
copyas.matrix() as.numeric() as.character() as.logical()
2. Vector
(1) Use rep() for repetitions, seq() for regular ones, and rnorm for random numbers
copyrep("sample",6) [1] "sample" "sample" "sample" "sample" "sample" "sample" seq(4,30,3) [1] 4 7 10 13 16 19 22 25 28 rnorm(3) [1] 0.1511196 1.1105814 -0.8626667 (2)combination paste0(rep("x",3),1:3) # or paste0("x",1:3) [1] "x1" "x2" "x3" paste0("sample",seq(1,5,2)) [1] "sample1" "sample3" "sample5"
The difference between paste() and paste0(): (1) sep= in paste() connects two or more vector strings corresponding to paste (V1, V1, Sep = "") the difference is that SEP cannot be set, the default = "" per space
copypaste("x",1:3,sep = "~") [1] "x~1" "x~2" "x~3"
(2) Operation of two vectors
Highlights:
copyx %in% y # Is every element of x in y x[x %in% y] #Note the x,y order x == y # Is x equal to y at the corresponding position?
- intersection, union, difference
x = c(1,5,3,4) y = c(5,12,24,3) intersect(x,y) [1] 5 3 union(x,y) [1] 1 5 3 4 12 24 setdiff(x,y) [1] 1 4 setdiff(y,x) [1] 12 24
- When the lengths of the two vectors are inconsistent, the loop fills

- sort(x) is equal to x[order(x)]
- match(x,y) x[match(y,x)] match : whoever is outside, whoever is behind, uses y as a template to adjust the order of x
copyx = c("A","B","C","D","E") y = c("E","C","B","A") match(y,x) x[match(y,x)]
2. Data frames, matrices and lists
1. Difference
(1) Vector vector - one-dimensional; matrix matrix - two-dimensional, only one data type is allowed; data.frame data frame - two-dimensional, each column only allows one data type
2. Practice questions
(1) #Find the median of the values in the first column of c1 #Filter the rows where the last column value is a or c in c1
copyc1 <- read.csv("./exercise.csv") median(c1$Petal.Length) # Find the median of the values in the first column of c1 # or median(c1[,1]) c1[c1$Species %in% c("c","a"),] # Filter c1, the last column value is a or c rows # or c1[c1$Species == "a"| c1$Species == "c",] The error is of the form: c1[c1$Species == c("c","a"),] # One long and one short, can't be compared, they have completed a cycle
(2) Modify row and column names
copy#Change row and column names rownames(df) <- c("r1","r2","r3","r4") #Modify only the name of a row/column colnames(df)[2]="CHANGE"
(3) Connection of two data frames merge(test1,test2,by="name") merge(test1,test3,by.x = "name",by.y = "NAME")
(4) Practice
1. Count which values are in the last column of the built-in data iris, and how many times each value is repeated 2. Extract the first 5 rows and the first 4 columns of the built-in data iris, convert them to a matrix, and assign them to a. 3. Change the row names of a to flower1, flower2...flower5.
copytable(iris[,ncol(iris)]) a = as.data.frame(iris[1:5,1:4]) rownames(a) = paste0("flowers",1:5) # or rownames(a) = paste0("flowers",1:nrow(a))
(5) Use of match() function

copy## Use y as a template, sort the order of X, and then select the id column of x to the column name of y: match() function # match(colnames(y),x$file_name) # x[match(colnames(y),x$file_name),] # x$ID[match(colnames(y),x$file_name)] colnames(y) = x$ID[match(colnames(y),x$file_name)]
Three, several methods of loading packages
copy# method one: install.packages("tidyr") install.packages('BiocManager') # Method Two: BiocManager::install("ggplot2") # Method three: devtools::install_github("jmzeng1314/idmap1") #In parentheses, the author's username and the package name are written # Method four: if(!require(stringr))install.packages("stringr") Mirror source recommendation: # Tsinghua mirror # http://mirrors.tuna.tsinghua.edu.cn/CRAN/ # http://mirrors.tuna.tsinghua.edu.cn/bioconductor/ # University of Science and Technology Mirror # http://mirrors.ustc.edu.cn/CRAN/ # http://mirrors.ustc.edu.cn/bioc/
Symbols in R
[ ] : vector, data frame, matrix subset [[ ]] : list subset
Fourth, read and write data
txt and csv
read.csv(): generally read csv format read.table(): generally read txt format
copyex1 <- read.table("./ex1.txt", header = T) ex2 <- read.csv("./ex2.csv", row.names = 1) # The first column is the row name soft <- read.table("./soft.txt", sep = "\t", # separated by fill = TRUE, # Autofill spaces header = TRUE ) write.table(ex1,file = "./ex1.txt") write.csv(ex2,file = "./ex2.csv")
Rdata
save() ---> save
load() ---> load
copysave(ex1,file = "./ex1.Rdata") load("./ex1.Rdata")
Read in data, ID conversion
Case:

copysoft <- read.csv("./soft.csv",row.names = 1) head(soft) exp$symbol <- soft$GeneName[match(rownames(exp),soft$ID)] exp <- exp[!duplicated(exp$symbol),] exp <- exp[!grepl("^ENST",exp$symbol),] rownames(exp) <- exp$symbol exp = exp[,-ncol(exp)]
5. Drawing
(1) Drawing
(1) Mapping: ggplot2, ggpubr, base (2) Puzzle: patchwork package, mfrow in par, grid.arrange, cowplot (3) Export:
copy#Image saving and exporting # 1. ggplot2 series ggsave(p,filename = "") # 2. Universal: Syllogism # Save format and file name pdf("test.pdf") dev.off() # close artboard
(2) ggplot2 syntax
- ggplot2 special syntax: column names without quotes
- Property setting Mapping: assign color according to the content of a certain column of data Manual setting: set the graph to one or N colors, regardless of the data type


- actual combat
copy#1. Entry-level drawing template: drawing data, horizontal and vertical coordinates ggplot(data = iris)+ geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) #2. Attribute settings (color, size, transparency, point shape, line type, etc.) ggplot(data = iris) + geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length), size = 5, # Dot size 5mm alpha = 0.5, # Transparency 50% shape = 8) # point shape ## Specify a specific color for the map? ggplot(data = iris)+ geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length, color = Species))+ scale_color_manual(values = c("blue","grey","red")) ## Distinguish between color and fill two attributes ### 1 Both hollow and solid shapes use color to set the color ggplot(data = iris)+ geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length, color = Species), shape = 17) #17, solid example ggplot(data = iris)+ geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length, color = Species), shape = 2) #No. 2, the hollow example ### 2 There are both borders and inner cores, only the two parameters of color and fill are needed ggplot(data = iris)+ geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length, color = Species), shape = 24, fill = "black") #Size 24, two-color example #3. Faceted ggplot(data = iris) + geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) + facet_wrap(~ Species) #double facet dat = iris dat$Group = sample(letters[1:5],150,replace = T) ggplot(data = dat) + geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) + facet_grid(Group ~ Species) #4. Geometric Objects #Local and global settings ggplot(data = iris) + geom_smooth(mapping = aes(x = Sepal.Length, y = Petal.Length))+ geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) ggplot(data = iris,mapping = aes(x = Sepal.Length, y = Petal.Length))+ geom_smooth()+ geom_point() #5. Statistical transformation usage scenarios #5.1. No statistics, the data is directly plotted fre = as.data.frame(table(diamonds$cut)) fre ggplot(data = fre) + geom_bar(mapping = aes(x = Var1, y = Freq), stat = "identity") #5.2 count changed to prop ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1)) #6. Location relationship # 6.1 Dithering Dot Plot ggplot(data = iris,mapping = aes(x = Species, y = Sepal.Width, fill = Species)) + geom_boxplot()+ geom_point() ggplot(data = iris,mapping = aes(x = Species, y = Sepal.Width, fill = Species)) + geom_boxplot()+ geom_jitter() # 6.2 Stacked Histograms ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut,fill=clarity)) # 6.3 Side by side histogram ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") #7. Coordinate system #flip coord_flip() ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_boxplot() + coord_flip() #Polar coordinate system coord_polar() bar <- ggplot(data = diamonds) + geom_bar( mapping = aes(x = cut, fill = cut), width = 1 ) + theme(aspect.ratio = 1) + labs(x = NULL, y = NULL) bar bar + coord_flip() bar + coord_polar() # Practice Questions: Violin Plot + Box Plot ggplot(iris,mapping = aes(x = Sepal.Width,y = Species)) + geom_violin(aes(fill = Species)) + geom_boxplot()+ geom_jitter(aes(shape = Species))
single facet

double facet

statistical transformation

Stacked histogram

side-by-side histogram

**
Violin + Boxplot

(3) ggpubr.R syntax
copy# There are a lot of pictures from ggpubr on sthda library(ggpubr) ggscatter(iris,x="Sepal.Length", y="Petal.Length", color="Species") p <- ggboxplot(iris, x = "Species", y = "Sepal.Length", color = "Species", shape = "Species", add = "jitter") p my_comparisons <- list( c("setosa", "versicolor"), c("setosa", "virginica"), c("versicolor", "virginica") ) p + stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value stat_compare_means(label.y = 9)
(4) Storage of pictures
#Save and export of pictures #1. ggplot2 series ggsave(p,filename = "")
#2. General: syllogism save format and file name pdf(“test.pdf”) … … dev.off() # end

(5) Jigsaw puzzle
copy# patchwork package p1.1 <- violin_plot(dat = dat,gene = dat$CCL5) p1.2 <- violin_plot(dat = dat,gene = dat$MMP9) p1.4 <- violin_plot(dat = dat,gene = dat$RAC2) p1.5 <- violin_plot(dat = dat,gene = dat$CORO1A) p1.6 <- violin_plot(dat = dat,gene = dat$CCL2) library(patchwork) p1 <- (p1.1 | p1.2 ) / # split into two lines (p1.4 | p1.5 | p1.6) library(ggplot2) ggsave("./vertify/GSE100927_vertify.pdf", plot = p1, width = 15, height = 18) 1234567891011
6. Topics
1. Sorting the data frame
- order or the arrange() function in the tidyverse
copy# order can sort a vector or a data frame sort(test$Sepal.Length) test$Sepal.Length[order(test$Sepal.Length)] test[order(test$Sepal.Length),] test[order(test$Sepal.Length,decreasing = T),] # arrange, more flexible sorting library(tidyverse) # need to load this package arrange(test, Sepal.Length) arrange(test, desc(Sepal.Length)) arrange(test, desc(Sepal.Width),Sepal.Length) # Sort by Sepal.Width first, if the Sepal.Width column is the same, then sort by Sepal.Length column
- mutate, select, filter, rename in the dplyr package mutate(): add column, rename(): rename column name select(): filter column; filter(): filter row Pipe symbol: %>%: ctrl + shift +m
2. Draw a boxplot of the expression matrix
As shown in the figure below, according to such an expression matrix, draw this picture, if you do not transform the table, it will not be successful

To turn a long table into a short table, the change operation is as follows
copylibrary(tidyr) library(tibble) library(dplyr) dat = t(exp) %>% as.data.frame() %>% rownames_to_column() %>% mutate(group = group_list)



3. Connection
- inner_join: intersection
- left_join: left join
- right_join: right join
- full_join: full connection
copylibrary(dplyr) inner_join(test1,test2,by="name") inner_join(test1,test2,by=c("name" = "Name") right_join(test1,test2,by="name") full_join(test1,test2,by="name") semi_join(test1,test2,by="name") anti_join(test1,test2,by="name")
merge(): function

4. String functions: load the stringr package





copyx <- "The birch canoe slid on the smooth planks." x ###1. Detect string length str_length(x) length(x) ###2. String splitting str_split(x," ") x2 = str_split(x," ")[[1]];x2 y = c("jimmy 150","nicker 140","tony 152") str_split(y," ") str_split(y," ",simplify = T) ###3. Extract string by position str_sub(x,5,9) ###4. Character detection str_detect(x2,"h") ###5. String replacement str_replace(x2,"o","A") str_replace_all(x2,"o","A") ###6. Character deletion str_remove(x," ") str_remove_all(x," ")