Shengxin skill tree R language learning live supporting notes

The following are (Introduction to R language programming) live supporting notes

1. Data Types and Vectors

1. Data Type

1.1 Judging the data type class() 1.2 Pressing the Tab key to automatically complete 1.3 Judging and converting data types (1) is family function, judgment, the return value is TRUE or FALSE

is.numeric("123")
is.character("a")
is.logical(TRUE)
copy

(2) The as family function realizes the conversion between data types

as.matrix()  
as.numeric()
as.character()
as.logical()
copy

2. Vector

(1) Use rep() for repetitions, seq() for regular ones, and rnorm for random numbers

rep("sample",6)
[1] "sample" "sample" "sample" "sample" "sample" "sample"

seq(4,30,3)
[1]  4  7 10 13 16 19 22 25 28

rnorm(3)
[1]  0.1511196  1.1105814 -0.8626667

(2)combination
paste0(rep("x",3),1:3) # or paste0("x",1:3)
[1] "x1" "x2" "x3"

paste0("sample",seq(1,5,2))
[1] "sample1" "sample3" "sample5"
copy

The difference between paste() and paste0(): (1) sep= in paste() connects two or more vector strings corresponding to paste (V1, V1, Sep = "") the difference is that SEP cannot be set, the default = "" per space

paste("x",1:3,sep  = "~")
[1] "x~1" "x~2" "x~3"
copy

(2) Operation of two vectors

Highlights:

x %in% y # Is every element of x in y

x[x %in% y] #Note the x,y order
x == y # Is x equal to y at the corresponding position?
copy
  1. intersection, union, difference

x = c(1,5,3,4) y = c(5,12,24,3) intersect(x,y) [1] 5 3 union(x,y) [1] 1 5 3 4 12 24 setdiff(x,y) [1] 1 4 setdiff(y,x) [1] 12 24

  1. When the lengths of the two vectors are inconsistent, the loop fills
  1. sort(x) is equal to x[order(x)]
  2. match(x,y) x[match(y,x)] match : whoever is outside, whoever is behind, uses y as a template to adjust the order of x
x = c("A","B","C","D","E")
y = c("E","C","B","A")
match(y,x)
x[match(y,x)]
copy

2. Data frames, matrices and lists

1. Difference

(1) Vector vector - one-dimensional; matrix matrix - two-dimensional, only one data type is allowed; data.frame data frame - two-dimensional, each column only allows one data type

2. Practice questions

(1) #Find the median of the values ​​in the first column of c1 #Filter the rows where the last column value is a or c in c1

c1 <- read.csv("./exercise.csv")
median(c1$Petal.Length)  # Find the median of the values ​​in the first column of c1 
# or median(c1[,1])
c1[c1$Species %in% c("c","a"),] # Filter c1, the last column value is a or c rows
# or c1[c1$Species == "a"| c1$Species == "c",]
The error is of the form:
c1[c1$Species == c("c","a"),] #  One long and one short, can't be compared, they have completed a cycle
copy

(2) Modify row and column names

#Change row and column names
rownames(df) <- c("r1","r2","r3","r4")
#Modify only the name of a row/column
colnames(df)[2]="CHANGE"
copy

(3) Connection of two data frames merge(test1,test2,by="name") merge(test1,test3,by.x = "name",by.y = "NAME")

(4) Practice

1. Count which values ​​are in the last column of the built-in data iris, and how many times each value is repeated 2. Extract the first 5 rows and the first 4 columns of the built-in data iris, convert them to a matrix, and assign them to a. 3. Change the row names of a to flower1, flower2...flower5.

table(iris[,ncol(iris)])
a = as.data.frame(iris[1:5,1:4])
rownames(a) = paste0("flowers",1:5) # or
rownames(a) = paste0("flowers",1:nrow(a))
copy

(5) Use of match() function

## Use y as a template, sort the order of X, and then select the id column of x to the column name of y: match() function
# match(colnames(y),x$file_name)
# x[match(colnames(y),x$file_name),] 
# x$ID[match(colnames(y),x$file_name)]
colnames(y) = x$ID[match(colnames(y),x$file_name)]
copy

Three, several methods of loading packages

# method one:
install.packages("tidyr")
install.packages('BiocManager')
# Method Two:
BiocManager::install("ggplot2")
# Method three:
devtools::install_github("jmzeng1314/idmap1") #In parentheses, the author's username and the package name are written
# Method four:
if(!require(stringr))install.packages("stringr")

Mirror source recommendation:
# Tsinghua mirror
# http://mirrors.tuna.tsinghua.edu.cn/CRAN/
# http://mirrors.tuna.tsinghua.edu.cn/bioconductor/
  
# University of Science and Technology Mirror
# http://mirrors.ustc.edu.cn/CRAN/
# http://mirrors.ustc.edu.cn/bioc/
copy

Symbols in R

[ ] : vector, data frame, matrix subset [[ ]] : list subset

Fourth, read and write data

txt and csv

read.csv(): generally read csv format read.table(): generally read txt format

ex1 <- read.table("./ex1.txt",
                  header = T)

ex2 <- read.csv("./ex2.csv",
                row.names = 1) # The first column is the row name

soft <- read.table("./soft.txt",
                   sep = "\t", # separated by
                   fill = TRUE, # Autofill spaces
                   header = TRUE
                   )
write.table(ex1,file = "./ex1.txt") write.csv(ex2,file = "./ex2.csv")
copy

Rdata

save() ---> save

load() ---> load

save(ex1,file = "./ex1.Rdata")
load("./ex1.Rdata")
copy

Read in data, ID conversion

Case:

soft <- read.csv("./soft.csv",row.names = 1)
head(soft)
exp$symbol <- soft$GeneName[match(rownames(exp),soft$ID)] 
exp <- exp[!duplicated(exp$symbol),]
exp <- exp[!grepl("^ENST",exp$symbol),] 
rownames(exp) <- exp$symbol
exp = exp[,-ncol(exp)]
copy

5. Drawing

(1) Drawing

(1) Mapping: ggplot2, ggpubr, base (2) Puzzle: patchwork package, mfrow in par, grid.arrange, cowplot (3) Export:

#Image saving and exporting
# 1. ggplot2 series
ggsave(p,filename = "")

# 2. Universal: Syllogism
# Save format and file name
pdf("test.pdf")
dev.off() # close artboard
copy

(2) ggplot2 syntax

  1. ggplot2 special syntax: column names without quotes
  2. Property setting Mapping: assign color according to the content of a certain column of data Manual setting: set the graph to one or N colors, regardless of the data type
  1. actual combat
#1. Entry-level drawing template: drawing data, horizontal and vertical coordinates
ggplot(data = iris)+
  geom_point(mapping = aes(x = Sepal.Length,
                           y = Petal.Length))

#2. Attribute settings (color, size, transparency, point shape, line type, etc.)
ggplot(data = iris) + 
  geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length), 
             size = 5,     # Dot size 5mm
             alpha = 0.5,  # Transparency 50%
             shape = 8)  # point shape


## Specify a specific color for the map?
ggplot(data = iris)+
  geom_point(mapping = aes(x = Sepal.Length,
                           y = Petal.Length,
                           color = Species))+
  scale_color_manual(values = c("blue","grey","red"))

## Distinguish between color and fill two attributes
### 1 Both hollow and solid shapes use color to set the color
ggplot(data = iris)+
  geom_point(mapping = aes(x = Sepal.Length,
                           y = Petal.Length,
                           color = Species),
             shape = 17) #17, solid example

ggplot(data = iris)+
  geom_point(mapping = aes(x = Sepal.Length,
                           y = Petal.Length,
                           color = Species),
             shape = 2) #No. 2, the hollow example
### 2 There are both borders and inner cores, only the two parameters of color and fill are needed
ggplot(data = iris)+
  geom_point(mapping = aes(x = Sepal.Length,
                           y = Petal.Length,
                           color = Species),
             shape = 24,
             fill = "black") #Size 24, two-color example

#3. Faceted
ggplot(data = iris) + 
  geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) + 
  facet_wrap(~ Species) 

#double facet
dat = iris
dat$Group = sample(letters[1:5],150,replace = T)
ggplot(data = dat) + 
  geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) + 
  facet_grid(Group ~ Species)

#4. Geometric Objects
#Local and global settings
ggplot(data = iris) + 
  geom_smooth(mapping = aes(x = Sepal.Length, 
                          y = Petal.Length))+
  geom_point(mapping = aes(x = Sepal.Length, 
                           y = Petal.Length))

ggplot(data = iris,mapping = aes(x = Sepal.Length, y = Petal.Length))+
  geom_smooth()+
  geom_point()

#5. Statistical transformation usage scenarios
#5.1. No statistics, the data is directly plotted
fre = as.data.frame(table(diamonds$cut))
fre
ggplot(data = fre) +
  geom_bar(mapping = aes(x = Var1, y = Freq), stat = "identity")
#5.2 count changed to prop
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))

#6. Location relationship
# 6.1 Dithering Dot Plot
ggplot(data = iris,mapping = aes(x = Species, 
                                 y = Sepal.Width,
                                 fill = Species)) + 
  geom_boxplot()+
  geom_point()

ggplot(data = iris,mapping = aes(x = Species, 
                                 y = Sepal.Width,
                                 fill = Species)) + 
  geom_boxplot()+
  geom_jitter()

# 6.2 Stacked Histograms
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut,fill=clarity))

# 6.3 Side by side histogram
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
 
 #7. Coordinate system
#flip coord_flip()

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() +
  coord_flip()
#Polar coordinate system coord_polar()
bar <- ggplot(data = diamonds) + 
  geom_bar(
    mapping = aes(x = cut, fill = cut), 
    width = 1
  ) + 
  theme(aspect.ratio = 1) +
  labs(x = NULL, y = NULL)
bar
bar + coord_flip()
bar + coord_polar() 

# Practice Questions: Violin Plot + Box Plot
ggplot(iris,mapping = aes(x = Sepal.Width,y = Species)) +
  geom_violin(aes(fill = Species)) +
  geom_boxplot()+
  geom_jitter(aes(shape = Species)) 
copy

single facet

double facet

statistical transformation

Stacked histogram

side-by-side histogram

**

Violin + Boxplot

(3) ggpubr.R syntax

# There are a lot of pictures from ggpubr on sthda
library(ggpubr)
ggscatter(iris,x="Sepal.Length",
          y="Petal.Length",
          color="Species")

p <- ggboxplot(iris, x = "Species", 
               y = "Sepal.Length",
               color = "Species", 
               shape = "Species",
               add = "jitter")
p
my_comparisons <- list( c("setosa", "versicolor"), 
                        c("setosa", "virginica"), 
                        c("versicolor", "virginica") )
p + stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
  stat_compare_means(label.y = 9)
copy

(4) Storage of pictures

#Save and export of pictures #1. ggplot2 series ggsave(p,filename = "")

#2. General: syllogism save format and file name pdf(“test.pdf”) … … dev.off() # end

(5) Jigsaw puzzle

# patchwork package
p1.1 <- violin_plot(dat = dat,gene = dat$CCL5)
p1.2 <- violin_plot(dat = dat,gene = dat$MMP9)
p1.4 <- violin_plot(dat = dat,gene = dat$RAC2)
p1.5 <- violin_plot(dat = dat,gene = dat$CORO1A)
p1.6 <- violin_plot(dat = dat,gene = dat$CCL2)

library(patchwork)
p1 <-  (p1.1 | p1.2 ) /     # split into two lines
  (p1.4 | p1.5 | p1.6)
library(ggplot2)
ggsave("./vertify/GSE100927_vertify.pdf", plot = p1, width = 15, height = 18) 
1234567891011
copy

6. Topics

1. Sorting the data frame

  1. order or the arrange() function in the tidyverse
# order can sort a vector or a data frame
sort(test$Sepal.Length)
test$Sepal.Length[order(test$Sepal.Length)]

test[order(test$Sepal.Length),]
test[order(test$Sepal.Length,decreasing = T),]

# arrange, more flexible sorting

library(tidyverse)  # need to load this package
arrange(test, Sepal.Length)
arrange(test, desc(Sepal.Length))
arrange(test, desc(Sepal.Width),Sepal.Length) # Sort by Sepal.Width first, if the Sepal.Width column is the same, then sort by Sepal.Length column
copy
  1. mutate, select, filter, rename in the dplyr package mutate(): add column, rename(): rename column name select(): filter column; filter(): filter row Pipe symbol: %>%: ctrl + shift +m

2. Draw a boxplot of the expression matrix

As shown in the figure below, according to such an expression matrix, draw this picture, if you do not transform the table, it will not be successful

To turn a long table into a short table, the change operation is as follows

library(tidyr)
library(tibble)
library(dplyr)
dat = t(exp) %>% as.data.frame() %>% rownames_to_column() %>%
mutate(group = group_list)
copy

3. Connection

  1. inner_join: intersection
  2. left_join: left join
  3. right_join: right join
  4. full_join: full connection
library(dplyr)
inner_join(test1,test2,by="name")
inner_join(test1,test2,by=c("name" = "Name")

right_join(test1,test2,by="name")
full_join(test1,test2,by="name")
semi_join(test1,test2,by="name")
anti_join(test1,test2,by="name")
copy

merge(): function

4. String functions: load the stringr package

x <- "The birch canoe slid on the smooth planks."

x
###1. Detect string length
str_length(x)
length(x)
###2. String splitting
str_split(x," ")
x2 = str_split(x," ")[[1]];x2

y = c("jimmy 150","nicker 140","tony 152")
str_split(y," ")
str_split(y," ",simplify = T)

###3. Extract string by position
str_sub(x,5,9)

###4. Character detection
str_detect(x2,"h")

###5. String replacement
str_replace(x2,"o","A")
str_replace_all(x2,"o","A")

###6. Character deletion
str_remove(x," ")
str_remove_all(x," ")
copy

Posted by neilcooper33 on Mon, 25 Jul 2022 22:07:50 +0530