Write in front
I have learned more about R language recently, so I would like to summarize here. This part mainly talks about the graphics and some processing functions of R language. The commands are more mathematical. After all, R language was designed for statistics at the beginning.
P.S. because I used R Markdown to write R program, the two "\\\\\\\\\.
primary coverage
-
Common mathematical functions and statistical functions;
-
Statistical drawing function (histogram, nuclear density diagram, box line diagram, normal QQ diagram, stem leaf diagram, empirical distribution diagram, etc.)
-
High level drawing functions (plot, coplot, pairs, qqnorm, contour, persp, etc.)
-
High level drawing commands (add, axes, log, type, etc.)
-
Low level drawing function and parameter setting
Mathematical and statistical functions
abs(-3) ## [1] 3 sqrt(9) ## [1] 3 ceiling(5/3) ## [1] 2 floor(5/3) ## [1] 1 round(4.55) ## [1] 5 log(exp(10)) ## [1] 10 sin(pi/2) ## [1] 1 cos(pi/2) ## [1] 6.123032e-17 x <- c(1,2,3,3) mean(x) # Equivalent to meanx < - sum (x) /length (x); Meanx ## [1] 2.25 median(x) ## [1] 2.5 sd(x) ## [1] 0.9574271 var(x) ## [1] 0.9166667 min(x) ## [1] 1 max(x) ## [1] 3
Standardization of data
x <- c(1,3,5,4) scale(x) ## [,1] ## [1,] -1.317465 ## [2,] -0.146385 ## [3,] 1.024695 ## [4,] 0.439155 ## attr(,"scaled:center") ## [1] 3.25 ## attr(,"scaled:scale") ## [1] 1.707825
Probability function
x <- pretty(c(-3, 3), 30); x
## [1] -3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 ## [16] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 ## [31] 3.0
y <- dnorm(x) plot(x, y)
rnorm(50, mean = 20, sd = 8)
[1] 11.233327 22.000808 10.066406 19.273047 21.771780 34.621389 21.706101 [8] 21.934236 19.608269 16.698233 13.597833 22.653223 20.194896 25.730459 [15] 10.912291 18.101390 12.456593 21.678788 7.763024 27.193593 24.136049 [22] 31.140113 22.691629 18.891392 18.009354 28.952892 8.203124 16.267587 [29] 21.039319 26.668597 15.264060 15.474431 28.440294 14.970583 26.289378 [36] 18.113167 11.175129 2.085909 26.948591 12.651352 17.815405 13.490284 [43] 21.128309 41.396762 32.838635 14.187705 29.128805 16.050802 14.680583 [50] 31.128813
Generating pseudorandom numbers with normal distribution
runif(5)
## [1] 0.6973212 0.8353123 0.1633793 0.7737247 0.3019795
# Seed random numbers set.seed(12)
String handler
# Count the number of characters in a string nchar("abcde")
## [1] 5
# Extract string (generate substring) substr("abcde", 3, 5)
## [1] "cde"
# String lookup grep("a", c("a", "c", "b", "a"))
## [1] 1 4
# String substitution sub("a", "A", "abcde")
## [1] "Abcde"
# Segmentation of strings strsplit("abcde", "c")
## [1] "ab" "de"
strsplit("abcde", "") # Separate each character
## [1] "a" "b" "c" "d" "e"
# Merging of strings paste("Today is", "Tuesday.")
## [1] "Today is Tuesday."
# Case conversion function toupper("abc")
## [1] "ABC"
tolower("ABc")
## [1] "abc"
Functions applied to matrices and data frames
b <- matrix(runif(12), nrow=3) # Functions dealing with matrices log(b) # Take the natural logarithm of each element of the matrix
## [,1] [,2] [,3] [,4] ## [1,] -2.66843174 -1.311625 -1.7215713 -4.7885131 ## [2,] -0.20116780 -1.775799 -0.4436883 -0.9347165 ## [3,] -0.05909021 -3.384469 -3.7775907 -0.2059417
mean(b) # Average all elements of the matrix
## [1] 0.3633845
# The apply function processes the matrix by dimension apply(b, 1, mean)
## [1] 0.1314632 0.5053715 0.4533189
# The happy function processes the list to get the processing result of each component of the list x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE)) lapply(x, mean) # The logic value TRUE is 1 by default, and the logic value FALSE is 0 by default
## $a ## [1] 5.5 ## ## $beta ## [1] 4.535125 ## ## $logic ## [1] 0.5
Graphic drawing
Histogram rendering (hist)
Used to represent the distribution of frequencies
# Basic histogram rendering x <- mtcars$mpg; x
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 ## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 ## [31] 15.0 21.4
hist(x)
# Parameter settings. breaks indicates the number of groups divided, and the default y-axis indicates the frequency hist(x, breaks = 12, col = "red", xlab = "Miles Per Callon")
# freq=F set y-axis to represent probability density hist(x, freq = F, breaks = 12, col = "green", xlab = "Miles Per Callon") # Axonometric drawing rug(jitter(x)) # Add noise lines(density(x), col = "red", lwd = 2) # Frequency variation
Drawing of nuclear density map
A tool for observing the distribution of continuous variables
The x-axis represents the value, and the y-axis represents the density (probability) of the value in all data
x <- density(mtcars$mpg); x
## ## Call: ## density.default(x = mtcars$mpg) ## ## Data: mtcars$mpg (32 obs.); Bandwidth 'bw' = 2.477 ## ## x y ## Min. : 2.97 Min. :6.481e-05 ## 1st Qu.:12.56 1st Qu.:5.461e-03 ## Median :22.15 Median :1.926e-02 ## Mean :22.15 Mean :2.604e-02 ## 3rd Qu.:31.74 3rd Qu.:4.530e-02 ## Max. :41.33 Max. :6.795e-02
plot(x)
attach(mtcars) library(sm)
sm.density.compare(mpg, cyl, xlab = "Miles Per Gallon")
Box diagram
boxplot(mtcars$mpg, main = "Box Plot", ylab = "Miles per gallon")
boxplot(mpg~cyl, data=mtcars, main = "Box Plot", xlab = "Number of Cylinders", ylab = "Miles per gallon")
Experience distribution map
Suitable for continuous distribution
w <- c(75.0, 64.0, 47.4, 66.9, 62.2, 62.2, 58.7, 63.5, 66.6, 64.0, 57.0, 69.0, 56.9, 50.0, 72.0); w
## [1] 75.0 64.0 47.4 66.9 62.2 62.2 58.7 63.5 66.6 64.0 57.0 69.0 56.9 50.0 72.0
# Sum up five numbers, calculate the two maxima and three quantiles in the data fivenum(w)
## [1] 47.40 57.85 63.50 66.75 75.00
# Drawing of empirical distribution map ecdf(w) # Calculate the numerical vector obtained from the empirical distribution function
## Empirical CDF ## Call: ecdf(w) ## x[1:13] = 47.4, 50, 56.9, ..., 72, 75
plot(ecdf(w),verticals = TRUE, do.p = TRUE) x <- 44:78 lines(x, pnorm(x, mean(w), sd(w)))
Normal QQ chart
The inverse function of the distribution function of the normal distribution is the uniform distribution on 0-1
w <- c(75.0, 64.0, 47.4, 66.9, 62.2, 62.2, 58.7, 63.5, 66.6, 64.0, 57.0, 69.0, 56.9, 50.0, 72.0); w
## [1] 75.0 64.0 47.4 66.9 62.2 62.2 58.7 63.5 66.6 64.0 57.0 69.0 56.9 50.0 72.0
qqnorm(w) qqline(w)
Stem leaf diagram
x<-c(25, 45, 50, 54, 55, 61, 64, 68, 72, 75, 75, 78, 79, 81, 83, 84, 84, 84, 85, 86, 86, 86, 87, 89, 89, 89, 90, 91, 91, 92, 100); x
## [1] 25 45 50 54 55 61 64 68 72 75 75 78 79 81 83 84 84 84 85 ## [20] 86 86 86 87 89 89 89 90 91 91 92 100
stem(x)
## ## The decimal point is 1 digit(s) to the right of the | ## ## 2 | 5 ## 3 | ## 4 | 5 ## 5 | 045 ## 6 | 148 ## 7 | 25589 ## 8 | 1344456667999 ## 9 | 0112 ## 10 | 0
High level mapping function and low level mapping function
High level mapping functions include: plot(), coplot(), pairs(), qqnorm(), qqline(), hist(), contour(), which can generate graphs and customize parameters;
However, the low-level mapping function cannot generate graphics by itself, and can only add new graphics on the basis of high-level mapping function.
High level mapping function
- plot() function
Plot the scatter diagram and curve of data.
There are four drawing methods: two vector scatter diagram, time series scatter diagram (scatter diagram of one-way quantity with respect to subscript and complex vector), box graph of factors, and scatter diagram composed of different indicators (regression diagnosis diagram, etc.).
x <- c(1,3,2,3,3,5); y <- c(3,2,3,4,5,6); z <- complex(re = x, im = y); plot(x)
plot(x, y)
plot(z)
# Box plot of factors y<-c(1600, 1610, 1650, 1680, 1700, 1700, 1780, 1500, 1640, 1400, 1700, 1750, 1640, 1550, 1600, 1620, 1640, 1600, 1740, 1800, 1510, 1520, 1530, 1570, 1640, 1600) f<-factor(c(rep(1,7),rep(2,5), rep(3,8), rep(4,6))) plot(f,y)
# Scatter diagram of indicator composition of data frame df<-data.frame( Age=c(13, 13, 14, 12, 12, 15, 11, 15, 14, 14, 14, 15, 12, 13, 12, 16, 12, 11, 15 ), Height=c(56.5, 65.3, 64.3, 56.3, 59.8, 66.5, 51.3, 62.5, 62.8, 69.0, 63.5, 67.0, 57.3, 62.5, 59.0, 72.0, 64.8, 57.5, 66.5), Weight=c( 84.0, 98.0, 90.0, 77.0, 84.5, 112.0, 50.5, 112.5, 102.5, 112.5, 102.5, 133.0, 83.0, 84.0, 99.5, 150.0, 128.0, 85.0, 112.0)) plot(df)
attach(df) # Scatter plot of height and age indicators plot(~Age+Height)
# Scatter plot of weight versus age and height plot(Weight~Age+Height)
- Functions for plotting multivariable data
pairs() function, when the data is a matrix or data frame, draw the scatter diagram of the matrix about its columns
The coplot() function can draw a more detailed scatter diagram, and can also represent the relationship between the indicators in each column
# Consistent with the result of the plot() function, it is a boxplot pairs(df)
# Draw the scatter diagram of indicators. The following is the scatter diagram of weight and height by age coplot(Weight ~ Height | Age)
- qqnorm(), hist(), dotchart(), contour(), image(), persp(), etc
dotchart() function draws the dot graph of data x
# Population mortality point map of Virginia in 1940 dotchart(VADeaths, main = "Death Rates in Virginia - 1940")
dotchart(t(VADeaths), main = "Death Rates in Virginia - 1940")
contour(), image(), persp() function to draw contour map of mountain area
x <- seq(0,2800, 400); y <- seq(0,2400,400); z <- c(1180,1320,1450,1420,1400,1300,700,900, 1230,1390,1500,1500,1400,900,1100,1060, 1270,1500,1200,1100,1350,1450,1200,1150, 1370,1500,1200,1100,1550,1600,1550,1380, 1460,1500,1550,1600,1550,1600,1600,1600, 1450,1480,1500,1550,1510,1430,1300,1200, 1430,1450,1470,1320,1280,1200,1080,940) Z <- matrix(z, nrow = 8) # Draw image map image(x, y, Z)
# Draw contour map contour(x, y, Z, levels = seq(min(z), max(z), by = 50))
# Draw 3D surfaces persp(x, y, Z, theta=30, phi=45, expand=.3)
Commands in high level drawings
- Logical commands in the diagram
Add a new figure to the original figure: add=T, the default is F, that is, directly replace the original figure
Display coordinate axis: axes=F, default to T, that is, display coordinate axis
contour(x, y, Z) contour(x, y, Z, levels = seq(min(z), max(z), by = 80), col=5, add=T)
contour(x, y, Z, levels = seq(min(z), max(z), by = 50), axes=F)
- Data logarithm
Logarithm of x axis: log="x"
Logarithm of y axis: log="y"
Take logarithm of x and y axes simultaneously: log="xy"
x1 <- 1:10; x2 <- 4:13; plot(x1, x2, log="x")
plot(x1, x2, log="xy", col=3)
- type command
Set parameters for plotting scatter
Default: type="p"
Solid line diagram: type="l"
Points are connected by solid lines (do not pass through points):type="b"
Solid line passes through all points: type="o"
Make a vertical line through the x axis: type="h"
Draw ladder curve: type="s"
Do not draw any points and curves: type="n"
x1 <- 1:10; x2 <- 4:13; plot(x1, x2, type = "s")
- Other drawing commands
pch: setting symbols for drawing
cex: set the size of the symbol (relative to the size of the drawing, numerical representation)
lty: set alignment
lwd: set lineweight
xlab(ylab): axis title
Main: main title
Sub: sub title
x1 <- 1:10; x2 <- 4:13; plot(x1, x2, pch=12, cex=3, lty=2, lwd=2, col="red", xlab="x axis", ylab="y axis", main="straight line", sub="Minor line")
Low level mapping function
points(): add points
lines(): Lines
text(): add a mark at a point on the graph
abline(): add a line to the graph. abline(a, b) means to draw a line y=bx+ay=bx+ay=bx+a, and h=y and v=x respectively mean a line parallel to the coordinate axis
title(main="", sub= ""): add a mark, description or other content to the diagram
axis(side):1, 2, 3 and 4 of the side indicate bottom, left, top and right
legend(title, location): add a legend
x <- 1:10; y <- 4:13; plot(x, y) xp <- c(8, 3, 4); yp <- c(9, 10, 5); points(xp, yp, pch=16, col="green") lines(xp, yp, pch=16, col="blue") text(x, y) abline(3,4) legend("topleft", inset = .01, "legend", c("A", "B"), lty=c(1, 2), pch=c(15, 17))