R language learning note processing function and basic graphics drawing

Write in front

I have learned more about R language recently, so I would like to summarize here. This part mainly talks about the graphics and some processing functions of R language. The commands are more mathematical. After all, R language was designed for statistics at the beginning.

P.S. because I used R Markdown to write R program, the two "\\\\\\\\\.

primary coverage

  1. Common mathematical functions and statistical functions;

  2. Statistical drawing function (histogram, nuclear density diagram, box line diagram, normal QQ diagram, stem leaf diagram, empirical distribution diagram, etc.)

  3. High level drawing functions (plot, coplot, pairs, qqnorm, contour, persp, etc.)

  4. High level drawing commands (add, axes, log, type, etc.)

  5. Low level drawing function and parameter setting

Mathematical and statistical functions

abs(-3)
## [1] 3
sqrt(9)
## [1] 3
ceiling(5/3)
## [1] 2
floor(5/3)
## [1] 1
round(4.55)
## [1] 5
log(exp(10))
## [1] 10
sin(pi/2)
## [1] 1
cos(pi/2)
## [1] 6.123032e-17
x <- c(1,2,3,3)
mean(x) # Equivalent to meanx < - sum (x) /length (x); Meanx
## [1] 2.25
median(x)
## [1] 2.5
sd(x)
## [1] 0.9574271
var(x)
## [1] 0.9166667
min(x)
## [1] 1
max(x)
## [1] 3

Standardization of data

x <- c(1,3,5,4)
scale(x)
##           [,1]
## [1,] -1.317465
## [2,] -0.146385
## [3,]  1.024695
## [4,]  0.439155
## attr(,"scaled:center")
## [1] 3.25
## attr(,"scaled:scale")
## [1] 1.707825

Probability function

x <- pretty(c(-3, 3), 30); x
##  [1] -3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2
## [16]  0.0  0.2  0.4  0.6  0.8  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8
## [31]  3.0
y <- dnorm(x)
plot(x, y)
rnorm(50, mean = 20, sd = 8)
[1] 11.233327 22.000808 10.066406 19.273047 21.771780 34.621389 21.706101

[8] 21.934236 19.608269 16.698233 13.597833 22.653223 20.194896 25.730459

[15] 10.912291 18.101390 12.456593 21.678788  7.763024 27.193593 24.136049

[22] 31.140113 22.691629 18.891392 18.009354 28.952892  8.203124 16.267587

[29] 21.039319 26.668597 15.264060 15.474431 28.440294 14.970583 26.289378

[36] 18.113167 11.175129  2.085909 26.948591 12.651352 17.815405 13.490284

[43] 21.128309 41.396762 32.838635 14.187705 29.128805 16.050802 14.680583

[50] 31.128813

Generating pseudorandom numbers with normal distribution

runif(5)
## [1] 0.6973212 0.8353123 0.1633793 0.7737247 0.3019795
# Seed random numbers
set.seed(12)

String handler

# Count the number of characters in a string
nchar("abcde")
## [1] 5
# Extract string (generate substring)
substr("abcde", 3, 5)
## [1] "cde"
# String lookup
grep("a", c("a", "c", "b", "a"))
## [1] 1 4
# String substitution
sub("a", "A", "abcde")
## [1] "Abcde"
# Segmentation of strings
strsplit("abcde", "c")
## [1] "ab" "de"
strsplit("abcde", "") # Separate each character
## [1] "a" "b" "c" "d" "e"
# Merging of strings
paste("Today is", "Tuesday.")
## [1] "Today is Tuesday."
# Case conversion function
toupper("abc")
## [1] "ABC"
tolower("ABc")
## [1] "abc"

Functions applied to matrices and data frames

b <- matrix(runif(12), nrow=3)
# Functions dealing with matrices
log(b) # Take the natural logarithm of each element of the matrix
##             [,1]      [,2]       [,3]       [,4]
## [1,] -2.66843174 -1.311625 -1.7215713 -4.7885131
## [2,] -0.20116780 -1.775799 -0.4436883 -0.9347165
## [3,] -0.05909021 -3.384469 -3.7775907 -0.2059417
mean(b) # Average all elements of the matrix
## [1] 0.3633845
# The apply function processes the matrix by dimension
apply(b, 1, mean)
## [1] 0.1314632 0.5053715 0.4533189
# The happy function processes the list to get the processing result of each component of the list
x <- list(a = 1:10, beta = exp(-3:3), 
		logic = c(TRUE,FALSE,FALSE,TRUE))
lapply(x, mean) # The logic value TRUE is 1 by default, and the logic value FALSE is 0 by default
## $a
## [1] 5.5
## 
## $beta
## [1] 4.535125
## 
## $logic
## [1] 0.5

Graphic drawing

Histogram rendering (hist)

Used to represent the distribution of frequencies

# Basic histogram rendering
x <- mtcars$mpg; x
##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
hist(x)
# Parameter settings. breaks indicates the number of groups divided, and the default y-axis indicates the frequency
hist(x, breaks = 12, col = "red", xlab = "Miles Per Callon")
# freq=F set y-axis to represent probability density
hist(x, freq = F, breaks = 12, col = "green",
		 xlab = "Miles Per Callon")
# Axonometric drawing
rug(jitter(x)) # Add noise
lines(density(x), col = "red", lwd = 2) # Frequency variation

Drawing of nuclear density map

A tool for observing the distribution of continuous variables
The x-axis represents the value, and the y-axis represents the density (probability) of the value in all data

x <- density(mtcars$mpg); x
## 
## Call:
## 	density.default(x = mtcars$mpg)
## 
## Data: mtcars$mpg (32 obs.);	Bandwidth 'bw' = 2.477
## 
##        x               y            
##  Min.   : 2.97   Min.   :6.481e-05  
##  1st Qu.:12.56   1st Qu.:5.461e-03  
##  Median :22.15   Median :1.926e-02  
##  Mean   :22.15   Mean   :2.604e-02  
##  3rd Qu.:31.74   3rd Qu.:4.530e-02  
##  Max.   :41.33   Max.   :6.795e-02
plot(x)
attach(mtcars)
library(sm)
sm.density.compare(mpg, cyl, xlab = "Miles Per Gallon")

Box diagram

boxplot(mtcars$mpg, main = "Box Plot", ylab = "Miles per gallon")
boxplot(mpg~cyl, data=mtcars, main = "Box Plot", 
		xlab = "Number of Cylinders", ylab = "Miles per gallon")

Experience distribution map

Suitable for continuous distribution

w <- c(75.0, 64.0, 47.4, 66.9, 62.2, 62.2, 58.7, 63.5,
66.6, 64.0, 57.0, 69.0, 56.9, 50.0, 72.0); w
##  [1] 75.0 64.0 47.4 66.9 62.2 62.2 58.7 63.5 66.6 64.0 57.0 69.0 56.9 50.0 72.0
# Sum up five numbers, calculate the two maxima and three quantiles in the data
fivenum(w)
## [1] 47.40 57.85 63.50 66.75 75.00
# Drawing of empirical distribution map
ecdf(w) # Calculate the numerical vector obtained from the empirical distribution function
## Empirical CDF 
## Call: ecdf(w)
##  x[1:13] =   47.4,     50,   56.9,  ...,     72,     75
plot(ecdf(w),verticals = TRUE, do.p = TRUE)
x <- 44:78
lines(x, pnorm(x, mean(w), sd(w)))

Normal QQ chart

The inverse function of the distribution function of the normal distribution is the uniform distribution on 0-1

w <- c(75.0, 64.0, 47.4, 66.9, 62.2, 62.2, 58.7, 63.5,
66.6, 64.0, 57.0, 69.0, 56.9, 50.0, 72.0); w
##  [1] 75.0 64.0 47.4 66.9 62.2 62.2 58.7 63.5 66.6 64.0 57.0 69.0 56.9 50.0 72.0
qqnorm(w)
qqline(w)

Stem leaf diagram

x<-c(25, 45, 50, 54, 55, 61, 64, 68, 72, 75, 75,
78, 79, 81, 83, 84, 84, 84, 85, 86, 86, 86,
87, 89, 89, 89, 90, 91, 91, 92, 100); x
##  [1]  25  45  50  54  55  61  64  68  72  75  75  78  79  81  83  84  84  84  85
## [20]  86  86  86  87  89  89  89  90  91  91  92 100
stem(x)
## 
##   The decimal point is 1 digit(s) to the right of the |
## 
##    2 | 5
##    3 | 
##    4 | 5
##    5 | 045
##    6 | 148
##    7 | 25589
##    8 | 1344456667999
##    9 | 0112
##   10 | 0

High level mapping function and low level mapping function

High level mapping functions include: plot(), coplot(), pairs(), qqnorm(), qqline(), hist(), contour(), which can generate graphs and customize parameters;
However, the low-level mapping function cannot generate graphics by itself, and can only add new graphics on the basis of high-level mapping function.

High level mapping function

  1. plot() function
    Plot the scatter diagram and curve of data.

There are four drawing methods: two vector scatter diagram, time series scatter diagram (scatter diagram of one-way quantity with respect to subscript and complex vector), box graph of factors, and scatter diagram composed of different indicators (regression diagnosis diagram, etc.).

x <- c(1,3,2,3,3,5);
y <- c(3,2,3,4,5,6);
z <- complex(re = x, im = y);
plot(x)
plot(x, y)
plot(z)
# Box plot of factors
y<-c(1600, 1610, 1650, 1680, 1700, 1700, 1780, 1500, 1640,
1400, 1700, 1750, 1640, 1550, 1600, 1620, 1640, 1600,
1740, 1800, 1510, 1520, 1530, 1570, 1640, 1600)
f<-factor(c(rep(1,7),rep(2,5), rep(3,8), rep(4,6)))
plot(f,y)
# Scatter diagram of indicator composition of data frame
df<-data.frame(
Age=c(13, 13, 14, 12, 12, 15, 11, 15, 14, 14, 14,
        15, 12, 13, 12, 16, 12, 11, 15 ),
Height=c(56.5, 65.3, 64.3, 56.3, 59.8, 66.5, 51.3,
        62.5, 62.8, 69.0, 63.5, 67.0, 57.3, 62.5,
        59.0, 72.0, 64.8, 57.5, 66.5),
Weight=c( 84.0, 98.0, 90.0, 77.0, 84.5, 112.0, 50.5, 
          112.5, 102.5, 112.5, 102.5, 133.0, 83.0, 
          84.0, 99.5, 150.0, 128.0, 85.0, 112.0))
plot(df)
attach(df)
# Scatter plot of height and age indicators
plot(~Age+Height)
# Scatter plot of weight versus age and height
plot(Weight~Age+Height)
  1. Functions for plotting multivariable data

pairs() function, when the data is a matrix or data frame, draw the scatter diagram of the matrix about its columns
The coplot() function can draw a more detailed scatter diagram, and can also represent the relationship between the indicators in each column

# Consistent with the result of the plot() function, it is a boxplot
pairs(df)
# Draw the scatter diagram of indicators. The following is the scatter diagram of weight and height by age
coplot(Weight ~ Height | Age)
  1. qqnorm(), hist(), dotchart(), contour(), image(), persp(), etc

dotchart() function draws the dot graph of data x

# Population mortality point map of Virginia in 1940
dotchart(VADeaths, main = "Death Rates in Virginia - 1940")
dotchart(t(VADeaths), main = "Death Rates in Virginia - 1940")

contour(), image(), persp() function to draw contour map of mountain area

x <- seq(0,2800, 400); y <- seq(0,2400,400);
z <- c(1180,1320,1450,1420,1400,1300,700,900,
        1230,1390,1500,1500,1400,900,1100,1060,
        1270,1500,1200,1100,1350,1450,1200,1150,
        1370,1500,1200,1100,1550,1600,1550,1380,
        1460,1500,1550,1600,1550,1600,1600,1600,
        1450,1480,1500,1550,1510,1430,1300,1200,
        1430,1450,1470,1320,1280,1200,1080,940)
Z <- matrix(z, nrow = 8)
# Draw image map
image(x, y, Z)
# Draw contour map
contour(x, y, Z, levels = seq(min(z), max(z), by = 50))
# Draw 3D surfaces
persp(x, y, Z, theta=30, phi=45, expand=.3)

Commands in high level drawings

  1. Logical commands in the diagram

Add a new figure to the original figure: add=T, the default is F, that is, directly replace the original figure
Display coordinate axis: axes=F, default to T, that is, display coordinate axis

contour(x, y, Z)
contour(x, y, Z, levels = seq(min(z), 
		max(z), by = 80), col=5, add=T)
contour(x, y, Z, levels = seq(min(z), max(z), by = 50), axes=F)
  1. Data logarithm

Logarithm of x axis: log="x"
Logarithm of y axis: log="y"
Take logarithm of x and y axes simultaneously: log="xy"

x1 <- 1:10; x2 <- 4:13;
plot(x1, x2, log="x")
plot(x1, x2, log="xy", col=3)
  1. type command
    Set parameters for plotting scatter
    Default: type="p"
    Solid line diagram: type="l"
    Points are connected by solid lines (do not pass through points):type="b"
    Solid line passes through all points: type="o"
    Make a vertical line through the x axis: type="h"
    Draw ladder curve: type="s"
    Do not draw any points and curves: type="n"
x1 <- 1:10; x2 <- 4:13;
plot(x1, x2, type = "s")
  1. Other drawing commands

pch: setting symbols for drawing
cex: set the size of the symbol (relative to the size of the drawing, numerical representation)
lty: set alignment
lwd: set lineweight
xlab(ylab): axis title
Main: main title
Sub: sub title

x1 <- 1:10; x2 <- 4:13;
plot(x1, x2, pch=12, cex=3, lty=2, lwd=2, col="red", 
		xlab="x axis", ylab="y axis", main="straight line", sub="Minor line")

Low level mapping function

points(): add points
lines(): Lines
text(): add a mark at a point on the graph
abline(): add a line to the graph. abline(a, b) means to draw a line y=bx+ay=bx+ay=bx+a, and h=y and v=x respectively mean a line parallel to the coordinate axis
title(main="", sub= ""): add a mark, description or other content to the diagram
axis(side):1, 2, 3 and 4 of the side indicate bottom, left, top and right
legend(title, location): add a legend

x <- 1:10; y <- 4:13;
plot(x, y)
xp <- c(8, 3, 4); yp <- c(9, 10, 5);
points(xp, yp, pch=16, col="green")
lines(xp, yp, pch=16, col="blue")
text(x, y)
abline(3,4)
legend("topleft", inset = .01, "legend", 
		c("A", "B"), lty=c(1, 2), pch=c(15, 17))

Tags: R Language

Posted by stevehossy on Wed, 01 Jun 2022 17:10:50 +0530