# 2. Data analysis - matplotlib

## Introduction

Why learn matplotlib

• Can visualize data more intuitively
• Make data more objective and persuasive

what is matplotlib
matplotlib: The most popular python underlying drawing library, mainly for data visualization icons, the name is taken from MATLAB, and it is built by imitating MATLAB.

Statistical chart comparison introduction

## line chart

### Instructions

Line chart drawable with matplotlib

```from matplotlib import pyplot as plt    #Import matplotlib's pyplot and alias plt
x=[2,4,8]   #the position of the data on the x-axis
y=[2,3,4]   #the position of the data on the y-axis
plt.plot(x,y)   #Pass in x and y, and draw a line chart through plot
plt.show()  #display graphics
```

### Set the axis scale, save the picture

```from matplotlib import pyplot as plt    #Import matplotlib's pyplot and alias plt
x = range (2,26,2)
y = [15,13,14.5,17 ,20,25,26,26,24,22,18,15]

#set image size
plt.figure(figsize=(20,8),dpi=80)

#Pass in x and y, and draw a line chart through plot
plt.plot(x,y)

#Set the scale of the x-axis
plt.xticks(x)
plt.xticks(x[::1])
plt.yticks(range(min(y),max(y)+1,1))
#save Picture
# plt.savefig('./t1.png') #It can be saved as a vector diagram in svg format, and there will be no aliasing when zoomed in

#display graphics
plt.show()

```

### Set image size and resolution

Plot a temperature line graph
If the list a represents the temperature every minute from 10:00 to 12:00, how to draw a line graph to observe the change of the temperature every minute?

```from matplotlib import pyplot as plt
import random
x=range(0,120)
y = [random.randint(20,35) for i in range(120)]
#set image size
plt.figure(figsize=(20,8),dpi=80)
plt.plot(x,y)
plt.show()
```

### set font

Optimization of Line Charts

```from matplotlib import pyplot as plt
import random
import matplotlib   #Required for the first font setting method
from matplotlib import font_manager #Required for second font setting

# #Windows and Linux font setting method (using matplotlib.rc) - using this method may report an error, because different computers have different family parameters (font name)
# font = {'family': 'MicroSoft YaHei',
#         'weight': 'bold',
#         'size': 'larger'}
# matplotlib.rc('font', **font)  # pass in the font dict as kwargs
# matplotlib.rc('font', family='MicroSoft YaHei')

#How to set another font
my_font=font_manager.FontProperties(fname='D:\project\msyh.ttc')

x=range(0,120)
y = [random.randint(20,35) for i in range(120)]

#set image size
plt.figure(figsize=(20,8),dpi=80)

plt.plot(x,y)

#Adjust the scale of the x-axis
_xtick_labels=["10 point{}Minute".format(i) for i in range(60)]
_xtick_labels+=["11 point{}Minute".format(i) for i in range(60)]
plt.xticks(list(x)[::3],_xtick_labels[::3],rotation=45,fontproperties=my_font)
plt.show()
```

### Set the axis description and title

```from matplotlib import pyplot as plt
import random
import matplotlib   #Required for the first font setting method
from matplotlib import font_manager #Required for second font setting

# #Font setting method for Windows and Linux (using matplotlib.rc)
# font = {'family': 'MicroSoft YaHei',
#         'weight': 'bold',
#         'size': 'larger'}
# matplotlib.rc('font', **font)  # pass in the font dict as kwargs
# matplotlib.rc('font', family='MicroSoft YaHei')

#How to set another font
my_font=font_manager.FontProperties(fname='D:\project\winter vacation python2\msyh.ttc')

x=range(0,120)
y = [random.randint(20,35) for i in range(120)]

#set image size
plt.figure(figsize=(20,8),dpi=80)

plt.plot(x,y)

#Adjust the scale of the x-axis
_xtick_labels=["10 point{}Minute".format(i) for i in range(60)]
_xtick_labels+=["11 point{}Minute".format(i) for i in range(60)]
plt.xticks(list(x)[::3],_xtick_labels[::3],rotation=45,fontproperties=my_font)

plt.xlabel('time',fontproperties=my_font)
plt.ylabel('temperature (unitâ„ƒ)',fontproperties=my_font)
plt.title('10 Temperature change every minute from 12:00 to 12:00',fontproperties=my_font)
plt.show()
```

### Set legend, line style

Assuming that everyone is 30 years old, according to their actual situation, the number of female (male) friends that you and your tablemate have made each year from the age of 11 to 30 are counted, such as list a and b, please show in a picture Draw a line chart of the data to compare the difference between yourself and the same table for 20 years, and analyze the trend of the number of female (male) friends each year
a = [1,0,1,1,2,4,3,2,3,4,4,5,6,5,4,3,3,1,1,1]
b = [1,0,3,1,2,2,3,3,2,1 ,2,1,1,1,1,1,1,1,1,1]
Require:
The y-axis represents the number of
The x-axis represents the age, such as 11 years old, 12 years old, etc.

```from matplotlib import pyplot as plt
from matplotlib import font_manager

y_1= [1,0,1,1,2,4,3,2,3,4,4,5,6,5,4,3,3,1,1,1]
y_2=[1,0,3,1,2,2,3,3,2,1 ,2,1,1,1,1,1,1,1,1,1]

x=range(11,31)
my_font=font_manager.FontProperties(fname='D:\project\winter vacation python2\msyh.ttc')

#set size
plt.figure(figsize=(20,8),dpi=80)

#set line style
plt.plot(x,y_1,label='Own',color='r',linestyle='--')
plt.plot(x,y_2,label='Deskmate',color='b')

plt.legend(prop=my_font,loc='best')    #Here the prop parameter is used to set the font, and the loc parameter is used to set the location. For details, please refer to the source code

#Set the x-axis scale
xticks_label=['{}age'.format(i) for i in x]
plt.xticks(x,xticks_label,fontproperties=my_font)
plt.yticks(range(0,9))

#set grid
plt.grid(alpha=0.3) #alpha parameter is transparency

plt.show()
```

## Scatter plot

Instructions

```plt.scatter(x,y)
```

example

Assuming that you have obtained the daily maximum temperature in Beijing in March and October in 2016 through the crawler (respectively in list a,b), how to find a certain law of temperature and time (day) changes at this time?
a=[11,17,16,11,12,11,12,6,6,7,8,9,12,15,14,17,18,21,16,17,20,14,15,15,15,19,21,22,22,22,23]
b = [26,26,28,19,21,17,16,19,18,20,20,19,22,23,17,20,21,20,22,15,11,15,5,13,17,10,11,13,12,13,6]

```from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font=font_manager.FontProperties(fname='D:\project\winter vacation python2\msyh.ttc')

y_3 = [11,17,16,11,12,11,12,6,6,7,8,9,12,15,14,17,18,21,16,17,20,14,15,15,15,19,21,22,22,22,23]
y_10 = [26,26,28,19,21,17,16,19,18,20,20,19,22,23,17,20,21,20,22,15,11,15,5,13,17,10,11,13,12,13,6]

x_3=range(1,32)
x_10=range(51,82)

#Set the figure size
plt.figure(figsize=(20,8),dpi=80)

#draw
plt.scatter(x_3,y_3,label='3 month')
plt.scatter(x_10,y_10,label='10 month')

#Adjust the scale of the x-axis
_x=list(x_3)+list(x_10)
_xtick_labels=["3 moon{}day".format(i) for i in x_3]
_xtick_labels+=["10 moon{}day".format(i-50) for i in x_10]
plt.xticks(_x[::3],_xtick_labels[::3],fontproperties=my_font,rotation=45)

plt.legend(loc="upper left",prop=my_font)

plt.xlabel('time',fontproperties=my_font)
plt.ylabel('temperature',fontproperties=my_font)
plt.title('title',fontproperties=my_font)

#show
plt.show()

```

## bar chart

### draw vertical bar chart

```from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font=font_manager.FontProperties(fname='D:\project\winter vacation python2\msyh.ttc')

a = ["wolf warrior 2","Fast and Furious 8","Kung Fu Yoga","Journey to the West: Fighting the Demons","Transformers 5: The Last Knight","Wrestling! dad","Pirates of the Caribbean: Dead Men Tell No Tales","Kong: Skull Island","Xtreme: The Ultimate Return","Resident Evil 6: The Final Chapter","Ride the wind and waves","Despicable Me 3","Outsmart Tiger Mountain","make trouble","Wolverine 3: The Last Stand","Spider-Man: Homecoming","The Legend of Wukong","Guardians of the Galaxy 2","lover","new mummy",]

b=[56.01,26.94,17.53,16.49,15.45,12.96,11.8,11.61,11.28,11.12,10.49,10.3,8.75,7.55,7.32,6.99,6.88,6.86,6.58,6.23]

#Set the figure size
plt.figure(figsize=(20,15),dpi=80)
#draw bar chart
plt.bar(range(len(a)),b)
#set string to x-axis
plt.xticks(range(len(a)),a,fontproperties=my_font,rotation=90)

plt.savefig('./movie.png')

plt.show()
```

### Draw a horizontal bar chart

```from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font=font_manager.FontProperties(fname='D:\project\winter vacation python2\msyh.ttc')

a = ["wolf warrior 2","Fast and Furious 8","Kung Fu Yoga","Journey to the West: Fighting the Demons","Transformers 5: The Last Knight","Wrestling! dad","Pirates of the Caribbean: Dead Men Tell No Tales","Kong: Skull Island","Xtreme: The Ultimate Return","Resident Evil 6: The Final Chapter","Ride the wind and waves","Despicable Me 3","Outsmart Tiger Mountain","make trouble","Wolverine 3: The Last Stand","Spider-Man: Homecoming","The Legend of Wukong","Guardians of the Galaxy 2","lover","new mummy",]

b=[56.01,26.94,17.53,16.49,15.45,12.96,11.8,11.61,11.28,11.12,10.49,10.3,8.75,7.55,7.32,6.99,6.88,6.86,6.58,6.23]

#Set the figure size
plt.figure(figsize=(20,15),dpi=80)
#draw bar chart
plt.barh(range(len(a)),b,height=0.3,color='orange')
#set string to x-axis
plt.yticks(range(len(a)),a,fontproperties=my_font)

plt.grid(alpha=0.3)

plt.savefig('./movie.png')

plt.show()
```

### Plot bar chart multiple times

case

Suppose you know the box office of the movies in list a on 2017-09-14(b_14), 2017-09-15(b_15), and 2017-09-16(b_16) respectively, in order to show the box office of the movies in the list and Compared with the data of other movies, how should the data be presented more intuitively?
a = ["Rise of the Planet of the Apes: Final Battle", "Dunkirk", "Spider-Man: Homecoming", "Wolf Warriors 2"]
b_16 = [15746,312,4497,319]
b_15 = [12357,156,2045,168]
b_14 = [2358,399,2358,362]
Data source: http://www.cbooo.cn/movieday

```from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font=font_manager.FontProperties(fname='D:\project\winter vacation python2\msyh.ttc')

a = ["Rise of the Planet of the Apes 3: The Ultimate Battle","Dunkirk","Spider-Man: Homecoming","wolf warrior 2"]
b_16 = [15746,312,4497,319]
b_15 = [12357,156,2045,168]
b_14 = [2358,399,2358,362]

#Set the figure size
plt.figure(figsize=(20,8),dpi=80)

#Create bar locations
bar_width=0.2
x_14 = list(range(len(a)))
x_15 = [i+bar_width for i in x_14]
x_16 = [i+bar_width*2 for i in x_14]

#draw bar chart
plt.bar(x_14,b_14,width=0.2,label='9 14th')
plt.bar(x_15,b_15,width=0.2,label='9 15th')
plt.bar(x_16,b_16,width=0.2,label='9 16th')

#Set legend
plt.legend(prop=my_font)

#set string to x-axis
plt.xticks(x_15,a,fontproperties=my_font)

plt.savefig('./movie.png')

plt.show()
```

## Histogram

case

Suppose you have obtained the duration of 250 movies (list a), and want to count the distribution of the duration of these movies (such as the number of movies with a duration of 100 minutes to 120 minutes, the frequency of occurrence) and other information, how should you present these data ?
a=[131, 98, 125, 131, 124, 139, 131, 117, 128, 108, 135, 138, 131, 102, 107, 114, 119, 128, 121, 142, 127, 130, 124, 101, 110, 116, 117, 110, 128, 128, 115, 99, 136, 126, 134, 95, 138, 117, 111,78, 132, 124, 113, 150, 110, 117, 86, 95, 144, 105, 126, 130,126, 130, 126, 116, 123, 106, 112, 138, 123, 86, 101, 99, 136,123, 117, 119, 105, 137, 123, 128, 125, 104, 109, 134, 125, 127,105, 120, 107, 129, 116, 108, 132, 103, 136, 118, 102, 120, 114,105, 115, 132, 145, 119, 121, 112, 139, 125, 138, 109, 132, 134,156, 106, 117, 127, 144, 139, 139, 119, 140, 83, 110, 102,123,107, 143, 115, 136, 118, 139, 123, 112, 118, 125, 109, 119, 133,112, 114, 122, 109, 106, 123, 116, 131, 127, 115, 118, 112, 135,115, 146, 137, 116, 103, 144, 83, 123, 111, 110, 111, 100, 154,136, 100, 118, 119, 133, 134, 106, 129, 126, 110, 111, 109, 141,120, 117, 106, 149, 122, 122, 110, 118, 127, 121, 114, 125, 126,114, 140, 103, 130, 141, 117, 106, 114, 121, 114, 133, 137, 92,121, 112, 146, 97, 137, 105, 98, 117, 112, 81, 97, 139, 113,134, 106, 144, 110, 137, 137, 111, 104, 117, 100, 111, 101, 110,105, 129, 137, 112, 120, 113, 133, 112, 83, 94, 146, 133, 101,131, 116, 111, 84, 137, 115, 122, 106, 144, 109, 123, 116, 111,111, 133, 150]

```from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font=font_manager.FontProperties(fname='D:\project\winter vacation python2\msyh.ttc')

a=[131,  98, 125, 131, 124, 139, 131, 117, 128, 108, 135, 138, 131, 102, 107, 114, 119, 128, 121, 142, 127, 130, 124, 101, 110, 116, 117, 110, 128, 128, 115,  99, 136, 126, 134,  95, 138, 117, 111,78, 132, 124, 113, 150, 110, 117,  86,  95, 144, 105, 126, 130,126, 130, 126, 116, 123, 106, 112, 138, 123,  86, 101,  99, 136,123, 117, 119, 105, 137, 123, 128, 125, 104, 109, 134, 125, 127,105, 120, 107, 129, 116, 108, 132, 103, 136, 118, 102, 120, 114,105, 115, 132, 145, 119, 121, 112, 139, 125, 138, 109, 132, 134,156, 106, 117, 127, 144, 139, 139, 119, 140,  83, 110, 102,123,107, 143, 115, 136, 118, 139, 123, 112, 118, 125, 109, 119, 133,112, 114, 122, 109, 106, 123, 116, 131, 127, 115, 118, 112, 135,115, 146, 137, 116, 103, 144,  83, 123, 111, 110, 111, 100, 154,136, 100, 118, 119, 133, 134, 106, 129, 126, 110, 111, 109, 141,120, 117, 106, 149, 122, 122, 110, 118, 127, 121, 114, 125, 126,114, 140, 103, 130, 141, 117, 106, 114, 121, 114, 133, 137,  92,121, 112, 146,  97, 137, 105,  98, 117, 112,  81,  97, 139, 113,134, 106, 144, 110, 137, 137, 111, 104, 117, 100, 111, 101, 110,105, 129, 137, 112, 120, 113, 133, 112,  83,  94, 146, 133, 101,131, 116, 111,  84, 137, 115, 122, 106, 144, 109, 123, 116, 111,111, 133, 150]

#Calculate the number of groups
d=3
num_bins=(max(a)-min(a))//d #group number
print(max(a),min(a),max(a)-min(a))
print(num_bins)

#Set the size of the graphic
plt.figure(figsize=(20,8),dpi=80)

#draw a histogram
plt.hist(a,num_bins)
# plt.hist(a,num_bins,density=True)   #Here you can use the parameter normed or density to draw a frequency distribution histogram

#Set the scale of the x-axis
plt.xticks(range(min(a),max(a)+d,d))

plt.grid()

plt.show()

```

case

interval = [0,5,10,15,20,25,30,35,40,45,60,90]
width = [5,5,5,5,5,5,5,5,5,15,30,60]
quantity = [836,2737,3723,3926,3596,1438,3273,642,824,613,215,47]
Data source: https://en.wikipedia.org/wiki/Histogram

```from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font=font_manager.FontProperties(fname='D:\project\winter vacation python2\msyh.ttc')

interval = [0,5,10,15,20,25,30,35,40,45,60,90]
width = [5,5,5,5,5,5,5,5,5,15,30,60]
quantity = [836,2737,3723,3926,3596,1438,3273,642,824,613,215,47]

#Set the figure size
plt.figure(figsize=(20,8),dpi=80)

#draw a histogram
plt.bar(range(len(quantity)),quantity,width=1)

#Set the scale of the x-axis
_x=[i-0.5 for i in range(13)]
_xtick_label=width+[150]
plt.xticks(_x,_xtick_label)

plt.grid(alpha=0.4)
plt.show()

```

The data given in the previous questions are all statistical data,
So in order to achieve the effect of the histogram, you need to draw a bar chart

So: Generally speaking, the ones that can use the plt.hist method are those data that have not been counted

## Summarize

1. Which kind of graph should be chosen to present the data
2. matplotlib.plot(x,y)
3. matplotlib.bar(x,y)
4. matplotlib.scatter(x,y)
5. matplotlib.hist(data,bins,normed)
6. Settings for xticks and yticks
7. label and title, grid settings
8. Drawing size and saving pictures

There are many graphics supported by matplotlib. If there are other needs, we
You can check the url address:
http://matplotlib.org/gallery/index.html

In practice, we often use front-end JavaScript to draw
Drawing example: https://echarts.apache.org/examples/zh/index.html

More drawing tools
plotly: github in the visualization tool, simpler and more beautiful than matplotlib, compatible with matplotlib and pandas

Usage: simple, just follow the documentation