The initial compensation uses Python to process the data of time series and encounters some pits. I hope the latecomers can avoid detours.
Background note: I use an existing csv data table as raw material for processing.
Objective: to realize the visualization of time series and periodicity.
1. The first pit encountered is to import time data. By default, the data type is string. Therefore, when visualizing, there will be a situation that there is no drawing in chronological order.
Therefore, you need to resolve the string to a data type of time type.
Method 1: use parse when reading data_ Dates=true, which automatically parses the time data.
Method 2: use the parser Parse parse time string:
from dateutil.parser import parse v1 = parse('2018-09-02') print("The parsed time format is:",v1)
# v1 here is a datetime Datetime object. If you need to convert to str, please refer to the followingprint(v1.strftime('%Y-%m-%d %H:%M:%S'))
Method 3: use pandas' to_datetime processing time list
import pandas as pd datestrs = ['2018/09/02','2018/09/03','2018/09/04'] print(pd.to_datetime(datestrs))
2. The second pit is for processing numerical data. The default data type is object when importing pandas. At this time, the data type needs to be forcibly converted, but I have been unable to convert it.
The BUG that appears is: ValueError: could not convert string to float
It took me a long time to find out the reason: it was because the data contained spaces or "," that the string could not be converted to int.
Solution: replace (',' ') Replace (',', '') is to replace the spaces in it and delete the "," in it
3. As for the following drawing, it is very simple. The only thing worth mentioning is the drawing of periodic graphs.
I use "week" to plot, and the period is fixed. The implementation process depends on the code.
4. In addition, it should be mentioned that encoding ='gbk'should be set when reading files. The default is utf-8, but the system will report an error.
1 #!/usr/bin/env python 2 # -*- coding:utf-8 -*- 3 # Author: Leslie Dang 4 5 import numpy as np 6 import pandas as pd 7 import matplotlib.pyplot as plt 8 9 # 01 import data from file 10 data1 = pd.read_csv('01series.csv',parse_dates=True,index_col=0,encoding = 'gbk') 11 print(data1) 12 # print(type(data1.index)) 13 print(data1.dtypes) 14 15 # 02 cast data type 16 print('***02 Cast data type***') 17 18 # ValueError: could not convert string to float 19 # Reason: it is likely that your data contains \t, or spaces, or "," 20 # Solution: replace (',' ') Replace (',', '') 21 22 for i in range(data1['Sales'].count()): 23 data1['Sales'][i] = data1['Sales'][i].replace(' ','').replace(',','') 24 25 data1['Sales'] = data1['Sales'].astype(int) 26 print(data1.dtypes) 27 28 # 03 drawing - line drawing 29 print('***03 plotting***') 30 # Plt Plot (data1['sales'], label =' sales') 31 # plt.show() 32 33 # 04 plot - periodic analysis chart 34 print('***04 plotting-Periodicity analysis chart***') 35 36 data1 = data1.set_index('week') 37 print(data1) 38 39 count = data1['Sales'].count() 40 circle = count//7 41 print(count,circle) 42 for i in range(circle): 43 plt.plot(data1['Sales'][7*i:7*i+7]) 44 plt.show() 45 46 # Thinking: how to quantify periodicity? What parameters can be used to express? How strong is the periodicity?
Here, add the data source I used:
week Sales date 2018-08-01 Wed 4,702,986 2018-08-02 Thu 5,034,151 2018-08-03 Fri 5,636,981 2018-08-04 Sat 6,377,764 2018-08-05 Sun 6,138,548 2018-08-06 Mon 5,335,273 2018-08-07 Tue 5,055,513 2018-08-08 Wed 5,159,413 2018-08-09 Thu 5,393,767 2018-08-10 Fri 5,920,339 2018-08-11 Sat 6,637,867 2018-08-12 Sun 6,292,839 2018-08-13 Mon 5,485,055 2018-08-14 Tue 5,274,536 2018-08-15 Wed 5,171,561 2018-08-16 Thu 5,269,780 2018-08-17 Fri 5,359,121 2018-08-18 Sat 6,353,952 2018-08-19 Sun 6,334,198 2018-08-20 Mon 5,577,552 2018-08-21 Tue 5,276,165 2018-08-22 Wed 5,403,919 2018-08-23 Thu 5,611,874 2018-08-24 Fri 6,073,795 2018-08-25 Sat 6,754,291 2018-08-26 Sun 6,333,426 2018-08-27 Mon 5,570,875 2018-08-28 Tue 5,327,305 2018-08-29 Wed 5,425,794