Python processing time series data

The initial compensation uses Python to process the data of time series and encounters some pits. I hope the latecomers can avoid detours.

 

Background note: I use an existing csv data table as raw material for processing.

Objective: to realize the visualization of time series and periodicity.

1. The first pit encountered is to import time data. By default, the data type is string. Therefore, when visualizing, there will be a situation that there is no drawing in chronological order.

Therefore, you need to resolve the string to a data type of time type.

Method 1: use parse when reading data_ Dates=true, which automatically parses the time data.

Method 2: use the parser Parse parse time string:

from dateutil.parser import parse
v1 = parse('2018-09-02')
print("The parsed time format is:",v1)
# v1 here is a datetime Datetime object. If you need to convert to str, please refer to the following
print(v1.strftime('%Y-%m-%d %H:%M:%S'))

Method 3: use pandas' to_datetime processing time list

import pandas as pd
datestrs = ['2018/09/02','2018/09/03','2018/09/04']
print(pd.to_datetime(datestrs))

2. The second pit is for processing numerical data. The default data type is object when importing pandas. At this time, the data type needs to be forcibly converted, but I have been unable to convert it.

The BUG that appears is: ValueError: could not convert string to float
It took me a long time to find out the reason: it was because the data contained spaces or "," that the string could not be converted to int.
Solution: replace (',' ') Replace (',', '') is to replace the spaces in it and delete the "," in it

3. As for the following drawing, it is very simple. The only thing worth mentioning is the drawing of periodic graphs.

I use "week" to plot, and the period is fixed. The implementation process depends on the code.

4. In addition, it should be mentioned that encoding ='gbk'should be set when reading files. The default is utf-8, but the system will report an error.

 1 #!/usr/bin/env python
 2 # -*- coding:utf-8 -*-
 3 # Author: Leslie Dang
 4 
 5 import numpy as np
 6 import pandas as pd
 7 import matplotlib.pyplot as plt
 8 
 9 # 01 import data from file
10 data1 = pd.read_csv('01series.csv',parse_dates=True,index_col=0,encoding = 'gbk')
11 print(data1)
12 # print(type(data1.index))
13 print(data1.dtypes)
14 
15 # 02 cast data type
16 print('***02 Cast data type***')
17 
18 # ValueError: could not convert string to float
19 # Reason: it is likely that your data contains \t, or spaces, or ","
20 # Solution: replace (',' ') Replace (',', '')
21 
22 for i in range(data1['Sales'].count()):
23     data1['Sales'][i] = data1['Sales'][i].replace(' ','').replace(',','')
24 
25 data1['Sales'] = data1['Sales'].astype(int)
26 print(data1.dtypes)
27 
28 # 03 drawing - line drawing
29 print('***03 plotting***')
30 # Plt Plot (data1['sales'], label =' sales')
31 # plt.show()
32 
33 # 04 plot - periodic analysis chart
34 print('***04 plotting-Periodicity analysis chart***')
35 
36 data1 = data1.set_index('week')
37 print(data1)
38 
39 count = data1['Sales'].count()
40 circle = count//7
41 print(count,circle)
42 for i in range(circle):
43     plt.plot(data1['Sales'][7*i:7*i+7])
44 plt.show()
45 
46 # Thinking: how to quantify periodicity? What parameters can be used to express? How strong is the periodicity?

Here, add the data source I used:

            week    Sales
 date                         
2018-08-01  Wed  4,702,986 
2018-08-02  Thu  5,034,151 
2018-08-03  Fri  5,636,981 
2018-08-04  Sat  6,377,764 
2018-08-05  Sun  6,138,548 
2018-08-06  Mon  5,335,273 
2018-08-07  Tue  5,055,513 
2018-08-08  Wed  5,159,413 
2018-08-09  Thu  5,393,767 
2018-08-10  Fri  5,920,339 
2018-08-11  Sat  6,637,867 
2018-08-12  Sun  6,292,839 
2018-08-13  Mon  5,485,055 
2018-08-14  Tue  5,274,536 
2018-08-15  Wed  5,171,561 
2018-08-16  Thu  5,269,780 
2018-08-17  Fri  5,359,121 
2018-08-18  Sat  6,353,952 
2018-08-19  Sun  6,334,198 
2018-08-20  Mon  5,577,552 
2018-08-21  Tue  5,276,165 
2018-08-22  Wed  5,403,919 
2018-08-23  Thu  5,611,874 
2018-08-24  Fri  6,073,795 
2018-08-25  Sat  6,754,291 
2018-08-26  Sun  6,333,426 
2018-08-27  Mon  5,570,875 
2018-08-28  Tue  5,327,305 
2018-08-29  Wed  5,425,794 

 

Posted by dmphotography on Wed, 01 Jun 2022 02:37:33 +0530