20212329 experiment IV Python program design Python comprehensive practice report

 

 

20212329 experiment IV Python program design Python comprehensive practice report

 

Course: Python Programming

Class: 2123
Name: chenpengyu
Student No.: 20212329
Test date: May 28, 2022
Compulsory / elective course: Public Elective Course

1, Experiment content

2, Achievement forecast

3, Project process

4, Problems and Solutions

5, Actual effect

6, Course summary

7, Thoughts and suggestions

 

1, Experiment content

Comprehensive applications of Python: crawler, data processing, visualization, machine learning, neural network, game, network security, etc.
Class representatives and team leaders collect homework (source code, video, comprehensive practice report)
Note: it is programmed on Huawei ECS server (openeuler system) and physical machine (Windows/Linux system) using VIM, PDB, IDLE, pychart and other tools.


Review: note that this experiment does not count as the total score of the experiment. Each of the first three experiments has 10 points, with a total of 30 points. This practice is included in the comprehensive practice, and the score is 25 points.
Scoring criteria:
(1) The program can run with rich functions. (it is required to submit the source code, and it is recommended to record the video of the program running) 10 points
(2) The comprehensive practice report should reflect the experimental analysis, design, implementation process, results and other information, with standardized format, clear logic and reasonable structure. 10 points.
(3) In the practice report, it is necessary to summarize the whole course and write the course feelings, opinions and suggestions. 5 points
(4) 10 points will be deducted for this practice if Huawei cloud services (either ECS or mindspire) are not used.

2, Project background

Securities analysts who have a deep understanding of the stock market predict the future development direction of the stock market and the degree of rise and fall according to the development of the stock market. This prediction behavior is only based on the assumption that the factors are based on the established preconditions.

 

 

3, Project process

 

Machine learning has an important application in stock price forecasting. In this machine learning project, we will discuss forecasting stock returns. There is uncertainty in this very complex task.

Crawl stock history

1 Configure the python environment, log in to the official website of baostock, crawl the data, query the historical K-line data of the stock corresponding to the code, as well as the date, code, opening price, daily maximum price, daily minimum price, closing price, closing price of the previous day, trading volume, turnover and other parameters.

2 Obtain CSI 300 component stocks, output the result set to csv file and xlsx file, and log out of the system.

3 The data set is divided into two groups: training set and test set, and the training set is much more than the test set (training set: used to train the model or determine the model parameters; validation set: used to make model selection; test set: used to test the performance of the finally selected optimal model. The validation set is not necessary, but only the training set and test set are required for a small data set). Set the start time and end time respectively (it will be cut off automatically if it is exceeded)

 

 1 import baostock                                                      
 2 import pandas                                                        
 3 import openpyxl                                                      
 4 from sklearn import linear_model                                     
 5 import numpy                                                         
 6 import matplotlib.pyplot                                             
 7 from sklearn.metrics import r2_score                                 
 8                                                                      
 9 stock='sh.600519' #Optional stock                                             
10                                                                      
11 print('login respond error_code:'+baostock.login().error_code)       
12 print('login respond  error_msg:'+baostock.login().error_msg)        
13                                                                      
14 bqh = baostock.query_history_k_data_plus(stock,                      
15     "date,code,open,high,low,close,preclose,volume,amount,turn,pctChg",
16     start_date='2020-01-01', end_date='2021-01-01')                  
17 print('query_hushen300 error_code:'+bqh.error_code)                  
18 print('query_hushen300  error_msg:'+bqh.error_msg)                   
19 hushen300_stocks = []                                                
20 while (bqh.error_code == '0') & bqh.next():                          
21                                                                      
22     hushen300_stocks.append(bqh.get_row_data())                      
23 result = pandas.DataFrame(hushen300_stocks, columns=bqh.fields)      
24 result.to_csv("Training set.csv")                                             
25 result.to_excel("Training set.xlsx")                                          
26                                                                      
27 bqh = baostock.query_history_k_data_plus(stock,                      
28     "date,code,open,high,low,close,preclose,volume,amount,turn,pctChg",
29     start_date='2022-03-01', end_date='2029-7-20')#It will automatically expire on the latest date      
30 print('query_hushen300 error_code:'+bqh.error_code)                  
31 print('query_hushen300  error_msg:'+bqh.error_msg)                   
32 hushen300_stocks = []                                                
33 while (bqh.error_code == '0') & bqh.next():                          
34                                                                      
35     hushen300_stocks.append(bqh.get_row_data())                      
36 result = pandas.DataFrame(hushen300_stocks, columns=bqh.fields)      
37 result.to_csv("Test set.csv")                                             
38 result.to_excel("Test set.xlsx")                                          
39                                                                      
40 baostock.logout()                                                    

 

 

Processing data

4 Process the data in Excel, delete "-" in the date, and convert it to a pure number so that it can be used for regression or reordering (reordering will lead to excessive error between the date data of the training set and the test set)

5 Save the data in Excel into the list by column. Note that it cannot be used directly:

 

 1 date1 = pandas.read_excel('Training set.xlsx', usecols='B')
 2 open1 = pandas.read_excel('Training set.xlsx', usecols='D')
 3 high1 = pandas.read_excel('Training set.xlsx', usecols='E')
 4 low1 = pandas.read_excel('Training set.xlsx', usecols='F')
 5 close1 = pandas.read_excel('Training set.xlsx', usecols='G')
 6 preclose1 = pandas.read_excel('Training set.xlsx', usecols='H')
 7 volume1 = pandas.read_excel('Training set.xlsx', usecols='I')
 8 amount1 = pandas.read_excel('Training set.xlsx', usecols='J')
 9 turn1 = pandas.read_excel('Training set.xlsx', usecols='K')
10 pctchg1 = pandas.read_excel('Training set.xlsx', usecols='L')
11 
12 date2 = pandas.read_excel('Test set.xlsx', usecols='B')
13 open2 = pandas.read_excel('Test set.xlsx', usecols='D')
14 high2 = pandas.read_excel('Test set.xlsx', usecols='E')
15 low2 = pandas.read_excel('Test set.xlsx', usecols='F')
16 close2 = pandas.read_excel('Test set.xlsx', usecols='G')
17 preclose2 = pandas.read_excel('Test set.xlsx', usecols='H')
18 volume2 = pandas.read_excel('Test set.xlsx', usecols='I')
19 amount2 = pandas.read_excel('Test set.xlsx', usecols='J')
20 turn2 = pandas.read_excel('Test set.xlsx', usecols='K')
21 pctchg2 = pandas.read_excel('Test set.xlsx', usecols='L')

  

 

In this way, the non list is output, and the regression cannot be done. The loop is used to read each row element of each column to form a list to prepare for the scatter diagram.

  1 pre = pandas.read_excel("Training set.xlsx", usecols=[1])        
  2 pre_list = pre.values.tolist()                          
  3 date1 = []                                              
  4 for s_list in pre_list:                                 
  5     date1.append(s_list[0])                             
  6 pre = pandas.read_excel("Training set.xlsx", usecols=[3])        
  7 pre_list = pre.values.tolist()                          
  8 open1 = []                                              
  9 for s_list in pre_list:                                 
 10     open1.append(s_list[0])                             
 11 pre = pandas.read_excel("Training set.xlsx", usecols=[4])        
 12 pre_list = pre.values.tolist()                          
 13 high1 = []                                              
 14 for s_list in pre_list:                                 
 15     high1.append(s_list[0])                             
 16 pre = pandas.read_excel("Training set.xlsx", usecols=[5])        
 17 pre_list = pre.values.tolist()                          
 18 low1 = []                                               
 19 for s_list in pre_list:                                 
 20     low1.append(s_list[0])                              
 21 pre = pandas.read_excel("Training set.xlsx", usecols=[6])        
 22 pre_list = pre.values.tolist()                          
 23 close1 = []                                             
 24 for s_list in pre_list:                                 
 25     close1.append(s_list[0])                            
 26 pre = pandas.read_excel("Training set.xlsx", usecols=[7])        
 27 pre_list = pre.values.tolist()                          
 28 preclose1 = []                                          
 29 for s_list in pre_list:                                 
 30     preclose1.append(s_list[0])                         
 31 pre = pandas.read_excel("Training set.xlsx", usecols=[8])        
 32 pre_list = pre.values.tolist()                          
 33 volume1 = []                                            
 34 for s_list in pre_list:                                 
 35     volume1.append(s_list[0])                           
 36 pre = pandas.read_excel("Training set.xlsx", usecols=[9])        
 37 pre_list = pre.values.tolist()                          
 38 amount1 = []                                            
 39 for s_list in pre_list:                                 
 40     amount1.append(s_list[0])                           
 41 pre = pandas.read_excel("Training set.xlsx", usecols=[10])       
 42 pre_list = pre.values.tolist()                          
 43 turn1 = []                                              
 44 for s_list in pre_list:                                 
 45     turn1.append(s_list[0])                             
 46 pre = pandas.read_excel("Training set.xlsx", usecols=[11])       
 47 pre_list = pre.values.tolist()                          
 48 pctchg1 = []                                            
 49 for s_list in pre_list:                                 
 50     pctchg1.append(s_list[0])                           
 51 pre = pandas.read_excel("Test set.xlsx", usecols=[1])        
 52 pre_list = pre.values.tolist()                          
 53 date2 = []                                              
 54 for s_list in pre_list:                                 
 55     date2.append(s_list[0])                             
 56 pre = pandas.read_excel("Test set.xlsx", usecols=[3])        
 57 pre_list = pre.values.tolist()                          
 58 open2 = []                                              
 59 for s_list in pre_list:                                 
 60     open2.append(s_list[0])                             
 61 pre = pandas.read_excel("Test set.xlsx", usecols=[4])        
 62 pre_list = pre.values.tolist()                          
 63 high2 = []                                              
 64 for s_list in pre_list:                                 
 65     high2.append(s_list[0])                             
 66 pre = pandas.read_excel("Test set.xlsx", usecols=[5])        
 67 pre_list = pre.values.tolist()                          
 68 low2 = []                                               
 69 for s_list in pre_list:                                 
 70     low2.append(s_list[0])                              
 71 pre = pandas.read_excel("Test set.xlsx", usecols=[6])        
 72 pre_list = pre.values.tolist()                          
 73 close2 = []                                             
 74 for s_list in pre_list:                                 
 75     close2.append(s_list[0])                            
 76 pre = pandas.read_excel("Test set.xlsx", usecols=[7])        
 77 pre_list = pre.values.tolist()                          
 78 preclose2 = []                                          
 79 for s_list in pre_list:                                 
 80     preclose2.append(s_list[0])                         
 81 pre = pandas.read_excel("Test set.xlsx", usecols=[8])        
 82 pre_list = pre.values.tolist()                          
 83 volume2 = []                                            
 84 for s_list in pre_list:                                 
 85     volume2.append(s_list[0])                           
 86 pre = pandas.read_excel("Test set.xlsx", usecols=[9])        
 87 pre_list = pre.values.tolist()                          
 88 amount2 = []                                            
 89 for s_list in pre_list:                                 
 90     amount2.append(s_list[0])                           
 91 pre = pandas.read_excel("Test set.xlsx", usecols=[10])       
 92 pre_list = pre.values.tolist()                          
 93 turn2 = []                                              
 94 for s_list in pre_list:                                 
 95     turn2.append(s_list[0])                             
 96 pre = pandas.read_excel("Test set.xlsx", usecols=[11])       
 97 pre_list = pre.values.tolist()                          
 98 pctchg2 = []                                            
 99 for s_list in pre_list:                                 
100     pctchg2.append(s_list[0])                           

Calculate the linear regression of a large number of data and calculate the goodness of fit. If the goodness of fit is higher than 0.8, it is used; if it is lower than 0.1, it is not used.

Then there are two methods:

① find the regression relationship between date and other parameters and predict the future value of other parameters, then find the regression relationship between other parameters and closing price, and use the predicted future value of other parameters to predict the future value of closing price (multiple regression)

 

 1 mymodel = numpy.poly1d(numpy.polyfit(date1, open1, 1))
 2 myline = numpy.linspace(20200102, 20201231)
 3 matplotlib.pyplot.scatter(date1, open1)
 4 matplotlib.pyplot.plot(myline, mymodel(myline))
 5 matplotlib.pyplot.show()
 6 print(r2_score(open1, mymodel(date1)))
 7 
 8 mymodel = numpy.poly1d(numpy.polyfit(date1, high1, 1))
 9 myline = numpy.linspace(20200102, 20201231)
10 matplotlib.pyplot.scatter(date1, high1)
11 matplotlib.pyplot.plot(myline, mymodel(myline))
12 matplotlib.pyplot.show()
13 print(r2_score(high1, mymodel(date1)))
14 
15 mymodel = numpy.poly1d(numpy.polyfit(date1, low1, 1))
16 myline = numpy.linspace(20200102, 20201231)
17 matplotlib.pyplot.scatter(date1, low1)
18 matplotlib.pyplot.plot(myline, mymodel(myline))
19 matplotlib.pyplot.show()
20 print(r2_score(low1, mymodel(date1)))
21 
22 mymodel = numpy.poly1d(numpy.polyfit(date1, close1, 1))
23 myline = numpy.linspace(20200102, 20201231)
24 matplotlib.pyplot.scatter(date1, close1)
25 matplotlib.pyplot.plot(myline, mymodel(myline))
26 matplotlib.pyplot.show()
27 print(r2_score(close1, mymodel(date1)))
28 
29 mymodel = numpy.poly1d(numpy.polyfit(date1, preclose1, 1))
30 myline = numpy.linspace(20200102, 20201231)
31 matplotlib.pyplot.scatter(date1, preclose1)
32 matplotlib.pyplot.plot(myline, mymodel(myline))
33 matplotlib.pyplot.show()
34 print(r2_score(preclose1, mymodel(date1)))
35 
36 mymodel = numpy.poly1d(numpy.polyfit(date1, volume1, 1))
37 myline = numpy.linspace(20200102, 20201231)
38 matplotlib.pyplot.scatter(date1, volume1)
39 matplotlib.pyplot.plot(myline, mymodel(myline))
40 matplotlib.pyplot.show()
41 print(r2_score(volume1, mymodel(date1)))
42 
43 mymodel = numpy.poly1d(numpy.polyfit(date1, amount1, 1))
44 myline = numpy.linspace(20200102, 20201231)
45 matplotlib.pyplot.scatter(date1, amount1)
46 matplotlib.pyplot.plot(myline, mymodel(myline))
47 matplotlib.pyplot.show()
48 print(r2_score(amount1, mymodel(date1)))
49 
50 mymodel = numpy.poly1d(numpy.polyfit(date1, turn1, 1))
51 myline = numpy.linspace(20200102, 20201231)
52 matplotlib.pyplot.scatter(date1, turn1)
53 matplotlib.pyplot.plot(myline, mymodel(myline))
54 matplotlib.pyplot.show()
55 print(r2_score(turn1, mymodel(date1)))
56 
57 mymodel = numpy.poly1d(numpy.polyfit(date1, pctchg1, 1))
58 myline = numpy.linspace(20200102, 20201231)
59 matplotlib.pyplot.scatter(date1, pctchg1)
60 matplotlib.pyplot.plot(myline, mymodel(myline))
61 matplotlib.pyplot.show()
62 print(r2_score(pctchg1, mymodel(date1)))

 

From the regression line and fitting degree of the obtained scatter chart, it can be concluded that the opening price, daily highest price, daily lowest price, closing price, and the closing price of the previous day can have a linear relationship with the date fitting, and the fitting degree is about 0.9. Then test the model with the opening price, daily high price, daily low price, closing price, closing price of the previous day and other test data to test whether the same results are given.

 

 1 mymodel = numpy.poly1d(numpy.polyfit(date1, open1, 1))
 2 r2 = r2_score(open2, mymodel(date2))
 3 print(r2)
 4 mymodel = numpy.poly1d(numpy.polyfit(date1, high1, 1))
 5 r2 = r2_score(high2, mymodel(date2))
 6 print(r2)
 7 mymodel = numpy.poly1d(numpy.polyfit(date1, low1, 1))
 8 r2 = r2_score(low2, mymodel(date2))
 9 print(r2)
10 mymodel = numpy.poly1d(numpy.polyfit(date1, close1, 1))
11 r2 = r2_score(close2, mymodel(date2))
12 print(r2)
13 mymodel = numpy.poly1d(numpy.polyfit(date1, preclose1, 1))
14 r2 = r2_score(preclose2, mymodel(date2))
15 print(r2)
16 mymodel = numpy.poly1d(numpy.polyfit(date1, volume1, 1))
17 r2 = r2_score(volume2, mymodel(date2))
18 print(r2)
19 mymodel = numpy.poly1d(numpy.polyfit(date1, amount1, 1))
20 r2 = r2_score(amount2, mymodel(date2))
21 print(r2)
22 mymodel = numpy.poly1d(numpy.polyfit(date1, turn1, 1))
23 r2 = r2_score(turn2, mymodel(date2))
24 print(r2)
25 mymodel = numpy.poly1d(numpy.polyfit(date1, pctchg1, 1))
26 r2 = r2_score(pctchg2, mymodel(date2))
27 print(r2)

      

 

The results show that the model is suitable for the test set, that is, it is sure that the model can be used to predict future values. You can start predicting new values.

Then use the same method to find the regression relationship between other parameters and the closing price, and get the opening price, the highest price, the lowest price. The closing price of the day before yesterday can fit the linear relationship with the closing price, and its fitting degree is about 1. After that, the model was tested. The high fitting degree indicates that the model is good and can be predicted.

 

 1 want_date=int(input())
 2 mymodel = numpy.poly1d(numpy.polyfit(date1, open1, 1))
 3 r2 = r2_score(open2, mymodel(date2))
 4 future_open=mymodel(want_date)
 5 
 6 mymodel = numpy.poly1d(numpy.polyfit(date1, high1, 1))
 7 r2 = r2_score(high2, mymodel(date2))
 8 future_high=mymodel(want_date)
 9 
10 mymodel = numpy.poly1d(numpy.polyfit(date1, low1, 1))
11 r2 = r2_score(low2, mymodel(date2))
12 future_low=mymodel(want_date)
13 
14 mymodel = numpy.poly1d(numpy.polyfit(date1, close1, 1))
15 r2 = r2_score(close2, mymodel(date2))
16 future_close=mymodel(want_date)
17 
18 mymodel = numpy.poly1d(numpy.polyfit(date1, preclose1, 1))
19 r2 = r2_score(preclose2, mymodel(date2))
20 future_preclose=mymodel(want_date)
21 
22 mymodel = numpy.poly1d(numpy.polyfit(open1, close1, 1))
23 r2 = r2_score(open2, mymodel(close2))
24 future_close1=mymodel(future_open)
25 mymodel = numpy.poly1d(numpy.polyfit(high1,close1 , 1))
26 r2 = r2_score(high2, mymodel(close2))
27 future_close2=mymodel(future_high)
28 mymodel = numpy.poly1d(numpy.polyfit(low1, close1, 1))
29 r2 = r2_score(low2, mymodel(close2))
30 future_close3=mymodel(future_low)
31 mymodel = numpy.poly1d(numpy.polyfit(preclose1, close1, 1))
32 r2 = r2_score(preclose2, mymodel(close2))
33 future_close4=mymodel(future_preclose)
34 print((future_close+future_close1+future_close2+future_close3+future_close4)/5)

 

 

② replace the previous day's closing price of Excel with the closing price of tomorrow, leave the cell in the last row (not included in the regression calculation), find the regression relationship between other parameters and the closing price of tomorrow, and use the known parameters to predict. (avoid the problem of date difference)

The general steps of this method are the same as those of the first method. The parameters with high fitting degree to the future closing price are the opening price, the highest price, the lowest price, the closing price and the fitting degree are all about 0.9. The four models are tested.

 

 

1 pre = pandas.read_excel("Training set.xlsx", usecols=[7])
2 pre_list = pre.values.tolist()
3 futclose1 = []
4 for s_list in pre_list:
5     futclose1.append(s_list[0])

 

 1 mymodel = numpy.poly1d(numpy.polyfit(open1, futclose1, 1))
 2 myline = numpy.linspace(1500, 2600)
 3 matplotlib.pyplot.scatter(open1, futclose1)
 4 matplotlib.pyplot.plot(myline, mymodel(myline))
 5 #matplotlib.pyplot.show()
 6 #print(r2_score(futclose1, mymodel(open1)))
 7 
 8 mymodel = numpy.poly1d(numpy.polyfit(high1, futclose1, 1))
 9 myline = numpy.linspace(1600, 2600)
10 matplotlib.pyplot.scatter(high1, futclose1)
11 matplotlib.pyplot.plot(myline, mymodel(myline))
12 #matplotlib.pyplot.show()
13 #print(r2_score(futclose1, mymodel(high1)))
14 
15 mymodel = numpy.poly1d(numpy.polyfit(low1, futclose1, 1))
16 myline = numpy.linspace(1500, 2500)
17 matplotlib.pyplot.scatter(low1, futclose1)
18 matplotlib.pyplot.plot(myline, mymodel(myline))
19 #matplotlib.pyplot.show()
20 #print(r2_score(futclose1, mymodel(low1)))
21 
22 mymodel = numpy.poly1d(numpy.polyfit(close1, futclose1, 1))
23 myline = numpy.linspace(1500, 2600)
24 matplotlib.pyplot.scatter(close1, futclose1)
25 matplotlib.pyplot.plot(myline, mymodel(myline))
26 #matplotlib.pyplot.show()
27 #print(r2_score(futclose1, mymodel(close1)))
28 
29 openx=float(input("Enter today's opening price:\n"))
30 highx=float(input("Enter today's highest price:\n"))
31 lowx=float(input("Enter today's lowest price:\n"))
32 closex=float(input("Enter today's closing price:\n"))
33 mymodel = numpy.poly1d(numpy.polyfit(open1, futclose1, 1))
34 r2 = r2_score(open2, mymodel(close2))
35 future_close1=mymodel(openx)
36 mymodel = numpy.poly1d(numpy.polyfit(high1,futclose1 , 1))
37 r2 = r2_score(high2, mymodel(close2))
38 future_close2=mymodel(highx)
39 mymodel = numpy.poly1d(numpy.polyfit(low1, futclose1, 1))
40 r2 = r2_score(low2, mymodel(close2))
41 future_close3=mymodel(lowx)
42 mymodel = numpy.poly1d(numpy.polyfit(close1, futclose1, 1))
43 r2 = r2_score(futclose2, mymodel(close2))
44 future_close4=mymodel(closex)
45 print("It is predicted that the closing price tomorrow will be:")
46 print((future_close4+future_close3+future_close2+future_close1)/4)

Upload to code cloud

 

Running on Huawei cloud

      

 

 

Full code

  1 import baostock
  2 import pandas
  3 import openpyxl
  4 from sklearn import linear_model
  5 import numpy
  6 import matplotlib.pyplot
  7 from sklearn.metrics import r2_score
  8 
  9 stock='sh.600519' #Optional stock
 10 
 11 print('login respond error_code:'+baostock.login().error_code)
 12 print('login respond  error_msg:'+baostock.login().error_msg)
 13 
 14 bqh = baostock.query_history_k_data_plus(stock,
 15     "date,code,open,high,low,close,preclose,volume,amount,turn,pctChg",
 16     start_date='2020-01-01', end_date='2021-01-01')
 17 print('query_hushen300 error_code:'+bqh.error_code)
 18 print('query_hushen300  error_msg:'+bqh.error_msg)
 19 hushen300_stocks = []
 20 while (bqh.error_code == '0') & bqh.next():
 21 
 22     hushen300_stocks.append(bqh.get_row_data())
 23 result = pandas.DataFrame(hushen300_stocks, columns=bqh.fields)
 24 result.to_csv("Training set.csv")
 25 result.to_excel("Training set.xlsx")
 26 
 27 bqh = baostock.query_history_k_data_plus(stock,"date,code,open,high,low,close,preclose,volume,amount,turn,pctChg",
 28     start_date='2022-03-01', end_date='2029-7-20')#It will automatically expire on the latest date
 29 print('query_hushen300 error_code:'+bqh.error_code)
 30 print('query_hushen300  error_msg:'+bqh.error_msg)
 31 hushen300_stocks = []
 32 while (bqh.error_code == '0') & bqh.next():
 33 
 34     hushen300_stocks.append(bqh.get_row_data())
 35 result = pandas.DataFrame(hushen300_stocks, columns=bqh.fields)
 36 result.to_csv("Test set.csv")
 37 result.to_excel("Test set.xlsx")
 38 
 39 baostock.logout()
 40 
 41 
 42 
 43 
 44 pre = pandas.read_excel("Training set.xlsx", usecols=[1])
 45 pre_list = pre.values.tolist()
 46 date1 = []
 47 for s_list in pre_list:
 48     date1.append(s_list[0])
 49 pre = pandas.read_excel("Training set.xlsx", usecols=[3])
 50 pre_list = pre.values.tolist()
 51 open1 = []
 52 for s_list in pre_list:
 53     open1.append(s_list[0])
 54 pre = pandas.read_excel("Training set.xlsx", usecols=[4])
 55 pre_list = pre.values.tolist()
 56 high1 = []
 57 for s_list in pre_list:
 58     high1.append(s_list[0])
 59 pre = pandas.read_excel("Training set.xlsx", usecols=[5])
 60 pre_list = pre.values.tolist()
 61 low1 = []
 62 for s_list in pre_list:
 63     low1.append(s_list[0])
 64 pre = pandas.read_excel("Training set.xlsx", usecols=[6])
 65 pre_list = pre.values.tolist()
 66 close1 = []
 67 for s_list in pre_list:
 68     close1.append(s_list[0])
 69 pre = pandas.read_excel("Training set.xlsx", usecols=[7])
 70 pre_list = pre.values.tolist()
 71 preclose1 = []
 72 for s_list in pre_list:
 73     preclose1.append(s_list[0])
 74 pre = pandas.read_excel("Training set.xlsx", usecols=[8])
 75 pre_list = pre.values.tolist()
 76 volume1 = []
 77 for s_list in pre_list:
 78     volume1.append(s_list[0])
 79 pre = pandas.read_excel("Training set.xlsx", usecols=[9])
 80 pre_list = pre.values.tolist()
 81 amount1 = []
 82 for s_list in pre_list:
 83     amount1.append(s_list[0])
 84 pre = pandas.read_excel("Training set.xlsx", usecols=[10])
 85 pre_list = pre.values.tolist()
 86 turn1 = []
 87 for s_list in pre_list:
 88     turn1.append(s_list[0])
 89 pre = pandas.read_excel("Training set.xlsx", usecols=[11])
 90 pre_list = pre.values.tolist()
 91 pctchg1 = []
 92 for s_list in pre_list:
 93     pctchg1.append(s_list[0])
 94 pre = pandas.read_excel("Test set.xlsx", usecols=[1])
 95 pre_list = pre.values.tolist()
 96 date2 = []
 97 for s_list in pre_list:
 98     date2.append(s_list[0])
 99 pre = pandas.read_excel("Test set.xlsx", usecols=[3])
100 pre_list = pre.values.tolist()
101 open2 = []
102 for s_list in pre_list:
103     open2.append(s_list[0])
104 pre = pandas.read_excel("Test set.xlsx", usecols=[4])
105 pre_list = pre.values.tolist()
106 high2 = []
107 for s_list in pre_list:
108     high2.append(s_list[0])
109 pre = pandas.read_excel("Test set.xlsx", usecols=[5])
110 pre_list = pre.values.tolist()
111 low2 = []
112 for s_list in pre_list:
113     low2.append(s_list[0])
114 pre = pandas.read_excel("Test set.xlsx", usecols=[6])
115 pre_list = pre.values.tolist()
116 close2 = []
117 for s_list in pre_list:
118     close2.append(s_list[0])
119 pre = pandas.read_excel("Test set.xlsx", usecols=[7])
120 pre_list = pre.values.tolist()
121 preclose2 = []
122 for s_list in pre_list:
123     preclose2.append(s_list[0])
124 pre = pandas.read_excel("Test set.xlsx", usecols=[8])
125 pre_list = pre.values.tolist()
126 volume2 = []
127 for s_list in pre_list:
128     volume2.append(s_list[0])
129 pre = pandas.read_excel("Test set.xlsx", usecols=[9])
130 pre_list = pre.values.tolist()
131 amount2 = []
132 for s_list in pre_list:
133     amount2.append(s_list[0])
134 pre = pandas.read_excel("Test set.xlsx", usecols=[10])
135 pre_list = pre.values.tolist()
136 turn2 = []
137 for s_list in pre_list:
138     turn2.append(s_list[0])
139 pre = pandas.read_excel("Test set.xlsx", usecols=[11])
140 pre_list = pre.values.tolist()
141 pctchg2 = []
142 for s_list in pre_list:
143     pctchg2.append(s_list[0])
144 
145 mymodel = numpy.poly1d(numpy.polyfit(date1, open1, 1))
146 myline = numpy.linspace(20200102, 20201231)
147 matplotlib.pyplot.scatter(date1, open1)
148 matplotlib.pyplot.plot(myline, mymodel(myline))
149 matplotlib.pyplot.show()
150 print(r2_score(open1, mymodel(date1)))
151 
152 mymodel = numpy.poly1d(numpy.polyfit(date1, high1, 1))
153 myline = numpy.linspace(20200102, 20201231)
154 matplotlib.pyplot.scatter(date1, high1)
155 matplotlib.pyplot.plot(myline, mymodel(myline))
156 matplotlib.pyplot.show()
157 print(r2_score(high1, mymodel(date1)))
158 
159 mymodel = numpy.poly1d(numpy.polyfit(date1, low1, 1))
160 myline = numpy.linspace(20200102, 20201231)
161 matplotlib.pyplot.scatter(date1, low1)
162 matplotlib.pyplot.plot(myline, mymodel(myline))
163 matplotlib.pyplot.show()
164 print(r2_score(low1, mymodel(date1)))
165 
166 mymodel = numpy.poly1d(numpy.polyfit(date1, close1, 1))
167 myline = numpy.linspace(20200102, 20201231)
168 matplotlib.pyplot.scatter(date1, close1)
169 matplotlib.pyplot.plot(myline, mymodel(myline))
170 matplotlib.pyplot.show()
171 print(r2_score(close1, mymodel(date1)))
172 
173 mymodel = numpy.poly1d(numpy.polyfit(date1, preclose1, 1))
174 myline = numpy.linspace(20200102, 20201231)
175 matplotlib.pyplot.scatter(date1, preclose1)
176 matplotlib.pyplot.plot(myline, mymodel(myline))
177 matplotlib.pyplot.show()
178 print(r2_score(preclose1, mymodel(date1)))
179 
180 mymodel = numpy.poly1d(numpy.polyfit(date1, volume1, 1))
181 myline = numpy.linspace(20200102, 20201231)
182 matplotlib.pyplot.scatter(date1, volume1)
183 matplotlib.pyplot.plot(myline, mymodel(myline))
184 matplotlib.pyplot.show()
185 print(r2_score(volume1, mymodel(date1)))
186 
187 mymodel = numpy.poly1d(numpy.polyfit(date1, amount1, 1))
188 myline = numpy.linspace(20200102, 20201231)
189 matplotlib.pyplot.scatter(date1, amount1)
190 matplotlib.pyplot.plot(myline, mymodel(myline))
191 matplotlib.pyplot.show()
192 print(r2_score(amount1, mymodel(date1)))
193 
194 mymodel = numpy.poly1d(numpy.polyfit(date1, turn1, 1))
195 myline = numpy.linspace(20200102, 20201231)
196 matplotlib.pyplot.scatter(date1, turn1)
197 matplotlib.pyplot.plot(myline, mymodel(myline))
198 matplotlib.pyplot.show()
199 print(r2_score(turn1, mymodel(date1)))
200 
201 mymodel = numpy.poly1d(numpy.polyfit(date1, pctchg1, 1))
202 myline = numpy.linspace(20200102, 20201231)
203 matplotlib.pyplot.scatter(date1, pctchg1)
204 matplotlib.pyplot.plot(myline, mymodel(myline))
205 matplotlib.pyplot.show()
206 print(r2_score(pctchg1, mymodel(date1)))
207 mymodel = numpy.poly1d(numpy.polyfit(date1, open1, 1))
208 r2 = r2_score(open2, mymodel(date2))
209 print(r2)
210 mymodel = numpy.poly1d(numpy.polyfit(date1, high1, 1))
211 r2 = r2_score(high2, mymodel(date2))
212 print(r2)
213 mymodel = numpy.poly1d(numpy.polyfit(date1, low1, 1))
214 r2 = r2_score(low2, mymodel(date2))
215 print(r2)
216 mymodel = numpy.poly1d(numpy.polyfit(date1, close1, 1))
217 r2 = r2_score(close2, mymodel(date2))
218 print(r2)
219 mymodel = numpy.poly1d(numpy.polyfit(date1, preclose1, 1))
220 r2 = r2_score(preclose2, mymodel(date2))
221 print(r2)
222 mymodel = numpy.poly1d(numpy.polyfit(date1, volume1, 1))
223 r2 = r2_score(volume2, mymodel(date2))
224 print(r2)
225 mymodel = numpy.poly1d(numpy.polyfit(date1, amount1, 1))
226 r2 = r2_score(amount2, mymodel(date2))
227 print(r2)
228 mymodel = numpy.poly1d(numpy.polyfit(date1, turn1, 1))
229 r2 = r2_score(turn2, mymodel(date2))
230 print(r2)
231 mymodel = numpy.poly1d(numpy.polyfit(date1, pctchg1, 1))
232 r2 = r2_score(pctchg2, mymodel(date2))
233 print(r2)
234 
235 print("Please enter the date:\n")
236 rate=0
237 want_date=int(input())
238 while(want_date>20210000):
239     want_date=want_date-10000
240     rate=rate+1
241 want_date=want_date+rate*100
242 mymodel = numpy.poly1d(numpy.polyfit(date1, open1, 1))
243 r2 = r2_score(open2, mymodel(date2))
244 future_open=mymodel(want_date)
245 
246 mymodel = numpy.poly1d(numpy.polyfit(date1, high1, 1))
247 r2 = r2_score(high2, mymodel(date2))
248 future_high=mymodel(want_date)
249 
250 mymodel = numpy.poly1d(numpy.polyfit(date1, low1, 1))
251 r2 = r2_score(low2, mymodel(date2))
252 future_low=mymodel(want_date)
253 
254 mymodel = numpy.poly1d(numpy.polyfit(date1, close1, 1))
255 r2 = r2_score(close2, mymodel(date2))
256 future_close=mymodel(want_date)
257 
258 mymodel = numpy.poly1d(numpy.polyfit(date1, preclose1, 1))
259 r2 = r2_score(preclose2, mymodel(date2))
260 future_preclose=mymodel(want_date)
261 
262 mymodel = numpy.poly1d(numpy.polyfit(open1, close1, 1))
263 r2 = r2_score(open2, mymodel(close2))
264 future_close1=mymodel(future_open)
265 mymodel = numpy.poly1d(numpy.polyfit(high1,close1 , 1))
266 r2 = r2_score(high2, mymodel(close2))
267 future_close2=mymodel(future_high)
268 mymodel = numpy.poly1d(numpy.polyfit(low1, close1, 1))
269 r2 = r2_score(low2, mymodel(close2))
270 future_close3=mymodel(future_low)
271 mymodel = numpy.poly1d(numpy.polyfit(preclose1, close1, 1))
272 r2 = r2_score(preclose2, mymodel(close2))
273 future_close4=mymodel(future_preclose)
274 print((future_close+future_close1+future_close2+future_close3+future_close4)/5)
275 
276 
277 ######################################################
278 
279 
280 pre = pandas.read_excel("Training set.xlsx", usecols=[7])
281 pre_list = pre.values.tolist()
282 futclose1 = []
283 for s_list in pre_list:
284     futclose1.append(s_list[0])
285 pre = pandas.read_excel("Test set.xlsx", usecols=[7])
286 pre_list = pre.values.tolist()
287 futclose2 = []
288 for s_list in pre_list:
289     futclose2.append(s_list[0])
290 
291 mymodel = numpy.poly1d(numpy.polyfit(open1, futclose1, 1))
292 myline = numpy.linspace(1500, 2600)
293 matplotlib.pyplot.scatter(open1, futclose1)
294 matplotlib.pyplot.plot(myline, mymodel(myline))
295 #matplotlib.pyplot.show()
296 #print(r2_score(futclose1, mymodel(open1)))
297 
298 mymodel = numpy.poly1d(numpy.polyfit(high1, futclose1, 1))
299 myline = numpy.linspace(1600, 2600)
300 matplotlib.pyplot.scatter(high1, futclose1)
301 matplotlib.pyplot.plot(myline, mymodel(myline))
302 #matplotlib.pyplot.show()
303 #print(r2_score(futclose1, mymodel(high1)))
304 
305 mymodel = numpy.poly1d(numpy.polyfit(low1, futclose1, 1))
306 myline = numpy.linspace(1500, 2500)
307 matplotlib.pyplot.scatter(low1, futclose1)
308 matplotlib.pyplot.plot(myline, mymodel(myline))
309 #matplotlib.pyplot.show()
310 #print(r2_score(futclose1, mymodel(low1)))
311 
312 mymodel = numpy.poly1d(numpy.polyfit(close1, futclose1, 1))
313 myline = numpy.linspace(1500, 2600)
314 matplotlib.pyplot.scatter(close1, futclose1)
315 matplotlib.pyplot.plot(myline, mymodel(myline))
316 #matplotlib.pyplot.show()
317 #print(r2_score(futclose1, mymodel(close1)))
318 
319 openx=float(input("Enter today's opening price:\n"))
320 highx=float(input("Enter today's highest price:\n"))
321 lowx=float(input("Enter today's lowest price:\n"))
322 closex=float(input("Enter today's closing price:\n"))
323 mymodel = numpy.poly1d(numpy.polyfit(open1, futclose1, 1))
324 r2 = r2_score(open2, mymodel(close2))
325 future_close1=mymodel(openx)
326 mymodel = numpy.poly1d(numpy.polyfit(high1,futclose1 , 1))
327 r2 = r2_score(high2, mymodel(close2))
328 future_close2=mymodel(highx)
329 mymodel = numpy.poly1d(numpy.polyfit(low1, futclose1, 1))
330 r2 = r2_score(low2, mymodel(close2))
331 future_close3=mymodel(lowx)
332 mymodel = numpy.poly1d(numpy.polyfit(close1, futclose1, 1))
333 r2 = r2_score(futclose2, mymodel(close2))
334 future_close4=mymodel(closex)
335 print("It is predicted that the closing price tomorrow will be:")
336 print((future_close4+future_close3+future_close2+future_close1)/4)

 

4, Problems and Solutions

 

1 Crawling stock information using beatifulsoup can only capture the information of the day

Solution: ① through pandas_ The datareader library crawls the stock market data (pandas_datareader is a library that can read all kinds of financial data) ② the introduction of the baostock Library (providing a large number of accurate and complete historical securities market data and financial data of listed companies)

2 Pandas write excel prompt FutureWarning: As the xlwt package is no longer maintained

Solution: since the xlwt package is no longer maintained, the xlwt engine will be removed in the future version of pandas. This is the only engine in pandas that supports xls format writing. Install openpyxl and write the xlsx file instead.

3 When MATLAB runs the file, it reports that the file is not found in the current folder or MATLAB path.

Solution: csvread is not recommended for MATLAB.

M = csvread(filename) reads the comma separated value (CSV) format file into array M. The file can only contain numeric values.

M = csvread(filename,R1,C1) reads the data in the file from row offset R1 and column offset C1. For example, the offsets R1=0, C1=0 specify the first value in the file.

M = csvread(filename,R1,C1,[R1 C1 R2 c2]) only reads the range defined by row offsets R1 and R2 and column offsets C1 and C2. Another way to define a range is to use a spreadsheet representation (e.g.'A1..B7') instead of [0 061].

4 Generate xlsx file with insufficient column width to display data completely
Solution: introduce the openyxl library to set the specific value of column width, * or adjust the adaptive column width (not implemented)
5 The MATLAB training time is too long, and it is easy to end the training due to the high mu value, resulting in too few data sets of the model, and the fitting degree can not meet the requirements

 

Solution: without using MATLAB, import Matplotlib in Python Pyplot module, scipy library, stats module, search for regression relations and draw linear regression lines in Python.
 

5, Actual effect

 

 

6, Course summary

"Life is short, I use Python!" Python is a high-level language with great potential. It is a completely object-oriented language. After years of development, it is playing a more and more important role in programming. From print("hello world") to the big project of doing final assignments, in this semester, I learned more about Python than in the first semester by taking the basic knowledge of Python as an elective course. When learning Python and C at the same time this semester, it is inevitable to want to make a comparison between them: Python statements are simpler and easier to understand, and it is clearer to use indentation to set statement nesting. In addition, python variables do not need to be defined in advance, creating an extremely comfortable programming environment for Python. However, the steps of C language are cumbersome, and using {} to set statement nesting will be a little less intuitive, but C language is better in terms of for loops and so on. With the development in recent years, python has become more and more popular, and it is used in more and more fields, such as machine learning and big data, in which Python plays an important role. There are too many contents, so I won't repeat them one by one. I will simply enumerate the usage of Python:

1. The list is equivalent to a dynamic array, and the tuple is equivalent to a static array.

2. The dictionary type is very convenient. It is directly implemented in python as a basic data type.

3. Variables can be assigned directly without defining types.

4. There are no autoincrement and autodecrement operators in python.

5. The identity operator is different from = =. Is compares whether the identities (memory addresses) of two variables are equal, and = = compares whether the values of two variables are equal.

6. The set is unordered, so {1, 2, 3}=={3, 2, 1} is true; Tuples and lists are ordered, so (1, 2., 3) = (3, 2, 1) is false.

7. python relies on indentation to distinguish code blocks instead of {}.

8. Colon in if else statement and while statement:.

9. There are only formal constants, and all variable names are capitalized as constants.

10. If there is no switch, elif can be used instead, which is equivalent to else if.

11. while can be used with else, and for can also be used with else, but it is rarely used.

12. The format of the for loop is for target_list in expression_list, target during loop_ List will increase automatically. In other languages, for (int i=0; i<=10; i++) can be replaced by for x in range(0,10).

13. range(0,10) indicates 0 to 9. range(0,10, 2) sets the step size to 2, indicating 0, 2, 4, 6, and 8. range(10,0,-1) indicates 10 to 1.

The format of range() is range(start, stop[, step]),

Start: count starts from start. The default is to start from 0. For example, range (5) is equivalent to range (0, 5);

Stop: count to the end of stop, but do not include stop. For example: range (0, 5) is [0, 1, 2, 3, 4] without 5;

Step: step size. The default value is 1. For example, range (0,5) is equivalent to range (0,5,1);

About importing:

14. Create in a folder_ Init_ python, which enables python to treat the folder as a package. The import module uses import and can add the as keyword.

15. Single or multiple variables in the import module can use form... Import... (wildcard * can be used here). Importing a module will execute all the code of that module. Be careful to avoid circular import.

16. Each module has built-in variables:__ Name__ Is the full name of the module__ Package is the name of the package where the module is located__ File__ Is the absolute path of the module__ doc__ Is the comment content of the module. But the__ Name__ Will be forced to change to__ main__, The entry file does not belong to any package, and the file will become its own file name instead of absolute Lu Jin.

17. The logical operators and, or and not are used to operate and return Boolean values. However, non Boolean values can also be regarded as Boolean values, so the logical operators can also return non Boolean values. This is equivalent to using if statements to make judgments and then determining to return a value in other languages. If the value of int and float types is not zero, it is considered True. If the string is not empty, it is also considered True. When the operation value is not a Boolean value, the return rules of and and or are as follows: first judge whether the Boolean value returned by the expression is True or False, and then look for the value that can be converted into the Boolean value in the operation value, which is the real value returned by the expression; If both operation values meet the conditions, and returns the following value, or returns the previous value. If 1 and 2 return 2, 1 or 2 returns 1. There is no & and | in Python.

About functions:

18. When defining a function in python, you do not need to specify the type of the return value of the function. It is commonly used in def. The parameter list can be empty, and the variables in the list do not need to define types. Still carry a colon after it:. Code blocks are still indented rather than {}.

19. The return result of a function can be multiple. return A,B can be used. When accepting the return value, you can directly accept it with multiple variables

20. Sequence unpacking is a major feature of Python. Execute d=1, 2, 3, and python will automatically integrate d into a sequence (array) to become [1,2,3]. Execute a, B, c=d again, and python will automatically unpack the sequence, equivalent to a=1,b=2,c=3.

21. When calling a function, you can use expressions to assign values in the parameter list of the function (the order can be different from the formal parameter order when defining the function)

22. When defining a function, you can define default parameters in the parameter list, such as def add(x,y=1):..., and you only need to enter a parameter value when calling. Note that the default parameters must follow the non default parameters in order during definition and call. You can also use keyword parameters to modify the default values during call.

23. python also follows the rules of variable scope, but there is no block level scope. Therefore, the variables defined in the for loop can be called directly outside the loop.

24. The scope of global variables in python is all modules.

About classes:

25. When instantiating a class, you do not need to use new, such as student=Student(), instead of student= new Student().

26. In class, variables are divided into instance variables and class variables, and methods are divided into instance methods, class methods and static methods.

27. When defining a class, the parameter list must have self regardless of whether the instance method in the class needs to pass parameters. The parameter list must have cls and add @classmethod to the upper line of the method no matter whether the class method in the definition class needs to pass parameters or not

28. The constructor template is def__ Init__ (self):..., the constructor call is also automatic. Similarly, when defining the constructor, modify the parameter list, assign values to member variables in the constructor, and pass in different initial values in instantiation.

29. When defining an instance method, use self Variable name, self. is used to access class variables__ Class__ Variable name; Outside the class, the instance variable of the access object uses the object name Variable name, class name for accessing class variables Variable name.

30. When defining class methods, cls Variable name. Use the class name when calling class methods outside the class Method name or object name Method name (the former is recommended).

31. To define a static method, add @staticmethod to the previous line.

32. The variable name and method name in the class are double underlined__ Private at the beginning, otherwise public (except constructors). Public and private are not available in Python.

33. As for inheritance, add the name of the parent class to the brackets defining the child class to inherit. The constructor of the subclass needs to explicitly call the constructor of the parent class through super. Compared with large languages such as C and Java, Python is light, smart and fast to start.

  ......

  ......

7, Thoughts and suggestions

This semester, I learned python with Mr. Wang, and gained a lot. Mr. Wang's teaching level is very high. He can always simplify complex, obscure and abstract things, so that I can have a deeper understanding of them. The most famous word I have heard from Mr. Wang is "for example". In addition, Mr. Wang is approachable and always imparts knowledge to us with a smile. The code for confessing, brushing the screen and crawling through the pictures of beautiful women is very "practical", which makes me very much like Mr. Wang's class and gain happiness while acquiring knowledge. Because the self-study of Python in the first semester of my freshman year gave me a certain foundation, and I was relatively relaxed in the early learning of the course. But with the deepening of learning content, I began to find it difficult to keep up with the pace of teaching. Because the foundation is not solid enough, we will always encounter various problems. Although the teaching resources in the cloud class are very sufficient, too many video learning resources are daunting at the same time, and it is difficult to mention the enthusiasm for learning. In class, I always felt that I had learned the next chapter before I had a thorough understanding of some knowledge points, and was not friendly enough to the students with poor foundation. Therefore, I hope Mr. Wang can adjust the pace of class, speed up the first few basic classes, and slow down the pace of later more difficult classes. At the same time, he can arrange some exercise questions to consolidate the learning results, so that the students can find out and fill the gaps. In this stumbling learning process, I followed the teacher's footsteps and learned all aspects of Python. I did a lot of programming practice. Although I didn't complete a huge project, in the course of learning, I mastered a lot of knowledge and gained a lot. I also realized the joy of writing scripts and projects in Python. For me, being able to implement some programs that I have never written before is a full sense of achievement. Finally, I hope I can continue to learn Python in the future, write more practical programs, and continue to move forward on the way of learning python. I thank Mr. Wang for his hard work and one semester's company.

Prior to python, the mainstream software for data analysis included MATLAB, R and SAS. Some companies with low demand for scientific operation can also fully meet the demand with Excel. In this semester's course, while learning python, I was also learning about big data analysis, artificial intelligence, neural network and other popular content at present, and learning Matlab and qt application development framework. I am very interested in machine learning, and hope to further my attainments in Python while learning machine learning in the future. In my opinion, the characteristic of Python language is that it is easy to use. In the process of writing python, I feel like writing an English composition. As long as you understand the meaning of class and def, many codes are easy to understand and write. Compared with C or Java, the simplicity of Python has great advantages. Although Python is very simple, it is a headache for many professional terms from the learning path of friends around, and many programming books are not friendly to readers. At the beginning, I was also plagued by professional language. I felt like I was reading heavenly books. I couldn't understand what I read and reported mistakes in everything I wrote. Therefore, for students who are getting started, there is only one suggestion: practice! Real knowledge comes from practice. In fact, through my own programming, I am constantly strengthening my understanding of concepts in the process of continuously debugging bugs. For students with poor computer skills, many terms cannot be explained clearly. A concept will lead to more concept explanations, and too little or too concrete explanation will restrict the idea of future programming.

 

 

 

Posted by Bobulous on Tue, 31 May 2022 06:18:03 +0530