Copyright Note: Zhejiang University of Finance and Economics Professional Practice Deep Learning tensorflow - Qi Feng##
Boston house price forecast
Table of contents
Tensorflow implements univariate linear regression
** See Example: Univariate Linear Regression.ipynb **
Tensorflow implements multivariate linear regression
** In the previous section, we used Tensorflow to build our first complete model - univariate linear regression. In this section, we will build a multivariate linear model to implement regression on multidimensional data. In addition, we will also introduce how to analyze the training process using TensorBoard, the visualization tool that comes with Tensorflow. **
load data
** Import related libraries **
%matplotlib notebook import matplotlib.pyplot as plt import tensorflow as tf import tensorflow.contrib.learn as skflow from sklearn.utils import shuffle import numpy as np import pandas as pd import os os.environ["CUDA_VISIBLE_DEVICES"] = "-1" print(tf.__version__) print(tf.test.is_gpu_available())
1.12.0 False
** Dataset Introduction **
This dataset contains multiple factors related to Boston housing prices:
**CRIM**: Urban crime rate per capita
** ZN **: Proportion of residential land over 25,000 sq.ft.
** INDUS ** : Proportion of urban non-retail land
**CHAS**: Charles River null variable (1 if the boundary is a river; otherwise, 0)
**NOX**: Nitric oxide concentration
**RM**: Average number of rooms in a residence
**AGE **: Proportion of owner-occupied houses built before 1940
**DIS**: Weighted distance to Boston's 5 central areas
**RAD**: proximity index of radiant roads
**TAX**: Full-value property tax rate per $10,000
**PTRATIO**: urban teacher-student ratio
**LSTAT**: Proportion of low-status people in the population
**MEDV**: Average home price in thousands of dollars
** The dataset is stored in CSV format, which can be read and formatted by the Pandas library **
** Pandas library ** can help us quickly read data files of regular size
Ability to read CVS files, text files, MS Excel, SQL databases and HDF5 format files for scientific purposes
Automatic conversion to Numpy multidimensional array
** Import data via Pandas **
df = pd.read_csv("data/boston.csv", header=0) print (df.describe())
CRIM ZN INDUS CHAS NOX RM \ count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 mean 3.613524 11.363636 11.136779 0.069170 0.554695 6.284634 std 8.601545 23.322453 6.860353 0.253994 0.115878 0.702617 min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 25% 0.082045 0.000000 5.190000 0.000000 0.449000 5.885500 50% 0.256510 0.000000 9.690000 0.000000 0.538000 6.208500 75% 3.677082 12.500000 18.100000 0.000000 0.624000 6.623500 max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 AGE DIS RAD TAX PTRATIO LSTAT \ count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 mean 68.574901 3.795043 9.549407 408.237154 18.455534 12.653063 std 28.148861 2.105710 8.707259 168.537116 2.164946 7.141062 min 2.900000 1.129600 1.000000 187.000000 12.600000 1.730000 25% 45.025000 2.100175 4.000000 279.000000 17.400000 6.950000 50% 77.500000 3.207450 5.000000 330.000000 19.050000 11.360000 75% 94.075000 5.188425 24.000000 666.000000 20.200000 16.955000 max 100.000000 12.126500 24.000000 711.000000 22.000000 37.970000 MEDV count 506.000000 mean 22.532806 std 9.197104 min 5.000000 25% 17.025000 50% 21.200000 75% 25.000000 max 50.000000
** Load the data required for this example **
df = np.array(df) for i in range(12): df[:,i] = (df[:,i]-df[:,i].min())/(df[:,i].max()-df[:,i].min()) #x_data = df[['CRIM', 'DIS', 'LSTAT']].values.astype(float) #Select 3 of the more important influencing factors x_data = df[:,:12] #y_data = df['MEDV'].values.astype(float) #get y y_data = df[:,12]
Build the model
** Define placeholders for \(x\) and \(y\) **
x = tf.placeholder(tf.float32, [None,12], name = "x") # 3 influencing factors y = tf.placeholder(tf.float32, [None,1], name = "y")
** create variable **
with tf.name_scope("Model"): w = tf.Variable(tf.random_normal([12,1], stddev=0.01), name="w0") b = tf.Variable(1., name="b0") def model(x, w, b): return tf.matmul(x, w) + b pred= model(x, w, b)
You can see that both b0 and w0 are under the namespace Model **
** Supplementary introduction - namespace name_scope **
There are often thousands of nodes in Tensorflow, and it is difficult to display them all at once during the visualization process. Therefore, name_scope can be used to divide the scope of variables. In visualization, this represents a level in the calculation graph.
- name_scope** does not affect **names of variables created with get_variable()
- name_scope** affects ** variables created with Variable() and op_name
The following examples illustrate:
Train the model
** Set training parameters **
train_epochs = 500 # number of iterations learning_rate = 0.01 #learning rate
** Define the mean squared loss function **
with tf.name_scope("LossFunction"): loss_function = tf.reduce_mean(tf.pow(y-pred, 2)) #Mean Squared Error MSE
Similarly, we can view the operations (op) under the namespace LossFunction through TensorBoard, including: mean, pow and sub (subtraction), and we define loss_function = tf.reduce_mean(tf.pow(y-pred, 2)) consistent.
** Select optimizer **
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_function)
** Declare Session **
sess = tf.Session() init = tf.global_variables_initializer()
** Generate graph protocol file **
tf.train.write_graph(sess.graph, 'log2/boston','graph.pbtxt')
'log2/boston\\graph.pbtxt'
loss_op = tf.summary.scalar("loss", loss_function) merged = tf.summary.merge_all()
** Supplementary introduction - TensorBoard
TensorBoard is a visualization tool that comes with Tensorflow.
Currently 7 visualization objects are supported: SCALARS, IMAGES,AUDIO,GRAPHS,DISTRIBUTIONS,HISTOGRAMS,EMBEDDINGS**.
During the training process, visualization can be achieved by recording structured data, and then running a local server listening on port 6006.
First specify the data to be recorded, and then open the TensorBoard panel with the following command:
** tensorboard --logdir=/your/log/path **
After entering the above command, it will display:
At this point, we can open the ** in the browser http://192.168.2.102:6006 to view the various functions of the panel.
Note: The specific IP address will vary from machine to machine, just check the IP displayed after "You can navigate to" in the command window.
For example, in this example, the value of loss_function is recorded through tf.summary.scalar **, so the following visualization results can be viewed in the SCALARS panel of TensorBoard:
** start session **
sess.run(init)
** Create a file writer (FileWriter) for digests **
writer = tf.summary.FileWriter('log/boston', sess.graph)
The path '/path/to/logs' specified in tf.summary.FileWriter('/path/to/logs', sess.graph) is the value of the parameter logdir when running the tensorboard command
** Iterative training **
loss_list = [] for epoch in range (train_epochs): loss_sum=0.0 for xs, ys in zip(x_data, y_data): z1 = xs.reshape(1,12) z2 = ys.reshape(1,1) _,loss = sess.run([optimizer,loss_function], feed_dict={x: z1, y: z2}) summary_str = sess.run(loss_op, feed_dict={x: z1, y: z2}) #lossv+=sess.run(loss_function, feed_dict={x: z1, y: z2})/506.00 loss_sum = loss_sum + loss # loss_list.append(loss) writer.add_summary(summary_str, epoch) x_data, y_data = shuffle(x_data, y_data) print (loss_sum) b0temp=b.eval(session=sess) w0temp=w.eval(session=sess) loss_average = loss_sum/len(y_data) loss_list.append(loss_average) print("epoch=", epoch+1,"loss=",loss_average,"b=", b0temp,"w=", w0temp )
149248.76506266068 epoch= 1 loss= 294.95803372067326 b= 3.94445 w= [[0.9625533] [1.7546748] [2.3211656] [1.1788806] [2.3749657] [2.9518034] [2.4899516] [2.6423771] [2.4740942] [2.3962796] [2.405994 ] [1.7411354]] 79301.29809246541 epoch= 2 loss= 156.72193298906208 b= 5.731459 w= [[-0.10316267] [ 3.2055295 ] [ 2.7016158 ] [ 1.8831778 ] [ 2.8047607 ] [ 4.819791 ] [ 3.5140483 ] [ 4.73509 ] [ 2.2039707 ] [ 2.4170697 ] [ 3.5754337 ] [ 1.6520783 ]] 59155.64258796818 epoch= 3 loss= 116.90838456120193 b= 6.8220057 w= [[-1.4175 ] [ 4.436672 ] [ 2.2779799] [ 2.4264417] [ 2.4682086] [ 6.168085 ] [ 3.792487 ] [ 6.3616514] [ 1.288084 ] [ 1.6558697] [ 3.8651628] [ 0.6330488]] 47703.83335633681
....................
print("y=",w0temp[0], "x1+",w0temp[1], "x2+",w0temp[2], "x3+", [b0temp]) print("y=",w0temp[0], "CRIM+", w0temp[1], 'DIS+', w0temp[2], "LSTAT+", [b0temp])
y= [-10.775753] x1+ [4.629923] x2+ [0.36049515] x3+ [30.468655] y= [-10.775753] CRIM+ [4.629923] DIS+ [0.36049515] LSTAT+ [30.468655]
plt.plot(loss_list)
<IPython.core.display.Javascript object>
[<matplotlib.lines.Line2D at 0x1567f8cda58>]