Tensorflow Summer Practice - Boston Housing Price Prediction

Copyright Note: Zhejiang University of Finance and Economics Professional Practice Deep Learning tensorflow - Qi Feng##

Boston house price forecast

Table of contents

Tensorflow implements univariate linear regression

** See Example: Univariate Linear Regression.ipynb **

Tensorflow implements multivariate linear regression

** In the previous section, we used Tensorflow to build our first complete model - univariate linear regression. In this section, we will build a multivariate linear model to implement regression on multidimensional data. In addition, we will also introduce how to analyze the training process using TensorBoard, the visualization tool that comes with Tensorflow. **

load data

** Import related libraries **

%matplotlib notebook

import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.contrib.learn as skflow
from sklearn.utils import shuffle
import numpy as np
import pandas as pd
import os 
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
print(tf.__version__)
print(tf.test.is_gpu_available())
1.12.0
False

** Dataset Introduction **

This dataset contains multiple factors related to Boston housing prices:

**CRIM**: Urban crime rate per capita

** ZN **: Proportion of residential land over 25,000 sq.ft.

** INDUS ** : Proportion of urban non-retail land

**CHAS**: Charles River null variable (1 if the boundary is a river; otherwise, 0)

**NOX**: Nitric oxide concentration

**RM**: Average number of rooms in a residence

**AGE **: Proportion of owner-occupied houses built before 1940

**DIS**: Weighted distance to Boston's 5 central areas

**RAD**: proximity index of radiant roads

**TAX**: Full-value property tax rate per $10,000

**PTRATIO**: urban teacher-student ratio

**LSTAT**: Proportion of low-status people in the population

**MEDV**: Average home price in thousands of dollars

** The dataset is stored in CSV format, which can be read and formatted by the Pandas library **

** Pandas library ** can help us quickly read data files of regular size

Ability to read CVS files, text files, MS Excel, SQL databases and HDF5 format files for scientific purposes

Automatic conversion to Numpy multidimensional array

** Import data via Pandas **

df = pd.read_csv("data/boston.csv", header=0)
print (df.describe())
             CRIM         ZN       INDUS         CHAS         NOX          RM  \
count  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000   
mean     3.613524   11.363636   11.136779    0.069170    0.554695    6.284634   
std      8.601545   23.322453    6.860353    0.253994    0.115878    0.702617   
min      0.006320    0.000000    0.460000    0.000000    0.385000    3.561000   
25%      0.082045    0.000000    5.190000    0.000000    0.449000    5.885500   
50%      0.256510    0.000000    9.690000    0.000000    0.538000    6.208500   
75%      3.677082   12.500000   18.100000    0.000000    0.624000    6.623500   
max     88.976200  100.000000   27.740000    1.000000    0.871000    8.780000   

              AGE         DIS         RAD         TAX     PTRATIO       LSTAT  \
count  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000   
mean    68.574901    3.795043    9.549407  408.237154   18.455534   12.653063   
std     28.148861    2.105710    8.707259  168.537116    2.164946    7.141062   
min      2.900000    1.129600    1.000000  187.000000   12.600000    1.730000   
25%     45.025000    2.100175    4.000000  279.000000   17.400000    6.950000   
50%     77.500000    3.207450    5.000000  330.000000   19.050000   11.360000   
75%     94.075000    5.188425   24.000000  666.000000   20.200000   16.955000   
max    100.000000   12.126500   24.000000  711.000000   22.000000   37.970000   

             MEDV  
count  506.000000  
mean    22.532806  
std      9.197104  
min      5.000000  
25%     17.025000  
50%     21.200000  
75%     25.000000  
max     50.000000  

** Load the data required for this example **

df = np.array(df)

for i in range(12):
    df[:,i] = (df[:,i]-df[:,i].min())/(df[:,i].max()-df[:,i].min())
#x_data = df[['CRIM', 'DIS', 'LSTAT']].values.astype(float) #Select 3 of the more important influencing factors
x_data = df[:,:12]
#y_data = df['MEDV'].values.astype(float) #get y
y_data = df[:,12]

Build the model

** Define placeholders for \(x\) and \(y\) **

x = tf.placeholder(tf.float32, [None,12], name = "x") # 3 influencing factors
y = tf.placeholder(tf.float32, [None,1], name = "y")

** create variable **

with tf.name_scope("Model"):
    w = tf.Variable(tf.random_normal([12,1], stddev=0.01), name="w0")
    b = tf.Variable(1., name="b0")
    def model(x, w, b):
        return tf.matmul(x, w) + b

    pred= model(x, w, b)

You can see that both b0 and w0 are under the namespace Model **

** Supplementary introduction - namespace name_scope **

There are often thousands of nodes in Tensorflow, and it is difficult to display them all at once during the visualization process. Therefore, name_scope can be used to divide the scope of variables. In visualization, this represents a level in the calculation graph.

  • name_scope** does not affect **names of variables created with get_variable()
  • name_scope** affects ** variables created with Variable() and op_name

    The following examples illustrate:

Train the model

** Set training parameters **

train_epochs = 500 # number of iterations
learning_rate = 0.01 #learning rate

** Define the mean squared loss function **

with tf.name_scope("LossFunction"):
    loss_function = tf.reduce_mean(tf.pow(y-pred, 2)) #Mean Squared Error MSE

Similarly, we can view the operations (op) under the namespace LossFunction through TensorBoard, including: mean, pow and sub (subtraction), and we define loss_function = tf.reduce_mean(tf.pow(y-pred, 2)) consistent.

** Select optimizer **

optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_function)

** Declare Session **

sess = tf.Session()
init = tf.global_variables_initializer()

** Generate graph protocol file **

tf.train.write_graph(sess.graph, 'log2/boston','graph.pbtxt')
'log2/boston\\graph.pbtxt'
loss_op = tf.summary.scalar("loss", loss_function)
merged = tf.summary.merge_all()

** Supplementary introduction - TensorBoard

TensorBoard is a visualization tool that comes with Tensorflow.

Currently 7 visualization objects are supported: SCALARS, IMAGES,AUDIO,GRAPHS,DISTRIBUTIONS,HISTOGRAMS,EMBEDDINGS**.

During the training process, visualization can be achieved by recording structured data, and then running a local server listening on port 6006.

First specify the data to be recorded, and then open the TensorBoard panel with the following command:

** tensorboard --logdir=/your/log/path **

After entering the above command, it will display:

At this point, we can open the ** in the browser http://192.168.2.102:6006 to view the various functions of the panel.

Note: The specific IP address will vary from machine to machine, just check the IP displayed after "You can navigate to" in the command window.

For example, in this example, the value of loss_function is recorded through tf.summary.scalar **, so the following visualization results can be viewed in the SCALARS panel of TensorBoard:

** start session **

sess.run(init)

** Create a file writer (FileWriter) for digests **

writer = tf.summary.FileWriter('log/boston', sess.graph) 

The path '/path/to/logs' specified in tf.summary.FileWriter('/path/to/logs', sess.graph) is the value of the parameter logdir when running the tensorboard command

** Iterative training **

loss_list = []
for epoch in range (train_epochs):
    loss_sum=0.0
    for xs, ys in zip(x_data, y_data):   
        z1 = xs.reshape(1,12)
        z2 = ys.reshape(1,1)
        _,loss = sess.run([optimizer,loss_function], feed_dict={x: z1, y: z2}) 
        summary_str = sess.run(loss_op, feed_dict={x: z1, y: z2})
        #lossv+=sess.run(loss_function, feed_dict={x: z1, y: z2})/506.00
        loss_sum = loss_sum + loss
       # loss_list.append(loss)
        writer.add_summary(summary_str, epoch) 
    x_data, y_data = shuffle(x_data, y_data)
    print (loss_sum)
    b0temp=b.eval(session=sess)
    w0temp=w.eval(session=sess)
    loss_average = loss_sum/len(y_data)
    loss_list.append(loss_average)
    print("epoch=", epoch+1,"loss=",loss_average,"b=", b0temp,"w=", w0temp )
    
149248.76506266068
epoch= 1 loss= 294.95803372067326 b= 3.94445 w= [[0.9625533]
 [1.7546748]
 [2.3211656]
 [1.1788806]
 [2.3749657]
 [2.9518034]
 [2.4899516]
 [2.6423771]
 [2.4740942]
 [2.3962796]
 [2.405994 ]
 [1.7411354]]
79301.29809246541
epoch= 2 loss= 156.72193298906208 b= 5.731459 w= [[-0.10316267]
 [ 3.2055295 ]
 [ 2.7016158 ]
 [ 1.8831778 ]
 [ 2.8047607 ]
 [ 4.819791  ]
 [ 3.5140483 ]
 [ 4.73509   ]
 [ 2.2039707 ]
 [ 2.4170697 ]
 [ 3.5754337 ]
 [ 1.6520783 ]]
59155.64258796818
epoch= 3 loss= 116.90838456120193 b= 6.8220057 w= [[-1.4175   ]
 [ 4.436672 ]
 [ 2.2779799]
 [ 2.4264417]
 [ 2.4682086]
 [ 6.168085 ]
 [ 3.792487 ]
 [ 6.3616514]
 [ 1.288084 ]
 [ 1.6558697]
 [ 3.8651628]
 [ 0.6330488]]
47703.83335633681

....................

print("y=",w0temp[0], "x1+",w0temp[1], "x2+",w0temp[2], "x3+", [b0temp])
print("y=",w0temp[0], "CRIM+", w0temp[1], 'DIS+', w0temp[2], "LSTAT+", [b0temp])
y= [-10.775753] x1+ [4.629923] x2+ [0.36049515] x3+ [30.468655]
y= [-10.775753] CRIM+ [4.629923] DIS+ [0.36049515] LSTAT+ [30.468655]
plt.plot(loss_list)
<IPython.core.display.Javascript object>
[<matplotlib.lines.Line2D at 0x1567f8cda58>]

Posted by laurus on Wed, 01 Jun 2022 01:04:34 +0530