Recommendation system FM - super detailed python combat


Here you can view the MF model I wrote before as a learning basis, Recommendation system MF - SVD and SVD++ matrix factorization

1.FM model

On the basis of the original linear model, the FM model combines the features considering the correlation between the features. The data model expresses the feature xi, and the combination of xj is represented by xixj. Now only the second-order polynomial model is considered, that is, the problem of pairwise combination of features. The formula is as follows:


First, implement the basic linear model formula, and use a randomly generated vector with a length of 10353-1 (except for the rating to be predicted) that conforms to the normal distribution to represent the individual weight of each feature. A blogger wrote about the FM derivation very detailed [Recommended algorithm] FM high-order decomposition model - including feature intersection, POLY2 model

2. Dataset

What FM has to solve is high-order matrix decomposition, which has limitations in processing additional information such as time and labels. So what is needed is high-level data, and the movielen data set is still selected here. On the basis of the original user's movie rating, add the vector information of the movie type and the time label of the user's comment

Feature index namemeaning
movieIdmovie id
userIduserid
genresmovie type
timestamptimestamp
ratingscore

The types of movies are

Data read-in merge

import numpy as np
import pandas as pd

datapath1=r"ml-latest-small\ratings.csv"
data1=pd.read_csv(datapath1)
datapath2=r"ml-latest-small\movies.csv"
data2=pd.read_csv(datapath2)
data1,data2

Results👇

What we need here is the genre movie type information of movies.csv. Because a movie may have multiple categories. So use onehots one-hot encoding for processing.

string="* Action* Adventure* Animation* Children's* Comedy* Crime* Documentary* Drama* Fantasy* Film-Noir* Horror* Musical* Mystery* Romance* Sci-Fi* Thriller* War* Western"
list=string.split("* ")[1:]
list

number=len(list)
ans=[]
for t,row in data2.iterrows():
    type_mov=[0]*(number+1)
    k=row["genres"].split("|")
    type_mov[number]=row["movieId"]
    for t in k:
        for t2,st in enumerate(list):
            if(t==st):
                type_mov[t2]=1
    ans.append(type_mov.copy())
column=["type_%d" %t for t in range(number)]
column.append("movieId")
ans_pd=pd.DataFrame(ans,columns=column)
ans_pd
        


Similarly, the userid and movieId of data1, as numerical data, are not linearly related, and onehot encoding is also performed here.

one_hot_user = pd.get_dummies(data1["userId"].values,prefix="user")
one_hot_mov = pd.get_dummies(data1["movieId"].values,prefix="mov")
one_hot_user,one_hot_mov


Then merge data1 with ans_pd and onehot encoded movieId and userid according to movieId

# one_hot_user,one_hot_mov
data=data1.join(one_hot_user).join(one_hot_mov)
data=pd.merge(data,ans_pd)
data


Because the time stamp is too large, it may cause the problem that the teacher said that the time is not handled well. So delete the original userid, movieId, timestamp columns. At this point, the data processing is completed, and a huge sparse matrix of 100836 rows × 10356 columns is obtained.

data=data.drop(columns=["userId","movieId","timestamp"])
data

3.FM solution

On the basis of the original linear model, the FM model combines the features considering the correlation between the features. The data model expresses the feature xi, and the combination of xj is represented by xixj. Now only the second-order polynomial model is considered, that is, the problem of pairwise combination of features. The formula is as follows:

First implement the basic linear model formula, and use a randomly generated vector with a length of 10353-1 (except for the rating to be predicted) that conforms to the normal distribution to represent the individual weights of each feature

w0=0
import numpy as np
w=np.random.normal(0,0.5,10353-1)
w


The linear regression model at this time is expressed as

# w=pd.DataFrame(w)
ans_data=np.dot(data.drop(columns=["rating"]),w)+w0
pd.DataFrame(ans_data)


The sum of the absolute values ​​of the predicted value minus the practice score is used as the loss function

abs(data["rating"]-ans_data).sum() 


FM adds the features between the two functions on this basis. Here, the 10352*10352 matrix is ​​used to represent X[i][j] to represent the relationship weight of the i-th and j-th matrices, and the normal distribution is still used. The method of generation is closer to the original score according to the test variance at 0.001, reducing the number of initial iterations. Calculating the predicted value does not require the entire matrix. Only half of the diagonal is needed, because all calculations except the diagonal will be repeated. Use np.triu to set the diagonal of the lower triangle to 0

wij=np.random.normal(0,0.001,[10352,10352])
wij=np.triu(wij, k=1)
pd.DataFrame(wij)


Modify the formula expression after adding pairwise feature weights as follows. If you take all the data and run for more than 5 minutes, it means that the matrix multiplication of 100836 × 10353 and 100836 × 10353 is too large. Here only the first 100 copies are taken for operation.

data_ij=data.drop(columns=["rating"])[100815:]
data_ij

k=[]
from tqdm import tqdm
for name,data_t in tqdm(data_ij.iterrows(),desc="Traverse to the first few users"):
    data_ij2=np.dot(data_t.T,data_t)
    ans_ij=np.dot(wij,data_ij2).sum()
    k.append(ans_ij)
k

ans_data=np.dot(data_ij,w)+k+w0
pd.DataFrame(ans_data)

(data["rating"][100815:]-ans_data).sum()


Because the 1035210352 matrix is ​​too large to affect the calculation speed, because only the upper triangular matrix is ​​needed, the auxiliary vector V can be introduced into each feature component xi by the method of matrix decomposition. Let VVt=W. V is the moment of k10352, continue the above operation

K=10
V=np.random.normal(0,0.01,[10352,K])
wij=np.dot(V,V.T)
wij=np.triu(wij, k=1)
pd.DataFrame(wij)

k=[]
from tqdm import tqdm
for name,data_t in tqdm(data_ij.iterrows(),desc="Traverse to the first few users"):
    data_ij2=np.dot(data_t.T,data_t)
    ans_ij=np.dot(wij,data_ij2).sum()
    k.append(ans_ij)
k

ans_data=np.dot(data_ij,w)+k+w0
pd.DataFrame(ans_data)
loss=(data["rating"][100815:]-ans_data).sum()
loss


After that, the parameters are updated according to the partial derivative of the result, and the method of stochastic gradient descent is also used, that is, a non-empty value in the original matrix is ​​randomly selected, and the value of the matrix is ​​fitted by a specific Q and P, and the partial derivative is subtracted. Update the weight parameters and set the learning rate to U=0.1. The partial derivative formula is as follows

import random
number=random.randrange(0,len(data),1)
data_t=data.drop(columns=["rating"]).iloc[[number]]
data_t
data_ij2=np.dot(data_t.T,data_t)
ans_ij=np.dot(wij,data_ij2).sum()
ans_ij

ans_data=np.dot(data.drop(columns=["rating"]).iloc[[number]],w)+ans_ij+w0
ans_data

loss=(data["rating"].iloc[[number]]-ans_data).sum()
loss

i=0
j=0
f=0
V[j][f]*data["rating"].iloc[[j]]-V[i][f]*data["rating"].iloc[[j]]**2

U=0.1
w0+=U*1*loss
w+=(data["rating"].iloc[[number]]*U*loss).iloc[0]
tidu_V=0
for i in tqdm(range(len(V))):
    for f in range(K):
        total=0
        for j in range(K):
            now=(V[j][f]*data["rating"].iloc[[j]]-V[i][f]*data["rating"].iloc[[j]]**2).iloc[0]
            total+=now
        V[i][f]=data["rating"].iloc[[j]]*total
w0,w,V

Using the for loop directly is still too slow, but it can be used. hehehehe~~~~

Tags: Python programming language

Posted by rcity on Wed, 01 Jun 2022 21:30:49 +0530