Here you can view the MF model I wrote before as a learning basis, Recommendation system MF - SVD and SVD++ matrix factorization
1.FM model
On the basis of the original linear model, the FM model combines the features considering the correlation between the features. The data model expresses the feature xi, and the combination of xj is represented by xixj. Now only the second-order polynomial model is considered, that is, the problem of pairwise combination of features. The formula is as follows:
First, implement the basic linear model formula, and use a randomly generated vector with a length of 10353-1 (except for the rating to be predicted) that conforms to the normal distribution to represent the individual weight of each feature. A blogger wrote about the FM derivation very detailed [Recommended algorithm] FM high-order decomposition model - including feature intersection, POLY2 model
2. Dataset
What FM has to solve is high-order matrix decomposition, which has limitations in processing additional information such as time and labels. So what is needed is high-level data, and the movielen data set is still selected here. On the basis of the original user's movie rating, add the vector information of the movie type and the time label of the user's comment
Feature index name | meaning |
---|---|
movieId | movie id |
userId | userid |
genres | movie type |
timestamp | timestamp |
rating | score |
The types of movies are
Data read-in merge
import numpy as np import pandas as pd datapath1=r"ml-latest-small\ratings.csv" data1=pd.read_csv(datapath1) datapath2=r"ml-latest-small\movies.csv" data2=pd.read_csv(datapath2) data1,data2
Results👇
What we need here is the genre movie type information of movies.csv. Because a movie may have multiple categories. So use onehots one-hot encoding for processing.
string="* Action* Adventure* Animation* Children's* Comedy* Crime* Documentary* Drama* Fantasy* Film-Noir* Horror* Musical* Mystery* Romance* Sci-Fi* Thriller* War* Western" list=string.split("* ")[1:] list
number=len(list) ans=[] for t,row in data2.iterrows(): type_mov=[0]*(number+1) k=row["genres"].split("|") type_mov[number]=row["movieId"] for t in k: for t2,st in enumerate(list): if(t==st): type_mov[t2]=1 ans.append(type_mov.copy()) column=["type_%d" %t for t in range(number)] column.append("movieId") ans_pd=pd.DataFrame(ans,columns=column) ans_pd
Similarly, the userid and movieId of data1, as numerical data, are not linearly related, and onehot encoding is also performed here.
one_hot_user = pd.get_dummies(data1["userId"].values,prefix="user") one_hot_mov = pd.get_dummies(data1["movieId"].values,prefix="mov") one_hot_user,one_hot_mov
Then merge data1 with ans_pd and onehot encoded movieId and userid according to movieId
# one_hot_user,one_hot_mov data=data1.join(one_hot_user).join(one_hot_mov) data=pd.merge(data,ans_pd) data
Because the time stamp is too large, it may cause the problem that the teacher said that the time is not handled well. So delete the original userid, movieId, timestamp columns. At this point, the data processing is completed, and a huge sparse matrix of 100836 rows × 10356 columns is obtained.
data=data.drop(columns=["userId","movieId","timestamp"]) data
3.FM solution
On the basis of the original linear model, the FM model combines the features considering the correlation between the features. The data model expresses the feature xi, and the combination of xj is represented by xixj. Now only the second-order polynomial model is considered, that is, the problem of pairwise combination of features. The formula is as follows:
First implement the basic linear model formula, and use a randomly generated vector with a length of 10353-1 (except for the rating to be predicted) that conforms to the normal distribution to represent the individual weights of each feature
w0=0 import numpy as np w=np.random.normal(0,0.5,10353-1) w
The linear regression model at this time is expressed as
# w=pd.DataFrame(w) ans_data=np.dot(data.drop(columns=["rating"]),w)+w0 pd.DataFrame(ans_data)
The sum of the absolute values ​​of the predicted value minus the practice score is used as the loss function
abs(data["rating"]-ans_data).sum()
FM adds the features between the two functions on this basis. Here, the 10352*10352 matrix is ​​used to represent X[i][j] to represent the relationship weight of the i-th and j-th matrices, and the normal distribution is still used. The method of generation is closer to the original score according to the test variance at 0.001, reducing the number of initial iterations. Calculating the predicted value does not require the entire matrix. Only half of the diagonal is needed, because all calculations except the diagonal will be repeated. Use np.triu to set the diagonal of the lower triangle to 0
wij=np.random.normal(0,0.001,[10352,10352]) wij=np.triu(wij, k=1) pd.DataFrame(wij)
Modify the formula expression after adding pairwise feature weights as follows. If you take all the data and run for more than 5 minutes, it means that the matrix multiplication of 100836 × 10353 and 100836 × 10353 is too large. Here only the first 100 copies are taken for operation.
data_ij=data.drop(columns=["rating"])[100815:] data_ij
k=[] from tqdm import tqdm for name,data_t in tqdm(data_ij.iterrows(),desc="Traverse to the first few users"): data_ij2=np.dot(data_t.T,data_t) ans_ij=np.dot(wij,data_ij2).sum() k.append(ans_ij) k
ans_data=np.dot(data_ij,w)+k+w0 pd.DataFrame(ans_data)
(data["rating"][100815:]-ans_data).sum()
Because the 1035210352 matrix is ​​too large to affect the calculation speed, because only the upper triangular matrix is ​​needed, the auxiliary vector V can be introduced into each feature component xi by the method of matrix decomposition. Let VVt=W. V is the moment of k10352, continue the above operation
K=10 V=np.random.normal(0,0.01,[10352,K]) wij=np.dot(V,V.T) wij=np.triu(wij, k=1) pd.DataFrame(wij)
k=[] from tqdm import tqdm for name,data_t in tqdm(data_ij.iterrows(),desc="Traverse to the first few users"): data_ij2=np.dot(data_t.T,data_t) ans_ij=np.dot(wij,data_ij2).sum() k.append(ans_ij) k
ans_data=np.dot(data_ij,w)+k+w0 pd.DataFrame(ans_data) loss=(data["rating"][100815:]-ans_data).sum() loss
After that, the parameters are updated according to the partial derivative of the result, and the method of stochastic gradient descent is also used, that is, a non-empty value in the original matrix is ​​randomly selected, and the value of the matrix is ​​fitted by a specific Q and P, and the partial derivative is subtracted. Update the weight parameters and set the learning rate to U=0.1. The partial derivative formula is as follows
import random number=random.randrange(0,len(data),1) data_t=data.drop(columns=["rating"]).iloc[[number]] data_t data_ij2=np.dot(data_t.T,data_t) ans_ij=np.dot(wij,data_ij2).sum() ans_ij
ans_data=np.dot(data.drop(columns=["rating"]).iloc[[number]],w)+ans_ij+w0 ans_data
loss=(data["rating"].iloc[[number]]-ans_data).sum() loss
i=0 j=0 f=0 V[j][f]*data["rating"].iloc[[j]]-V[i][f]*data["rating"].iloc[[j]]**2
U=0.1 w0+=U*1*loss w+=(data["rating"].iloc[[number]]*U*loss).iloc[0] tidu_V=0 for i in tqdm(range(len(V))): for f in range(K): total=0 for j in range(K): now=(V[j][f]*data["rating"].iloc[[j]]-V[i][f]*data["rating"].iloc[[j]]**2).iloc[0] total+=now V[i][f]=data["rating"].iloc[[j]]*total w0,w,V
Using the for loop directly is still too slow, but it can be used. hehehehe~~~~