Sentiment attitude analysis based on sentiment dictionary [easy to understand]

Hello everyone, meet again, I am your friend Quanzhanjun.

Sentiment analysis refers to mining the opinions expressed in the text, identifying whether the subject's evaluation of an object is positive or negative, and conducting tendency research based on the attitude. Text sentiment analysis can be divided into sentiment classification methods based on machine learning and sentiment analysis based on semantic understanding. Semantic analysis based on machine learning requires a large number of training sets and requires manual classification and labeling. The method I use is based on Sentiment Attitude Analysis Using Sentiment Dictionary in Semantic Understanding.

Here is the sentiment dictionary I use:

Link: https://pan.baidu.com/s/1toX2wqlIe2H-o_T6MFen1A Extraction code: gobt After copying this content, open the Baidu SkyDrive mobile App, which is more convenient to operate.

There are many kinds of sentiment dictionaries, such as Harbin Institute of Technology, HowNet sentiment dictionary and NTUSD Simplified Chinese sentiment dictionary of Taiwan University, etc., but not every dictionary is useful to us, we have to choose the right one according to our own text content Emotional Dictionary.

For sentiment analysis, we can’t do the analysis according to what we think, and we need certain supporting conditions. The algorithm I use is based on the paper "Sentiment Dictionary Construction and Analysis Methods for Weibo Sentiment Analysis" written by Yang Liyue and Wang Yizhi of Beijing Jiaotong University. The address of this paper is Research on Sentiment Dictionary Construction and Analysis Method of Weibo Sentiment Analysis – CNKI

The general process of sentiment analysis is as follows:

The first step is to preprocess the text:

The preprocessing of the text here is to segment the sentences. There are many word segmentation tools. I chose to use the stuttering word segmentation in python. This word segmentation tool is very useful. It can also analyze the part of speech while segmenting words. However, before word segmentation, for a piece of text content, not all the content is helpful for sentiment analysis, such as the title of a book, the title of Weibo text and some non-Chinese content, etc., we can use this regex to match only what we need

import jieba.posseg as pseg #includes part of speech (word, part of speech)

def seg_word(sentenct):
    d = ""
    sentenct = re.sub(u"\\#.*?\\#|\\@.*?\\ |\\<.*?\\>|[.*?]", "", sentenct)  # deal with#...#and the data between @... (space)
    s = re.compile(r'http://[a-zA-Z0-9.?/&=:]*', re.S)
    sentenct = s.sub("", sentenct)
    segList = pseg.cut(sentenct)  #Participle
    segResult = []
    data_c = []
    data_p =[]
    for word,flag in segList:
        if ('\u4e00' <= word <= '\u9fa5'):   #Determine whether the string is Chinese
            if len(word)>1:
                data_p.append(word)
            segResult.append(word)
        s = word + "/" + flag
        d=d+s+"  "
    data_c.append(d)
    return data_c  #list with parts of speech
copy

The second step is to train the word vector after segmentation through the sentiment dictionary:

English-Chinese dictionary sentiment includes sentiment words, negative words, adverb degrees and stop words

Emotional words: the words that the subject expresses the inner evaluation of an object, with a strong emotional color.

Adverbs of Degree: Not emotionally inclined, but capable of enhancing or weakening emotional intensity

Negative words: They do not have emotional tendencies themselves, but can change the polarity of emotions

Stop words: words that are completely useless or meaningless

First, we can use stop words to remove some meaningless words, such as this, that, and so on. After filtering out some meaningless words, most of the remaining words are useful words for sentiment analysis. After removing the stop words, we can use sentiment words, degree adverbs and negative words to perform sentiment analysis using a certain algorithm

Below is the code to match by deactivating dictionary

def stopchineseword(segResult):
    file = open("f:\\chineseStopWords.txt","r")
    data = []
    new_segResult=[]
    for i in file.readlines(): #Read data from a file and add it to a list
        data.append(i.strip())
    for i in segResult:
        if i in data:  #Compare if it is a stop word
            continue
        else:
            new_segResult.append(i)
    return new_segResult
copy

After completing the above two steps, we can start to use the sentiment dictionary for analysis

Some people will ask how to analyze the emotional words after knowing them. Are these just words? Here, we need to score sentiment words. Sentiment words are divided into positive sentiment words and negative sentiment words, and may be divided into several categories, but only the positive and negative sides are discussed here. Some sentiment dictionaries may give the scores corresponding to sentiment words (I don't know how to calculate them). According to what is written in the above-mentioned paper, we assign values ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​to emotional words. The score of positive emotional words is 1, the score of negative emotional words is -1, and the score of neutral words is 0 . Adverbs can also be given according to the dictionary. , Negative Words set all is – 1

Semantics is an important feature of sentence sentiment classification, and document classification judgment should be judged according to the steps of vocabulary, sentence, and Weibo short text. Affective tendencies are often modified by adverbs of degree before emotional words. When emotion words are modified by degree adverbs, the emotional tendency of emotion words will be strengthened or weakened. However, when a negative word is modified before an emotional word, the emotional tendency of the emotional word will be reversed. But there is a problem to pay attention to here, that is, there may be two results for different positions of negative words and degree words. One is "negative word + degree adverb + emotional word", and the other is "degree adverb + negative word + emotional word".

The calculation method for "negative word + degree adverb + emotional word" is

w = t *( – 1)* a * 0.5

The calculation method for "degree adverb + negative word + emotional word" is

w = t *( – 1)* a * 2

Among them, w represents the calculated emotional intensity value of the emotional word, t represents the weight of the emotional word, and represents the weight of the degree adverb before the emotional word t

After the weights of all sentiment words in the word vector are obtained, the sum is performed. If the obtained score is greater than 0, it is a positive sentiment; if the score is less than 0, it is a negative sentiment; if the score is 0, it is a medium sexual emotion.

The general algorithm is the same as shown in the figure below, but my picture still has some flaws, that is, the loop is not displayed when judging the part of speech of the word

The code for finding emotional words is as follows (the focus is on calculating the weights of emotional words in the whole sentence)

def classify_words(dict_data):
    positive_words = []
    positive_word = open("f:\\positive emotion words.txt","r",encoding="utf-8").readlines()
    for i in positive_word:
        positive_words.append(i.strip())

    negative_words = []
    negative_word = open("f:\\negative emotion words.txt","r",encoding="utf-8").readlines()
    for i in negative_word:
        negative_words.append(i.strip())

    privative_words = []
    privative_word = open("f:\\Negative Words.txt","r",encoding="utf-8").readlines()
    for i in privative_word:
        privative_words.append(i.strip())

    adverb_of_degree_words1 = []
    adverb_of_degree1 = open("f:\\2 times.txt","r").readlines()
    for i in adverb_of_degree1:
        adverb_of_degree_words1.append(i.strip())

    adverb_of_degree_words2 = []
    adverb_of_degree2 = open("f:\\1.5 times.txt","r").readlines()
    for i in adverb_of_degree2:
        adverb_of_degree_words2.append(i.strip())

    adverb_of_degree_words3 = []
    adverb_of_degree3 = open("f:\\1.25 times.txt","r").readlines()
    for i in adverb_of_degree3:
        adverb_of_degree_words3.append(i.strip())

    adverb_of_degree_words4 = []
    adverb_of_degree4 = open("f:\\1.2 times.txt","r").readlines()
    for i in adverb_of_degree4:
        adverb_of_degree_words4.append(i.strip())

    adverb_of_degree_words5 = []
    adverb_of_degree5 = open("f:\\0.8 times.txt","r").readlines()
    for i in adverb_of_degree5:
        adverb_of_degree_words5.append(i.strip())

    adverb_of_degree_words6 = []
    adverb_of_degree6 = open("f:\\0.5 times.txt","r").readlines()
    for i in adverb_of_degree6:
        adverb_of_degree_words6.append(i.strip())

    z = 0
    data = []
    for k,v in enumerate(dict_data):
        w = 0
        if v in positive_words:   #word for positive emotion
            w += 1
            for i in range(z, int(k)):
                if dict_data[i] in privative_words:
                    for j in range(z, i):   #Degree words + Negative words + Emotion words
                        if dict_data[j] in adverb_of_degree_words6 or dict_data[j] in adverb_of_degree_words5 or \
                                dict_data[j] in adverb_of_degree_words4 or dict_data[j] in adverb_of_degree_words3 or \
                                dict_data[j] in adverb_of_degree_words2 or dict_data[j] in adverb_of_degree_words1:
                            w = w * (-1) * 2
                            break
                    for j in range(i, int(k)):  #Negative words + Degree words + Emotion words
                        if dict_data[j] in adverb_of_degree_words6 or dict_data[j] in adverb_of_degree_words5 or \
                                dict_data[j] in adverb_of_degree_words4 or dict_data[j] in adverb_of_degree_words3 or \
                                dict_data[j] in adverb_of_degree_words2 or dict_data[j] in adverb_of_degree_words1:
                            w = w * 0.5
                            break
                elif dict_data[i] in adverb_of_degree_words1:
                    w =w * 2
                elif dict_data[i] in adverb_of_degree_words2:
                    w =w * 1.5
                elif dict_data[i] in adverb_of_degree_words3:
                    w =w * 1.25
                elif dict_data[i] in adverb_of_degree_words4:
                    w =w * 1.2
                elif dict_data[i] in adverb_of_degree_words5:
                    w =w * 0.8
                elif dict_data[i] in adverb_of_degree_words6:
                    w =w * 0.5
            z = int(k) + 1
        if v in negative_words:   #word for negative emotion
            w -= 1
            for i in range(z, int(k)):
                if dict_data[i] in privative_words:
                    for j in range(z, i):    #Degree words + Negative words + Emotion words
                        if dict_data[j] in adverb_of_degree_words6 or dict_data[j] in adverb_of_degree_words5 or \
                                dict_data[j] in adverb_of_degree_words4 or dict_data[j] in adverb_of_degree_words3 or \
                                dict_data[j] in adverb_of_degree_words2 or dict_data[j] in adverb_of_degree_words1:
                            w = w * (-1)*2
                            break
                    for j in range(i,int(k)):    #Negative words + Degree words + Emotion words
                         if dict_data[j] in adverb_of_degree_words6 or dict_data[j] in adverb_of_degree_words5 or \
                                 dict_data[j] in adverb_of_degree_words4 or dict_data[j] in adverb_of_degree_words3 or \
                                 dict_data[j] in adverb_of_degree_words2 or dict_data[j] in adverb_of_degree_words1:
                             w = w*0.5
                             break
                if dict_data[i] in adverb_of_degree_words1:
                    w *= 2
                elif dict_data[i] in adverb_of_degree_words2:
                    w *= 1.5
                elif dict_data[i] in adverb_of_degree_words3:
                    w *= 1.25
                elif dict_data[i] in adverb_of_degree_words4:
                    w *= 1.2
                elif dict_data[i] in adverb_of_degree_words5:
                    w *= 0.8
                elif dict_data[i] in adverb_of_degree_words6:
                    w *= 0.5
            z = int(k)+1
        data.append(w)
    return data
copy

Publisher: Full-stack programmer, please indicate the source: https://javaforall.cn/172411.html Original link: https://javaforall.cn

Tags: Java Cyber Security https

Posted by gray8110 on Fri, 23 Sep 2022 22:30:09 +0530