I drew several word cloud maps with Python, which amazes everyone

In data visualization diagrams, the application of word cloud diagrams can be seen everywhere. It usually extracts the word frequency from a piece of input text, and then displays high-frequency words in a concentrated manner according to the frequency of word occurrence, which is concise, intuitive and efficient.

Today I will share how to draw a superb word cloud map in Python. The complete code can be obtained at the end of the article.

Like to remember to collect, like, and follow.

Small scale chopper

Let's first try to draw a simple word cloud map, using the wordcloud module in Python to draw,

import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt

We import the text content and remove the line breaks and spaces, the code is as follows

text = open(r"tomorrow's things.txt",encoding='utf8').read()
text = text.replace('\n',"").replace("\u3000","")

We need to divide it into words one by one. At this time, we need to use the jieba module. The code is as follows

text_cut = jieba.lcut(text)
# Split the separated words with a certain symbol and connect them into a string
text_cut = ' '.join(text_cut)

Of course, there may be a lot of irrelevant content in the results that we don't need to read. At this time, we need to use stop words. We can build it ourselves, or we can directly use stop words that have been built by others. Vocabulary, here the editor uses the latter, the code is as follows

stop_words = open(r"List of common Chinese stop words.txt").read().split("\n")

The following is the core code for drawing the word cloud map

text_cut = jieba.lcut(text)
# Split the separated words with a certain symbol and connect them into a string
text_cut = ' '.join(text_cut)

output

Such an extremely simple word cloud map is done, of course, we can add a background image to it, such as the following image,

The main code that needs to be added is as follows

background = Image.open(r"5.png")
graph = np.array(background)

Then add the mask parameter in WorCloud

# Generate word cloud using WordCloud
word_cloud = WordCloud(font_path="simsun.ttc",  # Set word cloud font
                       background_color="white", # Background color for word cloud
                       stopwords=stop_words, # Removed stop words
                       mask=graph)
word_cloud.generate(text_cut)
word_cloud.to_file("1.png")

output

Deep optimization

In addition, the word cloud map drawn by another module stylecloud is also very cool, in which we mainly use the following function

gen_stylecloud(text=None,
               icon_name='fas fa-flag',
               colors=None,
               palette='cartocolors.qualitative.Bold_5',
               background_color="white",
               max_font_size=200,
               max_words=2000,
               stopwords=True,
               custom_stopwords=STOPWORDS,
               output_name='stylecloud.png',
)

Some of the commonly used parameters are

  • icon_name: the shape of the word cloud

  • max_font_size: maximum font size

  • max_words: the maximum number of words that can be accommodated

  • stopwords: used to filter common stop words

  • custom_stopwords: If you have a stop word list, you can use it

  • palette: palette

Let's try to draw a word cloud map, the code is as follows

stylecloud.gen_stylecloud(text=text_cut,
                          palette='tableau.BlueRed_6',
                          icon_name='fas fa-apple-alt',
                          font_path=r'Tian Yingzhang regular script 3500 words.ttf',
                          output_name='2.png',
                          stopwords=True,
                          custom_stopwords=stop_words)

output

The palette parameter is used as a palette, which can be changed arbitrarily. For specific reference: https://jiffyclub.github.io/palettable/ This website.

pyecharts

Finally, let's take a look at how to use the Pyecharts module to draw the word cloud map. The code is as follows

from pyecharts import options as opts
from pyecharts.charts import Page, WordCloud

words = [
    ("emperor", 10000),
    ("Zhu Yuanzhang", 6181),
    ("Ming Dynasty", 4386),
    ("court", 4055),
    ("Ming Army", 2467),
    ("Soldier", 2244),
    ("Zhang Juzheng", 1868),
    ("Wang Shouren", 1281)
]

c = (
        WordCloud()
        .add("", words, word_size_range=[20, 100])
        .set_global_opts(title_opts=opts.TitleOpts(title="Basic example"))
    )

c.render("1.html")

output

The result is slightly simpler, but it is worth noting here that the data passed in by the WordCloud() method in pyecharts is the specified word and its frequency of occurrence, which is different from the previous operation.

contact details

At present, a technical exchange group has been opened, with more than 3,000 members. The best way to remark when adding is: source + interest direction, which is convenient to find like-minded friends, and you can also join for information and code acquisition.

Method 1. Add WeChat ID: dkl88191, Remarks: from CSDN
Method 2. WeChat search public account: Python learning and data mining, background reply: add group

Tags: Python programming language

Posted by lice200 on Mon, 12 Sep 2022 22:01:54 +0530