In data visualization diagrams, the application of word cloud diagrams can be seen everywhere. It usually extracts the word frequency from a piece of input text, and then displays high-frequency words in a concentrated manner according to the frequency of word occurrence, which is concise, intuitive and efficient.
Today I will share how to draw a superb word cloud map in Python. The complete code can be obtained at the end of the article.
Like to remember to collect, like, and follow.
Small scale chopper
Let's first try to draw a simple word cloud map, using the wordcloud module in Python to draw,
import jieba from wordcloud import WordCloud import matplotlib.pyplot as plt
We import the text content and remove the line breaks and spaces, the code is as follows
text = open(r"tomorrow's things.txt",encoding='utf8').read() text = text.replace('\n',"").replace("\u3000","")
We need to divide it into words one by one. At this time, we need to use the jieba module. The code is as follows
text_cut = jieba.lcut(text) # Split the separated words with a certain symbol and connect them into a string text_cut = ' '.join(text_cut)
Of course, there may be a lot of irrelevant content in the results that we don't need to read. At this time, we need to use stop words. We can build it ourselves, or we can directly use stop words that have been built by others. Vocabulary, here the editor uses the latter, the code is as follows
stop_words = open(r"List of common Chinese stop words.txt").read().split("\n")
The following is the core code for drawing the word cloud map
text_cut = jieba.lcut(text) # Split the separated words with a certain symbol and connect them into a string text_cut = ' '.join(text_cut)
output
Such an extremely simple word cloud map is done, of course, we can add a background image to it, such as the following image,
The main code that needs to be added is as follows
background = Image.open(r"5.png") graph = np.array(background)
Then add the mask parameter in WorCloud
# Generate word cloud using WordCloud word_cloud = WordCloud(font_path="simsun.ttc", # Set word cloud font background_color="white", # Background color for word cloud stopwords=stop_words, # Removed stop words mask=graph) word_cloud.generate(text_cut) word_cloud.to_file("1.png")
output
Deep optimization
In addition, the word cloud map drawn by another module stylecloud is also very cool, in which we mainly use the following function
gen_stylecloud(text=None, icon_name='fas fa-flag', colors=None, palette='cartocolors.qualitative.Bold_5', background_color="white", max_font_size=200, max_words=2000, stopwords=True, custom_stopwords=STOPWORDS, output_name='stylecloud.png', )
Some of the commonly used parameters are
-
icon_name: the shape of the word cloud
-
max_font_size: maximum font size
-
max_words: the maximum number of words that can be accommodated
-
stopwords: used to filter common stop words
-
custom_stopwords: If you have a stop word list, you can use it
-
palette: palette
Let's try to draw a word cloud map, the code is as follows
stylecloud.gen_stylecloud(text=text_cut, palette='tableau.BlueRed_6', icon_name='fas fa-apple-alt', font_path=r'Tian Yingzhang regular script 3500 words.ttf', output_name='2.png', stopwords=True, custom_stopwords=stop_words)
output
The palette parameter is used as a palette, which can be changed arbitrarily. For specific reference: https://jiffyclub.github.io/palettable/ This website.
pyecharts
Finally, let's take a look at how to use the Pyecharts module to draw the word cloud map. The code is as follows
from pyecharts import options as opts from pyecharts.charts import Page, WordCloud words = [ ("emperor", 10000), ("Zhu Yuanzhang", 6181), ("Ming Dynasty", 4386), ("court", 4055), ("Ming Army", 2467), ("Soldier", 2244), ("Zhang Juzheng", 1868), ("Wang Shouren", 1281) ] c = ( WordCloud() .add("", words, word_size_range=[20, 100]) .set_global_opts(title_opts=opts.TitleOpts(title="Basic example")) ) c.render("1.html")
output
The result is slightly simpler, but it is worth noting here that the data passed in by the WordCloud() method in pyecharts is the specified word and its frequency of occurrence, which is different from the previous operation.
contact details
At present, a technical exchange group has been opened, with more than 3,000 members. The best way to remark when adding is: source + interest direction, which is convenient to find like-minded friends, and you can also join for information and code acquisition.
Method 1. Add WeChat ID: dkl88191, Remarks: from CSDN
Method 2. WeChat search public account: Python learning and data mining, background reply: add group