jieba and sklearn implementation of TF-IDF for python text mining

1. What is TF-IDF? TF-IDF (term frequency inverse document frequency) TF-IDF is a statistical method used to evaluate the importance of a word to a document set or one of the documents in a corpus. The importance of a word increases with the number of times it appears in the document, but decreUTF-8...

Posted by journy101 on Wed, 22 Sep 2021 18:13:40 +0530

Chinese email text classification [actual project]

Project introduction Text classification is one of the application fields of natural language processing. Text classification is the basic type of many other tasks. This project is the simplest binary classification problem. This project will introduce how to convert text data into numerical feUTF-8...

Posted by Marqis on Wed, 29 Sep 2021 00:24:50 +0530

fastai 2019 lesson8 notes notes

lesson8 Video address: https://course19.fast.ai/videos/?lesson=8 preface This second part is very different from the 2018 version. The course name is "deep learning from the foundation". We will learn to implement many things in Fastai and PyTorch. Basically, we will learn something that can beUTF-8...

Posted by Visualant on Wed, 29 Sep 2021 06:25:27 +0530

Using XGBoost for time series prediction, birth prediction Mini tutorial

Content introduction XGBoost is an effective implementation of gradient lifting for classification and regression problems. It is fast and efficient, performs well or even best on a wide range of predictive modeling tasks, and is the favorite of winners of data science competitions, such as thUTF-8...

Posted by Harbinger on Wed, 29 Sep 2021 07:28:41 +0530

(the concise version of the invincible version) deeply understand the transformer source code

Original link: https://blog.csdn.net/zhaojc1995/article/details/109276945 reference material: transformer original paper In depth understanding of transformer and source code Graphic Transformer (full version) The Annotated Transformer Chinese annotated version of The Annotated Transformer prefUTF-8...

Posted by Teddy B. on Wed, 29 Sep 2021 08:40:54 +0530

HMM hidden Markov model for Chinese text word segmentation

1, HMM overview 1. Introduction Folklore tells us that the state of algae has a certain probability relationship with the state of weather - the state of weather and algae is closely related. In this example, we have two sets of States, the observed state (algae state) and the hidden state (weUTF-8...

Posted by ernie on Sun, 03 Oct 2021 01:05:55 +0530

CRF conditional random field for Chinese text segmentation

1, Problem analysis 1. Brief introduction and comparison of CRF CRF word segmentation principle CRF regards word segmentation as a word position classification problem, and usually defines the word position information of words as follows: B is often used as the initial word In words, I is oftUTF-8...

Posted by The MA on Sun, 03 Oct 2021 06:04:39 +0530

Starting point Chinese network novel text classification project [easy to start nlp practical project]

Novel text classification task Code link https://github.com/a1097304791/fiction-classification data set The data set has 13 categories crawled from the starting point Chinese online, with 20 books in each category and 10 chapters in each book, a total of 260 novels and 3600 chapters. Algorithm UTF-8...

Posted by Who27 on Tue, 05 Oct 2021 02:17:22 +0530

Semantic similarity model SBERT -- a beautiful example of twinning network

Thesis address: https://arxiv.org/abs/1908.10084 Chinese translation of the thesis: https://www.cnblogs.com/gczr/p/12874409.html Source code download: https://github.com/UKPLab/sentence-transformers Related websites: https://www.sbert.net/ "Chinese translation of papers" is quite clear, so thisUTF-8...

Posted by HFD on Wed, 06 Oct 2021 02:33:32 +0530

[RNN architecture analysis] GRU model & attention mechanism

1. GRU model Learning objectives Understand the internal structure and calculation formula of GRUMaster the use of GRU tools in pytochUnderstand the advantages and disadvantages of GRU GRU (Gated Recurrent Unit), also known as gated cyclic unit structure, is also a variant of traditional RNN *UTF-8...

Posted by Gruessle on Thu, 07 Oct 2021 07:16:24 +0530