Interpretation and cases of naturallanguageprocessing Prompt content

Interpretation and cases of naturallanguageprocessing Prompt content

Note: if you think the blog is good, don't forget to like the collection. I will update the content about artificial intelligence and big data every week. Most of the content is original, such as Python Java Scala SQL code, CV NLP recommendation system, Spark Flink Kafka Hbase Hive Flume, etc. ~ all the content is pure dry goods. The interpretation of various top conference papers will make progress together.
Thesis: https://arxiv.org/pdf/2111.01998.pdf
code: https://github.com/thunlp/OpenPrompt
#Erudite Valley IT learning technology support#

preface

Prompt has become a new paradigm in modern naturallanguageprocessing. It is a relatively new concept in naturallanguageprocessing. In the past few months, many experts and scholars have conducted academic research in prompt, which may be a new direction for the future development of naturallanguageprocessing. Today, I would like to share with you the idea and case of prompt.

1, Hot degree

Let's start with a few pictures


Needless to say, you can take a look at this trend and the number of star s in github in just a few months. You can see that it is rising exponentially.

2, The core idea of Prompt

The core idea of Prompt is to better and fully use the pre training model. Generally, in natural language processing, there are three types of pre training models to choose from. They are:

  1. The first one is typical of Bert, which uses Mask mechanism for training, similar to cloze.
  2. The second is the autoregressive model based on GPT-3 to predict the next token.
  3. The third type is typical of T5, Seq2Seq, sequence to sequence model, such as machinetranslation.

Today, let's take the first Bert as a case to share with you. For more cases, I will continue to pay attention to the latest papers and landing projects of Prompt. It has been used in CV field.


Bert, you must be familiar with it. During the training, the classic Mask mechanism is used for training. It is also one of the most commonly used pre training models. But if my downstream task is not a cloze task, for example, a classification task, using Bert as the pre training model, will it be a problem? Effect discount? Because the downstream task is different from Bert's original training method? If I change my downstream task to cloze test, which is exactly the same as Bert's training method, can I improve the accuracy of downstream tasks?
This is the core theory of Prompt, which makes downstream tasks into exactly the same model as the pre training model** Only in this way can we maximize the advantages of the pre training model!

3, Cases and codes

1. dataset

If I now have a multi category task to determine which type of article the article belongs to, the public data set agnews for data

There are three columns of data. The first column is a label, indicating the category of this article. There are four categories in total. The second column is the title of the article, and the third column is the content of the article. I hope to predict what kind of articles belong to through the title and content of the articles, which belongs to multi classification.

2. read data set

First, sample the samples. The original data set contains 12000 data, which is too large and not difficult. Here, sample each type of data, 16 of which are collected respectively. There are only 64 data in total for training, which increases the difficulty.

The code is as follows (example):

from openprompt.data_utils.text_classification_dataset import AgnewsProcessor
from openprompt.data_utils.data_sampler import FewShotSampler #16 samples per category
dataset = {} #Four categories of corpus, news headlines and content
dataset['train'] = AgnewsProcessor().get_train_examples("../agnews")
sampler  = FewShotSampler(num_examples_per_label=16, num_examples_per_label_dev=16, also_sample_dev=True)
dataset['train'], dataset['validation'] = sampler(dataset['train'])
dataset['test'] = AgnewsProcessor().get_test_examples("../agnews")
print('Raw dataset size after sampling:',len(dataset['train']))
print('Sample data',dataset['train'][0])

2. assemble downstream tasks into Bert template

from openprompt.plms import load_plm
from openprompt.prompts import ManualTemplate
from openprompt import PromptDataLoader
from openprompt.prompts import SoftVerbalizer
from openprompt import PromptForClassification
import torch

plm, tokenizer, model_config, WrapperClass = load_plm("bert", "bert-base-cased")#Read pre training model

#This line is the key code, write the template
mytemplate = ManualTemplate(tokenizer=tokenizer, text='{"placeholder":"text_a"} {"placeholder":"text_b"} In this sentence, the topic is {"mask"}.')

wrapped_example = mytemplate.wrap_one_example(dataset['train'][0]) 
print('wrapped_example:',wrapped_example)

train_dataloader = PromptDataLoader(dataset=dataset["train"], template=mytemplate, tokenizer=tokenizer, 
    tokenizer_wrapper_class=WrapperClass, max_seq_length=256, decoder_max_length=3, 
    batch_size=4,shuffle=True, teacher_forcing=False, predict_eos_token=False,
    truncate_method="head")

#It can map the probability of the vocabulary output from the model to the probability of the classification label. In essence, it is to set k trainable virtual tokens, and then calculate the similarity between the model output and these tokens:
myverbalizer = SoftVerbalizer(tokenizer, plm, num_classes=4)

3. model training

from transformers import  AdamW, get_linear_schedule_with_warmup
use_cuda = False
prompt_model = PromptForClassification(plm=plm,template=mytemplate, verbalizer=myverbalizer, freeze_plm=False)
if use_cuda:
    prompt_model=  prompt_model.cuda()

loss_func = torch.nn.CrossEntropyLoss()
no_decay = ['bias', 'LayerNorm.weight']

# it's always good practice to set no decay to biase and LayerNorm parameters
optimizer_grouped_parameters1 = [
    {'params': [p for n, p in prompt_model.plm.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
    {'params': [p for n, p in prompt_model.plm.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
]

# Using different optimizer for prompt parameters and model parameters

optimizer_grouped_parameters2 = [
    {'params': prompt_model.verbalizer.group_parameters_1, "lr":3e-5},
    {'params': prompt_model.verbalizer.group_parameters_2, "lr":3e-4},
]


optimizer1 = AdamW(optimizer_grouped_parameters1, lr=3e-5)
optimizer2 = AdamW(optimizer_grouped_parameters2)

for epoch in range(5):
    tot_loss = 0 
    for step, inputs in enumerate(train_dataloader):
        if use_cuda:
            inputs = inputs.cuda()
        logits = prompt_model(inputs)
        labels = inputs['label']
        loss = loss_func(logits, labels)
        loss.backward()
        tot_loss += loss.item()
        optimizer1.step()
        optimizer1.zero_grad()
        optimizer2.step()
        optimizer2.zero_grad()
        print(tot_loss/(step+1))
 ##The training process of the model is relatively ordinary. You can train the model according to your actual situation

summary

Prompt is a way to make full use of the pre training model in the current exchange of fire. It can also change the label and add prior knowledge. Improve the overall effect of the model. I hope to see more landing projects in the future.

Tags: Big Data AI Pytorch NLP

Posted by phpanon on Sat, 04 Jun 2022 01:07:25 +0530