Regular Expression I

regular expression

1. Concept

Regular expressions, also known as regular expressions. (English: Regular Expression), often abbreviated as regex, regexp or RE in code), a concept in computer science. Regular expressions are usually used to retrieve and replace text that matches a certain pattern (rule).
There is not only one regular expression, and different programs in LINUX may use different regular expressions, such as: Tools: grep sed awk egrep

Regular expressions—usually used in judgment statements to check whether a string satisfies a certain format
Regular expressions are composed of ordinary characters and metacharacters
Common characters include upper and lower case letters, numbers, punctuation, and some other symbols
Metacharacters refer to special characters with special meaning in regular expressions, which can be used to specify the appearance pattern of their leading characters (that is, the characters before the metacharacters) in the target object

There are two regular expression engines commonly used in LINUX
Basic Regular Expression: BRE
Extended Regular Expression: ERE

2. Basic regular expression common metacharacters

\:          Escape character, escape special characters
^:          matches the beginning of the line,^is the beginning of the matched string^tux matches with tux line starting with
$:          match end of line, $is the end of the matched string tux$matches with tux end of line 
.:          match except newline\r\n any single character other than
[list]:     match list a character in a list
[^list]:    matches any absence list a character in a list
*:          Match the preceding subexpression 0 or more times
\{n\}:      matches the preceding subexpression n Second-rate
\{n,\}:     Matches the preceding subexpression not less than n Second-rate
\{n,m\}:    matches the preceding subexpression n arrive m Second-rate
 Note: egrep,awk use{n},{n, },{n, m}when matching"{}"no need to add before"\"
egrep -E -n 'wo{2}d' test.txt   //-E is used to display eligible characters in the file
egrep -E -n 'wo{2,3}d' test.txt

Locator
^ matches where the input string begins
$ Matches the position at the end of the input string

non-printing characters
\n matches a newline
\r matches a carriage return
\t matches a tab

3. grep command

-E : Enable the extension ( Extend)the regular expression of
-w : Indicates an exact match
-c : Calculate to find 'search string' the number of times
-i : Differences in case are ignored, so case is considered the same
-o : Only show the strings matched by the pattern
-v : Inverse selection, i.e. showing no 'search string' The line of the content (reverse search, output the line that does not match the search condition)
--color=auto : You can add color to the found keyword part
-n : By the way, output the line number

4. Extended regular expression egrep

Usually it is sufficient to use the base regular expression, but sometimes a wider range of extended regular expressions is needed to simplify the entire instruction

For example, use the basic regular expression to query the lines except the blank line in the file and the line beginning with "#" (usually used to view the effective configuration file), and execute "grep -v'^kate parse error: expected group after '^' at position 21:.... TXT | grep -v'^ ̲#' " to do this. Here you need to use pipe...|^#'test.txt", where the pipe symbol within single quotes indicates or (or)

The egrep command is basically similar to the grep command. The egrep command is a search file acquisition mode. Using this command, you can search for any string and symbol in the file, or you can search for strings in one or more files. A prompt can be a single character, a string, a word or A sentence.

The same as the basic regular expression type, extended regular expressions also contain multiple metacharacters. Common extended regular expressions metacharacters mainly include the following:

+   Role: Repeat one or more of the previous characters
 Example: execute"egrep -n 'wo+d' test.txt"command to query"wood" "woood" "woooooood"Equal strings

?   Role: zero or one previous character
 Example: execute"egrep -n 'bes?t' test.txt"command to query"bet""best"these two strings

|   Effect: use or ( or)way to find multiple characters
 Example: execute"egrep -n 'of|is|on' test.txt"command to query"of"or"if"or"on"string

()  Role: Find the "group" string
 Example:"egrep -n 't(a|e)st' test.txt". "tast"and"test"because these two words"t"and"st"is repeated, so the"a"And"e"
listed in"()"symbol, and with"|"separated to query"tast"or"test"string

()+	What it does: Identify multiple repeating groups
 Example:"egrep -n 'A(xyz)+C' test.txt". The command starts with the query"A"ends with"C",there is more than one"xyz"string meaning

5. Command Widget

Command gadgets: cut, sort, uniq, tr

5.1 cut

cut: column cutting tool

Instructions for use:

The cut command cuts bytes, characters, and fields from each line of a file and writes them to standard output.

If you do not specify the File parameter, the cut command reads standard input. One of the -b, -c, or -f flags must be specified.

Options:
-b: Truncate by Byte
-c: Truncate by character, commonly used in Chinese
-d: Specify what is the delimiter to intercept, the default is tab
-f: usually and-d Together
example
[root@lwb lwb]# cat /etc/passwd | cut -d ':' -f 1
root
bin
daemon
adm
lp
sync
.
.
.
[root@lwb lwb]# who
root     pts/0        2022-07-12 10:05 (192.168.36.1)
[root@lwb lwb]# who | cut -b 3
o
[root@lwb lwb]# who | cut -b 10
p
[root@lwb lwb]# cat 1.txt
 Eight hundred pacesetters run to the north slope, Artillery running north side by side
[root@lwb lwb]# cat 1.txt | cut -b 2
 
[root@lwb lwb]# cat 1.txt | cut -c 2
 Hundred

Note: cut is only good at processing text that is separated by a single character

5.2 sort

sort : is a tool for sorting file contents in row units, and can also be sorted according to different data types

Common options
-t: Specifies the delimiter, which is used by default[Tab]bar key or space separated
-k: Specify the sorting area, which range to sort
-n: Sort by number, default is text sorting
-u: Equivalent to uniq,Indicates that only one line is displayed for the same data. Note: If there is a space at the end of the line, the deduplication will not succeed.
-r: Reverse sorting, the default is ascending order,-r is descending order
-o: Dump the sorted results to the specified file
example:
sort test.txt                            #Without any options, the default is in ascending order by the first column
sort -n -t: -k3 test.txt                 #Sort the third column by number size (ascending order) with colon as separator
sort -nr -t: -k3 test.txt                #Sort the third column by number size with colon as separator (descending order)
sort -nr -t: -k3 test.txt -o test.bak    #The result is not output on the screen but to the test.bak file
sort -u passwd.txt                       #Remove duplicate lines from file (duplicate lines can be discontinuous)

5.3 uniq

uniq: mainly used to remove consecutive duplicate lines

Common options
-c: Count duplicate rows
-d: Show only duplicate rows
-u: Only show rows that appear once
example:
[root@lwb lwb]# cat animal.txt
monkey
monkey
pig
pig
cat
dog
cat
dog
giraffe
#Create a document
[root@lwb lwb]# cat animal.txt | uniq -c
      2 monkey
      2 pig
      1 cat
      1 dog
      1 cat
      1 dog
      1 giraffe
#Count the number of duplicate rows, and discontinuous duplicate rows are not counted as duplicate rows
[root@lwb lwb]# cat animal.txt | sort | uniq -c
      2 cat
      2 dog
      1 giraffe
      2 monkey
      2 pig
#Combined with sort, first sort and then count duplicate rows
[root@lwb lwb]# cat animal.txt | sort | uniq -d
cat
dog
monkey
pig
#Use with sort to filter out duplicate rows
[root@lwb lwb]# cat animal.txt | sort | uniq -u
giraffe
#Use in conjunction with sort to filter out unique rows
[root@lwb lwb]# cat animal.txt | sort | uniq
cat
dog
giraffe
monkey
pig
#Combined with sort to remove duplicates, you can also use sort -u directly

5.4 tr

tr: It can replace one character with another, or it can remove some characters completely, or it can be used to remove repeated characters

Common options
-d: delete character
-s: Remove all repeated characters, keep only the first one
example:
[root@lwb lwb]# cat animal.txt | tr 'a-z' 'A-Z'
MONKEY
MONKEY
PIG
PIG
CAT
DOG
CAT
DOG
GIRAFFE
#Replace lowercase letters with uppercase letters
[root@lwb lwb]# cat animal.txt | tr 'dog' 'DOG'
mOnkey
mOnkey
piG
piG
cat
DOG
cat
DOG
Giraffe
#Replacement is the replacement of one-to-one correspondence of letters
[root@lwb lwb]# cat animal.txt | tr 'g' ' '
monkey
monkey
pi 
pi 
cat
do 
cat
do 
 iraffe
#Enclose replaced characters in single quotes, including special characters
[root@lwb lwb]# cat animal.txt | tr 'do' '/'
m/nkey
m/nkey
pig
pig
cat
//g
cat
//g
giraffe
#replace multiple characters with one
[root@lwb lwb]# vim animal.txt
[root@lwb lwb]# cat animal.txt
monkey
monkey
pig
pig
cat
dog
cat
dog
'giraffe'
[root@lwb lwb]# cat animal.txt | tr "'" '/'
monkey
monkey
pig
pig
cat
dog
cat
dog
/giraffe/
#If you want to replace single quotes, you need to enclose the single quotes with double quotes
[root@lwb lwb]# cat animal.txt | tr -d 'g'
monkey
monkey
pi
pi
cat
do
cat
do
'iraffe'
#remove all g
[root@lwb lwb]# cat animal.txt | tr -d 'dog'
mnkey
mnkey
pi
pi
cat

cat

'iraffe'
#Delete all the 3 letters containing this dog
[root@lwb lwb]# cat animal.txt | tr -s 'f'
monkey
monkey
pig
pig
cat
dog
cat
dog
'girafe'
#Deduplicate the p character and keep only the first one

cat animal.txt | tr -s '\n'
#When multiple carriage returns are encountered, only one carriage return is retained, which is equivalent to removing blank lines

6. sed tool

6.1 Concept

sed (Stream EDitor) is a powerful and simple text parsing and conversion tool that can read text, edit the text content (delete, replace, add, move, etc.) according to specified conditions, and finally output all lines or only output processing some of the lines. sed can also implement quite complex text processing operations without interaction and is widely used

Shell scripts are used to complete various automated processing tasks.

The workflow of sed mainly includes three processes of reading, executing and displaying:

● Read: sed reads a line from the input stream (file, pipe, standard input) and stores it in a temporary buffer (also known as pattern space).

● Execute: By default, all sed commands are executed sequentially in the pattern space. Unless the address of the line is specified, the sed command will be executed sequentially on all lines.

● Display: Send the modified content to the output stream. After sending the data, the pattern space will be cleared.

Before all file contents have been processed, the above process will be repeated until all contents have been processed.

Note: by default all sed commands are executed in the pattern space, so the input file does not change anything unless redirection is used to store the output

6.2 Common options

Common options
-e or--expression=: Indicates that the input text file is processed with the specified command or script
-f or--file=: Indicates that the specified script file is used to process the input text file
-h or--help: show help
-n,--quiet or silent: Indicates that only the processed results are displayed
-i.bak: Edit text files directly
-r, -E: Using extended regular expressions
-s: Treat multiple files as independent files instead of a single continuous stream of long files
Common operations
a: Add, add a line of specified content below the current line
c: Replace, replace the selected line with the specified content
d: delete, delete the selected row
i: Insert, insert a line of specified content above the selected line
p: Print, if a line is specified at the same time, it means to print the specified line; if no line is specified, it means to print all the content; if there are non-printing characters, it will be printed with ASCII code output. It is usually related to "-n"option used together
s: replace, replace the specified character
y: character conversion

6.3 Usage examples

[root@lwb lwb]# vim num
[root@lwb lwb]# cat num
1
2
3
4
5
6
7
8
9
10
[root@lwb lwb]# sed -n 'p' num
1
2
3
4
5
6
7
8
9
10
#direct print

[root@lwb lwb]# sed -n '3p' num
3
#print the third line

[root@lwb lwb]# sed -n '3,5p' num
3
4
5
#print 3-5 lines

[root@lwb lwb]# sed -n 'p;n' num
1
3
5
7
9
#print odd lines

[root@lwb lwb]# sed -n 'n;p' num
2
4
6
8
10
#print even lines

[root@lwb lwb]# sed -n '1,5{p;n}' num
1
3
5
#print odd lines 1-5

[root@lwb lwb]# sed -n '6,${p;n}' num
6
8
10
#print even lines starting from the sixth line

[root@lwb lwb]# ifconfig ens33 | sed -n 2p
inet 192.168.36.131  netmask 255.255.255.0  broadcast 192.168.36.255
#print the second line of the network card

When the command is combined with a regular expression, the format is slightly different, and the regular expression is surrounded by "/". For example, the following is an example of using the sed command with regular expressions

[root@lwb lwb]# sed -n '/user/p' /etc/passwd
saslauth:x:994:76:Saslauthd user:/run/saslauthd:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin
radvd:x:75:75:radvd user:/:/sbin/nologin
qemu:x:107:107:qemu user:/:/sbin/nologin
#output lines containing user

[root@lwb lwb]# sed -n '4,/user/p' /etc/passwd
#output from line 4 to the first line containing user

[root@lwb lwb]# sed -n '/user/=' /etc/passwd
22
28
31
32
34
35
#Output the line number of the line containing user, equal sign = is used to output the line number

[root@lwb lwb]# sed -n '/^root/p' /etc/passwd
root:x:0:0:root:/root:/bin/bash
#print lines starting with root

[root@lwb lwb]# sed -n '/[0-9]$/p' test.txt
#output lines ending with numbers
[root@lwb lwb]# nl passwd
     1	root:x:0:0:root:/root:/bin/bash
     2	bin:x:1:1:bin:/bin:/sbin/nologin
     3	daemon:x:2:2:daemon:/sbin:/sbin/nologin
     4	adm:x:3:4:adm:/var/adm:/sbin/nologin
     5	lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

[root@lwb lwb]# nl passwd | sed '3d'
     1	root:x:0:0:root:/root:/bin/bash
     2	bin:x:1:1:bin:/bin:/sbin/nologin
     4	adm:x:3:4:adm:/var/adm:/sbin/nologin
#delete the third line

[root@lwb lwb]# nl passwd | sed '2,4d'
     1	root:x:0:0:root:/root:/bin/bash
     5	lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
#delete 2-4 lines

[root@lwb lwb]# sed '/^[a-z]/d' test.txt
#delete lines starting with lowercase letters

[root@lwb lwb]# sed '/\.$/d' test.txt
#delete lines ending with "."

[root@lwb lwb]# sed '/^$/d' test.txt
#delete all blank lines
sed 's/the/THE/' test.txt	    #Replace the first the in each line with THE
sed 's/l/L/2' test.txt	        #Replace the 2nd l in each line with an L
sed 's/the/THE/g' test.txt	    #replace all the in the file with THE
sed 's/o//g' test.txt #Delete all o in the file (replace with an empty string)
sed 's/^/#/' test.txt	        #Insert at the beginning of each line#No
sed '/the/s/^/#/' test.txt	    #in the included the Insert at the beginning of each line#No
sed 's/$/EOF/' test.txt	        #Insert the string EOF at the end of each line
sed '3,5s/the/THE/g' test.txt	#Replace all the in lines 3-5 with THE
sed '/the/s/o/O/g' test.txt  	#replace o with O in all lines containing the
sed '/the/{H;d};$G' test.txt          #Migrate the line containing the to the end of the file, {;} for multiple operations
sed '1,5{H;d};17G' test.txt  	      #Move the content of lines 1~5 to after line 17
sed '/the/w out.file' test.txt     	  #save the line containing the as file out.file
sed '/the/r /etc/hostname' test.txt	  #Add the contents of the file /etc/hostname after each line containing the
sed '3aNew' test.txt	              #Insert a new line after line 3 with the content New
sed '/the/aNew' test.txt	          #Insert a new line with the content New after each line containing the
sed '3aNew1\nNew2' test.txt	          #Insert multiple lines after the 3rd line, the \n in the middle means a newline
sed '1,5{H;d};16G' test.txt	          #Move the content of lines 1~5 to after line 16

Tags: regex

Posted by surfinglight on Wed, 20 Jul 2022 21:56:34 +0530