regular expression
1. Concept
Regular expressions, also known as regular expressions. (English: Regular Expression), often abbreviated as regex, regexp or RE in code), a concept in computer science. Regular expressions are usually used to retrieve and replace text that matches a certain pattern (rule).
There is not only one regular expression, and different programs in LINUX may use different regular expressions, such as: Tools: grep sed awk egrep
Regular expressions—usually used in judgment statements to check whether a string satisfies a certain format
Regular expressions are composed of ordinary characters and metacharacters
Common characters include upper and lower case letters, numbers, punctuation, and some other symbols
Metacharacters refer to special characters with special meaning in regular expressions, which can be used to specify the appearance pattern of their leading characters (that is, the characters before the metacharacters) in the target object
There are two regular expression engines commonly used in LINUX
Basic Regular Expression: BRE
Extended Regular Expression: ERE
2. Basic regular expression common metacharacters
\: Escape character, escape special characters ^: matches the beginning of the line,^is the beginning of the matched string^tux matches with tux line starting with $: match end of line, $is the end of the matched string tux$matches with tux end of line .: match except newline\r\n any single character other than [list]: match list a character in a list [^list]: matches any absence list a character in a list *: Match the preceding subexpression 0 or more times \{n\}: matches the preceding subexpression n Second-rate \{n,\}: Matches the preceding subexpression not less than n Second-rate \{n,m\}: matches the preceding subexpression n arrive m Second-rate Note: egrep,awk use{n},{n, },{n, m}when matching"{}"no need to add before"\" egrep -E -n 'wo{2}d' test.txt //-E is used to display eligible characters in the file egrep -E -n 'wo{2,3}d' test.txt Locator ^ matches where the input string begins $ Matches the position at the end of the input string non-printing characters \n matches a newline \r matches a carriage return \t matches a tab
3. grep command
-E : Enable the extension ( Extend)the regular expression of -w : Indicates an exact match -c : Calculate to find 'search string' the number of times -i : Differences in case are ignored, so case is considered the same -o : Only show the strings matched by the pattern -v : Inverse selection, i.e. showing no 'search string' The line of the content (reverse search, output the line that does not match the search condition) --color=auto : You can add color to the found keyword part -n : By the way, output the line number
4. Extended regular expression egrep
Usually it is sufficient to use the base regular expression, but sometimes a wider range of extended regular expressions is needed to simplify the entire instruction
For example, use the basic regular expression to query the lines except the blank line in the file and the line beginning with "#" (usually used to view the effective configuration file), and execute "grep -v'^kate parse error: expected group after '^' at position 21:.... TXT | grep -v'^ ̲#' " to do this. Here you need to use pipe...|^#'test.txt", where the pipe symbol within single quotes indicates or (or)
The egrep command is basically similar to the grep command. The egrep command is a search file acquisition mode. Using this command, you can search for any string and symbol in the file, or you can search for strings in one or more files. A prompt can be a single character, a string, a word or A sentence.
The same as the basic regular expression type, extended regular expressions also contain multiple metacharacters. Common extended regular expressions metacharacters mainly include the following:
+ Role: Repeat one or more of the previous characters Example: execute"egrep -n 'wo+d' test.txt"command to query"wood" "woood" "woooooood"Equal strings ? Role: zero or one previous character Example: execute"egrep -n 'bes?t' test.txt"command to query"bet""best"these two strings | Effect: use or ( or)way to find multiple characters Example: execute"egrep -n 'of|is|on' test.txt"command to query"of"or"if"or"on"string () Role: Find the "group" string Example:"egrep -n 't(a|e)st' test.txt". "tast"and"test"because these two words"t"and"st"is repeated, so the"a"And"e" listed in"()"symbol, and with"|"separated to query"tast"or"test"string ()+ What it does: Identify multiple repeating groups Example:"egrep -n 'A(xyz)+C' test.txt". The command starts with the query"A"ends with"C",there is more than one"xyz"string meaning
5. Command Widget
Command gadgets: cut, sort, uniq, tr
5.1 cut
cut: column cutting tool
Instructions for use:
The cut command cuts bytes, characters, and fields from each line of a file and writes them to standard output.
If you do not specify the File parameter, the cut command reads standard input. One of the -b, -c, or -f flags must be specified.
Options: -b: Truncate by Byte -c: Truncate by character, commonly used in Chinese -d: Specify what is the delimiter to intercept, the default is tab -f: usually and-d Together
example [root@lwb lwb]# cat /etc/passwd | cut -d ':' -f 1 root bin daemon adm lp sync . . .
[root@lwb lwb]# who root pts/0 2022-07-12 10:05 (192.168.36.1) [root@lwb lwb]# who | cut -b 3 o [root@lwb lwb]# who | cut -b 10 p
[root@lwb lwb]# cat 1.txt Eight hundred pacesetters run to the north slope, Artillery running north side by side [root@lwb lwb]# cat 1.txt | cut -b 2 [root@lwb lwb]# cat 1.txt | cut -c 2 Hundred
Note: cut is only good at processing text that is separated by a single character
5.2 sort
sort : is a tool for sorting file contents in row units, and can also be sorted according to different data types
Common options -t: Specifies the delimiter, which is used by default[Tab]bar key or space separated -k: Specify the sorting area, which range to sort -n: Sort by number, default is text sorting -u: Equivalent to uniq,Indicates that only one line is displayed for the same data. Note: If there is a space at the end of the line, the deduplication will not succeed. -r: Reverse sorting, the default is ascending order,-r is descending order -o: Dump the sorted results to the specified file
example: sort test.txt #Without any options, the default is in ascending order by the first column sort -n -t: -k3 test.txt #Sort the third column by number size (ascending order) with colon as separator sort -nr -t: -k3 test.txt #Sort the third column by number size with colon as separator (descending order) sort -nr -t: -k3 test.txt -o test.bak #The result is not output on the screen but to the test.bak file sort -u passwd.txt #Remove duplicate lines from file (duplicate lines can be discontinuous)
5.3 uniq
uniq: mainly used to remove consecutive duplicate lines
Common options -c: Count duplicate rows -d: Show only duplicate rows -u: Only show rows that appear once
example: [root@lwb lwb]# cat animal.txt monkey monkey pig pig cat dog cat dog giraffe #Create a document [root@lwb lwb]# cat animal.txt | uniq -c 2 monkey 2 pig 1 cat 1 dog 1 cat 1 dog 1 giraffe #Count the number of duplicate rows, and discontinuous duplicate rows are not counted as duplicate rows [root@lwb lwb]# cat animal.txt | sort | uniq -c 2 cat 2 dog 1 giraffe 2 monkey 2 pig #Combined with sort, first sort and then count duplicate rows [root@lwb lwb]# cat animal.txt | sort | uniq -d cat dog monkey pig #Use with sort to filter out duplicate rows [root@lwb lwb]# cat animal.txt | sort | uniq -u giraffe #Use in conjunction with sort to filter out unique rows [root@lwb lwb]# cat animal.txt | sort | uniq cat dog giraffe monkey pig #Combined with sort to remove duplicates, you can also use sort -u directly
5.4 tr
tr: It can replace one character with another, or it can remove some characters completely, or it can be used to remove repeated characters
Common options -d: delete character -s: Remove all repeated characters, keep only the first one
example: [root@lwb lwb]# cat animal.txt | tr 'a-z' 'A-Z' MONKEY MONKEY PIG PIG CAT DOG CAT DOG GIRAFFE #Replace lowercase letters with uppercase letters [root@lwb lwb]# cat animal.txt | tr 'dog' 'DOG' mOnkey mOnkey piG piG cat DOG cat DOG Giraffe #Replacement is the replacement of one-to-one correspondence of letters [root@lwb lwb]# cat animal.txt | tr 'g' ' ' monkey monkey pi pi cat do cat do iraffe #Enclose replaced characters in single quotes, including special characters [root@lwb lwb]# cat animal.txt | tr 'do' '/' m/nkey m/nkey pig pig cat //g cat //g giraffe #replace multiple characters with one
[root@lwb lwb]# vim animal.txt [root@lwb lwb]# cat animal.txt monkey monkey pig pig cat dog cat dog 'giraffe' [root@lwb lwb]# cat animal.txt | tr "'" '/' monkey monkey pig pig cat dog cat dog /giraffe/ #If you want to replace single quotes, you need to enclose the single quotes with double quotes [root@lwb lwb]# cat animal.txt | tr -d 'g' monkey monkey pi pi cat do cat do 'iraffe' #remove all g [root@lwb lwb]# cat animal.txt | tr -d 'dog' mnkey mnkey pi pi cat cat 'iraffe' #Delete all the 3 letters containing this dog [root@lwb lwb]# cat animal.txt | tr -s 'f' monkey monkey pig pig cat dog cat dog 'girafe' #Deduplicate the p character and keep only the first one cat animal.txt | tr -s '\n' #When multiple carriage returns are encountered, only one carriage return is retained, which is equivalent to removing blank lines
6. sed tool
6.1 Concept
sed (Stream EDitor) is a powerful and simple text parsing and conversion tool that can read text, edit the text content (delete, replace, add, move, etc.) according to specified conditions, and finally output all lines or only output processing some of the lines. sed can also implement quite complex text processing operations without interaction and is widely used
Shell scripts are used to complete various automated processing tasks.
The workflow of sed mainly includes three processes of reading, executing and displaying:
● Read: sed reads a line from the input stream (file, pipe, standard input) and stores it in a temporary buffer (also known as pattern space).
● Execute: By default, all sed commands are executed sequentially in the pattern space. Unless the address of the line is specified, the sed command will be executed sequentially on all lines.
● Display: Send the modified content to the output stream. After sending the data, the pattern space will be cleared.
Before all file contents have been processed, the above process will be repeated until all contents have been processed.
Note: by default all sed commands are executed in the pattern space, so the input file does not change anything unless redirection is used to store the output
6.2 Common options
Common options -e or--expression=: Indicates that the input text file is processed with the specified command or script -f or--file=: Indicates that the specified script file is used to process the input text file -h or--help: show help -n,--quiet or silent: Indicates that only the processed results are displayed -i.bak: Edit text files directly -r, -E: Using extended regular expressions -s: Treat multiple files as independent files instead of a single continuous stream of long files
Common operations a: Add, add a line of specified content below the current line c: Replace, replace the selected line with the specified content d: delete, delete the selected row i: Insert, insert a line of specified content above the selected line p: Print, if a line is specified at the same time, it means to print the specified line; if no line is specified, it means to print all the content; if there are non-printing characters, it will be printed with ASCII code output. It is usually related to "-n"option used together s: replace, replace the specified character y: character conversion
6.3 Usage examples
[root@lwb lwb]# vim num [root@lwb lwb]# cat num 1 2 3 4 5 6 7 8 9 10
[root@lwb lwb]# sed -n 'p' num 1 2 3 4 5 6 7 8 9 10 #direct print [root@lwb lwb]# sed -n '3p' num 3 #print the third line [root@lwb lwb]# sed -n '3,5p' num 3 4 5 #print 3-5 lines [root@lwb lwb]# sed -n 'p;n' num 1 3 5 7 9 #print odd lines [root@lwb lwb]# sed -n 'n;p' num 2 4 6 8 10 #print even lines [root@lwb lwb]# sed -n '1,5{p;n}' num 1 3 5 #print odd lines 1-5 [root@lwb lwb]# sed -n '6,${p;n}' num 6 8 10 #print even lines starting from the sixth line [root@lwb lwb]# ifconfig ens33 | sed -n 2p inet 192.168.36.131 netmask 255.255.255.0 broadcast 192.168.36.255 #print the second line of the network card
When the command is combined with a regular expression, the format is slightly different, and the regular expression is surrounded by "/". For example, the following is an example of using the sed command with regular expressions
[root@lwb lwb]# sed -n '/user/p' /etc/passwd saslauth:x:994:76:Saslauthd user:/run/saslauthd:/sbin/nologin rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin radvd:x:75:75:radvd user:/:/sbin/nologin qemu:x:107:107:qemu user:/:/sbin/nologin #output lines containing user [root@lwb lwb]# sed -n '4,/user/p' /etc/passwd #output from line 4 to the first line containing user [root@lwb lwb]# sed -n '/user/=' /etc/passwd 22 28 31 32 34 35 #Output the line number of the line containing user, equal sign = is used to output the line number [root@lwb lwb]# sed -n '/^root/p' /etc/passwd root:x:0:0:root:/root:/bin/bash #print lines starting with root [root@lwb lwb]# sed -n '/[0-9]$/p' test.txt #output lines ending with numbers
[root@lwb lwb]# nl passwd 1 root:x:0:0:root:/root:/bin/bash 2 bin:x:1:1:bin:/bin:/sbin/nologin 3 daemon:x:2:2:daemon:/sbin:/sbin/nologin 4 adm:x:3:4:adm:/var/adm:/sbin/nologin 5 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin [root@lwb lwb]# nl passwd | sed '3d' 1 root:x:0:0:root:/root:/bin/bash 2 bin:x:1:1:bin:/bin:/sbin/nologin 4 adm:x:3:4:adm:/var/adm:/sbin/nologin #delete the third line [root@lwb lwb]# nl passwd | sed '2,4d' 1 root:x:0:0:root:/root:/bin/bash 5 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin #delete 2-4 lines [root@lwb lwb]# sed '/^[a-z]/d' test.txt #delete lines starting with lowercase letters [root@lwb lwb]# sed '/\.$/d' test.txt #delete lines ending with "." [root@lwb lwb]# sed '/^$/d' test.txt #delete all blank lines
sed 's/the/THE/' test.txt #Replace the first the in each line with THE sed 's/l/L/2' test.txt #Replace the 2nd l in each line with an L sed 's/the/THE/g' test.txt #replace all the in the file with THE sed 's/o//g' test.txt #Delete all o in the file (replace with an empty string) sed 's/^/#/' test.txt #Insert at the beginning of each line#No sed '/the/s/^/#/' test.txt #in the included the Insert at the beginning of each line#No sed 's/$/EOF/' test.txt #Insert the string EOF at the end of each line sed '3,5s/the/THE/g' test.txt #Replace all the in lines 3-5 with THE sed '/the/s/o/O/g' test.txt #replace o with O in all lines containing the
sed '/the/{H;d};$G' test.txt #Migrate the line containing the to the end of the file, {;} for multiple operations sed '1,5{H;d};17G' test.txt #Move the content of lines 1~5 to after line 17 sed '/the/w out.file' test.txt #save the line containing the as file out.file sed '/the/r /etc/hostname' test.txt #Add the contents of the file /etc/hostname after each line containing the sed '3aNew' test.txt #Insert a new line after line 3 with the content New sed '/the/aNew' test.txt #Insert a new line with the content New after each line containing the sed '3aNew1\nNew2' test.txt #Insert multiple lines after the 3rd line, the \n in the middle means a newline sed '1,5{H;d};16G' test.txt #Move the content of lines 1~5 to after line 16