Regular Expressions JS-1212

Table of contents

There are two ways to create a new regular expression. One is to use literals, starting and ending with slashes.

	//Method 1: Literal
	var reg = /xyz/;
	//Method 2: Constructor new RegExp('regular expression object','modifier)
	var reg2=new RegExp("/xyz","g");
	//When using the constructor, writing regular content uses a predefined pattern, and you need to use '\' to escape \
	var reg=/\d*\;
	var reg=new RegExp('/\\d*/');

Modifier

The modifier (modifier) ​​represents the additional rules of the pattern, and is placed at the end of the regular pattern. Modifiers can be used singly or in combination:

  1. g modifier, by default, after the first match is successful, the regular object stops matching downwards. The g modifier means global matching (global). After adding it, the regular object will match all eligible results, which is mainly used for search and replacement.
    	var regex = /b/;
    	var str = 'abba';
    	regex.test(str); // true
    	regex.test(str); // true
    	regex.test(str); // true
    	
    	var regex = /b/g;
    	var str = 'abba';
    	regex.test(str); // true
    	regex.test(str); // true
    	regex.test(str); // false
    
  2. The i modifier ignores case. By default, regular objects are case-sensitive.
    /abc/.test('ABC') // false
    /abc/i.test('ABC') // true
    
  3. The m modifier indicates multiline mode (multiline).

instance method

1. RegExp.prototype.test()

Returns a boolean indicating whether the current pattern matches the argument string.

/cat/.test('cats and dogs') // true, to determine whether there is a cat in the string
//If the regular expression has the g modifier, each test method will start matching backwards from the position where the previous one ended.
var r = /x/g;
var s = '_x_x';

r.lastIndex // 0
r.test(s) // true

r.lastIndex // 2
r.test(s) // true

r.lastIndex // 4
r.test(s) // false

The regular expression of the above code uses the g modifier, which means that it is a global search and there will be multiple results. Then, the test method is used three times, and the position where the search is started each time is the position after the last match.

2. RegExp.prototype.exec()

Used to return matching results. If a match is found, an array is returned whose members are substrings that match successfully, otherwise null is returned.

var s = '_x_x';
var r1 = /x/;
var r2 = /y/;

r1.exec(s) // ["x"]
r2.exec(s) // null
//If the regular expression is added with the g modifier, the exec() method can be used multiple times, and the position of the next search will start from the position where the previous match ended successfully.
var reg = /a/g;
var str = 'abc_abc_abc'

var r1 = reg.exec(str);
r1 // ["a"]
r1.index // 0
reg.lastIndex // 1

var r2 = reg.exec(str);
r2 // ["a"]
r2.index // 4
reg.lastIndex // 5

var r3 = reg.exec(str);
r3 // ["a"]
r3.index // 8
reg.lastIndex // 9

var r4 = reg.exec(str);
r4 // null
reg.lastIndex // 0

String instance methods

1. String.prototype.match()

Similar to the **exec()** method, the exec() method returns an element if the match is successful, match() returns an array if the match is successful, and null if the match fails.

//default mode
console.log(/x/.exec("_x_xy"));//['x', index: 1, input: '_x_xy', groups: undefined]
console.log('_x_xy'.match(/x/));//['x', index: 1, input: '_x_xy', groups: undefined]
// global pattern
console.log(/x/g.exec("_x_xy"));//['x', index: 1, input: '_x_xy', groups: undefined]
console.log('_x_xy'.match(/x/g));//['x', 'x']

var str="12[ab]";
//In non-global mode, the returned array. The first element is the matched element, and after the second is the matched grouping element.
console.log(str.match(/(\d+)\[(\w+)\]/));//['12[ab]', '12', 'ab', index: 0, input: '12[ab]', groups: undefined]
console.log(str.match(/(\d+)\[(\w+)\]/g));//['12[ab]'];

2. String.prototype.search()

The search method of the string object returns the position of the first matching result that meets the condition in the entire string. Returns -1 if nothing matches.

'_x_x'.search(/x/);// 1

3. String.prototype.replace()

The replace method of a String object can replace matched values. It accepts two parameters, the first is a regular expression representing the search pattern, and the second is what to replace.
In the string method, the first element that meets the conditions is replaced; in the regular expression, if the g modifier is not added, the first matching value is replaced, otherwise all matching values ​​are replaced.

console.log('aaa'.replace('a', 'b'));//Replace only the first one that satisfies the condition
console.log('aaa'.replace(/a/, 'b'));//~
console.log('aaa'.replace(/a/g, 'b'));//Replace all values ​​that meet the condition

var str="3[ab]2[cd]"; 
console.log(str.match(/(\d+)\[(\w+)\]/g));//['3[ab]','2[cd]'];
str=str.replace(/(\d+)\[(\w+)\]/g,function(t,$1,$2){
   return  $2.repeat($1);
});
console.log(str);//"abababcdcd"

4. String.prototype.split()

Split the string according to the regular rules, and return an array composed of the split parts.
This method accepts two parameters, the first parameter is a regular expression, indicating the separation rule, and the second parameter is the maximum number of members of the returned array.

// irregular delimiter
'a,  b,c, d'.split(',')
// [ 'a', '  b', 'c', ' d' ]

// Regular separation, remove extra spaces
'a,  b,c, d'.split(/, */)
// [ 'a', 'b', 'c', 'd' ]

// Specifies the largest member of the returned array
'a,  b,c, d'.split(/, */, 2)
[ 'a', 'b' ]

//Split with specified elements
var str="abcdef";
console.log(str.split(/b|d/));//['a', 'c', 'ef']

predefined schema

  • \d matches a pure value, that is, any number between 0-9, which is equivalent to [0-9].
  • \D matches all characters other than 0-9, which is equivalent to [^0-9].
  • \w matches any letter, number and underscore, equivalent to [A-Za-z0-9_].
  • \W Characters except all letters, numbers and underscores, equivalent to [^A-Za-z0-9_]
  • \s matches whitespace (including newlines, tabs, spaces, etc.), equivalent to [ \t\r\n\v\f].
  • \S matches non-whitespace characters, equivalent to [^ \t\r\n\v\f].

character class

A character class (class) means that there are a series of characters to choose from, as long as one of them matches. All optional characters are placed in square brackets, such as [xyz] means that any one of x, y, and z matches.

/[abc]/.test('hello world') // false
/[abc]/.test('apple') // true

1. Caret (^)

If the first character in the square brackets is [^], it means that all characters except the characters in the character class can be matched. For example, [^xyz] means that everything except x, y, and z can be matched.

/[^abc]/.test('bbc news'); // true
/[^abc]/.test('bbc'); // false

If there are no other characters in the square brackets, that is, only [^], it means to match all characters, including newline characters. In contrast, a dot as a metacharacter (.) does not include line breaks.

Note that the caret has special meaning only in the first position of a character class, otherwise it is literal.

2. Hyphen (-)

For contiguous sequences of characters, the hyphen (-) is used to provide a shorthand form, indicating a contiguous range of characters. For example, [abc] can be written as [a-c], and [0123456789] can be written as [0-9]. Similarly, [A-Z] represents 26 capital letters.

metacharacter

1. Dot character (.)

The dot character (.) matches all characters except carriage return (\r), newline (\n), line separator (\u2028), and paragraph separator (\u2029).

2. Positional characters

  1. ^: Indicates the starting position of the string
  2. $: Indicates the position at the end of the string
// test must appear at the beginning
/^test/.test('test123') // true

// test must appear at the end position
/test$/.test('new test') // true

// Only test from start position to end position
/^test$/.test('test') // true
/^test$/.test('test test') // false

3. Selector (|)

The vertical bar symbol (|) means "or relationship" (OR) in regular expressions.

/11|22/.test('911') // true

Repeating Classes and Quantifiers

1. Duplicate class

The number of exact matches of the pattern, expressed in braces {}. {n} means to repeat exactly n times, {n,} means to repeat at least n times, {n,m} means to repeat not less than n times, not more than m times.

2. Quantifier character

  • ? A question mark indicates that a pattern occurs 0 or 1 time, which is equivalent to {0, 1}.
  • * An asterisk means that a pattern occurs 0 or more times, which is equivalent to {0,}.
  • + The plus sign indicates that a certain pattern occurs 1 or more times, which is equivalent to {1,}.

Greedy and non-greedy matching

/a+/,/a*/,a?/, greedy matching refers to matching until a does not appear;
Non-greedy matching refers to +? at the end of greedy matching, namely: /a+?/,/a*?/,/a??/

  • +?: Indicates that a pattern appears one or more times, and a non-greedy pattern is used for matching.
  • *?: Indicates that a pattern appears 0 or more times, and a non-greedy pattern is used for matching.
  • ??: A certain pattern in the table appears 0 or 1 time, and a non-greedy pattern is used for matching.

group

The brackets in the regular expression indicate group matching, and the patterns in the brackets can be used to match the content of the group.

/fred+/.test('fredd') // true, d+ means only repeat the letter d
/(fred)+/.test('fredfred') // true. Repeat the word fred
//packet capture
'abcabc'.match(/(.)b(.)/);
// ['abc', 'a', 'c']

1. Assertion, search according to conditions

1. Post positive assertion: (?=n)
x(?=y) x matches only before y, and y in parentheses will not be returned.
2. Post negative assertion: (?!n)
x(?!y) x matches only if it is not in front of y
3. Pre-positive assertion: (?<=n)
(?<=y)x only matches x after y
4. Pre-negative assertion: (?<!n)
(?<!y)x only matches x that is not after y

Tags: Javascript regex

Posted by celsoendo on Tue, 13 Dec 2022 03:48:31 +0530