JavaScript RegExp Basics | Developer.com
In JavaScript, a RegExp is a special string, called a pattern, that uses various character sequences to define the characteristics needed to match a character sequence within another string. Short for regular expression, RegExp, or simply RegEx, are nothing new. In fact, they date all the way back to 1951, when mathematician Stephen Cole Kleene described regular languages using his mathematical notation called regular events.
Most programming languages implement their own version of RegExp. JavaScript’s programming interface for them is clumsy, but they are a powerful tool for inspecting and processing strings. Properly understanding regular expressions will definitely make you a more effective programmer, at least when it comes to working with strings. This web development tutorial will provide some RegExp basics for testing and matching strings in JavaScript.
Read: Top JavaScript Frameworks
How to Create a RegExp in JavaScript
Being a type of object, a RegExp can be either constructed with the RegExp constructor or written as a literal value by enclosing a pattern in forward slash (/) characters, as shown in the following code example:
let regex1 = new RegExp("xyz"); let regex2 = /xyz/;
Both of those regular expression objects represent the same pattern: an ‘x‘ character followed by a ‘y‘ followed by a ‘z‘.
Using Modifiers in JavaScript RegExp
Programmers can specify one or more flags to change the default match behavor of the RegExp object:
- g: Performs a global match, finding all matches rather than just the first.
- i: Makes matches case-insensitive. Matches both uppercase and lowercase.
- m: Performs multiline matches. (Changes behavior of ^,$)
- s: Allows ‘.‘ to match newline characters.
- u: Enables Unicode support.
- y: Matches are sticky, looking only at exact position in the text.
Here is a code example showing how to make the above RegExp objects perform a global, case-insensitive, multiline search:
let regex1 = new RegExp("xyz", 'gim'); let regex2 = /xyz/gim;
Finding Matches within a String in JavaScript
Regular expression objects have a number of methods. The simplest one is test. If you pass it a string, it will return a boolean telling you whether the string contains a match of the pattern in the expression.
console.log(/abc/.test("vwxyz")); // true console.log(/abc/.test("yyz")) ; // false
The JavaScript String also has the match() method which matches a string against another string or regular expression. The match() method returns an array with the matches or null if no match is found. Here are a few code examples:
//A search for "ain" using a string: let text = "The rain in SPAIN stays mainly in the plain"; text.match("ain"); // ain //A search for "ain" using a regular expression: text = "The rain in SPAIN stays mainly in the plain"; text.match(/ain/); // ain //A global search for "ain": text = "The rain in SPAIN stays mainly in the plain"; text.match(/ain/g); // ain,ain,ain //A global, case-insensitive search: text = "The rain in SPAIN stays mainly in the plain"; text.match(/ain/gi); // ain,AIN,ain,ain
Brackets in JavaScript RegExp
Brackets are used to find a range of characters:
- [abc]: Find any character between the brackets
- [^abc]: Find any character NOT between the brackets
- [0-9]: Find any character between the brackets (any digit)
- [^0-9]: Find any character NOT between the brackets (any non-digit)
We can see a few examples of how to use brackets in JavaScript regular expressions below:
let text = "The rain in SPAIN stays mainly in the plain"; /[abc]/.test(text); // true /r[abc][eiu]n/.test(text); // true /[xyz]s/.test(text); // true /[0-9]/.test(text); // false
Read: JavaScript Debugging
Metacharacters in JavaScript RegExp
Metacharacters are characters that specify a given type of character to match:
- . : Matches any character except line terminators. When s flag set, it also matches line terminators.
- \d : Matches any digit (Arabic numeral).
- \D : Matches any character that is not a digit (Arabic numeral).
- \w : Matches any alphanumeric character from Latin alphabet, including underscore.
- \W : Matches any character that is not an alphanumeric character from Latin alphabet or underscore.
- \s : Matches any whitespace character (space, tab, newline, non-breaking space, and similar).
- \S : Matches any character that isn’t a whitespace character.
- \t : Matches a horizontal tab.
- \r : Matches a carriage return.
- \n : Matches a linefeed.
- \v : Matches a vertical tab.
- \f : Matches a form-feed.
- [\b] : Matches a backspace.
- \0 : Matches a NUL character (when not followed by another digit).
- \xnn : Matches the character code nn (two hexadecimal digits).
- \unnnn : Matches a UTF-16 code unit with the value nnnn (four hexadecimal digits).
- \ : Followed by a special character, means that the character should be matched literally.
The following two JavaScript RegExes match words and whitespace:
let text = "The rain in SPAIN stays mainly in the plain"; let result = text.match(/\w/gi); // T,h,e,r,a,i,n,i,n,S,P,A,I,N,s,t,a,y,s,m,a,i,n,l,y,i,n,t,h,e, p,l,a,i,n result = text.match(/\s/gi); // , , , , , , ,
Quantifiers in JavaScript RegExp
Quantifiers specify the number of characters or expressions to match.
- n+ : Matches any string that contains at least one n
- n* : Matches any string that contains zero or more occurrences of n
- n? : Matches any string that contains zero or one occurrences of n
- n{X} : Matches any string that contains a sequence of X n’s
- n{X,Y} : Matches any string that contains a sequence of X to Y n’s
- n{X,} : Matches any string that contains a sequence of at least X n’s
- n$ : Matches any string with n at the end of it
- ^n : Matches any string with n at the beginning of it
- ?=n : Matches any string that is followed by a specific string n
- ?!n : Matches any string that is not followed by a specific string n
Here are some text matches using JavaScript RegExp quantifiers:
let text = "The rain in SPAIN stays mainly in the plain"; // find all words of 5 characters or more let result = text.match(/\w{5,}/gi); // SPAIN,stays,mainly,plain // Match words that contain the letter i result = text.match(/\w*i+\w*/gi); // rain,in,SPAIN,mainly,in,plain // Match words that contain the letter i in the middle result = text.match(/\w+i+\w+/gi); // rain,SPAIN,mainly,in,plain
Final Thoughts on JavaScript RegExp
This web development tutorial provided some RegExp basics for testing and matching strings in JavaScript. There is a lot more to RegExp than what we covered here, including subgroups, replace operations, compiling RegExes, etc. We will cover those topics in future articles.