Chapter 9: JavaScript Regular Expressions - Complete Guide to Pattern Matching
Regular Expressions (RegEx) are powerful pattern-matching tools that allow you to search, validate, and manipulate text with incredible precision. They are essential for form validation, text processing, data extraction, and many other programming tasks.
Why Regular Expressions Matter in JavaScript
Regular Expressions in JavaScript are essential because they:
- Enable Text Validation: Validate email addresses, phone numbers, passwords, and other formats
- Support Data Extraction: Parse and extract specific information from text
- Facilitate Text Processing: Search, replace, and transform text efficiently
- Improve User Experience: Provide real-time input validation and formatting
- Handle Complex Patterns: Match sophisticated text patterns that would be difficult with simple string methods
- Support Internationalization: Work with various languages and character sets
Learning Objectives
Through this chapter, you will master:
- Basic regex syntax and patterns
- Character classes and quantifiers
- Anchors and boundaries
- Groups and capturing
- Lookahead and lookbehind assertions
- String methods that work with regex
- Practical validation and text processing examples
- Performance considerations and best practices
Basic Regex Syntax
Creating Regular Expressions
// Literal notation (most common)
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
// Constructor notation
const phoneRegex = new RegExp('^\\d{3}-\\d{3}-\\d{4}$');
// With flags
const globalRegex = /pattern/g; // Global flag
const caseInsensitiveRegex = /pattern/i; // Case insensitive
const multilineRegex = /pattern/m; // Multiline
const combinedFlags = /pattern/gim; // Multiple flags
// Dynamic regex creation
function createRegex(pattern, flags = '') {
return new RegExp(pattern, flags);
}
const dynamicRegex = createRegex('\\d+', 'g');
Basic Pattern Matching
// Simple text matching
const text = 'Hello, World!';
const helloRegex = /Hello/;
console.log(helloRegex.test(text)); // true
console.log(helloRegex.exec(text)); // ['Hello', index: 0, input: 'Hello, World!']
// Case insensitive matching
const caseInsensitiveRegex = /hello/i;
console.log(caseInsensitiveRegex.test(text)); // true
// Global matching
const globalRegex = /l/g;
const matches = text.match(globalRegex);
console.log(matches); // ['l', 'l', 'l']
// Replace with regex
const replaced = text.replace(/World/, 'JavaScript');
console.log(replaced); // 'Hello, JavaScript!'
Character Classes and Quantifiers
Character Classes
// Basic character classes
const text = 'abc123XYZ!@#';
// Word characters (letters, digits, underscore)
const wordChars = /\w/g;
console.log(text.match(wordChars)); // ['a', 'b', 'c', '1', '2', '3', 'X', 'Y', 'Z']
// Non-word characters
const nonWordChars = /\W/g;
console.log(text.match(nonWordChars)); // ['!', '@', '#']
// Digits
const digits = /\d/g;
console.log(text.match(digits)); // ['1', '2', '3']
// Non-digits
const nonDigits = /\D/g;
console.log(text.match(nonDigits)); // ['a', 'b', 'c', 'X', 'Y', 'Z', '!', '@', '#']
// Whitespace
const whitespace = /\s/g;
const textWithSpaces = 'hello world\n\t';
console.log(textWithSpaces.match(whitespace)); // [' ', '\n', '\t']
// Custom character classes
const vowels = /[aeiou]/gi;
const consonants = /[bcdfghjklmnpqrstvwxyz]/gi;
const hexDigits = /[0-9a-fA-F]/g;
const specialChars = /[!@#$%^&*()]/g;
// Negated character classes
const nonVowels = /[^aeiou]/gi;
const nonDigits = /[^0-9]/g;
Quantifiers
// Basic quantifiers
const text = 'aaabbbcccddd';
// Zero or more (*)
const zeroOrMore = /a*/g;
console.log(text.match(zeroOrMore)); // ['aaa', '', '', '', '', '', '', '', '', '', '']
// One or more (+)
const oneOrMore = /a+/g;
console.log(text.match(oneOrMore)); // ['aaa']
// Zero or one (?)
const zeroOrOne = /a?/g;
console.log(text.match(zeroOrOne)); // ['a', 'a', 'a', '', '', '', '', '', '', '', '', '']
// Exact count {n}
const exactCount = /a{3}/g;
console.log(text.match(exactCount)); // ['aaa']
// Range {n,m}
const range = /a{2,4}/g;
console.log(text.match(range)); // ['aaa']
// At least n {n,}
const atLeast = /a{2,}/g;
console.log(text.match(atLeast)); // ['aaa']
// Practical examples
const phoneNumber = /^\d{3}-\d{3}-\d{4}$/;
const zipCode = /^\d{5}(-\d{4})?$/;
const creditCard = /^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$/;
Anchors and Boundaries
Anchors
// String anchors
const text = 'Start middle end';
// Start of string (^)
const startAnchor = /^Start/;
console.log(startAnchor.test(text)); // true
// End of string ($)
const endAnchor = /end$/;
console.log(endAnchor.test(text)); // true
// Both anchors
const exactMatch = /^Start middle end$/;
console.log(exactMatch.test(text)); // true
// Word boundaries (\b)
const wordBoundary = /\bword\b/g;
const textWithWord = 'word wordy sword words';
console.log(textWithWord.match(wordBoundary)); // ['word', 'word']
// Non-word boundaries (\B)
const nonWordBoundary = /\Bword\B/g;
console.log(textWithWord.match(nonWordBoundary)); // ['word'] (from 'sword')
Multiline and Global Flags
const multilineText = `Line 1
Line 2
Line 3`;
// Without multiline flag
const withoutM = /^Line/g;
console.log(multilineText.match(withoutM)); // ['Line'] (only first line)
// With multiline flag
const withM = /^Line/gm;
console.log(multilineText.match(withM)); // ['Line', 'Line', 'Line']
// End of line matching
const endOfLine = /Line$/gm;
console.log(multilineText.match(endOfLine)); // ['Line', 'Line', 'Line']
Groups and Capturing
Capturing Groups
// Basic capturing groups
const text = 'John Doe, Jane Smith, Bob Johnson';
// Capture first and last name
const nameRegex = /(\w+)\s+(\w+)/g;
let match;
const names = [];
while ((match = nameRegex.exec(text)) !== null) {
names.push({
fullName: match[0],
firstName: match[1],
lastName: match[2]
});
}
console.log(names);
// [
// { fullName: 'John Doe', firstName: 'John', lastName: 'Doe' },
// { fullName: 'Jane Smith', firstName: 'Jane', lastName: 'Smith' },
// { fullName: 'Bob Johnson', firstName: 'Bob', lastName: 'Johnson' }
// ]
// Named capturing groups (ES2018)
const namedGroupRegex = /(?<firstName>\w+)\s+(?<lastName>\w+)/g;
const namedMatches = [...text.matchAll(namedGroupRegex)];
console.log(namedMatches[0].groups); // { firstName: 'John', lastName: 'Doe' }
Non-Capturing Groups
// Non-capturing groups
const text = 'color colour';
// Capturing group
const capturing = /colou?r/g;
console.log(text.match(capturing)); // ['color', 'colour']
// Non-capturing group
const nonCapturing = /colou(?:r)/g;
console.log(text.match(nonCapturing)); // ['color', 'colour']
// Practical example: optional protocol
const urlRegex = /(?:https?:\/\/)?(?:www\.)?([^\/]+)/;
const urls = [
'https://www.example.com',
'http://example.com',
'www.example.com',
'example.com'
];
urls.forEach(url => {
const match = url.match(urlRegex);
console.log(`${url} -> ${match[1]}`);
});
Backreferences
// Backreferences
const text = 'The the quick brown fox jumps over the lazy dog';
// Find repeated words
const repeatedWords = /\b(\w+)\s+\1\b/gi;
console.log(text.match(repeatedWords)); // ['The the']
// HTML tag matching
const htmlText = '<div>Content</div><span>Text</span>';
const tagRegex = /<(\w+)>(.*?)<\/\1>/g;
let match;
const tags = [];
while ((match = tagRegex.exec(htmlText)) !== null) {
tags.push({
tag: match[1],
content: match[2]
});
}
console.log(tags);
// [
// { tag: 'div', content: 'Content' },
// { tag: 'span', content: 'Text' }
// ]
Lookahead and Lookbehind Assertions
Lookahead Assertions
// Positive lookahead (?=...)
const text = 'password123 strongpass456 weakpass';
// Find passwords followed by numbers
const passwordWithNumbers = /\w+(?=\d)/g;
console.log(text.match(passwordWithNumbers)); // ['password', 'strongpass', 'weakpass']
// Negative lookahead (?!...)
const text2 = 'hello world hello there';
// Find 'hello' not followed by 'world'
const helloNotWorld = /hello(?!\s+world)/g;
console.log(text2.match(helloNotWorld)); // ['hello'] (from 'hello there')
// Password validation with lookahead
const passwordRegex = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;
function validatePassword(password) {
return passwordRegex.test(password);
}
console.log(validatePassword('Password123!')); // true
console.log(validatePassword('password')); // false
Lookbehind Assertions
// Positive lookbehind (?<=...)
const text = '$100 $200 €150 ¥300';
// Find numbers preceded by dollar sign
const dollarAmounts = /(?<=\$)\d+/g;
console.log(text.match(dollarAmounts)); // ['100', '200']
// Negative lookbehind (?<!...)
const text2 = 'price: $100 cost: €50 total: $200';
// Find numbers not preceded by dollar sign
const nonDollarAmounts = /(?<!\$)\d+/g;
console.log(text2.match(nonDollarAmounts)); // ['50']
// Extract currency and amount
const currencyRegex = /(?<=\$|€|¥)\d+/g;
const amounts = text.match(currencyRegex);
console.log(amounts); // ['100', '200', '150', '300']
String Methods with Regex
Search and Replace
// String.search()
const text = 'Hello, World!';
const index = text.search(/World/);
console.log(index); // 7
// String.match()
const text2 = 'The quick brown fox jumps over the lazy dog';
const words = text2.match(/\w+/g);
console.log(words); // ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
// String.matchAll() (ES2020)
const text3 = 'John: 25, Jane: 30, Bob: 35';
const ageRegex = /(\w+):\s*(\d+)/g;
const matches = [...text3.matchAll(ageRegex)];
const people = matches.map(match => ({
name: match[1],
age: parseInt(match[2])
}));
console.log(people);
// [
// { name: 'John', age: 25 },
// { name: 'Jane', age: 30 },
// { name: 'Bob', age: 35 }
// ]
// String.replace()
const text4 = 'Hello, World!';
const replaced = text4.replace(/World/, 'JavaScript');
console.log(replaced); // 'Hello, JavaScript!'
// String.replace() with function
const text5 = 'The quick brown fox';
const replaced2 = text5.replace(/\b\w/g, match => match.toUpperCase());
console.log(replaced2); // 'The Quick Brown Fox'
Advanced Replace Operations
// Replace with capturing groups
const text = 'John Doe, Jane Smith';
const swapped = text.replace(/(\w+)\s+(\w+)/g, '$2, $1');
console.log(swapped); // 'Doe, John Smith, Jane'
// Replace with function
const text2 = 'The price is $100 and $200';
const doubled = text2.replace(/\$(\d+)/g, (match, amount) => {
return `$${parseInt(amount) * 2}`;
});
console.log(doubled); // 'The price is $200 and $400'
// Format phone numbers
function formatPhoneNumber(phone) {
return phone.replace(/(\d{3})(\d{3})(\d{4})/, '($1) $2-$3');
}
console.log(formatPhoneNumber('1234567890')); // '(123) 456-7890'
// Remove extra whitespace
const text3 = ' Hello world ! ';
const cleaned = text3.replace(/\s+/g, ' ').trim();
console.log(cleaned); // 'Hello world !'
Practical Validation Examples
Email Validation
// Comprehensive email validation
const emailRegex = /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;
function validateEmail(email) {
return emailRegex.test(email);
}
// Test cases
const emails = [
'[email protected]',
'[email protected]',
'[email protected]',
'invalid.email',
'@example.com',
'user@',
'[email protected]'
];
emails.forEach(email => {
console.log(`${email}: ${validateEmail(email)}`);
});
Phone Number Validation
// Phone number validation for different formats
const phoneRegexes = {
us: /^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$/,
international: /^\+?[1-9]\d{1,14}$/,
flexible: /^[\+]?[1-9][\d]{0,15}$/
};
function validatePhoneNumber(phone, format = 'us') {
return phoneRegexes[format].test(phone);
}
// Test cases
const phones = [
'123-456-7890',
'(123) 456-7890',
'123.456.7890',
'123 456 7890',
'+1-123-456-7890',
'1234567890'
];
phones.forEach(phone => {
console.log(`${phone}: ${validatePhoneNumber(phone)}`);
});
URL Validation
// URL validation
const urlRegex = /^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$/;
function validateURL(url) {
return urlRegex.test(url);
}
// Extract domain from URL
function extractDomain(url) {
const domainRegex = /^https?:\/\/(?:www\.)?([^\/]+)/;
const match = url.match(domainRegex);
return match ? match[1] : null;
}
// Test cases
const urls = [
'https://www.example.com',
'http://example.com/path',
'https://subdomain.example.co.uk',
'invalid-url',
'ftp://example.com'
];
urls.forEach(url => {
console.log(`${url}: ${validateURL(url)} (domain: ${extractDomain(url)})`);
});
Text Processing Examples
Data Extraction
// Extract data from text
const text = `
Name: John Doe
Email: [email protected]
Phone: (555) 123-4567
Address: 123 Main St, Anytown, NY 12345
`;
// Extract name
const nameMatch = text.match(/Name:\s*(.+)/);
const name = nameMatch ? nameMatch[1] : null;
// Extract email
const emailMatch = text.match(/Email:\s*([^\s]+)/);
const email = emailMatch ? emailMatch[1] : null;
// Extract phone
const phoneMatch = text.match(/Phone:\s*([^\n]+)/);
const phone = phoneMatch ? phoneMatch[1] : null;
// Extract address
const addressMatch = text.match(/Address:\s*(.+)/);
const address = addressMatch ? addressMatch[1] : null;
console.log({ name, email, phone, address });
// Extract all key-value pairs
const keyValueRegex = /(\w+):\s*(.+)/g;
const data = {};
let match;
while ((match = keyValueRegex.exec(text)) !== null) {
data[match[1].toLowerCase()] = match[2].trim();
}
console.log(data);
Text Cleaning and Formatting
// Text cleaning functions
function cleanText(text) {
return text
.replace(/\s+/g, ' ') // Replace multiple spaces with single space
.replace(/\n\s*\n/g, '\n') // Remove empty lines
.trim(); // Remove leading/trailing whitespace
}
function formatText(text) {
return text
.replace(/\b\w/g, match => match.toUpperCase()) // Capitalize first letter of each word
.replace(/\s+/g, ' ') // Normalize spaces
.trim();
}
function removeSpecialChars(text) {
return text.replace(/[^\w\s]/g, ''); // Keep only word characters and spaces
}
// Example usage
const messyText = ' hello world !!! \n\n how are you? ';
console.log('Original:', messyText);
console.log('Cleaned:', cleanText(messyText));
console.log('Formatted:', formatText(messyText));
console.log('No special chars:', removeSpecialChars(messyText));
Performance Considerations
Optimizing Regex Performance
// Cache compiled regex objects
const regexCache = new Map();
function getCachedRegex(pattern, flags = '') {
const key = `${pattern}:${flags}`;
if (!regexCache.has(key)) {
regexCache.set(key, new RegExp(pattern, flags));
}
return regexCache.get(key);
}
// Use specific character classes instead of broad ones
// Good: specific character class
const specificRegex = /[a-zA-Z0-9]/;
// Avoid: overly broad character class
const broadRegex = /./;
// Use non-capturing groups when you don't need the captured content
// Good: non-capturing group
const nonCapturing = /(?:https?:\/\/)?(?:www\.)?([^\/]+)/;
// Avoid: capturing groups when not needed
const capturing = /(https?:\/\/)?(www\.)?([^\/]+)/;
// Use anchors to prevent unnecessary backtracking
// Good: anchored regex
const anchoredRegex = /^[a-zA-Z0-9]+$/;
// Avoid: unanchored regex that might backtrack
const unanchoredRegex = /[a-zA-Z0-9]+/;
Benchmarking Regex Performance
// Simple regex performance test
function benchmarkRegex(text, regex, iterations = 10000) {
const start = performance.now();
for (let i = 0; i < iterations; i++) {
regex.test(text);
}
const end = performance.now();
return end - start;
}
// Test different approaches
const text = 'Hello, World! This is a test string.';
const iterations = 100000;
// Simple match
const simpleRegex = /Hello/;
const simpleTime = benchmarkRegex(text, simpleRegex, iterations);
// Complex match
const complexRegex = /^[A-Z][a-z]+,\s+[A-Z][a-z]+!\s+This\s+is\s+a\s+test\s+string\.$/;
const complexTime = benchmarkRegex(text, complexRegex, iterations);
console.log(`Simple regex: ${simpleTime.toFixed(2)}ms`);
console.log(`Complex regex: ${complexTime.toFixed(2)}ms`);
Best Practices
1. Use Appropriate Regex Complexity
// Good: Simple and clear
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
// Avoid: Overly complex regex
const complexEmailRegex = /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;
// Use multiple simple regexes when appropriate
function validateEmail(email) {
const basicFormat = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
const hasAtSymbol = email.includes('@');
const hasDomain = email.split('@')[1]?.includes('.');
return basicFormat.test(email) && hasAtSymbol && hasDomain;
}
2. Comment Complex Regex
// Well-commented regex
const phoneRegex = /^
\(? # Optional opening parenthesis
([0-9]{3}) # Area code (3 digits)
\)? # Optional closing parenthesis
[-. ]? # Optional separator
([0-9]{3}) # Exchange code (3 digits)
[-. ]? # Optional separator
([0-9]{4}) # Line number (4 digits)
$/x;
// Or use descriptive variable names
const AREA_CODE = '([0-9]{3})';
const EXCHANGE_CODE = '([0-9]{3})';
const LINE_NUMBER = '([0-9]{4})';
const SEPARATOR = '[-. ]?';
const phoneRegex2 = new RegExp(`^\\(?${AREA_CODE}\\)?${SEPARATOR}${EXCHANGE_CODE}${SEPARATOR}${LINE_NUMBER}$`);
3. Handle Edge Cases
// Robust regex with edge case handling
function extractNumbers(text) {
if (!text || typeof text !== 'string') {
return [];
}
const numberRegex = /-?\d+(?:\.\d+)?/g;
const matches = text.match(numberRegex);
return matches ? matches.map(Number) : [];
}
// Test edge cases
console.log(extractNumbers('')); // []
console.log(extractNumbers(null)); // []
console.log(extractNumbers('No numbers here')); // []
console.log(extractNumbers('123 -456 78.9')); // [123, -456, 78.9]
Summary
Regular expressions are powerful tools for text processing:
- Basic Syntax: Use literals or constructors to create regex patterns
- Character Classes: Match specific sets of characters efficiently
- Quantifiers: Control how many times patterns can match
- Anchors: Match positions in text (start, end, word boundaries)
- Groups: Capture and reference parts of matches
- Assertions: Use lookahead and lookbehind for complex patterns
- String Methods: Leverage built-in methods for search and replace
- Performance: Optimize regex for better performance
- Best Practices: Write clear, maintainable regex patterns
Mastering regular expressions enables you to handle complex text processing tasks efficiently and create robust validation systems for your applications.
This tutorial is part of the JavaScript Mastery series by syscook.dev