Chapter 9: JavaScript Regular Expressions - Complete Guide to Pattern Matching

Regular Expressions (RegEx) are powerful pattern-matching tools that allow you to search, validate, and manipulate text with incredible precision. They are essential for form validation, text processing, data extraction, and many other programming tasks.

Why Regular Expressions Matter in JavaScript

Regular Expressions in JavaScript are essential because they:

Enable Text Validation: Validate email addresses, phone numbers, passwords, and other formats
Support Data Extraction: Parse and extract specific information from text
Facilitate Text Processing: Search, replace, and transform text efficiently
Improve User Experience: Provide real-time input validation and formatting
Handle Complex Patterns: Match sophisticated text patterns that would be difficult with simple string methods
Support Internationalization: Work with various languages and character sets

Learning Objectives

Through this chapter, you will master:

Basic regex syntax and patterns
Character classes and quantifiers
Anchors and boundaries
Groups and capturing
Lookahead and lookbehind assertions
String methods that work with regex
Practical validation and text processing examples
Performance considerations and best practices

Basic Regex Syntax

Creating Regular Expressions

// Literal notation (most common)
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;

// Constructor notation
const phoneRegex = new RegExp('^\\d{3}-\\d{3}-\\d{4}$');

// With flags
const globalRegex = /pattern/g; // Global flag
const caseInsensitiveRegex = /pattern/i; // Case insensitive
const multilineRegex = /pattern/m; // Multiline
const combinedFlags = /pattern/gim; // Multiple flags

// Dynamic regex creation
function createRegex(pattern, flags = '') {
  return new RegExp(pattern, flags);
}

const dynamicRegex = createRegex('\\d+', 'g');

Basic Pattern Matching

// Simple text matching
const text = 'Hello, World!';
const helloRegex = /Hello/;

console.log(helloRegex.test(text)); // true
console.log(helloRegex.exec(text)); // ['Hello', index: 0, input: 'Hello, World!']

// Case insensitive matching
const caseInsensitiveRegex = /hello/i;
console.log(caseInsensitiveRegex.test(text)); // true

// Global matching
const globalRegex = /l/g;
const matches = text.match(globalRegex);
console.log(matches); // ['l', 'l', 'l']

// Replace with regex
const replaced = text.replace(/World/, 'JavaScript');
console.log(replaced); // 'Hello, JavaScript!'

Character Classes and Quantifiers

Character Classes

// Basic character classes
const text = 'abc123XYZ!@#';

// Word characters (letters, digits, underscore)
const wordChars = /\w/g;
console.log(text.match(wordChars)); // ['a', 'b', 'c', '1', '2', '3', 'X', 'Y', 'Z']

// Non-word characters
const nonWordChars = /\W/g;
console.log(text.match(nonWordChars)); // ['!', '@', '#']

// Digits
const digits = /\d/g;
console.log(text.match(digits)); // ['1', '2', '3']

// Non-digits
const nonDigits = /\D/g;
console.log(text.match(nonDigits)); // ['a', 'b', 'c', 'X', 'Y', 'Z', '!', '@', '#']

// Whitespace
const whitespace = /\s/g;
const textWithSpaces = 'hello world\n\t';
console.log(textWithSpaces.match(whitespace)); // [' ', '\n', '\t']

// Custom character classes
const vowels = /[aeiou]/gi;
const consonants = /[bcdfghjklmnpqrstvwxyz]/gi;
const hexDigits = /[0-9a-fA-F]/g;
const specialChars = /[!@#$%^&*()]/g;

// Negated character classes
const nonVowels = /[^aeiou]/gi;
const nonDigits = /[^0-9]/g;

Quantifiers

// Basic quantifiers
const text = 'aaabbbcccddd';

// Zero or more (*)
const zeroOrMore = /a*/g;
console.log(text.match(zeroOrMore)); // ['aaa', '', '', '', '', '', '', '', '', '', '']

// One or more (+)
const oneOrMore = /a+/g;
console.log(text.match(oneOrMore)); // ['aaa']

// Zero or one (?)
const zeroOrOne = /a?/g;
console.log(text.match(zeroOrOne)); // ['a', 'a', 'a', '', '', '', '', '', '', '', '', '']

// Exact count {n}
const exactCount = /a{3}/g;
console.log(text.match(exactCount)); // ['aaa']

// Range {n,m}
const range = /a{2,4}/g;
console.log(text.match(range)); // ['aaa']

// At least n {n,}
const atLeast = /a{2,}/g;
console.log(text.match(atLeast)); // ['aaa']

// Practical examples
const phoneNumber = /^\d{3}-\d{3}-\d{4}$/;
const zipCode = /^\d{5}(-\d{4})?$/;
const creditCard = /^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$/;

Anchors and Boundaries

Anchors

// String anchors
const text = 'Start middle end';

// Start of string (^)
const startAnchor = /^Start/;
console.log(startAnchor.test(text)); // true

// End of string ($)
const endAnchor = /end$/;
console.log(endAnchor.test(text)); // true

// Both anchors
const exactMatch = /^Start middle end$/;
console.log(exactMatch.test(text)); // true

// Word boundaries (\b)
const wordBoundary = /\bword\b/g;
const textWithWord = 'word wordy sword words';
console.log(textWithWord.match(wordBoundary)); // ['word', 'word']

// Non-word boundaries (\B)
const nonWordBoundary = /\Bword\B/g;
console.log(textWithWord.match(nonWordBoundary)); // ['word'] (from 'sword')

Multiline and Global Flags

const multilineText = `Line 1
Line 2
Line 3`;

// Without multiline flag
const withoutM = /^Line/g;
console.log(multilineText.match(withoutM)); // ['Line'] (only first line)

// With multiline flag
const withM = /^Line/gm;
console.log(multilineText.match(withM)); // ['Line', 'Line', 'Line']

// End of line matching
const endOfLine = /Line$/gm;
console.log(multilineText.match(endOfLine)); // ['Line', 'Line', 'Line']

Groups and Capturing

Capturing Groups

// Basic capturing groups
const text = 'John Doe, Jane Smith, Bob Johnson';

// Capture first and last name
const nameRegex = /(\w+)\s+(\w+)/g;
let match;
const names = [];

while ((match = nameRegex.exec(text)) !== null) {
  names.push({
    fullName: match[0],
    firstName: match[1],
    lastName: match[2]
  });
}

console.log(names);
// [
//   { fullName: 'John Doe', firstName: 'John', lastName: 'Doe' },
//   { fullName: 'Jane Smith', firstName: 'Jane', lastName: 'Smith' },
//   { fullName: 'Bob Johnson', firstName: 'Bob', lastName: 'Johnson' }
// ]

// Named capturing groups (ES2018)
const namedGroupRegex = /(?<firstName>\w+)\s+(?<lastName>\w+)/g;
const namedMatches = [...text.matchAll(namedGroupRegex)];
console.log(namedMatches[0].groups); // { firstName: 'John', lastName: 'Doe' }

Non-Capturing Groups

// Non-capturing groups
const text = 'color colour';

// Capturing group
const capturing = /colou?r/g;
console.log(text.match(capturing)); // ['color', 'colour']

// Non-capturing group
const nonCapturing = /colou(?:r)/g;
console.log(text.match(nonCapturing)); // ['color', 'colour']

// Practical example: optional protocol
const urlRegex = /(?:https?:\/\/)?(?:www\.)?([^\/]+)/;
const urls = [
  'https://www.example.com',
  'http://example.com',
  'www.example.com',
  'example.com'
];

urls.forEach(url => {
  const match = url.match(urlRegex);
  console.log(`${url} -> ${match[1]}`);
});

Backreferences

// Backreferences
const text = 'The the quick brown fox jumps over the lazy dog';

// Find repeated words
const repeatedWords = /\b(\w+)\s+\1\b/gi;
console.log(text.match(repeatedWords)); // ['The the']

// HTML tag matching
const htmlText = '<div>Content</div><span>Text</span>';
const tagRegex = /<(\w+)>(.*?)<\/\1>/g;
let match;
const tags = [];

while ((match = tagRegex.exec(htmlText)) !== null) {
  tags.push({
    tag: match[1],
    content: match[2]
  });
}

console.log(tags);
// [
//   { tag: 'div', content: 'Content' },
//   { tag: 'span', content: 'Text' }
// ]

Lookahead and Lookbehind Assertions

Lookahead Assertions

// Positive lookahead (?=...)
const text = 'password123 strongpass456 weakpass';

// Find passwords followed by numbers
const passwordWithNumbers = /\w+(?=\d)/g;
console.log(text.match(passwordWithNumbers)); // ['password', 'strongpass', 'weakpass']

// Negative lookahead (?!...)
const text2 = 'hello world hello there';

// Find 'hello' not followed by 'world'
const helloNotWorld = /hello(?!\s+world)/g;
console.log(text2.match(helloNotWorld)); // ['hello'] (from 'hello there')

// Password validation with lookahead
const passwordRegex = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;

function validatePassword(password) {
  return passwordRegex.test(password);
}

console.log(validatePassword('Password123!')); // true
console.log(validatePassword('password')); // false

Lookbehind Assertions

// Positive lookbehind (?<=...)
const text = '$100 $200 €150 ¥300';

// Find numbers preceded by dollar sign
const dollarAmounts = /(?<=\$)\d+/g;
console.log(text.match(dollarAmounts)); // ['100', '200']

// Negative lookbehind (?<!...)
const text2 = 'price: $100 cost: €50 total: $200';

// Find numbers not preceded by dollar sign
const nonDollarAmounts = /(?<!\$)\d+/g;
console.log(text2.match(nonDollarAmounts)); // ['50']

// Extract currency and amount
const currencyRegex = /(?<=\$|€|¥)\d+/g;
const amounts = text.match(currencyRegex);
console.log(amounts); // ['100', '200', '150', '300']

String Methods with Regex

Search and Replace

// String.search()
const text = 'Hello, World!';
const index = text.search(/World/);
console.log(index); // 7

// String.match()
const text2 = 'The quick brown fox jumps over the lazy dog';
const words = text2.match(/\w+/g);
console.log(words); // ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

// String.matchAll() (ES2020)
const text3 = 'John: 25, Jane: 30, Bob: 35';
const ageRegex = /(\w+):\s*(\d+)/g;
const matches = [...text3.matchAll(ageRegex)];

const people = matches.map(match => ({
  name: match[1],
  age: parseInt(match[2])
}));

console.log(people);
// [
//   { name: 'John', age: 25 },
//   { name: 'Jane', age: 30 },
//   { name: 'Bob', age: 35 }
// ]

// String.replace()
const text4 = 'Hello, World!';
const replaced = text4.replace(/World/, 'JavaScript');
console.log(replaced); // 'Hello, JavaScript!'

// String.replace() with function
const text5 = 'The quick brown fox';
const replaced2 = text5.replace(/\b\w/g, match => match.toUpperCase());
console.log(replaced2); // 'The Quick Brown Fox'

Advanced Replace Operations

// Replace with capturing groups
const text = 'John Doe, Jane Smith';
const swapped = text.replace(/(\w+)\s+(\w+)/g, '$2, $1');
console.log(swapped); // 'Doe, John Smith, Jane'

// Replace with function
const text2 = 'The price is $100 and $200';
const doubled = text2.replace(/\$(\d+)/g, (match, amount) => {
  return `$${parseInt(amount) * 2}`;
});
console.log(doubled); // 'The price is $200 and $400'

// Format phone numbers
function formatPhoneNumber(phone) {
  return phone.replace(/(\d{3})(\d{3})(\d{4})/, '($1) $2-$3');
}

console.log(formatPhoneNumber('1234567890')); // '(123) 456-7890'

// Remove extra whitespace
const text3 = '  Hello    world   !  ';
const cleaned = text3.replace(/\s+/g, ' ').trim();
console.log(cleaned); // 'Hello world !'

Practical Validation Examples

Email Validation

// Comprehensive email validation
const emailRegex = /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;

function validateEmail(email) {
  return emailRegex.test(email);
}

// Test cases
const emails = [
  '[email protected]',
  '[email protected]',
  '[email protected]',
  'invalid.email',
  '@example.com',
  'user@',
  '[email protected]'
];

emails.forEach(email => {
  console.log(`${email}: ${validateEmail(email)}`);
});

Phone Number Validation

// Phone number validation for different formats
const phoneRegexes = {
  us: /^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$/,
  international: /^\+?[1-9]\d{1,14}$/,
  flexible: /^[\+]?[1-9][\d]{0,15}$/
};

function validatePhoneNumber(phone, format = 'us') {
  return phoneRegexes[format].test(phone);
}

// Test cases
const phones = [
  '123-456-7890',
  '(123) 456-7890',
  '123.456.7890',
  '123 456 7890',
  '+1-123-456-7890',
  '1234567890'
];

phones.forEach(phone => {
  console.log(`${phone}: ${validatePhoneNumber(phone)}`);
});

URL Validation

// URL validation
const urlRegex = /^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$/;

function validateURL(url) {
  return urlRegex.test(url);
}

// Extract domain from URL
function extractDomain(url) {
  const domainRegex = /^https?:\/\/(?:www\.)?([^\/]+)/;
  const match = url.match(domainRegex);
  return match ? match[1] : null;
}

// Test cases
const urls = [
  'https://www.example.com',
  'http://example.com/path',
  'https://subdomain.example.co.uk',
  'invalid-url',
  'ftp://example.com'
];

urls.forEach(url => {
  console.log(`${url}: ${validateURL(url)} (domain: ${extractDomain(url)})`);
});

Text Processing Examples

Data Extraction

// Extract data from text
const text = `
Name: John Doe
Email: [email protected]
Phone: (555) 123-4567
Address: 123 Main St, Anytown, NY 12345
`;

// Extract name
const nameMatch = text.match(/Name:\s*(.+)/);
const name = nameMatch ? nameMatch[1] : null;

// Extract email
const emailMatch = text.match(/Email:\s*([^\s]+)/);
const email = emailMatch ? emailMatch[1] : null;

// Extract phone
const phoneMatch = text.match(/Phone:\s*([^\n]+)/);
const phone = phoneMatch ? phoneMatch[1] : null;

// Extract address
const addressMatch = text.match(/Address:\s*(.+)/);
const address = addressMatch ? addressMatch[1] : null;

console.log({ name, email, phone, address });

// Extract all key-value pairs
const keyValueRegex = /(\w+):\s*(.+)/g;
const data = {};
let match;

while ((match = keyValueRegex.exec(text)) !== null) {
  data[match[1].toLowerCase()] = match[2].trim();
}

console.log(data);

Text Cleaning and Formatting

// Text cleaning functions
function cleanText(text) {
  return text
    .replace(/\s+/g, ' ') // Replace multiple spaces with single space
    .replace(/\n\s*\n/g, '\n') // Remove empty lines
    .trim(); // Remove leading/trailing whitespace
}

function formatText(text) {
  return text
    .replace(/\b\w/g, match => match.toUpperCase()) // Capitalize first letter of each word
    .replace(/\s+/g, ' ') // Normalize spaces
    .trim();
}

function removeSpecialChars(text) {
  return text.replace(/[^\w\s]/g, ''); // Keep only word characters and spaces
}

// Example usage
const messyText = '  hello    world   !!!  \n\n  how   are   you?  ';
console.log('Original:', messyText);
console.log('Cleaned:', cleanText(messyText));
console.log('Formatted:', formatText(messyText));
console.log('No special chars:', removeSpecialChars(messyText));

Performance Considerations

Optimizing Regex Performance

// Cache compiled regex objects
const regexCache = new Map();

function getCachedRegex(pattern, flags = '') {
  const key = `${pattern}:${flags}`;
  
  if (!regexCache.has(key)) {
    regexCache.set(key, new RegExp(pattern, flags));
  }
  
  return regexCache.get(key);
}

// Use specific character classes instead of broad ones
// Good: specific character class
const specificRegex = /[a-zA-Z0-9]/;

// Avoid: overly broad character class
const broadRegex = /./;

// Use non-capturing groups when you don't need the captured content
// Good: non-capturing group
const nonCapturing = /(?:https?:\/\/)?(?:www\.)?([^\/]+)/;

// Avoid: capturing groups when not needed
const capturing = /(https?:\/\/)?(www\.)?([^\/]+)/;

// Use anchors to prevent unnecessary backtracking
// Good: anchored regex
const anchoredRegex = /^[a-zA-Z0-9]+$/;

// Avoid: unanchored regex that might backtrack
const unanchoredRegex = /[a-zA-Z0-9]+/;

Benchmarking Regex Performance

// Simple regex performance test
function benchmarkRegex(text, regex, iterations = 10000) {
  const start = performance.now();
  
  for (let i = 0; i < iterations; i++) {
    regex.test(text);
  }
  
  const end = performance.now();
  return end - start;
}

// Test different approaches
const text = 'Hello, World! This is a test string.';
const iterations = 100000;

// Simple match
const simpleRegex = /Hello/;
const simpleTime = benchmarkRegex(text, simpleRegex, iterations);

// Complex match
const complexRegex = /^[A-Z][a-z]+,\s+[A-Z][a-z]+!\s+This\s+is\s+a\s+test\s+string\.$/;
const complexTime = benchmarkRegex(text, complexRegex, iterations);

console.log(`Simple regex: ${simpleTime.toFixed(2)}ms`);
console.log(`Complex regex: ${complexTime.toFixed(2)}ms`);

Best Practices

1. Use Appropriate Regex Complexity

// Good: Simple and clear
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;

// Avoid: Overly complex regex
const complexEmailRegex = /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;

// Use multiple simple regexes when appropriate
function validateEmail(email) {
  const basicFormat = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
  const hasAtSymbol = email.includes('@');
  const hasDomain = email.split('@')[1]?.includes('.');
  
  return basicFormat.test(email) && hasAtSymbol && hasDomain;
}

2. Comment Complex Regex

// Well-commented regex
const phoneRegex = /^
  \(?          # Optional opening parenthesis
  ([0-9]{3})   # Area code (3 digits)
  \)?          # Optional closing parenthesis
  [-. ]?       # Optional separator
  ([0-9]{3})   # Exchange code (3 digits)
  [-. ]?       # Optional separator
  ([0-9]{4})   # Line number (4 digits)
$/x;

// Or use descriptive variable names
const AREA_CODE = '([0-9]{3})';
const EXCHANGE_CODE = '([0-9]{3})';
const LINE_NUMBER = '([0-9]{4})';
const SEPARATOR = '[-. ]?';

const phoneRegex2 = new RegExp(`^\\(?${AREA_CODE}\\)?${SEPARATOR}${EXCHANGE_CODE}${SEPARATOR}${LINE_NUMBER}$`);

3. Handle Edge Cases

// Robust regex with edge case handling
function extractNumbers(text) {
  if (!text || typeof text !== 'string') {
    return [];
  }
  
  const numberRegex = /-?\d+(?:\.\d+)?/g;
  const matches = text.match(numberRegex);
  
  return matches ? matches.map(Number) : [];
}

// Test edge cases
console.log(extractNumbers('')); // []
console.log(extractNumbers(null)); // []
console.log(extractNumbers('No numbers here')); // []
console.log(extractNumbers('123 -456 78.9')); // [123, -456, 78.9]

Summary

Regular expressions are powerful tools for text processing:

Basic Syntax: Use literals or constructors to create regex patterns
Character Classes: Match specific sets of characters efficiently
Quantifiers: Control how many times patterns can match
Anchors: Match positions in text (start, end, word boundaries)
Groups: Capture and reference parts of matches
Assertions: Use lookahead and lookbehind for complex patterns
String Methods: Leverage built-in methods for search and replace
Performance: Optimize regex for better performance
Best Practices: Write clear, maintainable regex patterns

Mastering regular expressions enables you to handle complex text processing tasks efficiently and create robust validation systems for your applications.

This tutorial is part of the JavaScript Mastery series by syscook.dev

Why Regular Expressions Matter in JavaScript​

Learning Objectives​

Basic Regex Syntax​

Creating Regular Expressions​

Basic Pattern Matching​

Character Classes and Quantifiers​

Character Classes​

Quantifiers​

Anchors and Boundaries​

Anchors​

Multiline and Global Flags​

Groups and Capturing​

Capturing Groups​

Non-Capturing Groups​

Backreferences​

Lookahead and Lookbehind Assertions​

Lookahead Assertions​

Lookbehind Assertions​

String Methods with Regex​

Search and Replace​

Advanced Replace Operations​

Practical Validation Examples​

Email Validation​

Phone Number Validation​

URL Validation​

Text Processing Examples​

Data Extraction​

Text Cleaning and Formatting​

Performance Considerations​

Optimizing Regex Performance​

Benchmarking Regex Performance​

Best Practices​

1. Use Appropriate Regex Complexity​

2. Comment Complex Regex​

3. Handle Edge Cases​

Summary​