Komodo User Guide

Introduction to Regular Expressions

This section introduces you to regular expressions and describes the symbols and what they mean.

When you feel comfortable with regular expressions, you can use Rx Toolkit to help you build, test and debug your regular expressions. For more information, see Using Rx Toolkit.

What are regular expressions?

Regular expressions are sets of symbols and syntactic elements used to match patterns of text. Regular expressions are not a language or a tool; they are a syntax convention which many languages and tools support.

The syntax uses modifiers, metacharacters, anchors, escape characters, quantifiers, and alternation.

Top

How can I use regular expressions?

You can use regular expressions to perform text-manipulation tasks, such as searching and replacing text or testing for certain conditions in a text file or data file. Email filtering programs often use regular expressions to sort incoming email by topic or by sender.  

Top

How can I create regular expressions?

You create regular expressions by determining the pattern you want to isolate in your string or data. Then you build the regular expression using the elements of the syntax.

Top

What do those symbols mean in English?

The symbols in regular expressions express concepts concisely. This table introduces some of the types of symbols used in regular expressions. The name of each symbol is a link to further information:

Type of symbol Examples
modifiers /o, /s, /g, /m, /i
metacharacters ., [...], [^...]
quantifiers ?, *, +, {num}
anchors ^, $, \<, \>, \b, \B
escape characters \
backreferences \1
alternation |
character class [0123456789], [A-Z], [a-z]
Top

What are modifiers?

Modifiers change how a match is performed. These are common modifiers:

Modifier Meaning
i Ignore case when matching exact strings.
m Treat string as multiple lines. Allow "^'' and "$'' to match next to newline characters.
s Treat string as single line. Allow ".'' to match a newline character.
x Ignore whitespace and newline characters in the regular expression. Allow comments. 
o Compile regular expression once only.
g Match all instances of the pattern in the target string.

To learn how to use modifiers in Rx Toolkit, see Using Rx Toolkit.

For more detailed information about modifiers, see perlre in the ActivePerl documentation.

Top

What are metacharacters?

All alphanumeric characters match themselves. However, metacharacters match in a generalized fashion. 

Some metacharacters match single characters. These include:

Metacharacter Meaning
. Matches any character
[..] Matches any character listed inside the brackets.
[^...] Matches any character except those characters listed inside the brackets.

To learn how to use metacharacters in Rx Toolkit, see Using Rx Toolkit.

Top

What are quantifiers?

Quantifiers are metacharacters which specify the number of times a particular character should match. These include:

Quantifier Meaning
? Matches any character zero or one times.
* Matches the preceding element zero or more times
+ Matches the preceding element one or more times.
{num} Matches the preceding element num times.
{min, max} Matches the preceding element at least min times, but not more than max times.

For example, when you use the asterisk like this,

a.*e

this matches an "a" followed by any character zero or more times followed by an "e". This matches "alpine" and "apple".

If you want to restrict the number of times a character is matched, use the curly parenthesis with a number. For example,

apple[0-9]{2}

matches any number between zero and nine exactly two times in a row, such as "apple34".

If you want to restrict matches to a range, use the curly parenthesis with a range. For example, 

apple[0-9]{3,5}

matches any number between zero and nine at least three times in a row and no more than five times in a row. This matches "apple123", "apple4321", "apple15243", but not "apple21".

To learn how to use quantifiers in Rx Toolkit, see Using Rx Toolkit.

Top

What are anchors?

Anchors specify the position where the pattern occurs. For example:

Anchor Meaning
^ Matches at the start of a line.
$ Matches at the end of a line.
\< Matches at the beginning of a word.
\> Matches at the end of a word.
\b Matches at the beginning or the end of a word.
\B Matches any character not at the beginning or end of a word.

To learn how to use anchors in Rx Toolkit, see Using Rx Toolkit.

Top

What are escape characters?

Escape characters help you search for asterisks, question marks, periods, slashes, etc., in a string. Since most of the non-alphanumerical characters are treated as special characters in regular expressions, place a backslash before the character to reverse the meaning of the special character.

For example, ".*" finds any character any number of times. But "\.*" finds ellipses of various lengths. The backslash allows you to search for a plain period "\.".

Top

What are backreferences?

Backreferences allow you to load the results of a matched pattern into a buffer and then reuse it later in the expression. This allows regular expressions to behave as a search and replace. 

For example, 

s/\(apple)/pies, \1  and cherry

finds all instances of "apple", loads them into memory, and then replaces them with "pies, apple and cherry".

This technique handles strings of data that change slightly from instance to instance, such as page numbering schemes.

Top

What is alternation?

Alternation allows a regular expression to express a logical OR. If you want to search for apple or fruit, you could use the following:

apple|fruit

Add parentheses to limit the scope of alternate matches. This is useful when you search for words with two different spellings. For example,

gr(a|e)y

searches for both gray and grey.

Top

What is a character class?

Character classes match any character listed inside that class and use square brackets to separate from the rest of the regular expression. For example,

apple[0123456789]

matches "apple" followed by a zero, a one, a two, a three, a four, a five, a six, a seven, an eight, or a nine. You can abbreviate this using a dash. For example,

apple[0-9]

means the same as the longer regular expression just above. To match "apple" followed by any uppercase or lowercase alphanumeric character, we could write

apple[0-9A-Za-z]

If we separated those three ranges with a space, to make it easier to read,

apple[0-9 A-Z a-z]

this matches "apple" followed by any number between zero and nine, any uppercase or lowercase letter, or a space.

More character classes:

\d matches any digit

\w matches any word character

Top

Credits

Information in the Introduction to Regular Expressions was compiled from Using Regular Expressions, by Stephen Ramsay, at http://etext.lib.virginia.edu/helpsheets/regex.html and from Mastering Regular Expressions by Jeffrey E. F. Friedl, (c)1997, Sebastopol, O'Reilly & Associates.

Sending Feedback

You can send us feedback, request features, and report bugs. For instructions, see our Sending Feedback page.