Introduction to Regular ExpressionsThis section introduces you to regular expressions and describes the symbols and what they mean. When you feel comfortable with regular expressions, you can use Rx Toolkit to help you build, test and debug your regular expressions. For more information, see Using Rx Toolkit. What are regular expressions?Regular expressions are sets of symbols and syntactic elements used to match patterns of text. Regular expressions are not a language or a tool; they are a syntax convention which many languages and tools support. The syntax uses modifiers, metacharacters, anchors, escape characters, quantifiers, and alternation. ![]() How can I use regular expressions?You can use regular expressions to perform text-manipulation tasks, such as searching and replacing text or testing for certain conditions in a text file or data file. Email filtering programs often use regular expressions to sort incoming email by topic or by sender. ![]() How can I create regular expressions?You create regular expressions by determining the pattern you want to isolate in your string or data. Then you build the regular expression using the elements of the syntax. ![]() What do those symbols mean in English?The symbols in regular expressions express concepts concisely. This table introduces some of the types of symbols used in regular expressions. The name of each symbol is a link to further information:
![]() What are modifiers?Modifiers change how a match is performed. These are common modifiers:
To learn how to use modifiers in Rx Toolkit, see Using Rx Toolkit. For more detailed information about modifiers, see perlre in the ActivePerl documentation. ![]() What are metacharacters?All alphanumeric characters match themselves. However, metacharacters match in a generalized fashion. Some metacharacters match single characters. These include:
To learn how to use metacharacters in Rx Toolkit, see Using Rx Toolkit. ![]() What are quantifiers?Quantifiers are metacharacters which specify the number of times a particular character should match. These include:
For example, when you use the asterisk like this,
this matches an "a" followed by any character zero or more times followed by an "e". This matches "alpine" and "apple". If you want to restrict the number of times a character is matched, use the curly parenthesis with a number. For example,
matches any number between zero and nine exactly two times in a row, such as "apple34". If you want to restrict matches to a range, use the curly parenthesis with a range. For example,
matches any number between zero and nine at least three times in a row and no more than five times in a row. This matches "apple123", "apple4321", "apple15243", but not "apple21". To learn how to use quantifiers in Rx Toolkit, see Using Rx Toolkit. ![]() What are anchors?Anchors specify the position where the pattern occurs. For example:
To learn how to use anchors in Rx Toolkit, see Using Rx Toolkit. ![]() What are escape characters?Escape characters help you search for asterisks, question marks, periods, slashes, etc., in a string. Since most of the non-alphanumerical characters are treated as special characters in regular expressions, place a backslash before the character to reverse the meaning of the special character. For example, ".*" finds any character any number of times. But "\.*" finds ellipses of various lengths. The backslash allows you to search for a plain period "\.". ![]() What are backreferences?Backreferences allow you to load the results of a matched pattern into a buffer and then reuse it later in the expression. This allows regular expressions to behave as a search and replace. For example,
finds all instances of "apple", loads them into memory, and then replaces them with "pies, apple and cherry". This technique handles strings of data that change slightly from instance to instance, such as page numbering schemes. ![]() What is alternation?Alternation allows a regular expression to express a logical OR. If you want to search for apple or fruit, you could use the following:
Add parentheses to limit the scope of alternate matches. This is useful when you search for words with two different spellings. For example,
searches for both gray and grey. ![]() What is a character class?Character classes match any character listed inside that class and use square brackets to separate from the rest of the regular expression. For example,
matches "apple" followed by a zero, a one, a two, a three, a four, a five, a six, a seven, an eight, or a nine. You can abbreviate this using a dash. For example,
means the same as the longer regular expression just above. To match "apple" followed by any uppercase or lowercase alphanumeric character, we could write
If we separated those three ranges with a space, to make it easier to read,
this matches "apple" followed by any number between zero and nine, any uppercase or lowercase letter, or a space. More character classes: \w matches any word character ![]() CreditsInformation in the Introduction to Regular Expressions was compiled from Using Regular Expressions, by Stephen Ramsay, at http://etext.lib.virginia.edu/helpsheets/regex.html and from Mastering Regular Expressions by Jeffrey E. F. Friedl, (c)1997, Sebastopol, O'Reilly & Associates. Sending FeedbackYou can send us feedback, request features, and report bugs. For instructions, see our Sending Feedback page.
|