Komodo User Guide

Using Rx Toolkit

Komodo includes Rx Toolkit, a tool to help you build, edit, and debug your regular expressions. This section describes how to use Rx Toolkit to work with your regular expressions.

Note - This release only supports Perl regular expressions.

If you are new to Regular Expressions, see Regular Expressions Primer.

Starting Rx Toolkit

To start Rx Toolkit, do one of the following:

  • In the Standard toolbar, click Rx.
    or
  • From the Tools menu select Rx Toolkit.
Top

Getting Acquainted with Rx Toolkit Window

Take a Visual Tour of the Rx Toolkit

To do this Do this
create and edit your regular expression enter your text in the Regular Expression box, which is the top box in the Rx Toolkit
apply one or more metacharacters to your regular expression  enter your metacharacters in the Regular Expression box, which is the top box in the Rx Toolkit
apply one or more modifiers to your regular expression  select one or more modifiers from the list under the Regular Expression box
test your regular expression against a string enter the test string in the Test String box, which is the second box in the Rx Toolkit
step or run forward through your regular expression click Step Forward or Run Forward
step or run backward through your regular expression click Step Back or Run Backward
see the strings that match a group variable click Advanced
open this page in the Komodo Help click Help
close Rx Toolkit click X
or click Rx Toolkit in the toolbar
or from the Tools menu select Rx Toolkit

Top

Creating Regular Expressions

Use the features of Rx Toolkit to create and edit your regular expressions. This section describes:

Entering Your Regular Expression

To enter your regular expression:

  • In the Regular Expression box type your regular expression. 

Your regular expression can include metacharacters, anchors, quantifiers, digits, and alphanumeric characters.

Note - Do not enclose your regular expression in forward slashes "/". Rx Toolkit does not use enclosing slashes.

Top

Viewing Node Highlighting

Rx Toolkit features node highlighting to help you create your regular expressions.

When you hover over a node in your regular expression, Rx Toolkit highlights the node in green.

For example, if the regular expression is (.*)(\d+), when you hover over the \d, Rx Toolkit highlights this in green.

If you hover over the modifier or metacharacter for a node, the symbol is highlighted in green and the part that it modifies is highlighted in grey.

For example, if the regular expression is (.*)(\d+), when you hover over the + part of \d+, Rx Toolkit highlights the + in green and highlights the \d in grey.

If you hover over the parentheses or braces that indicate a group match, Rx Toolkit highlights the parentheses or braces in green and highlights their contents in grey. Rx Toolkit also highlights the group match in the test string.

For example, if the regular expression is (.*)(\d+), when you hover over the ( or the ) part of (.*), Rx Toolkit highlights the ( and ) in green and highlights the .* in grey. Rx Toolkit also highlights "testing12" in the test string "testing123".

Top

Viewing Node Tips

Rx Toolkit features node tips to help you create your regular expressions.

Rx Toolkit also provides node tips in the Rx Toolkit status bar below the Regular Expression box to help you design an effective regular expression. You can view these node tips as you hover over your regular expression.

For example, if the regular expression is (.*)(\d+) and you hover over each character in order from left to right:

when you hover over the Rx Toolkit status bar says
( Capture group $1
. Match any one character
* Match <.> 0 or more times
) Capture group $1
( Capture group $2
\d Match any numeric character
+ Match <\d> 1 or more times
) Capture group $2
Top

Adding Valid Metacharacters to Your Regular Expression

The Shortcuts menu provides a list of all the metacharacters and metasymbols that are valid at the current cursor position in your regular expression. This menu lists the metacharacter and a brief description of the metacharacter. When you move the cursor position, this list changes to reflect only the valid metacharacters for that current cursor position.

To add a valid metacharacter to your regular expression:

  1. Click Shortcuts to the right of the top box.
  2. From the list of valid metacharacters for that point in your regular expression, select the desired metacharacter for your pattern.
  3. Repeat as needed.

or

  1. Click inside your regular expression.
  2. Type a backslash "\" and your selected metacharacter.
Top

Adding Modifiers to Your Regular Expression

To add a modifier to your regular expression:

  • From the Modifiers list below the Regular Expression box, select the modifiers you want. You have a choice of:
    • Global - Match all occurrences of the specified regular expression. Use this when you want Rx Toolkit to cycle through several repetitions in your test string.
    • Multi-line Mode - Let caret "^" and dollar "$" match next to newline characters. Use this when your pattern is more than one line long and has at least one newline character.
    • Ignore Case - Ignore alphabetic case distinctions while matching. Use this when you don't want to specify the case in the pattern you're trying to match.
    • Single-line Mode - Let dot "." match newline characters. Use this when your pattern is more than one line long and has at least one newline character.
    • Extended - Ignore whitespace and comments. Use this when you want to pretty print your regular expression or when you want to annotate your regular expression with comments.

Note - Ensure you use the Modifiers checkboxes to add modifiers to your regular expression. Rx Toolkit does not understand modifiers entered in the Regular Expression box.

Top

Viewing Sample Regular Expressions

This section shows some sample regular expressions with different modifiers applied, including:

Using Global

The Global modifier matches all occurrences of the specified regular expression. Use this when you want Rx Toolkit to cycle through several repetitions in your test string.

To match the following test string

testing123
foobar75

you could use the following regular expression with Global selected

(.*?)(\d+)

Click Advanced to view your Group Match Variables output, which would look like this

$1 (match1)        testing
$2 (match1)        123
$1 (match2)        foobar
$2 (match2)        75

Discussion

This regular expression matches the entire test string.

The .* matches any character zero or more times, the ? tells * to not be greedy, which means (.*?) matches the words "testing" and "foobar". The (\d+) matches any digits one or more times, which matches the numbers "123" and "75". The Global modifier means that Rx Toolkit matches more than the first occurrence of the pattern. That's why Rx Toolkit matched $1 and $2 twice. 

Top

Using Multi-line Mode

The Multi-line Mode modifier allows ^ and $ to match next to newline characters. Use this when your pattern is more than one line long and has at least one newline character.

To match the subject part of the following test string

"okay?"

you could use the following regular expression with Multi-line Mode selected

^(\"okay\?\")

Click Advanced to view your Group Match Variables output, which would look like this

$1        "okay?"

Discussion

This regular expression matches the entire test string.

The ^ matches the beginning of any line. The \" matches the double quotes in the test string """. The match matches the literal word "okay". The \? matches the question mark "?". The \" matches the terminal double quotes """. There is only one variable group in this regular expression, and it contains the entire test string.

Top

Using Ignore Case

The Ignore Case modifier ignores alphabetic case distinctions while matching. Use this when you don't want to specify the case in the pattern you're trying to match.

To match the following test string

Testing123

you could use the following regular expression with Ignore Case selected

^([a-z]+)(\d+)

Click Advanced to view your Group Match Variables output, which would look like this

$1 (match1)        Testing
$2 (match1)        123

Discussion

This regular expression matches the entire test string.

The ^ matches the beginning of a string. The [a-z] matches any lowercase letter from "a" to "z". The + matches any lowercase letter from "a" to "z" one or more times. The Ignore Case modifier allows the regular expression to match any uppercase or lowercase letters. Therefore  ^([a-z]+) matches "Testing". The (\d+) matches any digit one or more times, so it matches "123".

Top

Using Single-line Mode

The Single-line modifier mode allows . to match newline characters. Use this when your pattern is more than one line long, has at least one newline character, and you want to match newline characters.

To match the following test string

Subject: Why did this
work?

you could use the following regular expression with Single-line Mode selected

(:[\t ]+)(.*)work\?

Click Advanced to view your Group Match Variables output, which would look like this

$1        :<space>
$2        Why did this <newline>

Discussion

This regular expression matches everything in the test string following the word Subject, including the colon and the question mark.

The (\s+) matches any space one or more times, so it matches the space after the colon. The (.*) matches any character zero or more times, and the single-line modifier allows the period to match the newline character. Therefore (.*) matches "Why did this <newline> match". The \? matches the terminal question mark "?".

Top

Using Extended

The Extended modifier ignores whitespace and comments in the regular expression. Use this when you want to pretty print your regular expression or when you want to annotate your regular expression with comments.

To match the following test string

testing123

you could use the following regular expression with Single-line Mode selected

(.*?)  (\d+)  # this matches testing123

Click Advanced to view your Group Match Variables output, which would look like this

$1        testing   
$2        123

Discussion

This regular expression matches the entire test string.

The .* matches any character zero or more times,  the ? makes the * not greedy, and the extended modifier ignores the spaces after the (.*?). Therefore, (.*?) matches "testing" and populates the variable $1. The (\d+) matches any digit one or more times, so this matches "123" and populates the variable $2. The extended modifier ignores the spaces after (\d+) and ignores the comments at the end of the regular expression. 

Top

Using Multi-line Mode and Single-line Mode

To match more of the following test string

Subject: Why did this
work?

you would need both Multi-line Mode and Single-line Mode selected for this regular expression

([\t ]+)(.*)^work\?

Click Advanced to view your Group Match Variables output, which would look like this

$1        <space>
$2        Why did this <newline>

Discussion

This regular expression matches everything in the test string following the word Subject, including the colon and the question mark.

The ([\t ]+) matches a Tab character or a space one or more times, which matches the space after the colon. The (.*) matches any character zero or more times, which matches "Why did this <newline>". The ^match matches the literal "work" on the second line. The \? matches the terminal question mark "?".

If you used only the Single-line Mode modifier, this match would fail because the caret "^" would only match the beginning of a string.

If you used only the Multi-line Mode modifier, this match would fail because the period "." would not match the newline character.

Top

Using Multi-line Mode, Ignore Case and Global

If you want to match the contents of a standard email header, you would compose your regular expression and apply three modifiers: multi-line, ignore case, and global.

To match more of the following test string

From: joe@domain.com
To: sue@domain2.com
Cc: bob@domain3.com
Subject: Use modifiers in regular expressions

you could use for this regular expression with Multi-line Mode, Ignore Case, and Global selected

^([a-z]+):\s+(.+?)$

Click Advanced to view your Group Match Variables output, which would look like this

$1 (match1)        From
$2 (match1)        joe@domain.com
$1 (match2)        To
$2 (match2)        sue@domain2.com
$1 (match3)        Cc
$2 (match3)        bob@domain3.com
$1 (match4)        Subject
$2 (match4)        Use modifiers in regular expressions

Discussion

This regular expression matches each part of the email header.

The ^ matches the beginning of any line. The ([a-z]+) matches any lowercase letter one or more times. The : matches a colon and the \s+ matches a space one or more times. The (.+?) matches any character one or more times. The $ matches the end of a line.

Multi-line Mode allows caret "^" to match the beginning of any line. Without the multi-line mode modifier, caret "^" matches the beginning of a string.

Ignore Case allows Rx Toolkit to match any lower or uppercase letter. Without the ignore case modifier, there would be no group match variables.

Global allows Rx Toolkit to keep matching after the first match is found. Without global, Rx Toolkit only matches the first line, "From: joe@domain.com".

Top

Evaluating Regular Expressions

A debugged regular expression correctly matches the patterns you intend and provides information about which variable contains which pattern. 

If there's a match...

If your regular expression matches the test string:

  • the Rx Toolkit status bar shows "Match succeeded"
  • Komodo highlights the green light in the Rx Toolkit status bar
  • Komodo highlights your entire test string in green

If there's no match...

If your regular expression does not match the test string:

  • the Rx Toolkit status bar shows the "No matches found", then shows details of the error
  • Komodo highlights the red light in the Rx Toolkit status bar
  • if there are syntax errors in your regular expression, Komodo underlines the errors with red squiggles
Top

Evaluating Your Regular Expression on a Node by Node Basis
(Moving Forward and Backward through Your Regular Expression)

Perl compiles small bits of a regular expression into nodes. Perl then connects nodes together into a graph which Perl's regular expression engine interprets to perform the match.

For example, the regular expression "abc*" contains three nodes:

ab - an exact node, which matches the exact string "ab"
* - a star quantifier node, which matches zero or more occurrences of its child node, which happens to be an exact string node which matches "c"
c - an exact node, which matches the exact string "c"

You can use the buttons below the Test String box to move through your regular expression and test your match on a node by node basis instead of all at once. As you move forward and backward through your regular expression, Komodo highlights the regular expression and the part it matches in the Test String box. Komodo also populates the Group Match Variables pane with the results of variables such as $1, $2.

To step forward or backward through one node of your regular expression:

  • Click Step Forward or Step Back.

To run forward to the end or backward to the beginning of your regular expression:

  • Click Run Forward or Run Backward.
Top

Viewing the Group Matched Variables Output

If your regular expression collects several words or numbers and stores them in variables such as $1, $2, you can use the Group Matched Variables output to view what part of your pattern the variable matches.

To view the variables output:

  • In Rx Toolkit, click Advanced.
    The window expands vertically and the Group Match Variables pane appears.

As you debug your output appears in the Group Match Variables pane. The left column lists variables by name and the right column lists the values of the variables. If you click a variable name or variable value, the corresponding pattern in the test string highlights and the corresponding part of your regular expression also highlights. Review your output carefully. When you try to match a pattern you may match a longer or shorter pattern than you expect.

Top

Closing Rx Toolkit

To close Rx Toolkit, click the X in the top right corner.