This tutorial assumes...
- ...that
ActivePerl
build 623 or greater is installed on your system. ActivePerl is a free
distribution of the core Perl language. See Komodo's
Installation Guide for
configuration instructions.
- ...that you have a connection to the Internet.
- ...that you are interested in Perl. You don't need to have previous
knowledge of Perl; the tutorial will walk you through a simple program and
suggest some resources for further information.
You have exported a number of email messages to a text file. You want to
extract the name of the sender and the contents of the email, and convert it to
XML format. You intend to eventually transform it to HTML using XSLT. To create an XML file from a text
source file, you will use a Perl program that parses the data and places it
within XML tags. In this tutorial you will:
- Install a Perl
module for parsing text files containing comma-separated
values.
- Open the Perl Tutorial
Project and associated files.
- Analyze parse.pl
the Perl program included in the Tutorial Project.
- Generate output
by running the program.
- Debug the program
using the Komodo debugger.
One of the great strengths of Perl is the wealth of free modules available
for extending the core Perl distribution. ActivePerl includes the Perl Package
Manger (PPM) that makes it easy to browse, download and update Perl modules from
module repositories on the internet. These modules are added to the core
ActivePerl installation.
Komodo Professional Edition includes the Visual Package Manager (VPM), a graphical
interface for PPM.
The Text::CSV_XS Perl module is necessary for this tutorial. If you are running
Komodo Professional on Windows or Linux, you can install this module using the
Visual Package Manager (VPM). Users of Mac OS X and/or Komodo Personal must install
it using the Perl Package Manager (PPM).
To install the module using VPM:
- Select Tools|Visual Package Manager,
or click the VPM button on the Toolbar. The VPM
Install tab opens in a browser.
- In the Search field, enter:
Text::CSV_XS
- Click Search. Modules that match the search
criteria are displayed in a list in the lower part of the screen.
Note: If Text::CSV_XS is already installed, VPM will
not show the module in the search result. Visit the
Remove tab to check if it is installed.
- Select the check box next to Text-CSV_XS and click Install.
VPM connects to the default repository, downloads the necessary files and
installs them.
The Text::CSV_XS Perl module is necessary for this tutorial. To install it
using PPM:
- Open the Run Command dialog box. Select Tools|Run Command.
- In the Run field, enter the command:
ppm install Text::CSV_XS
- Click the Run button to run the command. PPM connects
to the default repository, downloads the necessary files and installs them.
- PPM can be run directly from the command line with the
ppm
command. Enter ppm help for more information on command-line
options.
- By default, PPM and VPM access the Perl Package repository at http://ppm.activestate.com. The
ActiveState repository contains binary versions of most packages available from
CPAN, the
Comprehensive Perl Archive Network.
- More information about PPM is available on
ASPN. PPM documentation is also included with your ActivePerl
distribution.
- On Linux and Solaris systems where ActivePerl has been installed by the
super-user (i.e.
root ), most users will not have permissions to
install packages with VPM or PPM. Run ppm as root at the command
line to install packages.
Perl Pointer It is also possible to
install Perl modules without VPM or PPM using the CPAN shell. See the CPAN
FAQ for more information.
|
Select File|Open|Project and choose
perl_tutorial.kpf from the perl_tutorials
subdirectory. The location differs depending on your operating system.
Windows
<komodo-install-directory>\lib\support\samples\perl_tutorials
Linux and Solaris
<komodo-install-directory>/lib/support/samples/perl_tutorials
Mac OS X
<User-home-directory>/Library/Application Support/Komodo/3.x/samples/perl_tutorials
The files included in the tutorial
project are displayed on the Projects tab in the Left Pane. No
files open automatically in the Editor Pane.
On the Projects tab,
double-click the files parse.pl , mailexport.xml
and mailexport.txt . These files will open in the Editor Pane; a tab at the top of
the pane displays their names.
- mailexport.txt This file was generated by exporting the contents of an email folder
(using the email program's own Export function) to a comma-separated text file. Notice that the key to the file
contents are listed on the first line. The Perl program will use this line as a reference when parsing the email
messages.
- parse.pl This is the Perl program that will parse mailexport.txt and generate mailexport.xml.
- mailexport.xml This file was generated by parse.pl, using mailexport.txt as input.
When you run parse.pl (in Generating Output),
this file will be regenerated.
In this step, you will examine the Perl program on a line-by-line basis. Ensure that Line Numbers
are enabled in Komodo (View|View Line Numbers). Ensure that the file "parse.pl"
is displayed in the Komodo Editor Pane.
Line 1 - Shebang Line
- Komodo analyzes this line for hints about what language the file contains
- warning messages are enabled with the "-w" switch
Komodo Tip notice that syntax elements are
displayed in different colors. You can adjust the display options for language elements
in the Preferences dialog box. |
Lines 2 to 4 - External Modules
- these lines load external Perl modules used by the program
- Perl module files have a ".pm" extension; "use strict" uses the "strict.pm" module, part of the
core Perl distribution
- "use Text::CSV_XS" refers to the module installed in Step One
Lines 6 to 7 - Open Files
- input and output files are opened; if the output file does not exist, it is created
- scalar variables, indicated by the "$" symbol, store the files
- "strict" mode (enabled by loading "strict.pm" in line 2) requires that variables be declared
using the format "my $variable"
Perl Pointer scalar variables store "single" items; their symbol ("$") is
shaped like an "s", for "scalar". |
Lines 9 to 13 - Print the Header to the Output File
- "<<" is a "here document" indicator that defines the string to be printed
- the text "EOT" is arbitrary and user-defined, and defines the beginning and end of the string
- the second EOT on line 13 indicates the end of output
- lines 10 and 11 are data that will be printed to the output file
Lines 15 to 16 - Assign Method Call to Scalar Variable
- the result of the method call "new" is assigned to the scalar variable $csv
- the method "new" is contained in the module Text::CSV_XS
({binary => 1}) tells the method to treat the data as binary
Perl Pointer good Perl code is
liberally annotated with comments (indicated by the "#" symbol). |
Lines 18 to 19 - Method "getline"
- the method "getline" is contained in the module Text::CSV_XS, referenced in the $csv scalar variable
- "getline" reads the first line of mailexport.txt (referenced in the $in variable), parses the line into fields,
and returns a reference to the resulting array to the $fields variable
Line 21 - "while" Loop
- the "while" statement is conditional
- the condition is "1', so the program endlessly repeats the loop because the condition is always met
- the logic for breaking out of the loop is on line 25
- the loop is enclosed in braces; the opening brace is on line 21, the closing brace on line 51
Komodo Tip Click on the
minus symbol to the left of line 21. The entire section of nested code
will be collapsed. This is
Code
Folding. |
Komodo Tip click the mouse pointer on line 21. Notice that
the opening brace changes to a bold red font. The closing brace on line 51 is displayed the same way. |
Lines 22 to 25 - Extracting a Line of Input Data
- the "getline" function extracts one line of data from the input file and places it in the $record scalar
variable
- if "getline" returns an empty array, the input file has been fully processed and the program exits the loop
and proceeds to line 52
Perl Pointer variable arrays store lists of items indexed by
number; their symbol ("@") is shaped like an "a", for "array". |
Lines 27 to 31 - "foreach"
- "foreach" cycles through the elements stored in the @$record array
- the regular expressions on lines 29 and 30 find the characters "<" and "&", and replace them with
their character entity values ("<" and "&" are reserved characters in XML)
Lines 33 to 35 - hash slice
- line 35 combines the @$record array with the field reference generated in line 19
Perl Pointer variable hashes are indicated by the symbol "%",
and store lists of items indexed by string. |
Lines 37 to 50 - Writing Data to the Output File
- one line at a time, lines from the input file are processed and written to the output file
- portions of the data line (stored in the $record scalar variable) are extracted based on the
corresponding text in the field reference (the first line in the input file, stored in the $fields variable)
Line 51 - Closing the Processing Loop
- at line 51, processing will loop back to the opening brace on line 21
- the logic to exit the loop is on line 25
Lines 52 to 54 - Ending the Program
- line 52 prints the closing tag to the XML file
- line 53 closes the output file or, if it cannot, fails with the error
"Can't write mailexport.xml"
- line 54 closes the input file (it is not necessary to check the status
when closing the input file because this only fails if the program contains
a logic error.)
To start, you will simply generate the output by running the program through the debugger without setting any
breakpoints.
- Clear the contents of mailexport.xml Click on the "mailexport.xml" tab in the Editor Pane.
Delete the contents of the file - you will regenerate it in the next step. Save the file.
- Run the Debugger Click on the "parse.pl" tab in the editor. From the menu, select
Debug|Go/Continue. In the
Debugging Options dialog box,
click OK to accept the defaults.
- View the contents of mailexport.xml Click on the "mailexport.xml" tab in the editor.
Komodo informs you that the file has changed. Click OK to reload the file.
In this step you'll add breakpoints to the program and "debug" it. Adding
breakpoints lets you to run the program in chunks, making it possible to watch
variables and view output as it is generated. Before you begin, ensure that line
numbering is enabled in Komodo (View|View Line Numbers).
- Set a breakpoint: On the "parse.pl" tab, click in the grey
margin immediately to the left of the code on line 9 of the program. This will
set a breakpoint, indicated by a red circle.
- Run the Debugger: Select Debug|Go/Continue.
In the
Debugging
Options dialog box, click OK to accept the defaults. The debugger will process
the program until it encounters the first breakpoint.
Komodo Tip Debugger commands can be accessed
from the Debug menu, by shortcut keys, or from the Debug Toolbar. For a summary of debugger
commands, see Debugger
Command List. |
- Watch the debug process: A yellow arrow on the
breakpoint indicates the position at which the debugger has halted.
Click on the "mailexport.xml" tab. Komodo informs you that the file has changed.
Click OK to reload the file.
- View variables: In the
Bottom Pane,
see the Debug tab. The variables "$in" and "$out" appear in the
Locals tab.
- Line 9 - Step In: Select Debug|Step In.
"Step In" is a debugger command that causes the debugger to execute the
current line and then stop at the next processing line (notice that the lines
between 9 and 16 are raw output indicated by "here" document markers).
- Line 16 - Step In: On line 16, the processing transfers to
the module Text::CSV_XS. Komodo opens the file CSV_XS.pm and stops the debugger
at the active line in the module.
- Line 61 - Step Out: Select Debug|Step Out. The Step
Out command will make the debugger execute the function in Text::CSV_XS and
pause at the next line of processing, which is back in parse.pl on line 19.
- Line 19 - Step Over: Select Debug|Step Over. The
debugger will process the function in line 19 without opening the module
containing the "getline" function.
Komodo Tip What do the debugger commands
do?
- Step In executes the current line of code and pauses at the following line.
- Step Over executes the current line of code. If the line of code calls a
function or method, the function or method is executed in the background and the debugger
pauses at the line that follows the original line.
- Step Out when the debugger is within a function or method, Step Out will execute
the code without stepping through the code line by line. The debugger will stop on the line of
code following the function or method call in the calling program.
|
- Line 21 - Set Another Breakpoint: After the debugger stops
at line 21, click in the grey margin immediately to the left of the code on line
22 to set another breakpoint.
- Line 21 - Step Out: It appears that nothing happened. However, the
debugger actually completed one iteration of the "while loop" (from lines 21 to 51).
To see how this works, set another breakpoint at line 37, and Step Out again.
The debugger will stop at line 37. On the Debug Session tab, look at the data
assigned to the $record variable. Then Step Out, and notice that $record is no
longer displayed, and the debugger is back on line 21. Step Out again, and
look at the $record variable - it now contains data from the next record
in the input file.
- Line 37 - Stop the Debugger: Select Debug|Stop to stop
the Komodo debugger.
Perl Pointer Did you notice that output wasn't written to
mailexport.xml after every iteration of the while loop? This is because Perl maintains an internal buffer
for writing to files. You can set the buffer to "autoflush" using the special Perl variable "$|". |
ASPN, the ActiveState Programmer Network
ASPN, the ActiveState Programmer Network,
provides extensive resources for Perl programmers:
- Free downloads of ActivePerl, ActiveState's Perl distribution
- Searchable Perl documentation
- Trial versions of Perl tools, like the Perl Dev Kit and Visual Perl
- The Rx Cookbook, a collaborative library of regular expressions for Perl
Documentation
There is a wealth of documentation available for Perl. The first source for
language documentation is the Perl distribution installed on your system. To
access the documentation contained in the Perl distribution, use the following
commands:
- Open the Run Command dialog box (Tools|Run
Command), and then type
perldoc perldoc . A description of
the "perldoc" command will be displayed on your screen. Perldoc is used to
navigate the documentation contained in your Perl distribution.
Tutorials and Reference Sites
There are many Perl tutorials and beginner Perl sites on the Internet, such as:
|