Perl Option Requires An Argumentative Essay

Chapter 16

Command-Line Options


CONTENTS

Today's lesson describes the options you can specify to control how your Perl program operates. These options provide many features, including those that perform the following tasks:

  • Check syntax
  • Print warnings
  • Use preprocessor commands
  • File editing
  • Change the "end of input line" marker

Today's lesson begins with a description of how to supply options to your Perl program.

Specifying Options

There are two ways to supply options to a Perl program:

  • On the command line, when you enter the command that starts your Perl program
  • On the first line of your Perl program

The following sections describe these methods of supplying options.

Specifying Options on the Command Line

One way to specify options for a Perl program is to enter them on the command line when you enter the command that starts your program.

The syntax for specifying options on the command line is

perl options program

Here, is the name of the Perl program you want to run, and is the list of options you want to supply to the program.

For example, the following command runs the Perl program named and passes it the options and . (You'll learn about these and other options later today.)

$ perl -s -w test1

Some options need to be specified along with a value. For example, the option requires an integer to be passed with it:

$ perl -0 26 test1

Here, the integer is associated with the option .

If you want, you can omit the space between the option and its associated value, as in the following:

$ perl -026 test1

As before, this command associates with the option. In either case, the value associated with an option must always immediately follow the option.

NOTE
If an option does not require an associated value, you can put another option immediately after it without specifying an additional character or space. For example, the following commands are equivalent:
You can put an option that requires a value as part of a group of options, provided that it is last in the group. For example, the following commands are equivalent:

Specifying an Option in the Program

Another way to specify a command option is to include it as part of the header comment for the program. For example, suppose that the first line of your Perl program is this:

#!/usr/local/bin/perl -w

In this case, the option is automatically specified when you start the program.

Perl 4 enables you to specify only one option (or group of options) on the header comment line. This means that the following line generates an "unrecognized switch" error message:
Perl 5 enables as many switches as you like on the command line. However, some operating systems chop the header line after 32 characters, so be careful if you are planning to use a large number of switches
NOTE
Options specified on the command line override options specified in the header comment. For example, if your header comment is
and you start your program with the command
the program will run with the option specified but not the option

The Option: Printing the Perl Version Number

The option enables you to find out what version of Perl is running on your machine. When the Perl interpreter sees this option, it prints information on itself and then exits without running your program.

This means that if you supply a command such as the following, the file is not executed:

$ perl -v test1

Here is sample output from the command:

This is perl, version 5.001 Unofficial patch level 1m Copyright (c) 1987-1994, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5.0 source kit.

The only really useful things here, besides the copyright notice, are the version number of the Perl you are running-in this case, 4.0-and the patch level, which indicates how many repairs, or patches, have been made to this version. Here, the patch level is (which, at this writing, is the latest release of Perl version 4.0).

No other options should be specified if you specify the option, because none of them would do anything in this case anyway.

The Option: Checking Your Syntax

The option tells the Perl interpreter to check whether your Perl program is correct without actually running it. If it is correct, the Perl interpreter prints the following message (in which is the name of your program) and then exits without executing your program:

filename syntax OK

If the Perl interpreter detects errors, it displays them just as it normally does. After printing the error messages, it prints the following message, in which is the name of your program:

filename had compilation errors

Again, there is no point in supplying other options if you specify the option because the Perl interpreter isn't actually running the program; the only exception is the option, which prints warnings. This option is described in the following section.

The Option: Printing Warnings

As you have seen on the preceding days, some mistakes are easy to make when you are writing a Perl program, such as accidentally typing the wrong variable name, or using when you really mean to use . Because certain mistakes crop up frequently, the Perl interpreter provides an option that checks for them.

This option, the option, prints a warning every time the Perl interpreter sees something that might cause a problem. For example, if the interpreter sees the statement

$y = $x;

and hasn't seen before (which means that is undefined), it prints a warning message in the following form if you are running Perl 4:

Possible typo: "x" at filename line linenum.

Here, is the name of your Perl program, and is the number of the line on which the interpreter has detected a potential problem.

If you are running Perl 5, the message is similar, but also includes the name of the current package:

Identifier "main::x" used only once: possible typo at filename line linenum.

For more information on packages, see Day 19, "Object-Oriented Programming in Perl."

The following sections provide a partial list of the potential problems detected by the option. (If you are running Perl 5, the option provides dozens of useful warnings. Consult the Perl manual pages for a complete list.)

NOTE
The option can be combined with the option to provide a means of checking your syntax for errors and problems before you actually run the program

Checking for Possible Typos

As you have seen, a statement such as the following one leads to a warning message if has not been previously defined:

$y = $x;

The "possible typo" error message also appears in the following circumstances, among others:

  • If a variable is assigned to but is never used again
  • If a file variable is referred to without being specified in an statement

Of course, the possible-typo message might flag lines that don't actually contain typos. Following are two of the most common situations in which a possible typo actually is correct code:

  • The Perl 4 interpreter sometimes confuses a print format specifier with a file variable and claims that the name of the print format specifier is a possible typo. For example, the statement
format BLANK = .
  • (which enables you to print a blank line on a formatted page) might generate the warning message
Possible typo: "BLANK" at file1 line 26.
  • This warning message might appear even if the print format is actually used in the program, because it is specified by a statement such as
$~ = "BLANK";
  • and the Perl interpreter doesn't realize that the string refers to the print format.
  • The Perl 5 interpreter does not generate this warning message.
  • If you call a function that returns a list, and you need only an element of the list, one way to extract that single element is to assign the other elements to dummy variables. For example, if you want to retrieve just the group ID when you call , you can do so as shown here:
($d1, $d2, $groupid) = getgrnam ($groupname);
  • Here, the scalar variables and are dummy variables that hold the elements of the group file entry that you do not need. If (as is likely) and are not referred to again, the option treats and as possible typos.

Checking for Redefined Subroutines

One useful feature of the option is that it checks whether two subroutines of the same name have been defined in the program. (Normally, if the Perl interpreter sees two subroutines of the same name, it quietly replaces the first subroutine with the second one and carries on.)

If, for example, two subroutines named are defined in a program, the option prints a message similar to the following one:

Subroutine x redefined at file1 line 46.

The line number specified is the line that starts the second subroutine.

When the option has detected this problem, you can decide which subroutine to rename or throw away.

Checking for Incorrect Comparison Operators

Another really helpful feature of the option is that it checks whether you are trying to compare a string using the operator.

In a statement such as the following:

if ($x == "humbug") { ... }

the conditional expression

$x == "humbug"

is equivalent to the expression

$x == 0

because all character strings are converted to when used in a numeric context (a place where a number is expected). This is correct in Perl, but it is not likely to be what you want.

If the option is specified and the Perl interpreter sees a statement such as this one, it prints a message similar to the following if you are running Perl 4:

Possible use of == on string value at file1 line 26.

In Perl 5, the following warning is printed:

Argument "humbug" isn't numeric for numeric eq at file1 line 26.

In either case, this warning enables you detect these incorrect operators and replace them with operators, which compare strings.

The operator doesn't detect the opposite problem, namely:
In this case, the Perl interpreter converts to the string and performs a string comparison.
Because a number and its string equivalent usually mean the same thing, this normally doesn't cause a problem. Watch out, though, for octal numbers in string comparisons, as in the following example:
Here, the octal value is converted to the number 38 before being converted to a string. If you really want to compare to , this code will not produce the results you expect.
Another thing to watch out for is this: In Perl 4, the option does not check for conditional expressions such as the following:
because there are many cases in Perl in which the assignment operator belongs inside a conditional expression. You will have to manually check that you are not specifying (assignment) when you really mean to use (equality comparison).
Perl 5 flags this with the following message:

The Option: Executing a Single-Line Program

The option enables you to execute a Perl program from your shell command line. For example, the command

$ perl -e "print ('Hello');"

prints the following string on your screen:

Hello

You can also specify multiple options. In this case, the Perl statements are executed left to right. For example, the command

$ perl -e "print ('Hello');" -e "print (' there');"

prints the following string on your screen:

Hello there

By itself, the option is not all that useful. It becomes useful, however, when you use it in conjunction with some of the other options you'll see in today's lesson.

You can leave off the closing semicolon in a Perl statement passed via the option, if you want to:
If you are supplying two or more options, however, the Perl interpreter strings them together and treats them as though they are a single Perl program. This means that the following command generates an error because there must be a semicolon after the statement specified with the first option:

The Option: Supplying Your Own Command-Line Options

As you can see from this chapter, you can control the behavior of Perl by specifying various command-line options. You can control the behavior of your own Perl programs by spec-ifying command-line options for them too. To do this, specify the option when you call the program.

Here's an example of a command that passes an option to a Perl program:

$ perl -s testfile -q

This command starts the Perl program and passes it the option.

To be able to pass options to your program, you must specify the Perl option. The following command does not pass as an option:
In this case, is just an ordinary argument that is passed to your program and stored in the built-in array variable .
The easiest way to remember to include is to specify it as part of your header comment:
This ensures that your program always will check for options. (Unless, of course, you override the option check by providing other Perl options on the command line when you invoke the program.

If an option is specified when you invoke your Perl program, the scalar variable whose name is the same as the option is automatically set to 1 before program execution begins. For example, if a Perl program named is called with the option, as in the following, the scalar variable is automatically set to 1:

$ perl -s testfile -q

You then can use this variable in a conditional expression to test whether the option has been set.

NOTE
If is treated as an option, it does not appear in the system variable . A command-line argument either sets an option or is added to

Options can be longer than a single character. For example, the following command sets the value of the scalar variable to 1:

$ perl -s testfile -potato

You also can set an option to a value other than 1 by specifying and the desired value on the command line:

$ perl -s testfile -potato="hot"

This line sets the value of to .

Listing 16.1 is a simple example of a program that uses command-line options to control its behavior. This program prints information about the user currently logged in.


Listing 16.1. An example of a program that uses command-line options.
1: #!/usr/local/bin/perl -s 2: 3: # This program prints information as specified by 4: # the following options: 5: # -u: print numeric user ID 6: # -U: print user ID (name) 7: # -g: print group ID 8: # -G: print group name 9: # -d: print home directory 10: # -s: print login shell 11: # -all: print everything (overrides other options) 12: 13: $u = $U = $g = $G = $d = $s = 1 if ($all); 14: $whoami = "whoami"; 15: chop ($whoami); 16: ($name, $d1, $userid, $groupid, $d2, $d3, $d4, 17: $homedir, $shell) = getpwnam ($whoami); 18: print ("user id: $userid\n") if ($u); 19: print ("user name: $name\n") if ($U); 20: print ("group id: $groupid\n") if ($g); 21: if ($G) { 22: ($groupname) = getgrgid ($groupid); 23: print ("group name: $groupname\n"); 24: } 25: print ("home directory: $homedir\n") if ($d); 26: print ("login shell: $shell\n") if ($s);

$ program16_1 -U -d user name: dave home directory: /ag1/dave $

The header comment in line 1 specifies that the option is to be automatically specified when this Perl program is invoked. This ensures that options can always be passed to this program (unless, of course, you override the option on the command line, as described earlier).

The comments in lines 3-11 provide information on what options the program supports. This information is useful when someone is reading or modifying the program because there is no other way to tell which scalar variables are used to test options.

The option indicates that the program is to print everything; if this option is specified, the scalar variable is set to 1. To cut down on the number of comparisons later, line 13 checks whether is 1; if it is, the other scalar variables corresponding to command-line options are set to 1. This technique ensures that the following commands are equivalent (assuming that your program is named ):

$ program16_1 -all $ program16_1 -u -U -g -G -d -s

The scalar variables listed in line 13 can be assigned to, even though they correspond to possible command-line options, because they behave just like other Perl scalar variables.

Lines 14-17 provide the raw material for the various print operations in this program. To start, when the Perl interpreter sees the string , it calls the system command , which returns the name of the user running the program. This name is then passed to , which searches the password file and retrieves the entry for this particular user.

Line 18 checks whether the option has been specified. To do this, it checks whether has a nonzero value. If it does, the user ID is printed. (The user ID is also printed if has been specified because line 13 sets to a nonzero value in this case.)

Similarly, line 19 prints the user name if has been specified, line 20 prints the group ID if has been specified, line 25 prints the home directory if has been specified, and line 26 prints the filename of the login shell if has been specified.

Lines 21-24 check whether to print the group name. If has been specified, is nonzero, and line 22 calls to retrieve the group name.

NOTE
Because command-line options can change the initial values of scalar variables, it is a good idea to always assign a value to a scalar variable before you use it. Consider the following example:
This program normally prints the numbers from 0 to 9 because is assumed to have an initial value of 0. However, if this program is called with the option, the initial value of becomes something other than 0, and the program behaves differently.
If you add the following statement before the loop, the program always prints the numbers 0 to 9 regardless of what options are specified on the command line:

The Option and Other Command-Line Arguments

You can supply both options and command-line arguments to your program (provided that you supply the option to Perl). These are the rules that the Perl interpreter follows:

  • Any arguments immediately following the program name that start with a are assumed to be options.
  • Any argument that does not start with a is assumed to be an ordinary argument and not an option.
  • When the Perl interpreter sees an argument that is not an option, all subsequent arguments are also treated as ordinary arguments, not options, even if they start with a .

This means, for example, that the following command treats as an option to , and and as ordinary arguments:

$ perl -s testfile -w foo -e

The special argument also indicates "end of options." For example, the following command treats as an option and as an ordinary argument. The is thrown away.

$ perl -s testfile -w - -e

The Option: Using the C Preprocessor

The C preprocessor is a program that takes code written in the C programming language and searches for special preprocessor statements. In Perl, the option enables you to use this preprocessor with your Perl program:

$ perl -P myprog

Here, the Perl program is first run through the C preprocessor. The resulting output is then passed to the Perl interpreter for execution.

NOTE
Perl provides no way to just run the C preprocessor on a Perl program. To do this, you'll need a C compiler that provides an option which specifies "preprocessor only."
Refer to the documentation for your C compiler for details about how to do this

The following sections describe some of the most commonly used C preprocessor commands.

The C Preprocessor: A Quick Overview

C preprocessor statements always employ the following syntax:

#command value

Each C preprocessor statement starts with a character. is the preprocessor operation to perform, and is the (optional) value associated with this operation.

Macro Substitution: The Operator

The most common preprocessor statement is . This statement tells the preprocessor to replace every occurrence of a particular character string with a specified value.

The syntax for is

#define macro value

This statement replaces all occurrences of the character string with the value specified by . This operation is known as macro substitution. can contain letters, digits, or underscores.

The value specified in a statement can be any character string or number. For example, the following statement replaces all occurrences of with the string (including the quotation marks):

#define USERNAME "dave"

This statement replaces with the string , including the parentheses:

#define EXPRESSION (14+6)
NOTE
When you are using with a value that is an expression, it is usually a good idea to enclose the value in parentheses. For example, consider the following Perl statement:
If your preprocessor command is
the resulting Perl statement becomes
which assigns 44 to (because the multiplication is performed first). If you enclose the preprocessor expression in parentheses, as in
the statement becomes
which yields the result 100, which is likely what you want.
Also, you always should enclose any parameters (described in the following section) in parentheses, for the same reason

Passing Arguments Using

You can specify one or more parameters with your statement. This capability enables you to treat the preprocessor command like a simple function that accepts arguments. For example, the following preprocessor statement takes a specified value and uses it as an exponent:

#define POWEROFTWO(val) (2 ** (val))

In the Perl statement

$result = POWEROFTWO(1.3 + 2.6) + 4;

the preprocessor substitutes the expression for and produces this:

$result = (2 ** (1.3 + 2.6)) + 4;

You can supply more than one parameter with a statement. For example, consider the following statement:

#define EXPONENT (base, exp) ((base) ** (exp))

Now, the statement

$result = EXPONENT(4, 11);

yields the following result after preprocessing:

$result = ((4) ** (11));

The Perl interpreter ignores the extra parentheses.

TIP
By convention, macros defined using normally use all uppercase letters (plus occasional digits and underscores). This makes it easier to distinguish macros from other variable names or character strings

Listing 16.2 is an example of a Perl program that uses a statement to perform macro substitution. This listing is just Listing 15.4 with the preprocessor statement added.


Listing 16.2. A program that uses a statement.
1: #!/usr/local/bin/perl -P 2: 3: #define AF_INET 2 4: print ("Enter an Internet address:\n"); 5: $machine = <STDIN>; 6: $machine =~ s/^\s+|\s+$//g; 7: @addrbytes = split (/\./, $machine); 8: $packaddr = pack ("C4", @addrbytes); 9: if (!(($name, $altnames, $addrtype, $len, @addrlist) = 10: gethostbyaddr ($packaddr, AF_INET))) { 11: die ("Address $machine not found.\n"); 12: } 13: print ("Principal name: $name\n"); 14: if ($altnames ne "") { 15: print ("Alternative names:\n"); 16: @altlist = split (/\s+/, $altnames); 17: for ($i = 0; $i < @altlist; $i++) { 18: print ("\t$altlist[$i]\n"); 19: } 20: }

$ program16_2 Enter an Internet address: 128.174.5.59 Principal name: ux1.cso.uiuc.edu $

Line 3 defines the macro and assigns it the value . When the C preprocessor sees in line 10, it replaces it with , which is the value of on the current machine (as specified in the header file or ).

If this program is moved to a machine that defines a different value for , all you need to do to get this program to work is change line 3 to use the value on that machine.

Using Macros in Statements

You can use a previously defined macro as the value in another statement. The following is an example:

#define FIRST 1 #define SECOND FIRST $result = 43 + SECOND;

Here, the macro is defined to be equivalent to the value , and is defined to be equivalent to . This means that the statement following the macro definitions is equivalent to the following statement:

$result = 43 + 1;

Conditional Execution Using and

The and statements control whether a given group of statements is to be included as part of your program.

The syntax for the and statements is

#ifdef macro code #endif

Here, is any character string that can appear in a statement. is one or more lines of your Perl program.

When the C preprocessor sees an statement, it checks whether the macro has been defined using the statement. If it has, the code specified by is included as part of the program. If it has not, the code specified by is skipped.

NOTE
The code enclosed by and does not have to be a complete Perl statement. For example, the following code is legal:
Here, is assigned if is defined, if it's not.
Be careful, though: If you abuse , the resulting program might become difficult to read

The and Statements

The and statements provide additional control over when parts of your program are to be executed.

The statement enables you to define code that is to be executed when a particular macro is not defined.

The syntax for is the same as for :

#ifndef macro code #endif

For example:

#ifndef MYMACRO $result = 26; #endif

The assignment is performed only if has not appeared in a statement.

The statement enables you to specify code to be executed if a macro is defined and an alternative to choose if the macro is not defined. For example:

#ifdef MYMACRO $result = 47; #else print ("Hello, world!\n"); #endif

Here, if has been defined by a statement, the following statement is exe-cuted:

$result = 47;

If has not been defined, the following statement is executed:

print ("Hello, world!\n");

You can use with , as in the following:

#ifndef MYMACRO print ("Hello, world!\n"); #else $result = 47; #endif

This code is identical to the -- sequence shown earlier in this section.

The Statement

The statement enables you to specify that certain lines of your program are to be included only if the expression included with the statement is nonzero.

The syntax for the statement is

#if expr code #endif

Here, is the expression to be evaluated, and is the code to be executed if is nonzero.

For example, the following statement is executed only if the expression is nonzero (which it always is, of course):

#if 14 + 3 $result = 26; #endif

You can use a macro definition as part of an statement. If the macro is defined, it has a nonzero value in an expression; if it is not defined, it has the value zero. Consider the following example:

#if MACRO1 || MACRO2 $result = 47; #endif

When the preprocessor sees the statement, it evaluates the expression . This expression has a nonzero value if either or is nonzero. Therefore, the following statement is executed if either or is defined:

$result = 47;

The statement provides a quick way to remove lines of code from your program temporarily:

#if 0 $result = 46; print ("This line is not printed right now.\n"); #endif

Here, the expression included with the statement is always zero, which means that the statements between and are always skipped.

You can use with , as in the following example:

#if MACRO1 || MACRO2 print ("MACRO1 or MACRO2 is defined.\n"); #else print ("MACRO1 and MACRO2 are not defined.\n"); #endif

This code includes the first print statement if or has been defined using , and it includes the second print statement if neither has been defined.

You cannot use the (exponentiation) operator in an statement because is not supported in the C programming language

Nesting Conditional Execution Statements

You can put one -- construct inside another. For example:

#ifdef MACRO1 #ifdef MACRO2 print ("MACRO1 yes, MACRO2 yes\n"); #else print ("MACRO1 yes, MACRO2 no\n"); #endif #else #ifdef MACRO2 print ("MACRO1 no, MACRO2 yes\n"); #else print ("MACRO1 no, MACRO2 no\n"); #endif #endif

You also can put an -- construct or an -- construct inside an -- construct, or vice versa. The only restriction is that the inner construct must be completely contained in one part of the outer construct.

Including Other Files Using

Another preprocessor command that is quite useful is the command. This command tells the C preprocessor to include the contents of the specified file as part of the program.

The syntax for the command is

#include filename

is the name of the file to be included.

For example, the following command includes the contents of as part of the program:

#include <myincfile.h>

When an statement is found in a Perl program, the C preprocessor searches for the file in the current directory and the directory. (The option, described in the following section, enables you to search in other directories.) To instruct the C preprocessor to search only the current directory, enclose the filename in double quotation marks rather than angle brackets.

#include "myincfile.h"

This command limits the search for to the current directory.

You can specify an entire pathname in an statement, as in the following example:

#include "/u/dave/myincfile.h"

This command retrieves the contents of and adds them to the program.

NOTE
Perl also enables you to include other files as part of a program using the statement. For more information on , refer to
Day 19, "Object-Oriented Programming in Perl.

The Option: Searching for C Include Files

You use the option with the option. It enables you to specify where to look for include files to be processed by the C preprocessor. For example:

perl -P -I /u/dave/myincdir testfile

This command tells the Perl interpreter to search the directory for include files (as well as the default directories).

To specify multiple directories to search, repeat the option:

perl -P -I /u/dave/dir1 -I /u/dave/dir2 testfile

This command searches in both and .

NOTE
The directories specified in the option also are added to the system variable . This technique ensures that the function can search in the same directories as the C preprocessor.
For more information on , refer to Day 17, "System Variables." For more information on , refer to Day 19

The Option: Operating on Multiple Files

One of the most common tasks in Perl programs and in UNIX commands is to read the contents of several input files one line at a time and process each input line as it is read. In these programs and commands, the names of the input files are supplied on the command line. A simple example is the UNIX command :

$ cat file1 file2 file3 ...

This command reads one line of input at a time and writes it to the standard output file.

In Perl, one way to read the contents of several input files, one line at a time, is to enclose the operator in a loop:

while ($line = <>) { # process $line in here }

Another method is to specify the option. This option takes your program and executes it once for each line of input in each of the files specified on the command line.

Listing 16.3 is a simple example of a program that uses the option. It puts asterisks around each input line and then prints it.


Listing 16.3. A simple program that uses the option.
1: #!/usr/local/bin/perl -n 2: 3: # input line is stored in the system variable $_ 4: $line = $_; 5: chop ($line); 6: printf ("* %-52s *\n", $line);

$ program16_3 * This test file has only one line in it. * $

The option encloses the program shown here in an invisible loop. Each time the program is executed, the next line of input from one of the input files is read and is stored in the system variable . Line 4 takes this line and copies it into another scalar variable, ; line 5 then removes the last character-the trailing newline character-from this line.

Line 6 uses to write the input line to the standard output file. Because is formatting the input, the asterisks all appear in the same columns (column 1 and column 56) on your screen.

NOTE
The previous program is equivalent to the following Perl program (which does not use the option):

The and options work well together. For example, the following command is equivalent to the command:

$ perl -n -e "print $_;" file1 file2 file3

The argument supplied with the option is a one-line Perl program. Because the option executes the program once for each input line and reads each input line into the system variable , the statement

print $_;

prints each input line in turn, which is exactly what the command does. (Note that the parentheses that normally enclose the argument passed to have been omitted in this case.)

The previous command can be made even simpler:

$ perl -n -e "print" file1 file2 file3

By default, if no argument is supplied, assumes that it is to print the contents of . And, if the program consists of a single statement, there is no need to include the closing semicolon.

The pattern matching and substitution operators also operate on by default. For example, the following statement examines the contents of and searches for a digit:

$found = /[0-9]/;

This default behavior makes it easy to include a search or a substitution in a single-line command. For example:

$ perl -n -e "print if /[0-9]/" file1 file2 file3

This command reads each line of the files , , and . If an input line contains a digit, it is printed.

NOTE
Several other functions use as the default scalar variable to operate on, which makes those functions ideal for use with the and options. A full list of these functions is provided in the description of the system variable, which is contained in Day 17

The Option: Operating on Files and Printing

The option is similar to the option: it reads each line of its input files in turn. However, the option also prints each line it reads.

This means, for example, that you can simulate the behavior of the UNIX command with the following command:

$ perl -p -e ";" file1 file2 file3

Here, the is a Perl program consisting of one statement that does nothing.

The option is designed for use with the option, described in the following section.

NOTE
If both the and the options are specified, the option is ignored

The Option: Editing Files

As you have seen, the and options read lines from the files specified on the command line. The option, when used with the option, takes the input lines being read and writes them back out to the files from which they came. This process enables you to edit files using commands similar to those used in the UNIX command.

For example, consider the following command:

$ perl -p -i -e "s/abc/def/g;" file1 file2 file3

This command contains a one-line Perl program that examines the scalar variable and changes all occurrences of into . (Recall that the substitution operator operates on if the operator is not specified.) The option ensures that is assigned each line of each input file in turn and that the program is executed once for each input line. Thus, this command changes all occurrences of in the files , , and to .

Do not use the option with the option unless you know what you're doing. The following command also changes all occurrences of to , but it doesn't write out the input lines after it changes them:
Because the option specifies that the input files are to be edited, the result is that the contents of , , and are completely destroyed

The option also works on programs that do not use the option but do contain the operator inside a loop. For example, consider the following command:

$ perl -i file1 file2 file3

In this case, the Perl interpreter copies the first file, , to a temporary file and opens the temporary file for reading. Then, it opens for writing and sets the default output file (the file used by calls to , , and ) to be .

After the program finishes reading the temporary file to which was copied, it then copies to a temporary file, opens it for reading, opens for writing, and sets the default output file to be . This process continues until the program runs out of input files.

Listing 16.4 is a simple example of a program that edits using the option and the operator. This program evaluates any arithmetic expressions (containing integers) it sees on a single line and replaces them with their results.


Listing 16.4. A program that edits files using the option.
1: #!/usr/local/bin/perl -i 2: 3: while ($line = <>) { 4: while ($line =~ 5: s#\d+\s*[*+-/]\s*\d+(\s*[*+-/]\s*\d+)*#<x>#) { 6: eval ("\$result = $&;"); 7: $line =~ s/<x>/$result/; 8: } 9: print ($line); 10: }

This program produces no output because output is written to the files specified on the command line.

The operator at the beginning of the loop (line 3) reads a line at a time from the input file or files. Each line is searched using the pattern shown in line 5. This pattern matches any substring containing the following elements (in the order given):

  1. One or more digits
  2. Zero or more spaces
  3. An , , , or character
  4. Zero or more spaces
  5. One or more digits
  6. Zero or more of the preceding four subpatterns (which matches the last part of expressions such as )

This pattern is replaced by a placeholder substring, .

Lines 6 and 7 are executed once for each pattern matched in the input line. The matched pattern, an arithmetic expression, is automatically stored in the system variable ; line 6 substitutes this expression into a character string and passes this character string to the function . The call to creates a subprogram that evaluates the expression and returns the result in the scalar variable . Line 7 replaces the placeholder, , with the result returned in .

When all the arithmetic expressions have been evaluated and substituted for, the inner loop terminates, and line 9 calls . Because the option has been set, the line is written back to the original input file from which it came.

NOTE
Even though you do not know the name of the file variable that represents the file being edited, you can still set the default output
file variable to some other file and change it back later.
To perform this task, recall that the function returns the file variable associated with the current default file:
After the second call has been performed, the default output file is, once again, the file being edited

Backing Up Input Files Using the Option

By default, the option overwrites the existing input files. If you wish, you can save a copy of the original input file or files before overwriting them. To do this, specify a file extension with the option:

$ perl -i .old file1 file2 file3

Here, the file extension specified with the option tells the Perl interpreter to copy to before overwriting it. Similarly, the interpreter copies to , and to .

The file extension specified with the option can be any character string. By convention, file extensions usually begin with a period; this convention makes it easier for you to spot them when you list the files in your directory.

TIP
If you are using the option with a program you are not familiar with, it is a good idea to specify a file extension. Doing so ensures that your files are not damaged if the program does not work the way you expect

The Option: Splitting Lines

The option is used with the or option. If the option is set, each input line that is read is automatically split into a list of "words" (sequences of characters that are not white space); this list of words is stored in a special system array variable named .

For example, if your input file contains the line

This is a test.

and if a program that is called with the option reads this line, the array contains
the list

("This", "is", "a", "test.")

The option is useful for extracting information from files. Suppose that your input files contain records of the form

company_name quantity_ordered total_cost

such as, for example,

JOHN H. SMITH 10 47.32

Listing 16.5 shows how you can use the option to easily produce a program that extracts the quantity and total cost fields from these files.


Listing 16.5. An example of the option.
1: #!/usr/local/bin/perl 2: 3: # This program is called with the -a and -n options. 4: while ($F[0] =~ /[^\d.]/) { 5: shift (@F); 6: next if (!defined($F[0])); 7: } 8: print ("$F[0] $F[1]\n");

$ perl -a -n program16_5 10 47.32 106 11.54 $

Because the program is called with the option, the array variable contains a list, each element of which is a word from the current input line.

Because the company name in the input file might consist of more than one word (such as ), the loop in lines 4-7 is needed to get rid of everything that isn't a quantity field or a total cost field. After these fields have been eliminated, line 8 can print the useful fields.

Note that this program just skips over any nonstandard input lines.

The Option: Specifying the Split Pattern

The option, defined only in Perl 5, is designed to be used in conjunction with the option, and specifies the pattern to use when you split input lines into words. For example, suppose Listing 16.5 is called as follows:

$ perl -a -n -F:: program16_5

In this case, the words in the input file are assumed to be separated by a pair of colons, which means that the program is expecting to read lines such as the following:

JOHN H. SMITH::10::47.32
NOTE
The option ignores opening and closing slashes if they are present because it interprets them as pattern delimiters. This means that the following program invocations are identical:

The Option: Specifying Input End-of-Line

In all the programs you have seen so far, when the Perl interpreter reads a line from an input file or from the keyboard, it reads until it sees a newline character. You can tell Perl that you want the "end-of-line" input character to be something other than the newline character by specifying the option. (The here is the digit zero, not the letter .)

With the option, you specify which character is to be the end-of-line character for your input file by providing its ASCII representation in base 8 (octal). For example, the command

$ perl -0 040 prog1 infile

calls the Perl program named and specifies that it is to use the space character (ASCII 32, or 40 octal) as the end-of-line character when it reads the input file (or any other input file).

This means, for example, that if this program reads an input file containing the following:

Test input. Here's another line.

it will read a total of four input lines:

  • The first input line consists of the word .
  • The second input line consists of , followed by a newline character, followed by .
  • The third input line consists of the word .
  • The fourth input line consists of the word , followed by a newline character.

The option provides a quick way to read an input file one word at a time, assuming that each line ends with at least one blank character. (If it doesn't, you can quickly write a Perl program that uses the and options to add a space to the end of each line in each file.) Listing 16.6 is an example of a program that uses to read an input file one word at a time.


Listing 16.6. A program that uses the option.
1: #!/usr/local/bin/perl -0040 2: 3: while ($line = <>) { 4: $line =~ s/\n//g; 5: next if ($line eq ""); 6: print ("$line\n"); 7: }

$ program16_6 file1 This line contains five words. $

The header comment (line 1) specifies that the option is to be used and that the space character is to become the end-of-line character. (Recall that you do not need a space between an option and the value associated with an option.) This means that line 3 reads from the input file until it sees a blank space.

Not everything read by line 3 is a word, of course. There are two types of lines that are not particularly useful that the program must check for:

  • Empty lines, which are generated when the input file contains two consecutive spaces
  • Lines containing the newline character (remember, the newline character is no longer an end-of-line character, so now it actually appears in input lines)

Line 4 checks whether any newline characters are contained in the current input line. The substitution in this line is a global substitution, because an input line can contain two or more newline characters. (This occurs when an input file contains a blank line.)

After all the newline characters have been eliminated, line 5 checks whether the resulting input line is empty. If it is, the program continues with the next input line. If the resulting input line is not empty, the input line must be a useful word, and line 6 prints it.

NOTE
If you specify the value (octal zero) with the option, the Perl interpreter reads until it sees two newline characters. This enables you to read an entire paragraph at a time.
If you specify no value with the option, the null character (ASCII 0) is assumed

The Option: Specifying Output End-of-Line

The option enables you to specify an output end-of-line character for use in statements.

Like the option, the option accepts a base-8 (octal) integer that indicates the ASCII representation of the character you want to use.

When the option is specified, the Perl interpreter does two things:

  • If the or option is specified, each input line read in from the standard input file has its last character (the line terminator) removed. (The Perl interpreter takes this action because it assumes that you want to replace the old end-of-line character with the one specified by the option.)
  • When you call the function, the output written by will be immediately followed by the character specified by the option.

If you do not specify a value with the option, the Perl interpreter uses the character specified by the option, if it is defined. If has not been specified, the end-of-line character is defined to be the newline character.

If you are using both the and the option and you do not provide a value with the option, the order of the options becomes significant because the options are processed from left to right.
If the option appears first, the output end-of-line character is set to the newline character. If the option appears first, the output end-of-line character (set by ) becomes the same as the input end-of-line character (set by )

Listing 16.7 is a simple example of a program that uses .


Listing 16.7. A program that uses the option.
1: #!/usr/local/bin/perl -l014 2: 3: print ("Hello!"); 4: print ("This is a very simple test program!");

$ program16_7 Hello! This is a very simple test program! $

The option in the header comment in line 1 sets the output line character to the newline character. This means that every statement in the program will have a newline character added to it. As a consequence, the output from lines 3 and 4 appear on separate lines.

NOTE
You can control the input and output end-of-line characters also by using the system variables and . For a description of these system variables, refer to Day 17

The Option: Extracting a Program from a Message

The option enables you to process a Perl program that appears in the middle of a file (such as a file containing an electronic mail message, which usually contains some mail routing information). When the option is specified, the Perl interpreter ignores every line in the program until it sees a header comment (a comment beginning with the characters).

If you are using Perl 5, the header comment must also contain the word "perl.

After the Perl interpreter sees the header comment, it then processes the program as usual until one of the following three conditions occurs:

  • The bottom of the program file is reached.
  • The program file contains a line consisting of just the Ctrl+D or Ctrl+Z character.
  • The program file contains a line consisting of the following statement (by itself):
_ _END_ _

If the Perl interpreter reads one of the end-of-program lines (the second and third conditions listed previously), it ignores everything appearing after that line in the file.

Listing 16.8 is a simple example of a program that works if run with the option.


Listing 16.8. A Perl program contained in a file.
1: Here is a Perl program that appears in the middle 2: of a file. 3: The stuff up here is junk, and the Perl interpreter 4: will ignore it. 5: The next line is the start of the actual program. 6: #!/usr/local/bin/perl 7: 8: print ("Hello, world!\n"); 9: _ _END_ _ 10: This line is also ignored, because it is not part 11: of the program.

$ program16_8 Hello, world! $

If this program is started with the option, the Perl interpreter skips over everything until it sees line 6. (Needless to say, if you try to run this program without specifying the option, the Perl interpreter will complain.) Line 8 then prints the message .

Line 9 is the special end-of-program line. When the Perl interpreter sees this line, it skips the rest of the program.

NOTE
Of course, you can't specify the option in the header comment itself because the Perl interpreter has to know in advance that the program contains lines that must be skipped

Miscellaneous Options

The following sections describe some of the more exotic options you can pass to the Perl interpreter. You are not likely to need any of these options unless you are doing something unusual (and you really know what you are doing).

The Option

The option tells the Perl interpreter to generate a core dump file. This file can then be examined and manipulated.

The Option

The option tells the Perl interpreter to enable you to perform "unsafe" operations in your program. (Basically, you'll know that an operation is considered unsafe when the Perl interpreter doesn't let you perform it without specifying the option!)

The Option

The option tells the Perl interpreter that your program might be contained in any of the directories specified by your environment variable. The Perl interpreter checks each of these directories in turn, in the order in which they are specified, to see whether your program is located there. (This is the normal behavior of the shell for commands in the UNIX environment.)

NOTE
You need to use only if you are running your Perl program using the command, as in
If you are running the program using a command such as
your shell (normally) treats it like any other command and searches the directories specified in your environment variable even if you don't specify the option

The Option

The option sets the Perl interpreter's internal debugging flags. This option is specified with an integer value (for example, ).

For details on this option, refer to the online manual page for Perl.

NOTE
The internal debugging flags specified by have nothing to do with the Perl debugger, which is specified by the option.
The debugging flags specified by provide information on how Perl itself works, not on how your program works

The Option: Writing Secure Programs

The option specifies that data obtained from the outside world cannot be used in any command that modifies your file system. This feature enables you to write secure programs for system administration tasks.

This option is only available in Perl 5. If you are running Perl 4, use a special version of Perl named . For details on , see the online documentation supplied with your Perl distribution.

The Option: Using the Perl Debugger

One final option that is quite useful is . This option tells the Perl interpreter to run your program using the Perl debugger. For a complete description of the Perl debugger and how to use it, refer to Day 21, "The Perl Debugger."

NOTE
If you are specifying the option, you still can use other options

Summary

Today you learned how to specify options when you run your Perl programs. An option is a dash followed by a single letter, and optionally followed by a value to be associated with the option. Options lacking associated values can be grouped together.

You can specify options in two ways: on the command line and in the header comment. Only one option or group of options can be supplied in the header comment.

Available options include those that list the Perl version number, check your syntax, display warnings, allow single-line programs on the command line, invoke the C preprocessor, automatically read from the input files, and edit files in place.

Q&A

Q:Why can you specify only one option in the header comment?
A:This is a restriction imposed by the UNIX operating system.
Q:Why does display the Perl version number without running the program?
A:This option enables you to check whether the version of Perl you are running is capable of running your program. If an old copy of Perl is running on your machine, your program might not work properly.
Q:What options enable me to write a program that edits every line of a file?
A:Use the (edit in place) and (print each line) options. (These options are often used with the option to perform an editing command similar to those used by the UNIX command.)
Q:I have a program that needs to run on two or more different machines. Is there a way of writing the program that ensures that I don't have to change the program each time I change machines?
A:Here's how to carry out this task:
  1. On each machine, define a file that is to be used to store system-dependent constants. Give the file the same name on each machine. For example, you could call the file . The location of the file doesn't matter as long as it's a different directory name on each type of machine.
  2. In each , use to define one constant for each type of machine you run. For example, if you are running this program on UNIX 4.3BSD and System V machines, you could define constants named and .
  3. After you have defined the constants, set the value of each constant to 0, except for the one corresponding to the machine on which you are running. For example, on your 4.3BSD machines, set to 1, and set all the other constants to 0.
  4. Add the following statement to your program:
  5. In your program, use and to enclose any system-dependent information. For example, if a group of statements is to be executed only on 4.3BSD machines, enclose the statements with the statements
  6. When you run your program, use the option to specify C preprocessing, and use the option to tell the Perl interpreter to search for the directory corresponding to the file for this machine. For example, if you are running your program on a 4.3BSD machine and the file for 4.3BSD machines is in the directory, include the following option when you start your program:
Q:Why does the option override the option?
A:The option tells the Perl interpreter that you want to print each input line that you read, and the option tells it that you don't want to do so. These options basically contradict one another. overrides because is safer; if you really want , you can throw away the output from . If you really want and get , you won't get the output you want.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.

Quiz

  1. What do the following options do?
  2. What happens when and are both specified, and
    a.     appears first?
    b.     appears first?
  3. Why do the and options destroy input files when included together?
  4. How does the C preprocessor distinguish between preprocessor commands and Perl comments?
  5. How does the Perl interpreter distinguish options for the interpreter from options for the program itself?

Exercises

  1. Write a program that replaces all the newline characters in the file with colons. Use only command-line options to do this.
  2. Write a one-line program that prints only the lines containing the word .
  3. Write a one-line program that prints the second word of each input line.
  4. Write a program that prints if you pass the switch to it and that prints if you pass the switch.
  5. Write a one-line program that converts all lowercase letters to uppercase.
  6. BUG BUSTER: What is wrong with this command line?
  7. BUG BUSTER: What is wrong with this command line?


Option and Configuration Processing Made Easy

Jul 12, 2007 by Jon Allen

When you first fire up your editor and start writing a program, it’s tempting to hardcode any settings or configuration so you can focus on the real task of getting the thing working. But as soon as you have users, even if the user is only yourself, you can bet there will be things they want to choose for themselves.

A search on CPAN reveals almost 200 different modules dedicated to option processing and handling configuration files. By anyone’s standards that’s quite a lot, certainly too many to evaluate each one.

Luckily, you already have a great module right in front of you for handling options given on the command line: , which is a core module included as standard with Perl. This lets you use the standard double-dash style of option names:

Using Getopt::Long

When your program runs, any command-line arguments will be in the array. exports a function, , which processes to do something useful with these arguments, such as set variables or run blocks of code. To allow specific option names, pass a list of option specifiers in the call to together with references to the variables in which you want the option values to be stored.

As an example, the following code defines two options, and . The call to will then assign the value to the variables and respectively if the relevant option is present on the command line.

When has finished processing options, any remaining arguments will remain in for your script to handle (for example, specified filenames). If you use this example code and call your script as:

then after has been called the array will contain the values , , and .

Types of Command-Line Options

The option specifier provided to controls not only the option name, but also the option type. gives a lot of flexibility in the types of option you can use. It supports Boolean switches, incremental switches, options with single values, options with multiple values, and even options with hash values.

Some of the most common specifiers are:

So, to create an option that requires a string value, format the call to like this:

The value is required. If the user omits it, as in:

then the call to will with an appropriate error message.

Options with Multiple Values

The option specifier consists of four components: the option name; data type (Boolean, string, integer, etc.); whether to expect a single value, a list, or a hash; and the minimum and maximum number of values to accept. To require a list of string values, build up the option specifier:

Putting these all together gives:

Now invoking the script as:

will set to the array reference .

Giving a hash value to an option is very similar. Replace with and on the command line give arguments as key=value pairs:

Running the script as:

will store the hash reference in .

Storing Options in a Hash

By passing a hash reference as the first argument to , you can store the complete set of option values in a hash instead of defining a separate variable for each one.

Option names will be hash keys, so you can refer to the value as . If an option is not present on the command line, then the corresponding hash key will not be present.

Options that Invoke Subroutines

A nice feature of is that, as an alternative to simply setting a variable when an option is found, you can tell the module to run any code of your choosing. Instead of giving a variable reference to store the option value, pass either a subroutine reference or an anonymous code reference. This will then be executed if the relevant option is found.

When used in this way, also passes the option name and value as arguments to the subroutine:

You can still include code references in the call to even if you use a hash to store the option values:

Dashes or Underscores?

If you need to have option names that contain multiple words, such as a setting for “Source directory,” you have a few different ways to write them:

To give a better user experience, allows option aliases to allow either format. Define an alias by using the pipe character () in the option specifier:

Note that if you’re storing the option values in a hash, the first option name (in this case, ) will be the hash key, even if your user gave an alias on the command line.

If you have a lot of options, it might be helpful to generate the aliases using a function:

Running this script with each format in turn shows that they are all valid:

Additionally, is case-insensitive by default (for option names, not values), so your users can also use , , etc., as well:

Configuration Files

The next stage on from command-line options is to let your users save their settings into config files. After all, if your program expands to have numerous options it’s going to be a real pain to type them in every time.

When it comes to the format of a configuration file, there are a lot of choices, such as XML, INI files, and the Apache httpd.conf format. However, all of these formats share a couple of problems. First, your users now have two things to learn: the command-line options and the configuration file syntax. Second, even though many CPAN modules are available to parse the various config file formats, you still must write the code in your program to interact with your chosen module’s API to set whatever variables you use internally to store user settings.

Getopt::ArgvFile to the Rescue

Fortunately, someone out there in CPAN-land has the answer (you can always count on the Perl community to come up with innovative solutions). tackles both of these problems, simplifying the file format and the programming interface in one fell swoop.

To start with, the file format used by is extremely easy for users to understand. Config settings are stored in a plain text file that holds exactly the same directives that a user would type on the command line. Instead of typing:

your user can use the config file:

and then run for instant user gratification with no steep learning curve.

Now to the clever part. itself doesn’t actually care about the contents of the config file. Instead, it makes it appear to your program that all the settings were actually options typed on the command line–the processing of which you’ve already covered with . As well as saving your users time by not making them learn a new syntax, you’ve also saved yourself time by not needing to code against a different API.

The most straightforward method of using involves simply including the module in a statement:

A program called myscript that contains this code will search the user’s home directory (whatever the environment variable is set to) for a config file called .myscript and extract the contents ready for processing by .

Here’s a complete example:

Save this as hello, then run the script with and without a command-line option:

Now, create a settings file called .hello in your home directory containing the option. Remember to double quote the value if you want to include spaces.

Running the script without any arguments on the command line will show that it loaded the config file, but you can also override the saved settings by giving the option on the command line as normal.

Advanced Usage

In many cases the default behaviour invoked by loading the module will be all you need, but can also cater to more specific requirements.

User-Specified Config Files

Suppose your users want to save different sets of options and specify which one to use when they run your program. This is possible using the directive on the command line:

Note that there’s no extra programming required to use this feature; handling options is native to .

Changing the Default Config Filename or Location

Depending on your target audience, the naming convention offered by for config files might not be appropriate. Using a dotfile (.myscript) will render your user’s config file invisible in his file manager or when listing files at the command prompt, so you may wish to use a name like myscript.conf instead.

Again, it may also be helpful to allow for default configuration files to appear somewhere other than the user’s home directory, for example, if you need to allow system-wide configuration.

A further consideration here is PAR , the tool for creating standalone executables from Perl programs. PAR lets you include data files as well as Perl code, so you can bundle a default settings file using a command such as:

which will be available to your script as .

I mentioned earlier that can load arbitrary config files if the filename appears with the directive on the command line. Essentially, what the module does when loaded with:

is to prepend with , then resolve all directives, leaving with the contents of the files. This means that running the script as:

is basically equivalent to writing:

To load other config files, supports disabling the automatic processing and triggering it later. With a little manipulation of first, you can make:

equivalent to:

which will load the set of config files in the correct priority order.

All you need to do to enable this feature is change the statement to read:

Loading the module in this way tells to export the function , which your program needs to call to process the directives, and also prevents any automated processing from occurring.

Here’s an example that first loads a config file from the application bundle (if packaged by PAR) and then from the directory containing the application binary:

You can also use this technique together with to access the user’s application data directory in a cross-platform manner, so that the location of the config file conforms to the conventions set by the user’s operating system.

Summary

provides an easy to use, extensible system for processing command-line options. With the addition of , you can seamlessly handle configuration files with almost no extra coding. Together, these modules should be first on your list when writing scripts that need any amount of configuration.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *