As well as identifying regular expressions Perl can make substitutions based on those matches. The way to do this is to use the s function which is designed to mimic the way substitution is done in the vi text editor. Once again the match operator is used, and once again if it is omitted then the substitution is assumed to take place with the $_ variable.
To replace an occurrence of london by London in the string $sentence we use the expression
$sentence =~ s/london/London/and to do the same thing with the $_ variable just
s/london/London/Notice that the two regular expressions (london and London) are surrounded by a total of three slashes. The result of this expression is the number of substitutions made, so it is either 0 (false) or 1 (true) in this case.
s/london/London/gwhich of course works on the $_ variable. Again the expression returns the number of substitutions made, which is 0 (false) or something greater than 0 (true).
If we want to also replace occurrences of lOndon, lonDON, LoNDoN and so on then we could use
s/[Ll][Oo][Nn][Dd][Oo][Nn]/London/gbut an easier way is to use the i option (for "ignore case"). The expression
s/london/London/giwill make a global substitution ignoring case. The i option is also used in the basic /.../ regular expression match.
$_ = "Lord Whopper of Fibbing"; s/([A-Z])/:\1:/g; print "$_\n";will replace each upper case letter by that letter surrounded by colons. It will print :L:ord :W:hopper of :F:ibbing. The variables $1,...,$9 are read-only variables; you cannot alter them yourself.
As another example, the test
if (/(\b.+\b) \1/) { print "Found $1 repeated\n"; }will identify any words repeated. Each \b represents a word boundary and the .+ matches any non-empty string, so \b.+\b matches anything between two word boundaries. This is then remembered by the parentheses and stored as \1 for regular expressions and as $1 for the rest of the program.
The following swaps the first and last characters of a line in the $_ variable:
s/^(.)(.*)(.)$/\3\2\1/The ^ and $ match the beginning and end of the line. The \1 code stores the first character; the \2 code stores everything else up the last character which is stored in the \3 code. Then that whole line is replaced with \1 and \3 swapped round.
After a match, you can use the special read-only variables $` and $& and $' to find what was matched before, during and after the seach. So after
$_ = "Lord Whopper of Fibbing"; /pp/;all of the following are true. (Remember that eq is the string-equality test.)
$` eq "Lord Wo"; $& eq "pp"; $' eq "er of Fibbing";
Finally on the subject of remembering patterns it's worth knowing that inside of the slashes of a match or a substitution variables are interpolated. So
$search = "the"; s/$search/xxx/g;will replace every occurrence of the with xxx. If you want to replace every occurence of there then you cannot do s/$searchre/xxx/ because this will be interpolated as the variable $searchre. Instead you should put the variable name in curly braces so that the code becomes
$search = "the"; s/${search}re/xxx/;
$sentence =~ tr/abc/edf/
Most of the special RE codes do not apply in the tr function. For example, the statement here counts the number of asterisks in the $sentence variable and stores that in the $count variable.
$count = ($sentence =~ tr/*/*/);However, the dash is still used to mean "between". This statement converts $_ to upper case.
tr/a-z/A-Z/;
023 Amp, James Wa(tt), Bob Transformer, etc. These pion(ee)rs conducted manyTry to get it so that all pairs of letters are in parentheses, not just the first pair on each line.
For a slightly more interesting program you might like to try the following. Suppose your program is called countlines. Then you would call it with
./countlinesHowever, if you call it with several arguments, as in
./countlines first second etcthen those arguments are stored in the array @ARGV. In the above example we have $ARGV[0] is first and $ARGV[1] is second and $ARGV[2] is etc. Modify your program so that it accepts one argument and counts only those lines with that string. It should also put occurrences of this string in paretheses. So
./countlines thewill output something like this line among others:
019 But (the) greatest Electrical Pioneer of (the)m all was Thomas Edison, who