[GLLUG] Shell Scripting Question

Matt Graham danceswithcrows@usa.net
Wed, 12 Mar 2003 08:54:13 -0500


On Wednesday 12 March 2003 07:34, after a long battle with technology, 
Mike Rambo wrote:
> "Melson, Paul" wrote:
> > cat logfile |sed -e 's/|/\`/4' |sed -e 's/|/\`/3' |sed -e
> > 's/|/\`/2' |sed -e 's/|/\`/1' |sed -e 's/|/\|/g' | awk -F\`
> > '{print("<tr><td>",$2,"</td><td>",$5,"</td></tr>")}'
>
> Now there (sed) is a potential subject for a GLLUG presentation!
> Everytime I've looked at sed my eyes glaze over and my brain
> segfaults.

Paul didn't even use any of the *complicated* parts of sed.  Let's 
dissect this example:

sed -e 's/|/\`/4'
-e 's/|/\`/3'
-e 's/|/\`/2'
-e 's/|/\`/1'
-e 's/|/\|/g'

Each -e argument is a regular expression.  Regular expressions are 
complicated, but these look more complicated than they really are.  The 
first one, 's/|/\`/4' , means:  Substitute the 4th occurrence of | in 
the input with ` .  s means substitute, or 'find and replace'.  The 
part between the first pair of '/'s is the thing to find.  The part 
between the second pair of '/'s is the thing to replace.  The ` is 
preceded with a \ because ` is a special character.  Finally, the 
things after the last / are special flags.  '1-9' for 'only apply this 
to the 1-9th occurrence of the pattern, 'g' for 'apply this to every 
occurrence of the pattern, 'i' for case-insensitive matching, and lots 
more besides.

The number of things you can do with regular expressions is absolutely 
amazing.  Simple:

s/bob/fred/      (replace first 'bob' with 'fred')
s/bob/fred/g     (replace every 'bob' with 'fred')
s/bob/fred/gi    (replace every 'bob','BOB','BoB','bOb'... with 'fred')
s/bob.*bill/fred/   (replace every string that matches 'bob' plus an 
arbitrary number of any characters up to the string 'bill' with 'fred'.  
This would replace 'bob joe bill' but not 'bob tom'.  In a regular 
expression, '.*' is like the shell glob '*' (matches any number of 
characters), while '.' is like the shell glob character '?' (matches 
one character).)
s/^/>/         (replace the beginning of a line with '>'.  This really 
adds a '>' on to the beginning of a line.)
s/$/./      (replace the end of a line with '.'.  This really adds a '.' 
to the end of a line.)

More complex:

s/(\d\d)-(\d\d)-(\d\d\d\d)/\3-\1-\2/

Converts a date in American format, like 06-26-1976, to a date in ISO 
standard format, like 1976-06-26 .  Lots of new things here.  The '\d' 
matches any digit [0-9].  Each set of ()s creates a 'group'.  The 
regular expression engine stores whatever matched each group in 
internal variables, so you can use them later.  

So, if we feed this expression "06-26-1976", "06" matches the first 
group (\d\d).  This is stored in variable 1.  "26" matches the second 
(\d\d), and is stored in variable 2.  "1976" matches (\d\d\d\d) and is 
stored in variable 3.

Then, in the second ("replace") part of the regular expression, we 
replace whatever we "found" with the contents of variable 3 ("\3"), 
then a dash, then the contents of variable 1, then the contents of 
variable 2.  End result:  1976-06-26 .

NOTE:  The above regular expression follows Perl syntax.  sed may 
require \ before the ( to get the grouping right.

I hope this was useful; if I have made egregious errors, I'm sure 
someone will point them out shortly.

-- 
   Three disks for /usr/bin under the Sun
   Seven for the workers paging through their E-mails
   Nine disks for hackers getting programs to run
   One for the Sys Admin when the system fails
      One RAID to rule them all, One RAID to bind them
      One RAID to hold the files and in the darkness grind them
   In the land of Server where the Unix lies....
There is no Darkness in Eternity/But only Light too dim for us to see