[GLLUG] Shell Scripting Question
Matt Graham
danceswithcrows@usa.net
Wed, 12 Mar 2003 08:54:13 -0500
On Wednesday 12 March 2003 07:34, after a long battle with technology,
Mike Rambo wrote:
> "Melson, Paul" wrote:
> > cat logfile |sed -e 's/|/\`/4' |sed -e 's/|/\`/3' |sed -e
> > 's/|/\`/2' |sed -e 's/|/\`/1' |sed -e 's/|/\|/g' | awk -F\`
> > '{print("<tr><td>",$2,"</td><td>",$5,"</td></tr>")}'
>
> Now there (sed) is a potential subject for a GLLUG presentation!
> Everytime I've looked at sed my eyes glaze over and my brain
> segfaults.
Paul didn't even use any of the *complicated* parts of sed. Let's
dissect this example:
sed -e 's/|/\`/4'
-e 's/|/\`/3'
-e 's/|/\`/2'
-e 's/|/\`/1'
-e 's/|/\|/g'
Each -e argument is a regular expression. Regular expressions are
complicated, but these look more complicated than they really are. The
first one, 's/|/\`/4' , means: Substitute the 4th occurrence of | in
the input with ` . s means substitute, or 'find and replace'. The
part between the first pair of '/'s is the thing to find. The part
between the second pair of '/'s is the thing to replace. The ` is
preceded with a \ because ` is a special character. Finally, the
things after the last / are special flags. '1-9' for 'only apply this
to the 1-9th occurrence of the pattern, 'g' for 'apply this to every
occurrence of the pattern, 'i' for case-insensitive matching, and lots
more besides.
The number of things you can do with regular expressions is absolutely
amazing. Simple:
s/bob/fred/ (replace first 'bob' with 'fred')
s/bob/fred/g (replace every 'bob' with 'fred')
s/bob/fred/gi (replace every 'bob','BOB','BoB','bOb'... with 'fred')
s/bob.*bill/fred/ (replace every string that matches 'bob' plus an
arbitrary number of any characters up to the string 'bill' with 'fred'.
This would replace 'bob joe bill' but not 'bob tom'. In a regular
expression, '.*' is like the shell glob '*' (matches any number of
characters), while '.' is like the shell glob character '?' (matches
one character).)
s/^/>/ (replace the beginning of a line with '>'. This really
adds a '>' on to the beginning of a line.)
s/$/./ (replace the end of a line with '.'. This really adds a '.'
to the end of a line.)
More complex:
s/(\d\d)-(\d\d)-(\d\d\d\d)/\3-\1-\2/
Converts a date in American format, like 06-26-1976, to a date in ISO
standard format, like 1976-06-26 . Lots of new things here. The '\d'
matches any digit [0-9]. Each set of ()s creates a 'group'. The
regular expression engine stores whatever matched each group in
internal variables, so you can use them later.
So, if we feed this expression "06-26-1976", "06" matches the first
group (\d\d). This is stored in variable 1. "26" matches the second
(\d\d), and is stored in variable 2. "1976" matches (\d\d\d\d) and is
stored in variable 3.
Then, in the second ("replace") part of the regular expression, we
replace whatever we "found" with the contents of variable 3 ("\3"),
then a dash, then the contents of variable 1, then the contents of
variable 2. End result: 1976-06-26 .
NOTE: The above regular expression follows Perl syntax. sed may
require \ before the ( to get the grouping right.
I hope this was useful; if I have made egregious errors, I'm sure
someone will point them out shortly.
--
Three disks for /usr/bin under the Sun
Seven for the workers paging through their E-mails
Nine disks for hackers getting programs to run
One for the Sys Admin when the system fails
One RAID to rule them all, One RAID to bind them
One RAID to hold the files and in the darkness grind them
In the land of Server where the Unix lies....
There is no Darkness in Eternity/But only Light too dim for us to see