[GLLUG] tr

David Lee Lambert lamber45@egr.msu.edu
Mon, 30 Sep 2002 21:19:37 -0400 (EDT)


On Mon, 30 Sep 2002, Matt Graham wrote:

> tr doesn't do regexp matching the way sed and perl do.  tr in this case
> is matching either '\' or 'n' and replacing either one with ' '.  This
> is not what you want.  I think you want this:
>
> perl -pe 's#\\n# #g' < dumpfile.txt > newfile.txt
>
> Perl handles "weird" characters like 0x0A and 0x0D in regexps in a more
> consistent way than sed does.  However, if this is a really large file
> (over a few hundred M) that doesn't have any newlines in it at all,

Actually,  the perl manpages recommend reading in an entire file at once,
for speed,  under some conditions.  It sounds like Paul's program just
stuck the '\n' characters within datafields;  otherwise he'd be better off
trying something like

  perl -pe 's.\\n.\n.g' < dumpfile.txt > newfile.txt

Now,  it's true that I don't quite know how perl does s///g internally,
and it might be O(N^2) on the number of characters in a line... if this is
the case,  a 100 MB file with a '\n' every 80 characters could take a
couple months to process;  however,  it should be possible to do this on a
single pass through the file...

I ought to go look at the Perl RE compiler and see how it works.

-- 
DLL
http://www.cse.msu.edu/~lamber45/