databases, optimization and open source

Mark Szidik szidikm@mlc.lib.mi.us
Thu, 22 Mar 2001 16:02:49 -0500 (EST)


I admin a commercial product here that happens to use postgreSQL for its
authorization database.  The package comes with compiled java
application (no source supplied) to load data into postgres.  I have no
idea on how it was loading the data, but it was clear that they were
not using transactions.  It took about 8 hours to load 44K records!

I was so disgusted with the wait that I got the table layout and wrote
my own Python script to format the data into SQL insert
statements.  I setup transactions for every 1000 records.  My latest
load of 100K records took only 2 minutes 17 seconds!

That is one heck of a perfomance inprovement.  I also found out that
they created no indexes on the main table, so I added my own.  Now user
lookups are quite a bit faster.

I am lucky that I could get around their scripts, its rare when this
happens with commercial software.  I think this is yet another reason to
use open-source products:  If something doesn't work in an open-source
package, you can always change it.

-Mark