databases, optimization and open source
Mark Szidik
szidikm@mlc.lib.mi.us
Thu, 22 Mar 2001 16:02:49 -0500 (EST)
I admin a commercial product here that happens to use postgreSQL for its
authorization database. The package comes with compiled java
application (no source supplied) to load data into postgres. I have no
idea on how it was loading the data, but it was clear that they were
not using transactions. It took about 8 hours to load 44K records!
I was so disgusted with the wait that I got the table layout and wrote
my own Python script to format the data into SQL insert
statements. I setup transactions for every 1000 records. My latest
load of 100K records took only 2 minutes 17 seconds!
That is one heck of a perfomance inprovement. I also found out that
they created no indexes on the main table, so I added my own. Now user
lookups are quite a bit faster.
I am lucky that I could get around their scripts, its rare when this
happens with commercial software. I think this is yet another reason to
use open-source products: If something doesn't work in an open-source
package, you can always change it.
-Mark