October 04, 2005

Wow, that is expensive

I've been working on $CONSULTING_GIG (ex-$WORK_GIG)'s compiler today, digging into a performance issue that it's got. It doesn't show in too many places -- most of the forms are really snappy -- but when it does... ouch.

The problem, in this case, is one of constants.

Now, constants are one of those facts of life, and that's fine. I rather like them, and when given a bit of a chance there's all sorts of interesting things one can do with a program in a compiler. Other than some simple constant folding, I didn't do any of them, mostly for time reasons.

The way I implemented constants was pretty straightforward, since that's easiest to debug. Because all the binary operations had to be overloaded, and I didn't want an explosion of overload functions, the compiler generates PMCs of the appropriate language type for the constants, something that is certainly quite common -- just because the source says:

a + 4

you don't necessarily want a low-level integer for that 4, since you may have to allow people to overload operations in the integer class and have that affect integer constants. (You can do this in ruby, for example) This is no big deal.

What I'd done is taught the compiler to emit the appropriate code to create a new PMC and give it a value before each use, so the above statement would read in PIR something like:

I16 = find_type "Int32"
P16 = new I16
P16 = 4
P17 = global 'a'
P18 = new I16
P18 = P16 + P17

and go on from there. No big deal. Except... my victim program of the moment was taking something like two seconds a line item to pull up and display data, which was nasty. Yeah, sure, that was accessing a Postgres database hosted on a windows box over an 802.11b wireless link, but still, nasty. (It was 5 seconds a line item before I optimized the generated SQL -- Postgres has a hard time with "field LIKE $1" and index usage, since the query optimizer gets the query to optimize but not the value of the placeholder variables, so things get messy, but that's a separate issue)

Since nothing big and obvious was taking all the time, it was time to start in on the hopeful stuff and see where we went. In this case, constant creation seemed an obvious spot to look. Besides the cost of creating all the new PMCs every time, there's the disposal cost to allow for as well, since all those temp values need to be cleaned up by the garbage collector. Which it does, and reasonably well, but it's a lot faster to not do something than to do it.

The worry was that creation would be cheap enough that it was less than the cost of looking the variables up in the global namespace, which is a definite worry as parrot's hashing code has been sub-swell in the past. Still, it only took a day to make the changes and test everything out to make sure it worked.

Net result? Reusing constants rather than recreating them cut the time the victim program took to load up its line items from 17 seconds down to 11. Not bad, all things considered, and definitely surprising. I'd expected to shave off a second or so, maybe, which still would've been worth the effort, but not 6. That was nice.

So, net thing learned? Pre-generating constants is often worth it. (Now I'd love to find out what I could shave off the run time by changing all those by-name lookups to by-offset lookups...)

Posted by Dan at October 4, 2005 09:09 PM | TrackBack (0)

So...what was the solution to the Postgres problem? :)

Posted by: John at October 4, 2005 11:51 PM

Fixing the postgres problem was kind of fun.

Postgres can't do any data-based optimization with LIKE clauses if they use placeholders, so:

foo LIKE $1

generally won't use an index as part of its query plan. (It might some times, I haven't looked, so I can't say never. Just never for me) Unfortunately this language has a rule where if you're doing certain types of lookups, string fields are considered wildcarded if the string is shorter than the field in the database. That is, if the DB field foo is four characters, and I'm looking up with a three character string, it needs to be a wildcard lookup, and hence use LIKE.

The language also mandates that empty fields (strings that are all spaces, or numeric fields that are zero) don't count when doing a query, so the DB access library already processes the SQL before sending it on, to remove any unneeded terms in the WHERE clause. What I did was modify this logic to also check the the length of the data in the placeholder array, so that if the placeholder data was the same length as the field in the database the "foo LIKE $1" clause was changed to a "foo = $1" clause.

Postgres likes equality a lot better than it does LIKEs, so that got index usage and much better speed. Queries that were taking 3.3 seconds with the un-preprocessed SQL took 0.03 seconds with the pre-processed SQL. (It's a big table, and even with caching those full table scans take some time)

And hey, who am I to pass up a two order of magnitude increase in speed, right? :)

This language's database interface library does a lot of SQL mangling on the fly. At some point the cost of processing the SQL will outweigh the gains from processing, but it's definitely worth it for wins like this. (Even with the pain that is text processing in assembly language)

Posted by: Dan at October 5, 2005 09:37 AM

"That is, if the DB field foo is four characters, and I'm looking up with a three character string,"

You can add casts for domains. I such cases
by coercing foo= ? will result in calling
the correct cast which can add spaces
- if that is your thang :-)

Posted by: Jacqui Caren at October 24, 2005 06:00 AM

http://www.urban-armory.com/wwwboard/messages/5173.html complimentwhosewondered

Posted by: abilities at April 17, 2006 11:03 AM

http://erotic.redclouds.com/medic/messages/38633.html complimentwhosewondered

Posted by: snifter at June 1, 2006 02:11 AM