Friday, November 7, 2008

Destructive String Operations Suck

Some of you may know that I started maintaining a 20+ year old assembler sometime in March this year. Sadly enough, most of my time so far has been spent on refactoring. The reason is simple: I often can't fix a bug or add a feature the users want because I run into some impenetrable wall of code that nobody in their right mind could possibly want to grok. :-(

So I bite the bullet for everybody else, try to understand the code without going nuts, and then rewrite it at least somewhat more cleanly. Drudgery for sure, but not without merit since I keep learning quite a few things in the process. Yes, I still learn new things about programming! Nobody is ever really done learning this stuff, even when you teach it year-in and year-out.

In my recent refactoring travels, I have finally recognized The One Big Truth about string operations: They should never be destructive. Not ever!

Case in point, I had a cute little function char *strlower(char*) that converted a string to lower case destructively, mostly because it's simpler to write the code that way. Of course, as I started putting more and more const into the code as part of my refactoring, I kept having to work around this function in various ways. Today it finally got too annoying, so the function lives no more. 8-)

The new function is size_t strlower(char *dst, const char *src, size_t size) instead, somewhat obviously inspired by strlcpy and strlcat of BSD fame. Granted, the rest of the code is now full of local buffers, but that's better than having to contort myself around a destructive string operation.

Of course, now that I've made the changes and written this blog entry, I have the weird feeling that this is going to bite my again later. Nothing like making a really embarrassing mistake to learn something about programming, eh? :-)

Update 2011/10/09: Sad but true: I've not been able to keep up my maintenance work as well as I would have liked. Also, I've learned something more important in the meantime: You better decide between maintaining messy code and rewriting messy code when you embark on a project like this. Today, if I were to go back to this project, I'd run two concurrent branches in a git repository: One with fixes to the old code, and one with conservative refactorings that don't change the deep structure of the product. Oh well, live and learn.