Not bloomy enough

| No TrackBacks

Apparently, I use Mac OS X’s system-wide spellchecker more than others (or misspell things more), since I’ve noticed a bunch of false positives it’s suggested to me and it doesn’t seem like others have written about it. Examples:

(Never mind the irony of not knowing how to spell “knowledgeable” and posting to the world my ability to make myself “embarrassed.”)

I know that most spellcheckers are implemented with Bloom filters. They’re a really cool data structure that can test for membership in a set with the benefit that you don’t actually have to store the element in the Bloom filter (among other benefits). The main downside is that you aren’t guaranteed to be correct; there can be false positives (times when you say an element is in the set when it actually isn’t).

It seems like that’s what’s happening with Mac OS X here—too many false positives. My guess is that its suggestion algorithm looks something like this:

if word not in spelling_bloom_filter
    for each suggestion_word with edit distance = 1 from word
        if suggestion_word in spelling_bloom_filter
            add suggestion_word to suggestions

This would result in tons of words being thrown at the Bloom filter, which, if nearing capacity for the number of hash functions it has in it, could result in false positives happening relatively frequently. Does anyone have a better idea of what’s going on?

The way Apple could fix this is slimming down their list of words to test. Perhaps use a smarter suggestion algorithm that takes into account the keyboard layout of the user? Ideas?


Update: Fixed a spelling mistake in my rant on spellcheckers. <sigh>

No TrackBacks

TrackBack URL: http://tr.ashcan.org/mt/mt-tb.cgi/39