Date: Mar. 2nd, 2012 04:26 pm (UTC)From: [personal profile] logomancer
logomancer: Xerxes from System Shock 2 (Default)
While Google's algorithms are Sooper Top Sekrit, my educated guess is that they use a modified version of Bayesian filtering with a near-neighbor finding algorithm to detect simple letter transpositions. When someone types in a word with few results in a search bar, Google checks to see if there's a word that's close with more results, and see if that's what the person meant. If they click on the link, that's a yes, and the probability of the two being linked together goes up. If enough of this happens, then the substitution happens automatically. Of course, if they click on a different link, that's a no, and the filter takes that into account. In the end, it's all statistical analysis and a massive ton of storage, which is how machine learning has progressed for years now.

Naming systems in computer databases are pretty damned inflexible, really, and not just for people who case their name differently; a lot of the computer systems assume you have a Western-style name, with a given name, a middle name, and a surname, and maybe a title and a suffix. Spanish/Portuguese-style names, for instance, with more than one given name and more than one surname, are not properly reflected in most database structures. Arabic names are similarly problematic, with one given name and one surname, but multiple patronymics. As database structures tend to endure unless there's something seriously wrong with them, I anticipate that this is a problem that will last for quite a while, sadly.
(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org

Profile

arethinn: glowing green spiral (Default)
Arethinn

April 2025

S M T W T F S
  12345
67 89101112
13141516171819
20212223242526
27282930   

Expand Cut Tags

No cut tags

Style Credit

Page generated Jul. 5th, 2025 05:17 am
Powered by Dreamwidth Studios