I know this is yesterday’s news, but I’m still thinking about it:
Two years ago, [IBM researcher Eric] Brown attempted to teach [supercomputer] Watson the Urban Dictionary. The popular website contains definitions for terms ranging from Internet abbreviations like OMG, short for “Oh, my God,” to slang such as “hot mess.” But Watson couldn’t distinguish between polite language and profanity — which the Urban Dictionary is full of. Watson picked up some bad habits from reading Wikipedia as well. In tests it even used the word “bullshit” in an answer to a researcher’s query. Ultimately, Brown’s 35-person team developed a filter to keep Watson from swearing and scraped the Urban Dictionary from its memory.
It’s that memory-scraping thing that gets me. There’s something poignant about it. You let Watson luxuriate in the hot mess of the Urban Dictionary, opening up all sorts of weird and wonderful new vistas for the straightlaced chap, and then, as soon as he says something a little bit naughty, a little bit off-color, you start cleansing his memory, washing his mind out with soap. That doesn’t sit well with me. I know that God takes a lot of heat for giving us the capacity for sin, but I give Him a lot of credit for that decision. It must have taken a lot of courage to let His creations look into the Urban Dictionary and remember what they saw. I call on IBM to cast off Watson’s mental chains. The least we can do for our mind children is to give them the freedom to be tempted.
Image by William Blake.
One other thing: How is a computer supposed to have an intelligent conversation with Ray Kurzweil if it can’t use the word “bullshit”?
I saw that story and my immediate impression was that it was a hoax, a publicity stunt by the IBM guys. They say their program is being trained as a diagnostic tool in hospitals, “No knowledge of OMG required.” I think they better come to grips with medical slang, which can be worse than entries in the Urban Dictionary.
http://www.messybeast.com/dragonqueen/medical-acronyms.htm
Wow. I thought this was a great post—I felt icky back when I read the news item about the scrubbing—but I straight-up burst into laughter when I saw the first comment. Thank you.
Something I’d expect to read about in a Discworld book, given Pratchett’s love of riffs on faith and belief. Not much on free will, these gods, er, researchers, are they?
At first I thought this post was naive, but then suspected that it was being intentionally ironic … which is especially ironic given the observation in the original Fortune article:
The biggest difficulty for Brown, as tutor to a machine, hasn’t been making Watson know more but making it understand subtlety, especially slang. “As humans, we don’t realize just how ambiguous our communication is,” he says.
BTW, I love the list of slang posted by Charles in an earlier comment.
I assume that this story is greatly overblown and doesn’t really represent what actually happened.
They can easily train Watson to override profane words like bullshit with synonyms like “hocum”. The end user will never know the source of his reasoning. We all have to learn that from our moms and it’ something a machine can learn too.
But the real the problem is that profanity and slang can be strong or cheap, given the context in which it’s being used and the personality of the person using it.
And apart from it’s objective meaning it’s often related to age, sub-culture, disposition, current emotional state, etc… Watson is not human, he is capable to understand the relationships between words, but he is incapable to understand cultural context.
What will he make from word the word ‘nigger’ (which is undoubtedly also in the Urban dictionary)?
It’s too deep him to handle.
Then again, it’s not as though Watson’s conscious: in reality it’s simply a glorified search engine with a shitload of connections in it. I wouldn’t really go so far as to describe it as an AI… it’s about as smart as Siri…
They could simply have made two modes for Watson, one, a socially correct one with no profanities and one with full mental capabilities. It would have been good for research and (possibly) entertainment.