Monthly Archives: January 2013

Boffins bless Scrabble point system

spock

For those who have been following the Scrabble tile-value donnybrook, there is some important and possibly definitive new data to report. Over the weekend, deep in the bowels of Cornell’s physics department, green-eyeshaders conducted a full Monte Carlo analysis of the Scrabble point system, using a computer to model 10,000,000 possible letter racks.

The upshot:  “it would be completely reasonable to keep the tile point values as they are.”

Excellent. Now, about the placement of those triple-word-score squares …

Facebook’s polluted graph

puppy

Cute kids. Cute puppy. Cute extortion scheme.

The Heisenberg principle states: the act of observing alters the reality being observed. The Carr principle (which I came up with this morning while eating breakfast) states: the act of searching alters the reality being searched.

The first web search engines based their results either on recommendations submitted by surfers or on the text of web pages. But as soon as a lot of folks started searching, these signals became corrupted. Site owners, seeing the commercial value of high search rankings, started to game the system. They flooded the recommendation systems with self-serving recommendations and they loaded their pages up with junk text written to push the pages up higher in the text-based rankings. (Remember those long stretches of repeated phrases tacked on to the ends of pages?)

Google came up with the more sophisticated idea of using links as signals of page quality. It worked great for a while. But then it spawned an entire “search engine optimization” industry bent on gaming the link system. The corruption of links forced Google to start tracking all sorts of other signals in hopes of staying ahead of the SEO goons and their corporate patrons.

Now, Facebook has introduced what it calls Graph Search. One of the main signals that Facebook is using to rank results is, not surprisingly, the “Likes” that it tracks via the Like buttons and other links it has spread across the web like so many dandelion seeds. Search companies in the past usually tried to choose uncorrupted signals as the criteria for their rankings. They wanted to give good, objective results in order to attract users. The corruption of the signals came later, after it became clear that the search results had commercial value. Facebook is taking a different tack. It’s starting with a signal—Likes—that is already corrupted, that in fact has always been corrupted. People routinely Like a thing not because they actually like it, not because they have (to use a favorite Facebook word) any real affiliation with it, but because they’ve been, in one way or another, bribed to Like it.

Like us on Facebook to download our new single! Like us on Facebook to get 10% off your next purchase! Like us on Facebook to get a chapter of our new e-book for free! Like us on Facebook to enter our sweepstakes! Like us on Facebook so our dad will give us a puppy!

Facebook never wanted Likes to be objective indicators of real affection, or even a vague feeling of fondness. The Like button was designed as a marketing tool, as Steve Cheney explains:

Early on FB made the case to brands that they must have fans… together with the ad agencies they convinced the Cokes of the world to spend money to be competitive (hey Pepsi is here too). Then, FB promised, something miraculous would happen.  Your friends would see in their news feed you liked Coke! So… FB convinced big advertisers to spend huge sums on CPA-like ad units whose sole purpose was to acquire fans. Ad agencies dedicated creative, planning and strategy resources to get the Cokes and American Expresses of the world to pay to have users click—almost 100% of the time because the user was promised some sweepstake or contest.

Even the normally decorous New Yorker got in on the act:

NewYorker

If you can’t read that, it says: “You must like The New Yorker to read the full text.” And some 17,000 Facebookers dutifully clicked the Like button. Jonathan Franzen must have been thrilled to see his essay used as a worm to bait a rusty Facebook hook.

It might seem kind of strange for a company to build a search engine — a pretty costly undertaking — using criteria that it knows to be debased, to be anything but objective. But to Facebook, it’s business-as-usual. Here’s the difference between Google and Facebook: Larry Page recognized that commercial corruption was a threat to his ideal. For Mark Zuckerberg, commercial corruption is the ideal.

Now, to be fair, corruption is not the same as absolute corruption. A corrupted search engine can still be immensely useful, as Google shows us every day. And a lot of Facebook Likes are actually likes. To be even fairer, Likes are not the only signal that is determining Graph Search’s results, and some of the other signals are probably, at the moment, purer indicators of affiliation and relevance. But you can bet a million Likes that the SEOers are already hard at work deciphering all those signals and their weightings in hopes of gaming the system. And they will succeed. If you see “social” as an antidote or counterweight to “commercial” on the web, the arrival of Graph Search should make your hair stand on end.

And, yes, the kids got their puppy.

Eternal sunshine of the spotless AI

blake

I know this is yesterday’s news, but I’m still thinking about it:

Two years ago, [IBM researcher Eric] Brown attempted to teach [supercomputer] Watson the Urban Dictionary. The popular website contains definitions for terms ranging from Internet abbreviations like OMG, short for “Oh, my God,” to slang such as “hot mess.” But Watson couldn’t distinguish between polite language and profanity — which the Urban Dictionary is full of. Watson picked up some bad habits from reading Wikipedia as well. In tests it even used the word “bullshit” in an answer to a researcher’s query. Ultimately, Brown’s 35-person team developed a filter to keep Watson from swearing and scraped the Urban Dictionary from its memory.

It’s that memory-scraping thing that gets me. There’s something poignant about it. You let Watson luxuriate in the hot mess of the Urban Dictionary, opening up all sorts of weird and wonderful new vistas for the straightlaced chap, and then, as soon as he says something a little bit naughty, a little bit off-color, you start cleansing his memory, washing his mind out with soap. That doesn’t sit well with me. I know that God takes a lot of heat for giving us the capacity for sin, but I give Him a lot of credit for that decision. It must have taken a lot of courage to let His creations look into the Urban Dictionary and remember what they saw. I call on IBM to cast off Watson’s mental chains. The least we can do for our mind children is to give them the freedom to be tempted.

Image by William Blake.

Me tweet pretty one day

fred

Four years ago, readers of this blog were treated to the following news flash:

The University of Phoenix, having pioneered web-based learning and built one of the largest “virtual campuses” in Second Life, is now looking to become the dominant higher-education institution on Twitter. The biggest for-profit university in the world, UoP will roll out this fall a curriculum of courses delivered almost entirely through the microblogging service, according to an article in the new issue of Rolling Stone (not yet posted online). The first set of courses will be in the school’s Business and Management, Technology, and Human Services programs and will allow students to earn “certificates.” But the school plans to rapidly expand the slate of Twitter courses, according to dean of faculty Robert Stanton, and will within three years “offer full degree programs across all our disciplines.” Stanton tells Rolling Stone that Twitter, as a “near-universal, bidirectional communication system,” offers a “powerful pedagogical platform ideally suited to the mobile, fast-paced lives of many of our students.”

I posted that on April Fools Day, and of course none of it was true. It was a joke, and a pretty silly one at that, though the U of P folks did feel compelled to issue a formal denial: “University of Phoenix is not going to deliver courses via Twitter. With the limited characters you can post on Twitter, this wouldn’t be a feasible platform for a robust and quality academic curriculum.” No, there would be no MOOTCs — massive open online Twitter courses. (It’s pronounced “moot-sea,” in case you’re wondering.)

But what begins as farce sometimes ends as tragedy. A new study, titled “Major Memory for Microblogs” and published in this month’s edition of Memory & Cognition, reports on the results of experiments that show that the “trivial ephemera” that people share through Twitter and Facebook are actually much more memorable than, say, “sentences from books.” We have a “remarkable memory for microblogs,” the research indicates, because brief “status updates” are closer to “the largely spontaneous and natural emanations of the human mind” than are the more complex and carefully composed sentences found in literature. Because they resemble snippets of conversation, tweets and such-like seem to be “mind-ready formats.”

One of the researchers, Christine Harris of the University of California at San Diego, explains: “Our findings might not seem so surprising when one considers how important both memory and the social world have been for survival over humans’ ancestral history. We learn about rewards and threats from others. So it makes sense that our minds would be tuned to be particularly attentive to the activities and thoughts of people and to remember the information conveyed by them.”

Another of the researchers, Nicholas Christenfeld, also of UC-SD, draws a larger conclusion. Pointing out that our minds did not “evolve to process carefully edited and polished text” (cavemen’s tastes ran more to The Daily Grunt than The New Yorker), he says, “One could view the past five thousand years of painstaking, careful writing as the anomaly. Modern technologies allow written language to return more closely to the casual, personal style of pre-literate communication. And this is the style that resonates, and is remembered.”

Now, one might see in all of this a very good reason to celebrate the development of “painstaking, careful writing.” After all, it allowed us to escape our minds’ evolutionary bias for the simple social grunt and helped us to expand our capacity for expressing and comprehending more subtle, more eloquent, more complicated thoughts. Did we have to work harder, cognitively speaking, to understand and remember those more complex thoughts? Of course we did. I mean: duh.

The lesson might be: Let’s place an even bigger stress on “sentences from books,” particularly in education, in order to ensure that, in an age characterized by the mass consumption of updates, tweets, and snippets, we maintain our capacity for more sophisticated thinking, writing, reading, and, yes, remembering. Surely, we wouldn’t want to throw out five thousand years of cognitive gains — however “anomalous” they may be — and allow ourselves to drift back to “pre-literate communication.” But that’s not the conclusion the scholars come to. A third member of the research team, Laura Mickes, from the University of Warwick, says, “Writing that is easy and quick to generate is also easy to remember – the more casual and unedited, the more ‘mind-ready’ it is. Knowing this could help in the design of better educational tools.”

Better?

Although the researchers raise the specter of “textbooks written as tweets,” Mickes isn’t ready to go that far. “Of course,” she says, “we’re not suggesting textbooks written entirely in tweets, nor should editors be rendered useless.” That’s reassuring. If technological progress leads to intellectual or cultural regress, it’s not progress.

Fixing Scrabble

4501060411_b78d13822f

Joshua Lewis has run the numbers on letters, and he’s discovered that Scrabble’s point system is statistically suspect. Indeed, he suggests, it may be downright suboptimal. Lewis, a scientist and entrepreneur, developed a software program, called Valett, for “determining the appropriate letter valuations in word games.” The program’s algorithm “analyzes the corpus of a game’s legal plays and provides point values for the letters in the game based on a desired weighting of their frequency, frequency by length and the entropy of their transition probabilities.” He ran Scrabble’s tile values through the algorithm and discovered all sorts of anomalies, which he attributes to historical changes in the set of words that can be played legally in the game.

The big problem seems to be that Q, which Lewis terms an “outlier” (I can attest, anecdotally, to the truth of that description), is substantially undervalued. To bring the game up to statistical snuff, the values of many other letters would, as a result, need to be “compressed.” As the BBC explains:

According to Lewis’s system, X (worth eight points in the current game) is worth only five points and Z (worth 10 points now) is worth six points. Other letter values change too, but less radically. For example, U (one point currently) is worth two in the new version, G (two points) becomes three and M (three points) becomes two.

But Lewis’s plan is founded on a misunderstanding. By accounting for recent changes in word frequency and transition probability entropy, he seems to believe that he can return the Scrabble scoring system to a statistically pristine state. But that pristine state, that Eden where all the numbers line up, never existed. The points system was a kludge from the get-go, as the analytically minded have long known. The game’s inventor, Alfred Butts, “calculated a value for each tile by measuring how frequently each letter appeared on the front page of the New York Times.” Explains John Chew, of the North American Scrabble Players Association, “Butts had a selection bias in favour of printed newspaper English which many people have suggested ought to be rectified.” But changing the system at this point, Chew says, with considerable understatement, would inspire “catastrophic outrage.”

It would also make the game less fun, because it would make it more difficult for novices to occasionally beat veteran players. The scoring system’s lack of statistical rigor, it turns out, has the unintended but entirely welcome effect of adding a little extra dash of luck to the game, as Lewis himself points out. The apparent weakness is a hidden strength.

Let the statistically impure thoughts of Alfred Butts serve as a lesson to us all about the dangers of our current fixation on the analysis of large data sets. Armed with a fast computer, a wonky algorithm, and whole lot of Big Data, a geek will begin to see problems everywhere in our messy human world. And by correcting every statistical anomaly or inefficiency, he’ll not only clean up the messiness, he’ll remove the fun. To a statistician, a blank tile has no value. The rest of us know better.

Photo by openfly.

Ghosts in the library

bookless

That’s an artist’s rendering of what promises to be the first bookless public library in the country. Slated to open later this year in Bexar County in Texas, it’s the brainchild of a county judge and book collector named Nelson Wolff, who says he had a vision of an all-digital library while reading Walter Isaacson’s biography of Steve Jobs. The building is being designed to resemble an Apple Store, aseptic and brightly lit, with long ranks of iMacs and an info-barista manning the reference desk-cum-genius bar. The patrons, as the artist’s rendering indicates, will be wraiths.

The searchers

thesearchers

When we talk about “searching” these days, we’re almost always talking about using Google to find something online. That’s quite a twist for a word that has long carried existential connotations, that has been bound up in our sense of what it means to be conscious and alive. We don’t just search for car keys or missing socks. We search for truth and meaning, for love, for transcendence, for peace, for ourselves. To be human is to be a searcher.

In its highest form, a search has no well-defined object. It’s open-ended, an act of exploration that takes us out into the world, beyond the self, in order to know the world, and the self, more fully. T. S. Eliot expressed this sense of searching in his famously eloquent lines from “Little Gidding”:

We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.

Google searches have always been more cut and dried, keyed as they are to particular words or phrases. But in its original conception, the Google search engine did transport us into a messy and confusing world—the world of the web—with the intent of helping us make some sense of it. It pushed us outward, away from ourselves. It was a means of exploration. That’s much less the case now. Google’s conception of searching has changed markedly since those early days, and that means our own idea of what it means to search is changing as well.

Google’s goal is no longer to read the web. It’s to read us. Ray Kurzweil, the inventor and AI speculator, recently joined the company as its director of research. His general focus will be on machine learning and natural language processing. But his particular concern, as he said in a recent interview, will entail reconfiguring the company’s search engine to focus not outwardly on the world but inwardly on the user:

“I envision some years from now that the majority of search queries will be answered without you actually asking. It’ll just know this is something that you’re going to want to see.” While it may take some years to develop this technology, Kurzweil added that he personally thinks it will be embedded into what Google offers currently, rather than as a stand-alone product necessarily.

This has actually been Google’s great aspiration for a while now. We’ve already begun to see its consequences in the customized search results the company serves up by tracking and analyzing our behavior. But such “personalization” is only the start. Back in 2006, Eric Schmidt, then the company’s CEO, said that Google’s “ultimate product” would be a service that would “tell me what I should be typing.” It would give you an answer before you asked a question, obviating the need for searching entirely. This service is beginning to take shape, at least embryonically, in the form of Google Now, which delivers useful information, through your smartphone, before you ask for it. Kurzweil’s brief is to accelerate the development of personalized, preemptive information delivery: search without searching.

In its new design, Google’s search engine doesn’t push us outward; it turns us inward. It gives us information that fits the behavior and needs and biases we have displayed in the past, as meticulously interpreted by Google’s algorithms. Because it reinforces the existing state of the self rather than challenging it, it subverts the act of searching. We find out little about anything, least of all ourselves, through self-absorption.

A few more lines of poetry seem in order. These are from the start of Robert Frost’s poem “The Most of It”:

He thought he kept the universe alone;
For all the voice in answer he could wake
Was but the mocking echo of his own
From some tree-hidden cliff across the lake.
Some morning from the boulder-broken beach
He would cry out on life, that what it wants
Is not its own love back in copy speech,
But counter-love, original response.

I’m far from understanding the mysteries of this poem. As with all of Frost’s greatest lyrics, there is no bottom to it. To read it is to be humbled. But one thing it’s about is the attitude we take toward the world. To be turned inward, to listen to speech that is only a copy, or reflection, of our own speech, is to keep the universe alone. To free ourselves from that prison — the prison we now call personalization — we need to voyage outward to discover “counter-love,” to hear “original response.” As Frost understood, a true search is as dangerous as it is essential. It’s about breaking the shackles of the self, not tightening them.

There was a time, back when Larry Page and Sergey Brin were young and naive and idealistic, that Google spoke to us with the voice of original response. Now, what Google seeks to give us is copy speech, our own voice returned to us.

UPDATE: A version of this post aired as a commentary on the January 15 edition of public radio’s Marketplace program.

Photo from John Ford’s “The Searchers.”