The “human operated nodes” of Amazon’s Mechanical Turk may soon have competition, at least when it comes to identifying objects in photographs. Researchers at the University of California at San Diego are making progress in developing a machine-learning approach that enables computers to automatically interpret photographs and other images, reports Technology Review.
As described in a paper that appears in the latest issue of IEEE Transactions on Pattern Analysis and Machine Intelligence, the system, called Supervised Multiclass Labeling (SML), combines semantic, or text, labels that describe an image’s contents with a statistical analysis of the image. A computer is first trained to recognize an object – a tree, say – by being shown many images containing the object that have been labeled, or tagged, with the description “tree” by people. The computer learns to make an association between the tag and a statistical, pixel-level analysis of the image. It learns, in effect, to spot a tree, regardless of where the tree happens to appear in a given image.
Having been seeded with intelligence, the computer can then begin to interpret images on its own, applying probabilities to what it “sees” (eg, “there is an 80% probability that this picture contains a tree”). As it interprets more and more images, the computer becomes smarter and the tags it applies to images more accurate. The computer-generated tags can then be used as the basis for an automated image-search service.
As shown in the example below, the labels a trained computer applies to images bear a disconcertingly strong resemblance to the tags that people give:
In fact, according to the researchers – Nuno Vasconcelos, Gustavo Carneiro, and Antoni Chan of UCSD and Pedro Moreno of Google – the tags generated by the machines can be more precise than those assigned by people because people tend to be less rigorous and more subjective than computers. People’s tags contain a lot of noise, as do the searches that are based on them. The authors write:
When compared with previous approaches, SML has the advantage of combining classification and retrieval optimality with 1) scalability in database and vocabulary sizes, 2) ability to produce a natural ordering for semantic labels at annotation time, and 3) implementation with algorithms that are conceptually simple and do not require prior semantic image segmentation. We have also presented the results of an extensive experimental evaluation, under various previously proposed experimental protocols, which demonstrated superior performance with respect to a sizable number of state-of-the-art methods, for both semantic labeling and retrieval.
Tests of the SML system at Google “indicate that the system can be used on large image collections,” according to Chan. In a brief video, Vasconcelos explains the system’s workings and says that the technique can be applied to other machine-learning challenges, such as teaching computers to understand sounds or read text. Give computers a little intelligence, and there’s just no stopping them.
Ahem!
Care to revisit this post from last June now? To what end, indeed?