The Yin and Yang of Augmented Reality


One annoying thing about my Blackberry Curve 8900 smartphone is the fact that I have to buy a $20 3GB media card in order to shoot video from my phone. While making the purchase at Batteries Plus in Fort Lauderdale the sales agent confirmed that his latest Droid comes with 16 GB card pre-loaded. He smiled as handed me the sales receipt.

Curious, I asked him what was currently his favorite Droid app. Somewhat predictably he replied that Goggles by Google was his current go-to application.

The Yang

Google Goggles was first announced in 2009 with much fanfare as part of the increasing hype around augmented reality or more specifically, visual photo recognition search.

Still in its infancy this technology allows a Droid user to snap a photograph of any landmark, product, logo – or whatever – and boom: Your presented with Google search results for that image.

Face recognition, while theoretically possible, has been put on hold by Google as it works out a future privacy policy. In the meantime, just about everything else is open season.

The agent demonstrated this capability by taking a photograph of my media card bar code with his Droid and sure enough some results were returned.

Unfortunately, they were not the right results. Undeterred he brought out a cereal box of Coco Pops (Don’t ask!) and sure enough this time it worked. From what I gather there appears to be about a 75% accuracy rate with this technology.

Certainly it was cool, but I asked the sales agent whether he was actually using it in any real-world sense.

He replied that he was currently in the market for a new home and that each house he liked he would snap with Goggles which would bring up the online real estate listing which he saved for future review. Now, that I can relate to!

According to Google this is how the technology works:

We first send the user’s image to Google’s datacenters
We then create signatures of objects in the image using computer vision algorithms
We then compare signatures against all other known items in our image recognition databases; and
We then figure out how many matches exist; and
We then return one or more search results, based on available meta data and ranking signals; and
We do all of this in just a few seconds

Typical Google ingenuity which relies on the power of the algorithm or “Yang” to create meaning from pixelized signatures.

The Yin

The “Yin’ is the more traditional approach of relying upon humans — sometimes thousands of them — to make sense of your question or “image” and then supply the result.

This combination of “crowdsourcing” and “Human Intelligence,” coupled with technology, is the hallmark of rival apps such as “Amazon Remembers” which relies on an invisible army of signed-in Amazon users identify your images.

[See crowdsourcing article for further background on this subject]

Amazon calls them Mechanical Turks and claim these online worker bees can still do some things better than computers (visualize Google) such as identifying objects in a photo or video, performing data de-duplication, transcribing audio recordings, or researching data details.

Fortunately, unlike Google Labs, Amazon was kind of enough to release their “experimental” toy on the Blackberry App Exchange for me to test.

I was itching to compare the results against Google Goggles not only in terms of results but also in terms of speed. I quickly downloaded the Amazon app which was basically a souped up mobile search browser sweetened by the “Remembers” photo tab.

An Easy Kill?

Since I am an avid reader I decided to give Amazon an easy “kill”. At 2.27.p.m EST on Sunday, 27 July 25, 2010 I snapped a shot of the front cover of “The Talent Code” by Daniel Coyle using the Amazon app and anxiously awaited the results.

Sure enough at 3.34 p.m EST I was alerted by email of a “Match”. I logged into my app and clicked the product details to view the outcome. Unfortunately, just like Goggles came up short for the sales agent and my media card, so did Amazon.

The Mechanical Turk had decided that the image was actually a book by Geoff Colvin called “Talent Is Overrated”.

I decided to give Amazon a chance to redeem itself. I turned the “Talent Code” book over and snapped the ISBN number on the back cover. 15 minutes later the result came back: “The Bar Code Book: Fifth Edition – A comprehensive Guide to Reading, Printing, Specifying and Evaluating by Roger Palmer.

Oh well, I guess the collective crowdsourcing brain may not be firing on all cylinders on a Sunday afternoon.

Eventually Amazon got its act together when it successfully identified a picture of my Apple Magic Mouse which was retailing on the Amazon store for $83.

It’s curious that both Goggles and Amazon fluffed what should be the easiest image search of all: A bar code scan.

Both Amazon and Google have gone to great lengths to stress the experimental nature of both the Yin and Yang approaches respectively

Ironically, the real power may lie in a blended approach which incorporates the best of Yin and Yang.


  1. America, Please Get Ready To Swarm | Jason Stevens - 05. Aug, 2010

    […] from “Information,” exemplified in the two different approaches offered by products such as Google Goggles and Amazon’s “Mechanical Turks”, respectively. One relies exclusively on automation; the other on the human analysis. Both, […]

Leave a Reply