Although Google's non-search engine products, like its Google Apps Web hosted collaboration and communication software suite get a lot of attention, Google's search technology and its companion ad system and network still generate most of the company's revenue.
At last week's Web 2.0 Summit, IDG News Service caught up with Marissa Mayer, Google's vice president of Search Products and User Experience, to chat about video search, semantic versus keyword search, Google's universal search effort and the challenge of indexing the "deep web."
What follows is an edited transcript of the interview:
IDGNS: There are different technology approaches to video search. Blinkx, for example, maintains it does it better than Google because it indexes the text of what is said in videos with speech recognition technology. Where is Google with video search today?
Mayer Google Video has had an interesting evolution. When we first launched it, it was based on closed captions, so literally a transcription of the program, but interestingly you couldn't play video. So we changed it so that you could play video and now we're searching the meta content. That said, one of the future elements of what's likely to happen in search is around speech recognition.
You may have heard about our directory assistance service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model that we can use for all kinds of different things, including video search.
The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG411 is about that: Getting a bunch of different speech samples so when you call up or we're trying to get the voice out of video, we can do it with high accuracy.
IDGNS: What about non-speech content in videos - the action in the clip?
Mayer That's going to be particularly hard, given that most of Google's approaches are based on text right now. So we really do need the text, which is why our inclination is to build a great speech-to-text model and pull the text out. That said, there are a lot instances where humour, context, things that happen in frame that don't necessarily have words, but for that we're going to have to rely on the community to do things like tagging.
There is some very early research happening around recognizing faces in videos, recognizing particular objects, understanding that hey, there's a ball in the frame right now, but it's very early and not at all ready to be deployed in a commercial application.
IDGNS: Some people criticize Google because its query-analysis technology is mostly based on keywords, as opposed to understanding natural language, like full sentences.
Mayer Right now Google is really good with keywords and that's a limitation we think the search engine should be able to overcome with time. People should be able to ask questions and we should understand their meaning, or they should be able to talk about things at a conceptual level. We see a lot of concept-based questions - not about what words will appear on the page but more like "what is this about?". A lot of people will turn to things like the semantic web as a possible answer to that. But what we're seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they're done through brute force.
When you type in "GM" into Google, we know it's "General Motors." If you type in "GM foods" we answer with "genetically-modified foods." Because we're processing so much data, we have a lot of context around things like acronyms. Suddenly, the search engine seems smart, like it achieved that semantic understanding, but it hasn't really. It has to do with brute force. That said, I think the best algorithm for search is a mix of both brute-force computation and sheer comprehensiveness and also the qualitative human component.
IDGNS: Where is the universal search effort at Google?
Mayer It is early stage and we're working on more radical things now. The team launched [universal search] in May. Books, images, news, video and local information have now been blended in [with general Web search]. The team is now devoting its time and energy to three different pieces. They're working really hard to internationalize universal and bring it to all of our different countries and languages, because it's English-only and largely in the US. They're working on bringing in [other vertical engines] like blog search, patents, scholar. And they're also looking at how to do more radical things in terms of overall ranking and relevance and user presentation, the user interface piece.
The reason why universal search was such a big change for us was that there were three [key] pieces [to adapt]. We had to change the entire infrastructure to make the thing cost effective. Then there's the ranking piece: Now that you have all these results, how do you order them? And the final piece was the user interface.
Now the infrastructure is in place and the engineers can finally get to have fun thinking about what they can do in terms of relevance and ranking, and user interface. With that third [user interface] piece, we're doing a lot of experimentation building a bunch of interesting prototypes of how universal search could play out this year, or two or three years out.
IDGNS: Is the ultimate goal to fold all these vertical tabs of news, image, video, book search and so on into the general web search query box?
Mayer We want people to think of the search box as one query box. That said, we do acknowledge that there are times when you know you want an image or a news story, so obviously we'll still have links to those independent indices. But you shouldn't have to use those if you're not an expert and you don't know what's there [in all our specialty search engines]. We'd like all of those [secondary indices] to be integrated into the main [web] search engine.
IDGNS: What's Google's take on all the so-called "deep web" content search engines can't get to for a variety of technical reasons?
Mayer The issue on "deep web" content is that it's usually in databases and [web] crawling isn't a great way of getting at a database. So we've been doing things like Google Base. Most databases allow people to do an XML feed off of them so you can do an XML output of your database and you can upload that database to Google and Google Base.
IDGNS: Are you making progress with that approach?
Mayer Yes, literally hundreds of millions of items have been uploaded into Google Base. So we're making progress indexing that data, but we're not doing a good enough job surfacing that data in the search results. So we have it, and if you go to Google Base you can find it but it's hard to figure out, from the universal search aspect, when it should be blended into the main search results.
IDGNS: From an engineer's point of view, is working on search engine technology still "cool" and professionally stimulating?
Mayer Search is still very much an unsolved problem. We're like six years into what's going to become a 500-year discipline. It's still very compelling and we have people joining Google every day who are really excited to work on search.
IDGNS: Three or four years ago, people were saying 'Google is great, it's a cash cow, but for people to switch to another search engine, there's no cost, in terms of effort or inconvenience.' But it turns out this hasn't happened. What's Google's view into this issue?
Mayer There are two non-obvious outcomes around the stickiness question. One is that it's true: If someone builds a better search, users will probably move over to that competing engine.]That keeps our employees, especially our engineers, really well motivated to make sure no one has better search. We have to prove to our users every day that we still have the best search.
The other non-obvious observation is that search is much stickier than most people realize. Intuitively, you know you can go to other search engines but people have a source they trust and it takes them a long time to feel like they're not giving anything up [by using another search engine.]
When you find a long lost friend or information about a medical condition that is hard to diagnose or [you experience] that "I'm feeling lucky" moment when this amazing website turns up first on Google, each of those moments stays in your mind. There's an attachment, a trust there that this tool really brought me what I needed. So the switching cost for search is a lot higher [than people think.]
IDGNS: Where do the carriers come in for mobile search? I guess you can bypass them if the user has a mobile browser. Or else you can choose to partner with the carriers for better integration of your services.
Mayer Certainly mobile web browsers do work. That said, the deeper integration is much more advantageous. We've seen that right away with the Google Maps service being embedded into the iPhone.
That said, downloadable applications like Google Maps for Mobile or Gmail for Mobile often do sidestep some of the limitations of the particular software that may come preinstalled with the phone. That's a good strategy to help people who have capable phones use those products. So you can partner with the carriers to get it embedded in the phone when it ships or you can have an application that people can download later.
IDGNS: I imagine there are all sorts of business dealings and arrangements that need to be worked out.
Mayer The mobile space is very complicated. We've got some very successful mobile partnerships in a number of countries. So we've been working with [telecom] partners but we've also been pursuing an alternate downloadable-application path.