First of all congratulations to Yahoo! “Rebel Alliance” for taking this first step on what might be a disruptive move to compete against the Don’t-Be-Evil Empire! (h/t Dave McClure who coined this phrase). Now you will have to Use The Force and go much further to power an industry of distinctively unique Alternative Search Engines, like Uptake. (Disclosure: we are not using BOSS but plan to evaluate BOSS Custom for our use on providing backfill results)

BOSS: one of Yahoo!’s last hopes

Photo courtesy: Revell.de

But it took more than an X-Wing to destroy the Death Star. It Took the Force.

Here are the salient features of BOSS and where we think it has to go further to truly power innovative new search experiences.

Three levels, but only BOSS Custom has real potential for a highly differentiated service offering.

There are three levels to the BOSS program, according to SearchEngineWatch:

  • self-service API
  • BOSS University for academics
  • BOSS Custom, designed for companies with their own ranking and/or presentation methodologies. Or alternative, companies with proprietary data that can help as an additional signal that factors into relevancy.

I’ll go over all the aspects of the BOSS program below, and then come back to BOSS Custom as evidence that Yahoo! just might Use The Force. But the basic features looks like a free version of Google Custom Search Engine.

Four primary functions of search are addressed by BOSS

TechCrunch highlights these 4 functions of search that BOSS provides as service:

This is a good framework. Lets start with indexing and crawling.

Crawling and Indexing: Not the real barrier to vertical search

According to Yahoo! Bill Michels, as reported by ReadWriteWeb, “niche search engines often aren’t very good because they have access to a very limited index of content. It’s expensive to index the whole web.” This is just wrong, in my opinion. It may be expensive but that isn’t why niche search engines haven’t been successful.

Lowering the cost of crawling can be good for startups, of course. However, with cloud services and vendors leveraging low cost computing and crawling, crawling is already becoming a commodity. In building Uptake, “buying the web wholesale” is the least of our worries especially because we are focused on one vertical.

We have used a stealth mode vendor to crawl the entire Web and build the most comprehensive structured database of travel products, at least 30% larger than any other source, and it didn’t take us $300 million to do that. According to SearchEngineLand, there is a vendor called CommonCrawl who aims to provide a complete index of the Web on a white label basis. So “buying the web wholesale” is possible today.

Building a repository of documents is only the beginning. The key is extracting meaning from the documents (and the relationships between the documents) to power your ranking algorithm.

Ranking: Some new ability, but still built on top of a black box

Building a ranking model for a specific purpose is probably the most difficult task for a search startup. BOSS can help jumpstart a new companies effort by providing an acceptable but not differentiated result.

Ranking: Add your own search signals so you can re-rank results

Proprietary signals are needed to deliver better precision through better ranking matched against user intent for any given query. BOSS breaks new ground by allowing you to add and blend your own search signals into Yahoo!’s black box. Example from SearchEngineLand:

Me.dium is the example highlighted by Yahoo! Silicon Alley Insider points out that Me.dium is adding social signals to Yahoo! ranking and calling it “Social Search” which ironically is “using a name Yahoo! has already attached to a failed product.” VentureBeat has a more positive spin on the Me.dium demo application, although Dan Kaplan concludes:

The question that hangs over Yahoo, BOSS, and Me.dium is whether or not any search player will really be able to change user behavior and get people to consistently use something other than Google; the results would probably have to be a noticeable leap forward, and even then, it would be hard to break Googling habits.

Dan is absolutely right, and BOSS will have to go a lot further to deliver a true “leap forward.”

Ability to blend results:

In addition to re-ranking, Yahoo! BOSS also allows you to blend results. This has potential for much more interesting SERPs that are more tailored for a specific vertical or search intent state.

Blend: Mashup Framework

As an added plus, Yahoo! is providing the BOSS Mashup Framework. According to Yahoo! Search Blog: “We’re releasing a Python library and UI templates that allow developers to easily mashup BOSS search results with other public data source”

Blend: Web, news, and image search availability at launch

This seems like table stakes. Vanessa Fox at SearchEngineLand points out that at first glance, the API looks similar to Google custom search API and Microsoft’s Live Search API

Query handling

This is also a big challenge for new search startups. Yahoo!’s service provides for this as part of the overall service. But as far as I can tell there is no way to insert one’s own query parsing into Yahoo!’s so as to affect the search results. UPDATE from Huanjin Chen: I guess a developer can always use his/her own GUI to get the user query and parse it and then form a Yahoo query. If so, the developer can insert his/her own query handling.

Presentation: Total flexibility on presentation

According to Yahoo! Search Blog: “Freedom to present search results using any user interface paradigm, without Yahoo! branding or attribution requirements” This means no attribution required! But this is a red herring because…

Business Model: Advertising strings attached

According to GigaOm, you have to use Yahoo! Search Advertising. And Om makes the point:

Notably, they are asking startups to sign up for their search monetization system — the very same system that is going to use Google to drum up ads. That isn’t a very confidence-inspiring move. And if this monetization tool was so great, Yahoo wouldn’t be in the kind of trouble it’s in.

Indexing of the Semantic Web not included.

Marshall Kirkpatrick of ReadWriteWeb asked if the indexing of the semantic web would be included, and they said not. But this is no big deal because semantic tags and microformats have yet to be adopted in a huge way by the most important sites, who would rather focus on traditional Search Engine Optimization (SEO) for Google rather than use immature tools to expose semantic meaning.

Business Model: Unlimited queries

Business model friendly pricing, although you have to sign up for their advertising platform.

BOSS Custom: Now you’re talking about Using The Force

Everything above was just for the Jedi Apprentice. But if you are going to confront the Dark Side, you need BOSS Custom. Here’s what they provide and why it is critical (and may or may not be far enough):

Near real-time indexing of public or proprietary content

Real-time indexing is appropriate for time-sensitive information like news and blogs. This could be an advantage over other wholesale crawling and indexing methods that other private label providers are providing.

Blending training datasets to produce advanced, customized ranking models that scale to the Web

This is probably one of the most interesting aspects of BOSS custom. This suggests that a training dataset and our own proprietary signals can be integrated into the Yahoo!’s existing ranking models. So the real question is: how much of that ranking model will be opened up for tuning by the search startup?

Federating web and proprietary content in a single search display

This can already be achieved using Google Custom Search Engine in a rudimentary way. The larger issue is that most proprietary content in structured databases are not just unstructured (or semi-structured) Web documents but more structured database objects with attributes. How does BOSS Custom make it easy to integrated the right Web pages with the right products, without really understanding what entities are on those Web pages?

Integrating query suggestions (Search Assist technology)

Query parsing services is one of the most interesting aspects of BOSS Custom. Search Assist is very well done already, and if it could be integrated with our custom, industry-specific ontology.

Leveraging highly trained query and document categorizers

Also very interesting, if indeed this is being opened up to developers. Our own query categorizers are still fairly early stage and if we could understand Yahoo!’s query categorizers work and could integrated our own understanding of natural language into that categorizer, this could be a real accelerator for us in parsing queries better.

Structured search (range queries, refinement)

This sounds interesting, but I’m sceptical that the horizontal approach to refinement will provide for a truly differentiated experience unless the specific needs of that customer, as captured in an ontology, can be integrated into the refinement controls provided for by BOSS Custom.

UPDATE: Here are some additional thoughts from our China search team and Huanjin Chen, our search architect:

Structured search and query/document categorizers can be useful tools for driving improved relevance for niche search engines. There are many ways of making search results more relevant. One is to understand the user intention better and refine the search, which structured search addresses. Another way is to build the ontology/taxonomy/directory to classify queries and documents, which can benefit from the categorizer.

Unlimited queries are essential to build a commercial search engine. Google search API might be more structured and flexible than Yahoo’s, but they only allow 50K queries per day. Potentially, a search engine could use both. One could build ontology using Google’s search API, and then use Yahoo BOSS for user search results.

Yahoo also does not allow access to their ranking signals. Agreed, this is a huge limitation, but it is understandable. The Yahoo ranking order is still useful. At minimum, one could use it as one of the relevance factors. If one can query Yahoo search engine in different ways, then one can kind of guess their ranking signals. The drawback is that multiple queries may be a performance concern.

Crawling is not a barrier to vertical search engine. It is true. However, first, it still makes an under-funded startup to get start easily; second, Yahoo data helps on time-sensitive data; third, some startups may want to try semi-horizontal market, such as content for kids and shopping, which is more costly to crawl.

New search startups have to take a different approach to differentiate from Google and Yahoo! How will BOSS support this?

Huanjin: In general, I think this is a very important event for all search startups. It opens new opportunities. Google has pushed search relevance to the limit that the current approach can achieve. To significantly improve the search relevance, we have to take drastically different approaches, i.e., not solely by keyword matching, not solely by statistical signals, and not treating every user the same. My take on the direction of search technology is

(1) Meaning based (semantic and/or syntactic).

(2) Context aware. The same word means different things in different contexts.

(3) Must be able to treat different users differently

(4) Opinion based.

(5) Leverage human knowledge base

The BOSS Custom roadmap needs to envision supporting these innovative approaches, because that is the only way that new startups can create a reason for people to leave their horizontal search engine like Google and adopt an alternative.

UPDATE:  Andrew Chen’s take is totally on target

Andrew Chen makes the point that Yahoo! BOSS just allows search mashups, and what Yahoo! really needs to do is open up their search and network traffic:

Andrew:  “The extreme approach – well not even that extreme these days, given Facebook – would be to let developers build extensions to the search engine that actually run on top of the *.yahoo.com domain. They can provide an API, do app approvals, and direct only small bits of traffic to each app to test them out – then ramp up the ones that perform better than anything else. There are difficult pieces necessary to make this work, but if done well, it has the potential to change the search game by letting developers target small groups of queries the way that advertisers have been able to.”

But when Marshall Kirkpatrick asked about this, Yahoo! responded “oh, that’s a different department.”  Will Yahoo! bring their disparate fiefdoms together into one integrated strategy to truly Use the Force?!

And much, much more…

This will be interesting. BOSS and BOSS Custom is a bold, dynamic approach to the ever-increasing hegemony of the Don’t Be Evil Empire! Competition is good and we wish Yahoo! the best of luck. Our advice is to Use The Force and go far enough to equip players like us to create highly differentiated experiences from general search.

Death Star here we come!

Source: Wikipedia