Tag: AltSearchEngines

Yahoo! BOSS in the right direction, but only BOSS Custom goes far enough for radical search innovation

First of all congratulations to Yahoo! “Rebel Alliance” for taking this first step on what might be a disruptive move to compete against the Don’t-Be-Evil Empire! (h/t Dave McClure who coined this phrase). Now you will have to Use The Force and go much further to power an industry of distinctively unique Alternative Search Engines, like Uptake. (Disclosure: we are not using BOSS but plan to evaluate BOSS Custom for our use on providing backfill results)

BOSS: one of Yahoo!’s last hopes

Photo courtesy: Revell.de

But it took more than an X-Wing to destroy the Death Star. It Took the Force.

Here are the salient features of BOSS and where we think it has to go further to truly power innovative new search experiences.

Three levels, but only BOSS Custom has real potential for a highly differentiated service offering.

There are three levels to the BOSS program, according to SearchEngineWatch:

  • self-service API
  • BOSS University for academics
  • BOSS Custom, designed for companies with their own ranking and/or presentation methodologies. Or alternative, companies with proprietary data that can help as an additional signal that factors into relevancy.

I’ll go over all the aspects of the BOSS program below, and then come back to BOSS Custom as evidence that Yahoo! just might Use The Force. But the basic features looks like a free version of Google Custom Search Engine.

Four primary functions of search are addressed by BOSS

TechCrunch highlights these 4 functions of search that BOSS provides as service:

This is a good framework. Lets start with indexing and crawling.

Crawling and Indexing: Not the real barrier to vertical search

According to Yahoo! Bill Michels, as reported by ReadWriteWeb, “niche search engines often aren’t very good because they have access to a very limited index of content. It’s expensive to index the whole web.” This is just wrong, in my opinion. It may be expensive but that isn’t why niche search engines haven’t been successful.

Lowering the cost of crawling can be good for startups, of course. However, with cloud services and vendors leveraging low cost computing and crawling, crawling is already becoming a commodity. In building Uptake, “buying the web wholesale” is the least of our worries especially because we are focused on one vertical.

We have used a stealth mode vendor to crawl the entire Web and build the most comprehensive structured database of travel products, at least 30% larger than any other source, and it didn’t take us $300 million to do that. According to SearchEngineLand, there is a vendor called CommonCrawl who aims to provide a complete index of the Web on a white label basis. So “buying the web wholesale” is possible today.

Building a repository of documents is only the beginning. The key is extracting meaning from the documents (and the relationships between the documents) to power your ranking algorithm.

Ranking: Some new ability, but still built on top of a black box

Building a ranking model for a specific purpose is probably the most difficult task for a search startup. BOSS can help jumpstart a new companies effort by providing an acceptable but not differentiated result.

Ranking: Add your own search signals so you can re-rank results

Proprietary signals are needed to deliver better precision through better ranking matched against user intent for any given query. BOSS breaks new ground by allowing you to add and blend your own search signals into Yahoo!’s black box. Example from SearchEngineLand:

Me.dium is the example highlighted by Yahoo! Silicon Alley Insider points out that Me.dium is adding social signals to Yahoo! ranking and calling it “Social Search” which ironically is “using a name Yahoo! has already attached to a failed product.” VentureBeat has a more positive spin on the Me.dium demo application, although Dan Kaplan concludes:

The question that hangs over Yahoo, BOSS, and Me.dium is whether or not any search player will really be able to change user behavior and get people to consistently use something other than Google; the results would probably have to be a noticeable leap forward, and even then, it would be hard to break Googling habits.

Dan is absolutely right, and BOSS will have to go a lot further to deliver a true “leap forward.”

Ability to blend results:

In addition to re-ranking, Yahoo! BOSS also allows you to blend results. This has potential for much more interesting SERPs that are more tailored for a specific vertical or search intent state.

Blend: Mashup Framework

As an added plus, Yahoo! is providing the BOSS Mashup Framework. According to Yahoo! Search Blog: “We’re releasing a Python library and UI templates that allow developers to easily mashup BOSS search results with other public data source”

Blend: Web, news, and image search availability at launch

This seems like table stakes. Vanessa Fox at SearchEngineLand points out that at first glance, the API looks similar to Google custom search API and Microsoft’s Live Search API

Query handling

This is also a big challenge for new search startups. Yahoo!’s service provides for this as part of the overall service. But as far as I can tell there is no way to insert one’s own query parsing into Yahoo!’s so as to affect the search results. UPDATE from Huanjin Chen: I guess a developer can always use his/her own GUI to get the user query and parse it and then form a Yahoo query. If so, the developer can insert his/her own query handling.

Presentation: Total flexibility on presentation

According to Yahoo! Search Blog: “Freedom to present search results using any user interface paradigm, without Yahoo! branding or attribution requirements” This means no attribution required! But this is a red herring because…

Business Model: Advertising strings attached

According to GigaOm, you have to use Yahoo! Search Advertising. And Om makes the point:

Notably, they are asking startups to sign up for their search monetization system — the very same system that is going to use Google to drum up ads. That isn’t a very confidence-inspiring move. And if this monetization tool was so great, Yahoo wouldn’t be in the kind of trouble it’s in.

Indexing of the Semantic Web not included.

Marshall Kirkpatrick of ReadWriteWeb asked if the indexing of the semantic web would be included, and they said not. But this is no big deal because semantic tags and microformats have yet to be adopted in a huge way by the most important sites, who would rather focus on traditional Search Engine Optimization (SEO) for Google rather than use immature tools to expose semantic meaning.

Business Model: Unlimited queries

Business model friendly pricing, although you have to sign up for their advertising platform.

BOSS Custom: Now you’re talking about Using The Force

Everything above was just for the Jedi Apprentice. But if you are going to confront the Dark Side, you need BOSS Custom. Here’s what they provide and why it is critical (and may or may not be far enough):

Near real-time indexing of public or proprietary content

Real-time indexing is appropriate for time-sensitive information like news and blogs. This could be an advantage over other wholesale crawling and indexing methods that other private label providers are providing.

Blending training datasets to produce advanced, customized ranking models that scale to the Web

This is probably one of the most interesting aspects of BOSS custom. This suggests that a training dataset and our own proprietary signals can be integrated into the Yahoo!’s existing ranking models. So the real question is: how much of that ranking model will be opened up for tuning by the search startup?

Federating web and proprietary content in a single search display

This can already be achieved using Google Custom Search Engine in a rudimentary way. The larger issue is that most proprietary content in structured databases are not just unstructured (or semi-structured) Web documents but more structured database objects with attributes. How does BOSS Custom make it easy to integrated the right Web pages with the right products, without really understanding what entities are on those Web pages?

Integrating query suggestions (Search Assist technology)

Query parsing services is one of the most interesting aspects of BOSS Custom. Search Assist is very well done already, and if it could be integrated with our custom, industry-specific ontology.

Leveraging highly trained query and document categorizers

Also very interesting, if indeed this is being opened up to developers. Our own query categorizers are still fairly early stage and if we could understand Yahoo!’s query categorizers work and could integrated our own understanding of natural language into that categorizer, this could be a real accelerator for us in parsing queries better.

Structured search (range queries, refinement)

This sounds interesting, but I’m sceptical that the horizontal approach to refinement will provide for a truly differentiated experience unless the specific needs of that customer, as captured in an ontology, can be integrated into the refinement controls provided for by BOSS Custom.

UPDATE: Here are some additional thoughts from our China search team and Huanjin Chen, our search architect:

Structured search and query/document categorizers can be useful tools for driving improved relevance for niche search engines. There are many ways of making search results more relevant. One is to understand the user intention better and refine the search, which structured search addresses. Another way is to build the ontology/taxonomy/directory to classify queries and documents, which can benefit from the categorizer.

Unlimited queries are essential to build a commercial search engine. Google search API might be more structured and flexible than Yahoo’s, but they only allow 50K queries per day. Potentially, a search engine could use both. One could build ontology using Google’s search API, and then use Yahoo BOSS for user search results.

Yahoo also does not allow access to their ranking signals. Agreed, this is a huge limitation, but it is understandable. The Yahoo ranking order is still useful. At minimum, one could use it as one of the relevance factors. If one can query Yahoo search engine in different ways, then one can kind of guess their ranking signals. The drawback is that multiple queries may be a performance concern.

Crawling is not a barrier to vertical search engine. It is true. However, first, it still makes an under-funded startup to get start easily; second, Yahoo data helps on time-sensitive data; third, some startups may want to try semi-horizontal market, such as content for kids and shopping, which is more costly to crawl.

New search startups have to take a different approach to differentiate from Google and Yahoo! How will BOSS support this?

Huanjin: In general, I think this is a very important event for all search startups. It opens new opportunities. Google has pushed search relevance to the limit that the current approach can achieve. To significantly improve the search relevance, we have to take drastically different approaches, i.e., not solely by keyword matching, not solely by statistical signals, and not treating every user the same. My take on the direction of search technology is

(1) Meaning based (semantic and/or syntactic).

(2) Context aware. The same word means different things in different contexts.

(3) Must be able to treat different users differently

(4) Opinion based.

(5) Leverage human knowledge base

The BOSS Custom roadmap needs to envision supporting these innovative approaches, because that is the only way that new startups can create a reason for people to leave their horizontal search engine like Google and adopt an alternative.

UPDATE:  Andrew Chen’s take is totally on target

Andrew Chen makes the point that Yahoo! BOSS just allows search mashups, and what Yahoo! really needs to do is open up their search and network traffic:

Andrew:  “The extreme approach – well not even that extreme these days, given Facebook – would be to let developers build extensions to the search engine that actually run on top of the *.yahoo.com domain. They can provide an API, do app approvals, and direct only small bits of traffic to each app to test them out – then ramp up the ones that perform better than anything else. There are difficult pieces necessary to make this work, but if done well, it has the potential to change the search game by letting developers target small groups of queries the way that advertisers have been able to.”

But when Marshall Kirkpatrick asked about this, Yahoo! responded “oh, that’s a different department.”  Will Yahoo! bring their disparate fiefdoms together into one integrated strategy to truly Use the Force?!

And much, much more…

This will be interesting. BOSS and BOSS Custom is a bold, dynamic approach to the ever-increasing hegemony of the Don’t Be Evil Empire! Competition is good and we wish Yahoo! the best of luck. Our advice is to Use The Force and go far enough to equip players like us to create highly differentiated experiences from general search.

Death Star here we come!

Source: Wikipedia

AltSearchEngines post: Alts Living in a Google World

I just guest-posted this over at AltSearchEngines.com, so I thought I’d share this with the UpTake travel and search industry blog readers too. Enjoy!

UpTake.com: Alts Living in a Google World

Judging from the intellectually stimulating discussion I had with 30+ alternative search engines at the recent AltSearchEngines-sponsored meet-up in San Francisco, there is no question that a renaissance of innovation is coming from the Alts. Many founders of Alts seem to be motivated by the idea that “they can do it better than Google.” This is a great motivation during the stealth-R&D phase. But when it comes time to go to market and get site traffic, we at UpTake believe the Alts should follow this maxim, inspired by Madonna’s classic “Material Girl“:

Madonna Like a Virgin“Living in a Google World

Some Alts kiss me, some Alts hug me
I think they’re o.k.
If they don’t give me proper traffic
I just walk away”

link to album image: http://www.madonna.com/bin/galImg/siteFiles/4820374586.

So Alts, don’t kiss and hug me with your advanced technology and buzzwords. Just deliver the goods: traffic!

Four Tips on how you can better live in a Google World

We at UpTake (formerly known as Kango), know that we are living in a Google world. As an AltSearchEngine, that means you need to play by the rules Google has set for the game, if you want to be found, and you want to compete. Here are four tips on how you can get more traffic in a Google World:

TIP ONE: Focus on crafting rich “search engine results pages” (SERPs) that look like category pages, not SERPs.

Does Google index Yahoo! and MSN SERP pages? Enough said. They have been crystal clear on this point: they don’t want to index your SERP pages either! So the solution is to provide rich, crawlable landing pages that don’t look like SERP pages. Here’s how we did it for San Francisco Hotels and Things to Do in New York. In addition to the typical search engine “blue links”, we added images, copy, and other useful information for our users. Focus on looking like Amazon or another e-commerce player that has successfully indexed pages in Google.

TIP TWO: Provide a browseable catalog that is organized in some sort of semantically logical fashion, so that other crawlers can crawl your site!

Chances are your search engine doesn’t create easily crawlable pages. This is not unique to search engines; it is also the problem of most dynamically generated Websites like e-commerce sites. Solve this problem by creating an accessible “browse-tree” [sitemap] of your pages, categorized in a semantically logical fashion. For example, we organize by states like Florida and New York, and cities like Orlando and Chicago. We also created category pages like Lodging and Things to Do. Don’t do a laundry list of alphabetically organized deep searches. Instead, look to e-commerce sites in your vertical to see how to organize your browse tree. Hint: just using a sitemap.xml is not enough!

TIP THREE: Get lots of links to your site! Be willing to talk about things that are interesting but not focused on your search engine.

We set out to create a great travel search application. But then we discovered that in order to rank in Google you need lots of inbound links! One of our founders applied his snarky sense of humor toward this with a satire “what if Google had to design for Google“. Then it got Sphunn. Then it got Dugg 4822 times! Then Battelle mentioned it. This blog post was our most popular, and most linked to post in the company’s history.

TIP FOUR: Have original content.

Have original content. One way to do that is simple-blogging. It may be strange to think that a SearchEngine should have a blog, but it should. A blog is an excellent way of putting your personality on the web, and attracting new customers through a more traditional method: subscription and word of mouth. Also Google will not crawl pages of content that are not original. If you are just displaying web content from other Websites (just like Google), Google will not want to show more intermediate navigational pages beyond their own SERPs. They want to actually take users to the rich content they seek. Therefore, you must also create rich content that address the keywords that people are using in order to attract them to your search engine through ranking on Google search results.

Summary

To be successful as an Alt these days, not only do you need a great search experience with unique technology, but also pursue a lot of other traffic strategies not really related to building that search experience to attract customers. Google has defined the rules. Our policy at UpTake is to learn them, love them, and give ourselves a great shot at success by living well in the Google world!

Custom Search

The Vacation Bloggers

BlogCatalog Viewers

MyBlogLog Readers

Meta