Creeping Inevitability: Search Engines Index Tweets

The Fail Whale

Microsoft’s search engine Bing has struck a deal with Face­book and the hot micro-messaging ser­vice Twit­ter, a brash attempt to add real-time web updates to its search results in order to make Google look like a lum­ber­ing dinosaur.

While we’re still digest­ing the news of Bing adding Twit­ter to its search engine, Google has some news of their own: they’re about to do the same exact thing.

Search­ing Twit­ter traf­fic ini­tially seems kind of odd.  Then, after a bit of thought, it starts to sound a bit bet­ter.  Finally, after a lot of reflec­tion, it starts to ven­ture into the realm of the bizarre.  Here a quick sum­mary of the stages of acceptance:

1 — Odd­ity:  Why would you want to add Twit­ter traf­fic to a search engine?  Twit­ter is all about quick thoughts between you and your ten thou­sand clos­est friends.  Search engines are sup­posed to be good at answer­ing ques­tions, and most ques­tions require more than 140 char­ac­ters to answer com­pletely.  (I’ll blithely ignore the ques­tion of whether most peo­ple doing searches actu­ally care about com­plete answers.)  On first blush, it doesn’t seem like a great match.

2 — Sense:  So why would Google and Microsoft be inter­ested in index­ing tweets, then?  Assum­ing it’s more than just a PR chasing-buzzwords stunt (not nec­es­sar­ily a good assump­tion, but…) how would index­ing tweets add value to the core of their search busi­nesses?  Well, if you treat tweets more as meta­data than search data, it starts to make more sense.  A good per­cent­age of tweets con­tain links, and once you fil­ter out spam you’re left with a lot of links that have been determined–by actual humans!–to be inter­est­ing.  Machine intel­li­gence is great, but humans are still bet­ter at fig­ur­ing out which pages are worth­while and which aren’t…this is the idea behind Mahalo and the like.  If Google and Microsoft can mine that data to improve the qual­ity of their hits, index­ing tweets sud­denly makes more sense.

3 — Non­sense:  But if that’s the goal, why make the tweets them­selves search­able?  That’s going back to treat­ing the tweets as actual data again, which seems ques­tion­able at best.  If peo­ple start to see their tweets show up in search engines that will change the way Twit­ter is used.  Right now it’s treated as an ephemeral medium; incor­rectly in the­ory, since Twit­ter is already search­able, but given how well Twitter’s search engine works, it might actu­ally be true.  With sto­ries pop­ping up all over the place remind­ing peo­ple to be cau­tious about what they put on Face­book or MySpace, does Twit­ter really want to be included in the list of ser­vices to fear?  I’m sure there are good reasons–probably money–for Twit­ter to get involved in this deal, but it’s not with­out risk.

via Bing Part­ners With Twit­ter and Face­book for Real-Time Search and  BREAKING: Google Announces Search Deal With Twit­ter.

Digestion: Rethinking the Long Tail Theory

Photo by Amanda Gyllenhaal

Photo by Amanda Gyllenhaal

There’s a bit of dis­cus­sion right now about a work­ing paper com­ing from Ser­guei Netes­sine and Tom F. Tan at Whar­ton that’s won­der­ing how solid the Long Tail effect really is.  A lot the crit­i­cism seems to come down to some definitions:

Ander­son is also author of The Long Tail: Why the Future of Busi­ness Is Sell­ing Less of More. The key dif­fer­ence between the opin­ion of the book and the study by Whar­ton researchers is how they define “hits” and “niches.” In the book, Ander­son focuses on the def­i­n­i­tion of hits in absolute terms such as the top 10 or top 1,000 prod­ucts, while Netes­sine and Tan argue that, to take grow­ing prod­uct vari­ety into account, one has to define pop­u­lar­ity in rel­a­tive terms, such as the top 1% or top 10% of prod­ucts, to prop­erly assess the pres­ence or absence of the Long Tail.

The ques­tion of absolute v. rel­a­tive def­i­n­i­tions can obvi­ously be looked at either way, but it seems to me that the real ques­tion is not how many total prod­ucts are avail­able (rel­a­tive) but how many prod­ucts are avail­able that would not be were Net­flix not shoot­ing for the niches.  That is, if we define a hit as the top 1% and 3000 movies are stocked by a stan­dard brick and mor­tar com­pany that isn’t capa­ble of the logis­tics of being a Long Tail busi­ness, then the top 30 movies are the hits across the entire indus­try.  For there to be a mean­ing­ful com­par­i­son between stan­dard and Long Tail you’d have to con­sider that Long Tail is based on the premise that inven­to­ries are expand­ing and that is one of the things it is look­ing at, not try to cal­cu­late the expand­ing inven­to­ries into the def­i­n­i­tion of hits and niches.  So I guess I have to agree with Ander­son on that one.

Of course, this def­i­n­i­tional ques­tion doesn’t change some of the very good points that the paper brings up about how the Long Tail effect is being used now.  The most impor­tant one to me is the crit­i­cal­ity of rec­om­men­da­tion sys­tems in a Long Tail busi­ness.  All those niche prod­ucts are just over­head if con­sumers don’t know they’re there.  Net­flix is obvi­ously aware of the prob­lem, given that the data used in this study was released by Net­flix as part of a mil­lion dol­lar con­test to improve their rec­om­men­da­tion sys­tem.  Based on my own expe­ri­ence as a Net­flix cus­tomer, I have to say improve­ment is sorely needed–though I might ques­tion whether the rec­om­men­da­tion sys­tem itself is the issue or the hor­ri­bly non-browsable inter­face Net­flix uses.  (Well, really inter­faces plural, since a large part of the prob­lem is how they bounce back and forth between dif­fer­ent looks depend­ing on how you get to the data…but that’s a dif­fer­ent discussion.)

It makes me won­der how much social rec­om­men­da­tions are actu­ally use­ful for Net­flix.  I don’t use that sys­tem myself, and it wouldn’t be vis­i­ble in the data used in this study which was just of rat­ings data, but it seems like improve­ments to the social tools used by Net­flix would pro­vide a far supe­rior rec­om­men­da­tion sys­tem to the algo­rithms devel­oped in the com­pe­ti­tion.  For me, the issue is the lack of con­trol that Net­flix gives its cus­tomers.  For instance, I don’t have any abil­ity to choose which movies I’ve rated or rented will be vis­i­ble to which friends in any sort of gran­u­lar way.  There’s no offi­cial inte­gra­tion between the closed “Net­flix friends” com­mu­nity and other social net­works, at least that I can find on Netflix’s site.  That alone would be incred­i­bly valu­able; the idea of social net­work­ing is to make the per­son the cen­ter of knowl­edge, not the net­work, and Netflix’s friends don’t allow that.

via Rethink­ing the Long Tail The­ory: How to Define ‘Hits’ and ‘Niches’ — Knowledge@Wharton.

Passing Through: Linkbait Your Blog

Scan the tabloid rack for head­lines that make you want to shout, “Hey Martha, come see!” Try to cre­ate the same “must share this” effect in your own head­lines. Really, who can resist “fem­bots and the geeks who love them”?

via Linkbait Your Blog — Wired How-To Wiki.

Oh dear.  This is an excel­lent the­ory.  It’s also exactly the wrong way to think about it.  Maybe the tabloid rack isn’t the best place to look…

I’m not sure why, but the idea that this came out of a wiki post seems wrong to me.  Is tabloid really the direc­tion we want to go with social media?