Cloud computing & song scrobbling on Last.fm
I must admit, I'm extremely late to this whole Last.fm thing but then again, I'm not sure if I've missed much. From what I can tell, Last.fm is a music discovery site that collects and maintains your "musical footstream" while allowing you to share your information with the community. The more you produce, participate, and listen within the social network, the more you are rewarded with new music, friends, and community relationships. So in a nutshell, it's a music social network founded on a beefed up Pandora-like jukebox. I can't exactly say that I use all the social networking tools -especially considering how much time I already spend on Facebook- but I appreciate the effort. Truthfully, the only thing I really care to do is scrobble. For those of you unaware of the verb, to scrobble means to automatically add the tracks you play to your Last.fm profile using a piece of software called a scrobbler. In business terms, scrobbling turns a user's listening history into a commodity and Last.fm's public API allows for other
companies/users/developers to partner and create.
That said, what I appreciate most about Last.fm is how they overcome the technical challenge of collecting, harvesting, and computing data from Scrobbling users. Just think about it, how in the hell does Last.fm collect 40 million scrobbles per day? Can you just imaging the kind of computational power you need to collect all that music without crashing? It kinda' reminds me of this time I worked on a high-profile Wordpress blog that went down in flames after 100,000 people tried to view it in a single hour. LOL. Those were the days, when your company used 2 servers max to host their site.
Well apparently, the way Last.fm manages 40 million unique users/mo and 800 scrobbles per second is through cloud computing. Cloud computing -at this point- shouldn't be a surprise to anyone but what I find fascinating is how the team is mixing and matching a variety of technologies including Hadoop (Link 2) (Link 4) and dumbo to provide site stats and metrics, charts, reporting, neighbors, recommendations, indexing, evaluation, and data insights *.
So what does this mean? It means that in order for the web to continue supporting the current trend of free user-data, technology is having to -once again- change its trajectory and look at new distributed systems to meet the demand. On the bright side, technologies like Hadoop are also being used to generate pretty art.