A Google of One : Yes and No

A Google of One

It’s just another phase in the continued commoditization of infrastructure. It started a decade or two ago with the OS and has been followed in quick succession by the web and J2EE stacks.

While not free, cloud computing offerings by the likes of Amazon.com have definitely opened the realm of possibility to the average developer.

Hadoop is just the next logical procession in all of this, an implementation of something Google pioneered and has been running for 4 or 5 years now. Continued contributions to the project from Yahoo! (assuming they continue to open-source the majority of their improvements) and others can only strengthen it’s appeal.

In the case of Google, they’ve been very successful because of what they’ve been able to build on top of their infrastructure. They have long realized the importance of well architected distributed systems and their developers see those benefits today. As developers, it’s amazing how simple things are when you’re able to think about them serially. Google and Yahoo! understand that, so should we.

The resources that used to be precious are now freely available. So then what becomes the bottleneck? What becomes the skill that is valued?

Two things:

* Implementation. The ability to leverage all the pieces together effectively.

* Business Acumen. By this, I mean having a smart business plan. If you are going to pull this off, you are going to need to cover the cost of using the cloud computing and the initial servers. Yet if you can come up with a way to make enough money off of each individual using the service, it could potentially be self sustaining. Google and Yahoo could even help you compete against them by supplying you with revenue from ads.

I’ll add Availability of useful and interesting data to this list. You can have all the processing power in the world but if it’s not directed towards solving interesting problems, you’ll have a very small audience.

RIP: Netscape (Mosaic/Navigator/Communicator/Browser)

Nearly 14 years after the once mighty browser made its first desktop appearance as Mosaic Netscape 0.9, its disappearance comes as little surprise. Although Netscape accounted for more than 80 per cent of the browser market in 1995, the arrival of Microsoft’s Internet Explorer in the same year brought stiff competition and surpassed Netscape within three years.

Looks like 9.0.0.6 will be the last release. I must say that’s a long way from the .9/1.0 releases I spent my teenage years supporting at the local ISP.

Large-scale Hadoop @ Yahoo! Search

Jeremy Zadowny has mention of a very large scale Hadoop deployment over at Yahoo! Search.

It makes sense given their previous commitments and investments to the project but it’s also cool in a way to start seeing some significant migrations to the framework.

Over on the Yahoo! Hadoop blog, you can read about how the webmap team in Yahoo! Search is using the Apache Hadoop distributed computing framework. They’re using over 10,000 CPU cores to build the map and processing a ton of data to do so. They end up using over 5 petabytes of raw disk storage, eventually outputting over 300 terabytes of compressed data that’s used to power every single search.

Another interesting quote from Eric Baldeschwieler (Senior Director, Grid Computing):

This process is not new (see the AltaVista connectivity server). What is new is the use of Hadoop. Hadoop has allowed us to run the identical processing we ran pre-Hadoop on the same cluster in 66% of the time our previous system took. It does that while simplifying administration. Further we believe that as we continue to scale up Hadoop, we will be able to scale up our production jobs as needed to larger cluster sizes.

Pretty impressive.

As part of this announcement, Jeremy has posted an interview he did with a couple of the webmap and grid computing people. The video feed seems quite slow right now so you’ll have to be patient.

Update: The video feed seems much better now. Check it out.