Amazon's announcement of a cloud-based data mining and analysis service, using the Hadoop implementation of MapReduce, potentially opens advanced business intelligence (BI) activities to many more businesses and organizations. It's an excellent example of just how much cloud computing can change the world.
In essence, the service, Amazon Elastic MapReduce, if it works as advertised, abstracts the complexity and cost of massive parallel and symmetrical programming and processing so non-computer scientists -- you know, business types -- can examine and query huge data sets.
Think of it as having your own tuned supercomputer that you can plug gigantic data sets into and ask questions that will determine the course of your businesses for the next decade. Oh, and you can pay for the pleasure on a credit card.
This high-end BI value has pretty much been the sole purview of large, skilled and deep-pocketed enterprises. But there are plenty of people, researchers, government agencies, academics, small to medium enterprises, venture capitalists and the like that would hugely benefit from sussing out important trends and findings from the growing reams of raw data generated by modern businesses and societies. Talk about metadata on steroids! Here's another way to use social networks, folks.
For more on the business implications of MapReduce and advanced BI, take a look at a podcast I recently moderated. For more on the more technical aspects of what MapReduce-oriented computing means, there's a second podcast discussion.
Given the intriguing price points Amazon is providing, this service could be a game-changer. It will likely force other cloud providers to follow suit, which will make advanced BI services more available and affordable for more kinds of tasks. I can even imagine communities of similarly interested user parties sharing query formulations and search templates of myriad investigations. A whole third-party BI consulting and services industry could crop up virtually overnight.
It will interesting to see if Business Intelligence 2.0 types of analysis can also be brought to the service, through third parties or even outright products that leverage the cloud BI services in the background.
Their pitch: We can bring what Google does for the Web to your entire universe of data. For any of your users. Oh, and we can bring other useful and available data sets into the mix, too. And you can afford this. Your executives can figure out how to use it directly. No lab coats required.
Governments and legislators in particular -- which have access to huge stores of publicly financed data -- could significantly drop the cost of providing data and analysis services to the masses. As I understand it, the federal and state governments are a bit better at creating data than leveraging it in near real time. As in, the once a decade census data takes almost 10 years to get published. This could help that a lot.
Part of the challenge will be getting to the data and making the largest -- sometimes in the petabyte scale -- sets available to a service like Amazon's. The garbage-in, garbage-out parable does not change. And moving and managing these large sets is not trivial.
What's more trust remains a hurdle. For sensitive data, the handling and security of the bits need to be managed. But if a sales force trusts it's daily grind to Salesforce.com, perhaps other sensitive data too has a place on someone else's cloud fabric.
For those that can get access to good data on matters of importance to them, and perhaps do unique joins against other data sets, this cloud--based BI development could be a boon. Things that were never possible at any price are now doable.
With Amazon's move, the important BI tasks moves up away from cost-inhibitors and the infrastructure access pain to the data access, quality and query development skills levels, where it belongs.
Particularly in this economy, taking the risk out of weighty business and market decisions -- at an affordable cost on someone else's cloud fabric -- is a no brainer.