Tuesday, June 9, 2009

Greenplum speeds creation of 'self-service' data warehouses with Enterprise Data Cloud release

Greenplum has charged headlong into cloud computing with this week's announcement of its Enterprise Data Cloud (EDC) Initiative, which aims to bring "self-service" provisioning to data warehousing and business analytics.

The San Mateo, Calif. company, which provides large-scale data processing and data analytics, says its new initiative, as well as the general availability of Greenplum Database 3.3, improves on costly and inflexible solutions that have dominated the market for decades. [Disclosure: Greenplum is a sponsor of BriefingsDirect podcasts.]

Greenplum's goal: To foster speedy creation of vast data warehouses by non-IT personnel in either public or private cloud configurations. The value of data warehouses and the business intelligence (BI) payoffs they provide are clear. And Greenplum is correct in identifying that creating warehouses from disparate data sources has been difficult, expensive and labor-intensive.

At the heart of the EDC initiative is a software-based platform that enables enterprises to create and manage any number of data warehouses and data marts that can be deployed across a common pool of physical, virtual, or public cloud infrastructures.

The key building blocks of the platform include:
  • Self-service provisioning: providing analysts and database administrators (DBAs) the ability to provision new data warehouses and data marts in minutes with a single click.

  • Massive scale and elastic expansion: the ability to load, store, and manage data at petabyte scale, and dynamically expand the size of the system without system downtime.

  • Highly optimized parallel database core: a parallel database that is optimized for business intelligence (BI) and analytics and that is linearly scalable.
Greenplum Database 3.3 is the latest version of the company's flagship database software, which adds a wide range of capabilities to streamline management and enhance performance. Among the enhancements aimed at DBAs and IT professionals:
  • Online system expansion: the ability to add servers to a database system and expand across the new servers while the system is online and responding to queries. Each additional server adds additional storage capacity, query performance and loading performance to the system.

  • pgAdmin III administration console: an enhanced version of pgAdmin III, which is the most popular and feature-rich open-source administration and development platform for PostgreSQL.

  • Scalability-optimized management commands: a range of enhancements to management commands, including starting and stopping the database, analyzing tables, reintegration of failed nodes into the system. This is designed to improve performance and scalability on very large systems.
Database 3.3 is supported on server hardware from a range of vendors including HP, Dell, Sun and IBM. The software is also supported for such non-production uses as development and evaluation on Mac OSX 10.5, Red Hat Enterprise Linux 5.2 or higher (32-bit) and CentOS Linux 5.2 or higher (32-bit).

As part of the EDC initiative, Greenplum is assembling an ecosystem of customers and partners who embrace this new approach and are collaborating with Greenplum to create new technologies and standards that leverage the capabilities of the EDC platform. Early participants deploying EDC platforms on Greenplum Database include Fox Interactive Media/MySpace, Zions Bancorporation and Future Group.

I think that BI vendors will want to join in allowing Greenplum, among others, to refine and advance the notion of data warehouse "middleware" layers. This takes a burden off of IT, which can focus on providing virtualized resource pools in which to deploy solutions such as Greenplum's.

As commodity hardware is used to undergird these virtualized on-premises clouds, the total costs contract. And, as we've seen with Amazon, Rackspace and others, the costs for moving data to a third-party clouds offers other potentially compelling cost advantages, even as scale issues about moving data around and security concerns are being addressed.

The automated warehouse layer approach benefits the BI vendors, as their tools and analystics engines can leverage the coalesced cloud-based data that Greenplum provides. The more and better the data, the better the BI. Cloud providers, too, may examine Greenplum with an eye to prtoviding data warehouse instances "as a service," a value-added data service opportunity to expand general cloud services.

And, of course, the biggest winners are the business analysts and business managers -- at enterprises as well as SMBs -- who can finally get the insights from massive data pools that they long for and a price they can realistically consider.

There will be a building symbiotic relationship between cloud computing and such data warehousing solutions as Greenplum's Enterprise Data Cloud. The more data that can become housed in accessible clouds, the more need to access, manage and provision additional data for analysis pay-offs.

And the more tools there are for leveraging cloud-based data, the more value there will be to moving data to clouds ... on so on. The chicken-and-egg relationship is clearly under way, with solutions providers like Greenplum offering a needed catalyst to the ramp-up process.