Monday, September 29, 2008

Greenplum pushes envelope with MapReduce and parallelism enhancements to its extreme-scale data offering

Greenplum has delivered on its promise to wrap MapReduce into the newest version of its data solutions. The announcement from the data warehousing and analytics supplier comes to a fast-changing landscape, given last week's HP-Oracle Exadata announcements.

It seems that data infrastructure vendors are rushing to the realization that older database architectures have hit a wall in terms of scale and performance. The general solution favors exploiting parallelism to the hilt and aligning database and logic functions in close proximity, while also exploiting MapReduce approaches to provide super-scale data delivery and analytics performance.

Greenplum's Database 3.2 takes on all three, but makes signigficant headway in embedding the MapReduce parallel-processing data-analysis technique pioneered by Google. The capability is accompanied by new tooling to extend the reach of using the technology. The result is Web-scale analytics and performance for enterprises and carriers -- or cloud compute data models for the masses. [Disclosure: Greenplum is a sponsor of BriefingsDirect podcasts.]

The newest offering from the San Mateo, Calif.-based Greenplum provides users new capabilities for analytics, as well as in-database compression, and programmable parallel analytic tools.

With the new functionality, users can combine SQL queries and MapReduce programs into unified tasks executed in parallel across thousands of cores. The in-database compression, Greenplum says, can increase performance and reduce storage requirements dramatically.

The programmable analytics allow mathematicians and statisticians to use the statistical language R or build custom functions using linear algebra and machine learning primitives and run them in parallel directly against the database.

Greenplum's massively parallel, shared-nothing architecture fully utilizes each core, with linear scalability to thousands of processors. This means that Greenplum's open source-powered database software can scale to support the demands of petabyte data warehousing. The company's standards-based approach enables customers to build high-performance data warehousing systems on low-cost commodity hardware.

Database 3.2 offers a new GUI and infrastructure for monitoring database performance and usage. These seamlessly gather, store, and present comprehensive details about database usage and current and historical queries internals, down to the iterator level, making this ideal for profiling queries and managing system utilization.

Now that HP and Oracle have taken the plunge and integrated hardware and software, we can expect that other hardware makers will be seeking software partners. Obviously IBM has DB2, Sun Microsystems has MySQL, but Dell, Hitachi, EDS and a slew of other hardware and storage providers may need to respond to the HP-Oracle challenge.

On Greenplum's blog, Ben Werther, director, Professional Services & Product Management at Greenplum, says: "Oracle has been getting beat badly in the high-end warehousing space ... Once you cut through the marketing, this is really about swapping out EMC storage for HP commodity gear, taking money from EMC's pocket and putting it in Oracle's."

It will also be interesting to watch as bedfellows and evaluated from Microsoft/DatAllegro, what happens with Ingres, whether Sun with MySQL can enter this higher end data performance echelon. This could mean that players like Greenplum and Aster Data Systems get some calling cards from a variety of suitors. The Sun-Greenplum match-up makes sense at a variety of levels.

Stay tuned. This market is clearly heating up.