EMC Corp. unveiled its own distribution of open-source analytics software Apache Hadoop in conjunction with the annual conference in San Francisco by its security division RSA Tuesday. The solution integrates Hadoop with database technology by EMC's Greenplum.
Hadoop is a popular software framework for performing analytics on large data sets using clusters of servers – usually low-cost commodity servers. EMC made the announcement on the same day Intel Corp. announced its Hadoop distribution.
According to EMC, many consumer-focused Internet companies have already adopted Hadoop for analytics, building applications that are tightly coupled with data that's available to them, and taken advantage of its benefits. Enterprises are on the way to adopt the same model, the vendor believes.
Scott Yara, senior VP of products at EMC's Greenplum division, said Hadoop was a “big deal” for enterprises, holding “transformational” possibilities.
“Marrying the extraordinary capabilities of the Greenplum technology—essentially the Greenplum crown jewels—with this amazing open-source phenomenon has been no small feat,” he said. “But we're ‘all in' — investing in a manner that no other company is — to help catapult Hadoop into wide-scale adoption.”
EMC's distribution is called “Pivotal HD”. It offers native integration of the vendor's Greenplum parallel-processing database with Hadoop with a SQL interface and SQL processing.
The technology that delivers this integration is called “HAWQ”. It puts SQL parallel database on to of the Hadoop Distributed File System.
The SQL capabilities are a key aspect of the solution. They give it the ability to support standard query interfaces and to be used by SQL-trained analysts.
EMC bought Greenplum in 2010.