Big data player MapR has made the full production version of Apache Drill available, increasing the flexibility and ease of use of its Hadoop distribution.
Drill allows analytics on data sets which contain their own schema, making self-service analytics possible. MapR has been offering an early version of the framework, being defined under the auspices of the Apache Foundation, since September 2014, but has now delivered the production-ready 1.0 version.
No schema required
Hadoop distributes processing to the data, so large amounts can be processed at once, but it needs a framework, and the ecosystem contains plenty of options: “Apache Drill competes with Hive, Impala [the framework from MapR’s rival Cloudera], and SparkSQL,” MapR marketing manager Jack Norris told DatacenterDynamics.. “But the others need schemas set up. Drill can look at things which have embedded schemas like JSON (Javascript object notation), CSV or Parquet.”
The Drill framework is an open source version of the Dremel system developed within Google. The framework does not depend on Hadoop, and can be used outside of the popular big data ecosystems. It also distributes processing so it works well at scale, with a design goal of scaling to 10,000 servers or more and processing petabytes of data and trillions of records in seconds.
The framework is designed to handle Internet of Things (IoT) data, or streams of web clicks, and includes governance so it can be used for multi-tenant data lakes, or inside enterprises.
“We offer secure access to the same underlying file, so different groups see different data,” explained Norris. For instance, data scientists might see mass anonymized data. Granting permission at the access level is different from other approaches, he explained, and makes it easy to ensure privacy and prevent the spread of duplicated datasets.
“The availability of Apache Drill in the MapR Distribution is a major milestone for the SQL-on-Hadoop project, which is significant in delivering real-time insights from complex data formats without requiring any data preparation,” said Matt Aslett, research director, data platforms and analytics, 451 Research.
Among the partners praising the move, MicroStrategy’s CTO Tim Lang said: “With a minimal learning curve, Drill opens up more complex data sets to the end user who can instantly visualize and analyze new information using our advanced capabilities.”