Version 12c of Oracle’s venerable database launched in July 2013, with the intention of competing against the rising tide of in-memory databases, most notably SAP’s HANA. It then launched the following September, with the intention of really competing against HANA, this time with the promise of a new database engine architecture that would meld the columnar storage of big data systems with the tuple- or row-oriented architecture of relational databases.
If a Saturn V rocket has three stages, then after Tuesday, Oracle’s rendition of the Saturn V has no more launches left.
At a company occasion the company described as a “launch event”, Oracle CEO Larry Ellison re-introduced a bewildered group of spectators to version 12c, this time with a ‘real’ in-memory option, and this time with the promise of “orders of magnitude” greater performance than even Oracle’s own previous incarnation.
And this time, it will be given full general availability in July, Ellison said.
“If you read all the university papers, all the research institutes, all the discussions of in-memory databases,” remarked Ellison “you’ll see that they were primarily designed to increase the speed of queries, to increase the speed of analytics and report writing. And by one, two, sometimes three orders of magnitude... The first goal: Run queries 100 x faster, to deliver real-time analytics. People think that’s the goal of in-memory databases. Well, that’s where most in-memory databases begin and end, that first goal. That was not our only goal. We wanted to do that, but we did not want to compromise, slow down our transaction processing portion of the database. In fact, as long as we’re speeding up our queries, why don’t we speed up our OLTP at the same time?”
Going off the Cloud
The original definition of in-memory database architecture is the ability to store, query, retrieve, and process data entirely from a pool of memory. Ellison has a notorious love/hate relationship with the Cloud; in fact, cloud architectures can cease to exist for Ellison, when it suits him. Tuesday, the cloud was off again, as Ellison actively avoided uttering the word. He chose instead to focus on Exadata servers whose various blades’ memory can be pooled together to run 12c’s relaunched in-memory architecture.
“We’ve had this software out in the hands of our customers for quite some time,” Ellison admitted “and the results they’re getting are quite stunning... We’ve got a variety of quotes from customers who’ve been using this technology for a long time, and they’re getting one, two, three orders of magnitude performance improvement.”
When in-memory databases first arrived on the scene just a few years ago, and some traditional analytical applications were accelerated by several orders of magnitude, the response from traditional data warehouse vendors and Oracle partners was that the results were “uncorroborated”, and that the claims themselves were “deliberately disingenuous and intended to obfuscate.” Since that time, these tremendous speed gains have become facts of life, and along with partners including Teradata, Oracle has had to scramble.
But Oracle’s strategy was not to copy SAP, or to simply produce an in-memory component and call it an “accelerator.” In fact, it may not be technically accurate to call 12c an in-memory database since it does not actually store the entire database in memory. Tuesday, Ellison described this fact as a feature.
“This thing works in scale-out,” he said. “Now, there are other in-memory databases. But they don’t scale out very well, or at all. And we had to make [a decision] and we said this thing had to be completely transparent with all your applications.”
Because Exadata clusters are comprised of multiple blades, each with its own memory, Ellison said. “The way our in-memory database has to work is we have to have part of the in-memory database on this machine, part on this machine, part on this machine. We have to be able to partition the column store, if you will, and spread it across the different rack nodes, different real application cluster nodes, and it all has to work. Just throw a switch, and it needs to run faster. So you can take advantage of the memory of lots of inexpensive machines.”
DRAM costs money, explained Ellison, and storage is relatively cheap. “Oracle’s in-memory database doesn’t require the entire database be in memory. It’s smart. It keeps the active part in memory. If there are inactive pieces of the database, they don’t have to be in memory. They can be in flash. If they’re really inactive, they can be on disk. So it’s got a memory hierarchy that it automatically manages. So the Oracle in-memory option is economical, in that it doesn’t require you to buy DRAM to hold every last byte of your data in your in-memory database.”
In-memory databases to this point do not have to store and process batches of transactions in cycles. As a result, on account of the way relational databases have typically worked, those SAP HANA applications that truly realize five or six orders of magnitude of speed gains have been re-architected for the new system. Re-architecture is typically not something that enterprises are compelled to do willingly, or even slightly begrudgingly.
Oracle clearly perceived this as a potential weakness on HANA’s part; and Tuesday, Ellison seized upon the opportunity to exploit it.
With Oracle’s new engine architecture, no database functionality or applications need to be redesigned whatsoever, said the CEO, for them to take full advantage of the new 12c’s speed gains. This is because 12c stores data using both tuple and columnar models simultaneously, and then leverages the columnar data cache (which 12c does not vigorously log, because it’s not transactional) as an index for the tabular system. In fact, Ellison promises the new cache can replace between 3 and 20 simultaneous indexes that traditional RDBMS models would typically generate, with a scheme from the world of big data which is already as much as 20 x faster at retrieval just by itself.
“You’re just going to operate your business differently when you can get this kind of information,” Ellison said. “You’re just going to ask more frequent questions, more complicated questions. You’re going to do much more optimization. It’s not just, this one thing runs faster, but you’re going to change business processes as a result of having this information so quickly... You’re going to become a real-time enterprise.”