How Project Mercury is eBay’s Big Data play

Phoenix data center designed for unprecedented volume of data

7 March 2012 by Yevgeniy Sverdlik - DatacenterDynamics

How Project Mercury is eBay’s Big Data play
Inside one of the modular data centers on the roof of Project Mercury. Image courtesy of eBay.

Note: this is Part 1 of our two-part online series on Project Mercury.

eBay no longer throws away any data its web properties generate. This data is mostly information about its users and their online behavior.

Part of eBay’s Project Mercury – the company’s brand-new data center in Phoenix, Arizona – was to advance its data center consolidation project. Another major part was to provide enough power density and scalability to handle the amount of data the company now stores and processes and the speed at which that volume of data grows.

A search engine in a container

In Phoenix, eBay has deployed a new search engine and a massive data analytics platform unlike anything its infrastructure and data-analytics teams have ever dealt with in the past.

eBay’s senior director of global foundation services Dean Nelson says work on the search engine, whose name the company is hesitant to make public, started in 2011. It is running on a high-performance computing cluster, half of which is housed in a Dell data center module on the roof of Project Mercury. The Dell Modular Data Center (MDC) is the module that has infamously shown one of industry’s lowest PUE numbers on one of Arizona desert’s hottest days of the year.

Come hear Dean Nelson speak about eBay’s Project Mercury at the DatacenterDynamics conference in New York on March 13th

The older search engine is going to be replicated and new functionality is going to be added to help you find that obscure R2-D2 action figure faster and easier. Both search and front-end functions are being expanded. The new engine will come online on all new equipment in parallel to the existing one, Nelson says.

HP PODs for Hadoop

Across the roof from the Dell box are two white-and-blue data center containers by HP. The PODs contain infrastructure to support eBay’s leap into the Brave New World of Big Bata. “We have gone all-in on big data, because data is absolutely king,” Nelson says. The company chose Hadoop for its data-analytics platform, and server nodes and local storage that handle it occupy about 80% of the two HP containers’ capacity. The rest of capacity is used by additional HPC nodes.

There are two 24-petabyte Hadoop clusters on the roof of Project Mercury. Thanks to these clusters and the largest data-warehouse expansion in the eBay’s history, the company’s data capacity grew by 500% over six months.

So why so much data?

The explosion in the amount of data eBay now has access to was caused by mobile computing and social networking, says Bob Page, eBay’s VP of data analytics platform. “It’s not just ‘What did someone buy?’, but ‘What did they bid on it?’ It’s also ‘Where were they at the time?’ It’s also ‘Who influenced them within their social circle?’ All that data is amassed.”

Page came to eBay about two years ago. At the time, the company was already best in class at warehousing transaction data, he says. It had not yet invested into analytics of users’ behavior, however, and Page’s background was in behavioral analytics.

eBay had access to behavior data but did not know what parts of it were useful and how to take advantage of it. As a result, the philosophy the company adopted soon after Page’s arrival was that no data would be thrown away at all. “We don’t know what’s going to be useful at some point in the future,” Page explains. At some point, an amount of data amassed grows to where its analysis as a whole becomes useful. This tipping point is not easily determined until a company actually amasses the appropriate amount of data.

Big Data’s infrastructure challenge

The big infrastructure challenges are storing all this data cost-efficiently while making sure none of it is lost. Page’s team re-evaluated the way eBay used to buy technology. Instead of the common practice of paying top-dollar to store all data on the best equipment money could buy, the team developed a three-tiered storage strategy it is using today.

One of the three data systems is a large enterprise-data warehouse that stores all the transaction data and the additional user data about transactions, such as user locations, devices used, etc.

In addition to data about transactions that happened, eBay analysts required data about what people did on the site before and after a transaction, or what they did on the site even if they did not end up buying anything. This behavioral-data piece churned out way more data than transactions alone ever had.

The two other warehousing systems handle the behavioral data. One of them is Hadoop and Page says it is a whole new way of treating data. Instead of storing data on disk in one place and moving pieces of it as they are needed on a separate processing system, Hadoop brings the processing to where the data sits, removing the need for time-consuming data transfers.

The first test run of a Hadoop cluster consisted of 400 nodes. It gave enough positive ROI for Page’s team to immediately decide to expand it. Two 24-petabyte clusters were deployed in 2011. Each node in a cluster has 12 2TB drives. The team expects to double the clusters in size sometime in 2012.

The clusters are currently aimed to store behavioral data collected over nine quarters.“Is that enough?” Page asks. “It’s not clear. Maybe we’ll need more.”

The third data warehousing system at the site is called Singularity. eBay built this 18-petabyte system together with Terradata. This system’s is also going to be doubled in 2012. The big goal of Project Mercury was to build a data center that would support this rate of growth.

The more you know…

With this amount of data and capacity in place, other kinds of data become useful too. This is the exponential nature of data analytics: the more you data you have, the more data beyond what you have becomes useful. These other types can be competitive data, data about the web or foreign exchange rates to name a few. Neither of them has traditionally been stored in data warehouses, but today they all have value to a company like eBay.

Read Part 2 of the series, which examines demand drivers for data center capacity at eBay.

Read more on Project Mercury in the latest issue of DatacenterDynamics FOCUS magazine.

CONNECT WITH US

Sign in


Forgotten Password?

Create MyDCD account

Regions

region LATAM y España North America Europe Em Português Middle East Africa Asia Pacific

Whitepapers View All