Back in 2020, we began publicizing the findings of our paper for the IEEE Transactions on Sustainable Computing, one of the largest peer reviewed journals in the sector. Called “Optimizing Server Refresh Cycles: The Case for Circular Economy With an Aging Moore's Law”, it made the case for optimizing server estates at machine level by benchmarking new and older generation servers. It demonstrated that simply replacing machines with the latest model and assuming this would halve energy bills no longer made sense. The findings of the paper could and should have had transformational impacts on how IT equipment was purchased, configured and optimized.

How did the IT and data center community react to this news – with a bang or a whimper? Frankly, it barely reacted at all. We gave keynote speeches at Net Zero events, tech summits and international conferences and each time it was met with a “So what?” Despite mounting evidence that refreshing wholesale with the latest generation of equipment no longer resulted in a doubling of efficiency, the sector carried on with Business as Usual.

For another view, read co-author Rabih Bashroush of the Uptime Institute's view from 2022.

Reading the industry press over recent months that attitude may well have to change. Chip manufacturer AMD and tech giants Google and Nvidia have both internally accepted the decline in Moore’s Law for years. The organizations have independently redesigned hardware and application loads as a result.

Sea change

In July 2022, AMD announced the development of new chip technology to boost server performance and efficiency. Moving from large monolithic chips the company moved to smaller chips or chiplets. In doing so, they have been able to increase efficiency by 60 percent over the last four to five years while tripling performance. This compares to Intel’s traditionally designed chips, which have gained 40 percent performance and stalled on efficiency. The interesting part is why AMD chose to do this.

As outlined in our paper, CPU architecture enjoyed sharp gains in performance per Watt when it began using multi-core technology. Eight cores in each processor meant eight times the data it was able to deal with at the same time. As the number of cores doubled with each successive generations, so did the processing power of the chip. We saw large benefits with this until 2015. However, the chips themselves do not get any bigger. There is a tipping-point, where manufacturers either run out of space for new cores, or the cores run out of space for data. Looking at CPU efficiency trends after 2015 shows only incremental improvements in performance and efficiency at maximum load and decreased efficiency in low power mode.

AMD’s response to this has been to rework the layout of the chips. Whereas rival Intel’s latest generation contains 40 cores that connect to external processors and peripherals, AMD chunks the processing across nine chiplets, eight containing eight cores and one that is a single chiplet connectivity hub. In doing this, AMD has made a trade-off that optimizes efficiency for cloud and HPC applications at the cost of scale-up applications such as database processing and database applications. Rather than designing one chip for all workloads, the design favours one usage in one particular context.

AMD has now become a great choice for cloud, IT services (such as desktop virtualization and transaction processing). Looking at performance data across multiple makes and models of servers, it is clear why this is. Our company, Interact, specializes in hardware recommendations for medium and large data centers and enterprises. We analyze the estate using a machine learning application, measure performance per Watt and total performance capability of the different machines and make specific recommendations for energy and cost savings over a given period. We regularly see AMD based systems as the highest performance and performance per Watt. However, we only found out recently that Google, which makes its own chips, has gone on a similar journey in-house.

Application specific hardware

It recently became public that Google also noticed the decline in Moore’s Law in 2015 and also redesigned chip architecture. In April 2021, industry press outlets Cnet, Ars Technica, and DCD covered the story, based on a scientific paper in ASPLOS, which revealed that YouTube had begun making a specialized video clip after realizing that applications would be less optimal and therefore less profitable – as a result of the slowdown.

The specific function they required (namely transcoding for YouTube to minimize storage for video) was not well served by off the shelf CPUs. So, YouTube decided to build its own chips following the Bitcoin miner model of using ASICs (application-specific integrated circuits) to create hardware that was the most efficient for the task required. YouTube started using the VCUs (video coding units) in 2018.

The eventually-announced chips were able to deliver 20-33 times the performance of traditional hardware for their specific function.

The chips were able to do this because they specialized in delivering video streams to all types of devices. The full uploaded file would play on a monitor or TV. Smaller versions are needed for a laptop, and even smaller versions for a smartphone. Whereas before the data center would have needed three different machines to host this transmission, the VCU could do it with one. Google also developed another ASIC, the tensor processing unit (TPU) for artificial intelligence applications.

Enterprise data centers

Google can make wholesale changes because it has huge resources and good oversight of the workloads its data centers manage. Smaller operators, who may not own the servers in their facility or control the workloads they run, need to adopt a different approach. With the processor landscape becoming more complex, the configuration of machines and specification according to a specific workload are of the upmost importance. Trade-offs might be required but if you know what you want to do then you can pick the hardware to match. The "one size fits all" approach to procurement doesn’t make sense and the options available to make the most of your hardware are massive.

The challenge is to accurately assess the efficiency of machines, and then get to the more granular level of measuring performance for workload clusters. Very old machines, whilst a huge energy draw for the compute power they deliver, can be replaced with later generations and achieve the performance and energy efficiency gains needed. Upgrades and reconfiguration can also play a part in achieving the same performance for a fraction of the energy and cost.

Many enterprise data centers do not own or run their own servers, so it is seen as someone else’s problem. There is also a tendency to go with what you know, and expect the laws that were true ten years ago are just as true today. However, this represents a tremendous, missed opportunity.

Business as usual results in huge contributions to the growing e-waste stream (53m tonnes worldwide at latest count). It also has an impact on our depleting resources of rare materials like cobalt, lithium, and tantalum.

Subscribe to our daily newsletters