Cookie policy: This site uses cookies (small files stored on your computer) to simplify and improve your experience of this website. Cookies are small text files stored on the device you are using to access this website. For more information on how we use and manage cookies please take a look at our privacy and cookie policies. Some parts of the site may not work properly if you choose not to accept cookies.

sections

Slack versus saturation

  • Print
  • Share
  • Comment
  • Save

Maintaining effective infrastructure performance management and cost control can be defined as two sides of the same coin. Side one is the under-provisioning of data center resources, which can lead to unforeseen performance problems, while on the other side is the over-provisioning of data center resources resulting in IT efficiency concerns.

The business outcome that large enterprise and managed service providers seek to strike is service assurance that provisions resources at the right cost while achieving an optimal balance between performance excellence and capacity utilization. Consider some of the major issues that define how cloud administrators in the modern data center are constantly challenged: 

tightrope balance performance thinkstock photos eelnosiva

It’s a balancing act…

Source: Thinkstock / eelnosiva

  • Fast resolution of performance contentions, referred to as performance healing
  • Avoiding or minimizing resource saturation, referred to as capacity planning
  • Improving or maximizing resource under-utilization, referred to as efficiency optimization

It takes time…

The objective of a software-defined data center is to deliver SLAs (service level agreements) against application quality by minimizing capacity saturation while maximizing resource utilization.

Effective capacity planning involves determining whether or not to add resource capacity to avoid SLA-reducing performance contentions.. Meanwhile, effective efficiency optimization is about determining whether and how to balance resource usage to avoid SLA-reducing performance contentions. Both these proactive activities are typically performed regularly - usually daily - and if they are done perfectly, there will be no need for reactive performance healing, but doing so is extremely time-consuming and demands automation.

Running hot?

Theoretically, the same holds true for converged IT infrastructures. When an organization pays thousands of dollars for IT processing capabilities, it makes intuitive sense that management would want shared compute, network and storage resources running at clock speed. But when it comes to ‘running hot’, IT infrastructure efficiency is not always a best practice that is embraced.

According to a June 2015 study by sustainability consultancy Anthesis Group and Stanford University research fellow Jonathan Koomey, business and enterprise data center IT equipment utilization “rarely exceeds six percent.” Additionally, current data from the Uptime Institute reveal that in the US, “up to 30 percent of the country’s 12 million servers are ‘actually ‘comatose’ – abandoned by application owners and users but still racked and running, wasting energy and placing ongoing demands on data center facility power and capacity.” The Anthesis study used data from TSO Logic spanning an installed base of 4,000 physical servers and found that 30 percent of these servers proved to be, “using energy while doing nothing.”

Businesses that care about server efficiency and converged infrastructure ROI keep a steady eye on their resource utilization statistics. There is always the temptation to push resource consumption into the red and get the most bang for the invested buck. But the danger of maximizing utilization is obvious to anyone who has ever experienced the logjam of running a client system with 100 percent CPU or memory utilization. Typically, the ‘red line’ starts far before 100 percent utilization. In fact, IBM recently showcased a new server capable of holding 70 percent utilization without any performance impact. The obvious path around such stalling is to add more resources, and that leads to a second inevitable truth: ultimate performance carries incremental, hidden and unpredictable costs.

Takie a realistic view

Data center managers can already deliver core IT operational metrics to business-side managers in a language that makes sense. This can be done by continuously illustrating a realistic view into available capacity against overall efficiency; acceptable performance thresholds between running hot and wasted ‘headroom’; and a greater degree of granularity in terms of ROI from virtualization over maintaining legacy infrastructures.

But while most data center managers operate on a fixed budget, they are generally vulnerable to substantially over-provisioning hardware and SI consulting services, due to the lack of visibility of contemporary data center complexities. A more progressive view envisions a new generation of effective infrastructure performance management tools that can deliver 360-degree visibility so that all data center components are factored into the equation but also provide real world operational metrics by which to calculate capacity against efficiency. Longer term, this also facilitates more informed and intelligent IT investment decisions for a right-sized physical infrastructure and a non-biased, cross-silo analytics repository upon which to base service assurance delivery.

Efficiency is not about maxing out utilization, nor is it about achieving the highest possible MIPS, IOPS or any other standard metric. Technically, efficiency is about the ratio of useful work performed to the energy expended. What can you do to optimize your converged-infrastructure efficiency?

Here are five strategies you can start implementing right now to bring your organization much closer to optimal efficiency and long-term cost savings:

  1. Link capacity management with infrastructure performance
    Clearly, there are many variables to weigh when seeking to balance capacity and performance. Advanced service assurance analytics tools can inform these decisions with dashboard analysis that assess all available resources in the infrastructure and advises IT managers about suitable capacity and performance possibilities. And as measurement models focus on performance degradation, it’s possible to accurately find the levels at which capacity saturation yields performance loss.
  2. Identify workload type and infrastructure performance requirements Public cloud providers are particularly sensitive about efficiency. Consider the workload models of a Google Mail or Microsoft One Drive infrastructure. Ultra-fast responsiveness for a budget-oriented or even free app isn’t nearly as important as keeping infrastructure performance at modest, ‘good enough’ levels while prioritizing the lowest possible back-end cost of operation. Private clouds often support revenue-generating operations, so there is a higher emphasis on performance and responsiveness, even if it means sacrificing efficiency 
  3. Determine how much public cloud belongs in your mix 
    The benefits of public cloud infrastructure are well known and headlined by the duality of cost savings and greater control for infrastructure performance management, not to mention cloud compute resources that are cheaper than ever. However, shared infrastructure almost always carries an inherent performance penalty and the more critical the workload being placed on those shared resources, the higher the risk of incurring performance limitation and risk to cloud-sprawl and governance issues.
  4. Begin with end-user quality of experience (QoE) 
    Increasingly, the starting metric for infrastructure performance is end-user QoE. IT operations may assume that running at 70 percent capacity is acceptable, but if QoE reports start trickling in at 50 percent and become a torrent at 60-70 percent, then you need to know which fine-grain metrics to troubleshoot and stabilise.
  5. Establish baseline, then extrapolate service assurance 
    Gauging the ultimate capacity of a set of hybrid-cloud infrastructure resources can be very difficult if testing begins when that infrastructure is already under dynamic overload. The best way to obtain a solid, dependable baseline against which future assessments can be made is to start with a standard configuration running a base platform. After adding applications and VMs, deploy a real-time, live monitoring tool to beta-test how utilization characteristics change as load increases all the way up to current, ‘production-level’ utilization. With this in hand, you should be able to extrapolate the gap from present utilization to your threshold target.

Atchison Frazer is CMO of service assurance analytics firm Xangati 

Related images

  • tightrope balance performance thinkstock photos eelnosiva

Have your say

Please view our terms and conditions before submitting your comment.

required
required
required
required
  • Print
  • Share
  • Comment
  • Save

Webinars

More link