Country missing? Please select your nearest region...
1E Chief Technologist Mark Blackburn discusses a new metric from the Green Grid at looking at server efficiency he says can greatly improve data center
14 March 2012 by Mark Blackburn - 1E
The data center industry still needs to understand how efficiently compute resources of individual servers are being utilized. If a metric around this was put in place, managers would be able to better understand which servers need to be decommissioned, switched off or virtualized, improving overall efficiency.
When looked at from the perspective of individual microprocessor instructions, every instruction processed obviously has some functionality and could be considered useful. This is how most computer systems treat work — as a percentage of time doing anything other than idling — referred to as utilization.
The success of server virtualization initiatives in improving efficiency has mainly been measured by the increase in overall processor utilization. CPU utilization, however, only shows how much work is being done, not if it is useful.The Green Grid has developed a mechanism for determining the proportion of work that provides primary services — server compute efficiency (ScE) — which can be aggregated to determine the compute efficiency of the entire data center, data center compute efficiency (DCcE).
A server is usually commissioned to provide one or more specific services. These are referred to as the primary services for that server. Secondary, tertiary and similar services may be optional or required to provide a primary service, but if the primary service did not exist there would be no reason for the secondary and tertiary services to continue running. Therefore, if the server is doing anything other than the work for which it was commissioned that work is not a primary service. It is possible to work out how much of an individual server’s utilization is dedicated to the primary service and, from this, ascertain the proportion of primary services that the server is supporting.
There are several problems with this :
• Not all servers have well-defined processes performing primary services.
• In large data centers, tracking and recording which processes on which servers would require a lot of administrative overhead.
• The method does not take into account the proportion of underlying operating system activity.
In any well-run data center, the processes providing secondary, tertiary and other services usually form a relatively small set of well-known maintenance and monitoring tasks, including antivirus, backup, drivers and indexing. The set of secondary/tertiary services in use across the data center is relatively small compared with the set of primary services, and those secondary/tertiary services are likely to be common across servers with different application types and even different operating system versions. Therefore, an easier mechanism to ascertain the proportion of utilization attributable to primary services is to take all utilization and then subtract from it the utilization known to be from secondary, tertiary, and other services.
For the purposes of the ScE metric, the following methods are employed:
• Primary services CPU: Average CPU utilization attributable to primary services —total average CPU utilization minus average CPU utilization from secondary and tertiary services.
• Primary services I/O: Input/Output (I/O) can be used the same way as CPU to determine primary services activity.
• Incoming requests: For specific services use TCP sessions to ensure that the request reaches the appropriate server and that the response goes to the correct client and can be understood by it. If any processes of a primary service receive an incoming session-based connection request, it can be assumed that primary service activities are being performed.
• Interactive logons: Terminal server-type applications may not necessarily register when using the methods above to determine primary services activity because they may not create very much CPU activity or I/O and because the incoming remote-access sessions can be long-lived.
For each method, measurements should be taken at a regular interval. The server being measured can be deemed as having provided active primary services during that period of time if any of the following criteria are met:
• The CPU utilization attributable to primary services is above a designated threshold, which has proven to be an effective choice.
• The amount of I/O attributable to primary services (total I/O minus I/O from secondary and tertiary services) is above a particular threshold; experience has shown 500Kb/sec to be a good threshold.
• A primary services process has received an incoming session-based connection request.
• There has been an interactive logon to the server.
If none of the above are met, the server can be considered as not having provided primary services during that time sample. The ScE percentage over any time period is therefore calculated by summing the number of samples where the server is found to be providing primary services and dividing this by the total number of samples taken over that time period and multiplying by 100.
For a given data center, DCcE is calculated by simply averaging the ScE values from all servers during the same time period. The secondary and tertiary services factored out before calculating DCcE will vary from data center to data center, therefore this should not be used to compare different data centers with each other. DCcE is designed to allow server and data center operators to discover where inefficiencies lie within a specific data center and then to address them to increase efficiency over time.
This article first appeared in FOCUS 20. Sign up for your DatacenterDynamics FOCUS subscription here.