So, you’ve got yourself a shiny new state-of-the-art monitoring toolset. It has all the bells and whistles, and is even capable of performing correlation techniques. You may now be tempted to kick back and smile contentedly as you picture yourself as the new corporate lookout, viewing every system and process through a single-pane-of-glass view for your organisation. This, however, would be jumping the gun.
Buying an expensive tennis racket doesn’t mean you’re ready for Wimbledon, and this is much the same for monitoring. The right tool means very little in the hands of someone who can’t use it. Understanding when and how to create alerts that are meaningful, actionable and valuable to you and your business can lift your data center monitoring from perfunctory to effective.
You should perform proper testing before you start creating new monitors and alerts. Here are some handy tips to keep in mind:
- When testing, ratchet the scope as small as you can and expand it slowly. What you really want to test first is the trigger condition—when an alert will be sent, or triggered.
- Learn to use reverse thresholds. You want to avoid spiking the test machines repeatedly. CPU<90% will trigger much more reliably.
- Log, log, and log again. If your tool supports its own logging, turn it on. Insert as many updates as possible in your alert actions, from “I’m beginning step X now” to “I just completed step X.” It may seem overly verbose, but when you are stuck trying to figure out where the process is dying in the midst of the 17-step flow, you’ll be glad you did.
- Taste your own medicine. Make sure you receive the alerts yourself and don’t be tempted to offload them to the production team until the quantity and quality of the messages is something YOU would be comfortable receiving.
- Avoid email. By having alerts sent via email, you will simply be adding delays and pressure on your infrastructure. Instead, send messages to a local log file, to the display, and so on.
- Collaborate. Your colleagues will have to live with the results of your new monitor or alert, so you should agree on everything from base function to message formatting.
Now that you know how to test, it’s time to talk about what to test and, more specifically, where to start.
It makes sense to focus on the parts of the environment that deliver the greatest rewards for the least amount of effort. This could mean listening to other teams complain about IT issues they’re facing that could help determine if the issues they are experiencing are driven by system failures. Then, you can establish sophisticated alerts to save the day.
While it might sound counterintuitive, it’s also important that you don’t plan too far ahead. Once you get the ball rolling and begin to see some success, more teams will seek you out to ask for your help, and you may find yourself suffering from an embarrassment of opportunities.
The bottom line
Before we get carried away with the possibilities of our shiny new monitoring tool, it’s important to remember the actual purpose of monitoring improvements. Our goal as IT professionals, as ever, is to save our business money.
Part of the challenge of good data center monitoring is the time it takes to develop, test, and convince business leaders that this time is valuable and contributes to the bottom line.
So, how do you show the number crunchers proof of your monitoring success? One option is to lookback to the bad old days, meaning keeping data on the ticket counts before your new monitoring and alerts are in place. By showing how high the ticket numbers were before the new system, and how low they are post-implementation, you can demonstrate how much time your new monitoring and alerts are saving. And number crunchers know better than anyone that time is money.
Proof is in the pudding
In my experience, there are three drivers that help executives make decisions: increasing revenue, reducing cost, and avoiding risk. If the project, initiative, or software you are proposing doesn’t tick any of these boxes, you may as well give it up now.
Thankfully, effective monitoring fulfils atleast two of these requirements. It reduces costs by allowing IT professionals to identify and remediate issues sooner, saving the company from expensive downtime and reducing business expenses.
Monitoring also helps avoid risk. A comprehensive monitoring toolset that offers insightful data will help IT professionals take a proactive stance against potential failures. Instead of waiting for downtime to happen and remediating after the fact, effective monitoring allows us to get on the front foot and keep issues from arising before they take root.
Monitoring can be both an IT professional’s best friend and a key differentiator for businesses looking to reduce risk and cost. However, it isn’t a quick fix, and in order to reap the rich rewards that monitoring brings, you must first learn to use it correctly.
Leon Adato is Head Geek (and technical product marketing manager) at SolarWinds