The failure to maintain the code of critical software infrastructure, also known as ‘technical debt’ is an insidious problem across the industry. It can be caused by a number of factors, ranging from discontinued platforms and out-of-date code libraries to staff turnover and budget restrictions. Sometimes it is intentional because a company chooses not to prioritize resourcing software maintenance. Other times it can be unintentional and only discovered when the worst happens.

However the technical debt is accrued, it will only grow until it is addressed, and the longer it is left, the bigger the challenge of fixing it. To learn more about the hidden challenges of technical debt and how to avoid it in the first place, DCD spoke to Craig Compiano, CEO of Modius, a specialist in DCIM solutions for data centers. Compiano has been warning of the dangers of technical debt for years, so we met up with him to learn more.

Code Error
– Getty Images

“Debt embedded in the software is like compounding interest on a loan – the technical debt will accrue over time and be more costly to remediate the longer it stays in the code base,” he warns, “And just like a bank loan, if you don't make monthly payments, it just keeps getting bigger and bigger and then the challenge to pay off that loan, particularly when it's been accruing interest for five years is enormous, compared to making monthly payments.”

Easy to do, harder to fix

If this is the first time you’ve thought about the concept of technical debt, it is likely that your business has accrued some. Compiano cites it as being “normal rather than rare”, in part because it’s so easy to slip into:

“Most software is developed with a business outcome driven by time-to-market or budgetary constraints. Often, software teams will take some shortcut, either deliberate or accidental, and leave behind some amount of technical debt in their deliverable. Then it's a question of how the organization can remediate the debt, which requires reworking the code.”

By now, you may be wondering why technical debt isn’t being spoken about in the same hushed tones as other hot-button industry issues, such as AI and sustainability. This isn’t lost on Compiano:

“What's ironic is that data centers have a very high aversion to risk, and yet they incur substantial risk every day with the problem of out-of-date software tools or self-developed and maintained software. You'd think that they would be adopting new or better technologies in all areas of their operations because it's all inextricably linked together. It's a system of components – the UPS, the chillers, the people, and the software.”

'George has left the building...'

A common cause of technical debt is when a business chooses to build and maintain a bespoke system, or continues to use a system that is no longer supported by the original vendor. Many infrastructures end up as a patchwork of different, semi-integrated systems, making it even harder to keep track of where the technical debt is, and how to prioritize fixing it.

Simply saying, “Yes, but it works” is not the solution, because one day, it won’t, as Compiano has seen regularly:

“The risk factor is something that is not fully appreciated. Often what we hear is something like ‘We got this software with our hardware’, or ‘We got this software because one of our fellows that used to work here built a tool that we use today, and so it doesn't cost us anything – so if we were to spend money on new software, we would have to spend funds that we don't otherwise care to spend because we've already got a tool.’”

This, in turn, can be fed by under-resourcing, or staff turnover: “We're forever hearing that ‘George has left the building’ – a company has tools that worked because George was there – he built it and maintained it. He left. Now, how do you keep this thing up? If George didn't build it with documentation and coding standards, and he iterated on it over time, all of that knowledge went out the door when he left.”

This argument speaks to two of the main discussion points in the data center industry today – AI and the skills shortage. Both are having an impact on software development, as well as the data center:

“There’s a very real problem with all of the explosive growth in AI, creating demand for software engineers to build bigger platforms and more complex applications that use and leverage all of this newfound technology, and there is a finite number of skilled software engineers. So who's going to maintain these bespoke systems? The people aren't there.“

Just as the problem can hinge on one individual, technical debt can be accrued at a corporate level. If a company changes hands, the new management may have a different philosophy, different priorities, and in extreme cases could even bring in their own people to maintain a codebase that “just worked” because of the people maintaining it. That could apply to the data center operator, or the software vendor who designed the codebase.

For that reason, although it’s easy to assume that by choosing a commercial off-the-shelf (COTS) package, you’re covered, but technical debt can take many forms, and often, they’re out of immediate sight:

“Technical debt isn't just the code that was created. It's all aspects and components that make a deliverable work day-to-day. You use an open-source library that works with a version of Java, and then, as Java is upgraded over time, that library doesn't work with the current version, so you are either using an out-of-date version of Java or a previous version of Windows – either way, you have to undertake the project to upgrade or replace that library.”

Embedded security flaws

Security is another burgeoning reason: “There's the embedded risk that cyber security has not been addressed properly in the product at inception, and it's pretty hard to retrofit security into your architecture.”

Compiano underlines the importance of due diligence in choosing a software vendor carefully, for just this reason: “Technical debt probably exists in every piece of code that exists in the marketplace. When you select a software vendor to provide you with that functionality, a company is shifting that risk from themselves to the vendor. If the vendor is doing a credible job at maintaining their code base and providing updates and features, they are likely managing technical debt internally and will own the risk of reducing and managing it.”

We ask Compiano what he thinks are the important factors to consider when auditioning a software vendor, to avoid falling into technical debt later: “if you're going to have a piece of software that is running your mission-critical operations, you want to make sure that it's coming from a provider who's going to guarantee maintenance of that software through SLAs that will give you some comfort that you're going to have a tool that's going to last you for 10 years,” he says.

He illustrates the point by adding: “We’re forever hearing stories about businesses still looking for COBOL developers. Those mainframe solution horror stories never die.”

Leading by example

Modius has been building software solutions for data centers for 20 years, including its flagship DCIM product, OpenData. We ask Compiano how his company avoids falling into technical debt, especially given the mission-critical nature of the Modius offering. He answers with a combination of pride, and logic:

“We have built up a team that follows best practices around the design, development, and testing of code. When we structure our commercial terms with customers, they either buy a perpetual license including ongoing software maintenance, bound by SLAs. or pay an annual fee ‘term license’, and we use that revenue to maintain the code.”

It’s a responsibility that Compiano takes extremely seriously: “We have a liability to each of those customers to continue to support the product. If we're not supporting the code, they probably won't want to re-up the contract. If it begins to lag in features or reliability, then it won't renew. So if you create the economic incentive with a software provider like Modius to continue to be a partner over the long haul, they will maintain the code.”

Coder looking at error
– Getty Images

Using a company specializing in software yields better results than attempting to create and service a bespoke solution, because, put simply, that’s all they do. More importantly, it is significantly cheaper than attempting to properly resource maintenance in-house. Compiano tells us:

“You can do a little bit every month with a team of dedicated people who try to maintain it, but if you don't commit the resources, then it's a lot harder to get to that point where it's time to ‘pay off the loan’, so to speak, and now you're looking at likely a major rework, or a potential failure, because that software may just stop working, and there's no quick way to fix it. I would argue that if someone is not in the business of developing software, they shouldn't try to do it halfway.”

Compiano has another analogy to illustrate his point. In this case it’s the myriad of other functions of any business that no-one would dream of servicing with software built from scratch. After all, reinventing Windows would be like reinventing the wheel:

“No company today would go off to build their general ledger. The finance department would probably license some hosted software or license some enterprise version of a general ledger system and a payable system payroll. Nobody would go off today and code a payroll system when there are innumerable COTS solutions or SaaS solutions for payroll. Yet, people in the data center space still try to build their own which is ironic, because it's the center of the universe for our digital economy, and yet people still try to build their own as opposed to finding a COTS provider for that software.”

The machine learning advantage

A DCIM system like Modius’s OpenData can also offer functionality that would be impossible to code and maintain in a bespoke system or ‘patchwork’ solution. We ask Compiano about some of the functional advantages of making the switch:

“Machine learning (ML) capabilities are in everyday use for so many things, but taking that technology and bringing it into a DCIM is a harder road – to be able to do that in a way that an end user, not a data scientist, can leverage them, but that's what we have done. Now we have embedded ML that is exercisable by a typical end user, so that they can detect anomalies.

But as he goes on to say, the possibilities of using ML don’t stop there:

“Operators can optimize the performance of their data center infrastructure. They can forecast performance and maybe enter into things like preventative maintenance regimens as opposed to scheduled maintenance”, adding, “It would be virtually impossible for most self-developed platforms because there's too much disparate technology to assume someone is going to have all those skills and the right architecture to implement it.”

Finally, we ask Compiano about the bottom line. It’s one thing to see the advantages of a properly maintained, technical debt-free solution, but another to get buy-in from a manager who can only see that the current infrastructure “just works”, but the potential to save money might swing the pendulum.

He tells us: “I would argue it's always always cheaper, in the end, to outsource your purchase to others who specialize. If you are comprehensive in capturing cost elements, you'll find it's always greater than the cost of our licensing fees, because we're doing this day in, day out, with many customers. It's not a one-off that a single user would have to calculate for their internal purposes.”

The moral of the story is never to turn a blind eye to technical debt, because even if accrued intentionally, eventually the loan has to be repaid, and in the case of a data center, the bailiff can take the form of downtime.

Don't let technical debt jeopardize your operations.To learn more about how Modius can help manage your data center's technical debt and optimize your infrastructure with cutting-edge DCIM solutions, click here or contact them at [email protected].