The digital infrastructure on which our world now depends can at times be surprisingly fragile. Sharks, anchors, or unfriendly nations can cut submarine cables. Construction work can sever fiber-optics buried in the streets. And extreme weather, power cuts, or even equipment failure fires can render data centers out of action.

But what about satellites? GPS has become integral to daily life, weather and observation satellites provide a number of information services to commercial companies, and now we’re beginning to see a number of commercial companies provide broadband and 5G connectivity from orbit. Are the satellites we depend on as robust as we need them to be?

This article appeared in Issue 40 of the DCD>Magazine. Subscribe for free today

In space no one can hear you stream

Issue 40 Front Cover.png

Issue 40: How data centers survived the Texas storm

Texas froze over, data centers burned down, and semiconductor fabs struggled with drought. The last three months have been chaos, but data center resiliency has helped the industry prevail.

Here on Terra Firma, data center resiliency is relatively easy to measure: Is there an ample supply of water, power, and connectivity, is the area likely to see floods or other extreme weather, does it have redundant sources and routes of power and connectivity for backup?

Likewise, cables can be shielded and buried, cell towers built sturdy and guarded. But in space there’s no one to give you an Uptime Tier-rating. The good news is, despite the harsh and unpredictable conditions of space, satellites are usually well-engineered and highly redundant machines designed to keep the elements at bay and survive the bumpy ride into space.

The costs to build and launch large satellites runs into the tens, if not hundreds of millions of dollars per launch and can take months to prepare, and so the multi-ton satellites flown to Geostationary Earth Orbit (GEO) 35,786 kilometers (22,236 miles) above the Earth are routinely built with multiple layers of redundancy on key systems and payloads and rigorously tested.

“Satellites are reliable in the sense that they get strapped into a rocket and blasted into space through several Gs of acceleration and a ton of heat noise and vibration, and then operate in a vacuum with significant temperature shifts as they go from sunlight into the shadow back into sunlight, and radiation,” says Dr. Brian Weeden, director of program planning, Secure World Foundation. “In that sense, they are pretty durable.”

Assuming a satellite survives the launch and calls home without any troubles, it faces a constant battle for survival out in the harshness of space. Even Earth satellites in low orbits can see temperature swings of minus 50°C (-58°F) to plus 50°C (122°F) every 90 minutes, which can have a big effect on the equipment onboard, as can the lack of air.

“Materials that you thought were quite solid can actually have some liquid or gaseous components which can leave into the vacuum of space, changing the properties of the material and causing it to shrink or become brittle,” says Andy Vick, head of disruptive technology at RAL Space.

Space weather is another major contributor to satellite failures. Many of these bus-sized, multi-ton satellites are out in GEO, thousands of miles from Earth where there is little atmospheric protection from extreme conditions and large amounts of radiation. And the void can be surprisingly active and unpredictable when it comes to weather.

X-rays, ultraviolet rays, radiation, and geomagnetic storms can all wreak havoc on-board; components can be damaged by the high current that discharges into the satellite or damaged by high-energy particles that penetrate the satellite. Space dust – literally tiny particles of rock dust – can hit the sats and become plasma and damage equipment.

Sun Outages, where the satellite passes in front of the Sun, don't harm the satellite. However, the sun's interference swamps the signal from the satellite, causing a loss of data. These outages affect the signals from geostationary satellites, and can last for around ten minutes a day during the Equinox - but they are predictable.

The University of Reading recently recorded the first ‘space hurricane’ which it described as a ‘1,000km-wide swirling mass of plasma raining electrons several hundred kilometers above the North Pole.’

The most notorious space weather event was the Carrington Event, a solar flare in 1859 that caused auroras as far south as the Caribbean, woke people in the night thinking it was morning, and caused telegraph lines to fail. Smaller events in 1989 caused blackouts and communication failures. A Carrington-level event today would cause worldwide electronics failures, and could wipe out all the satellite networks of the world if action wasn’t taken ahead of time.

In a disaster report, space insurance consortium Atrium warned a single anomalously large proton flare or a number of flares in quick succession from our sun could result in a loss of power to all satellites in geosynchronous orbit and cost billions of dollars to fix.

Dr. Holger Krag, head of the Space Safety Programme Office for the European Space Agency, tells DCD there is little that can be done to protect satellites from the impact of a solar flare beyond turning off key electrical systems ahead of time. But the unpredictable nature of the sun can make this a difficult task.

To better predict coronal ejections from the sun and provide more notice about potential space weather events, the ESA has planned a mission called Lagrange, where spacecraft will be positioned at "Lagrange points", where the gravity of the Earth and Sun balance providing stable locations to observe the sun’s activity a few days ahead of the Earth’s position.

“From [the L5] position, it can see the surface of the sun that would turn towards Earth three days later. We can see an advanced view of the activity area on the sun as it’s rotating around its axis towards the Earth,” says Dr. Krag. “At the same time, you can have a side view on the line between Earth and Sun so it can see coronal mass ejection traveling from the Sun to the Earth and it can measure the velocity of the ejections."

The Lagrange mission is expected to fly in 2027, and Dr. Krag says: "It will give us a much more reliable forecast."

Outages. In. SPAAACE.

satellite.jpg
– ESA

Before satellites launch, they go through a rigorous testing regime that can see them placed into climate chambers to simulate the super cold and hot vacuum of space, as well as vibration and shock tests to see how machines cope with the rigors of launch and booster separation en-route to orbit. Satellites are built on the assumption they will never be touched again, so operators want to make sure their investments are built to last.

“The vibration environment, the acoustic vibrations of the supersonic airflow over the fairing and things like that are quite extreme,” says RAL's Vick. “We have got the ability to put the package satellite in front of what is basically Deep Purple's 1980s speaker stack: a stack of speakers about three stories high. You completely surround the satellite and you blast it with sine waves and simulate the kind of acoustic blast that the thing will get on its launch.”

The fact that satellites are untouchable once up in orbit has also required as much redundancy and backup capability as possible being out into each satellite.

“The systems are built to be resilient and operate autonomously,” says Kevin Bell, VP of space program operations at the Aerospace Corporation, “and have several different kinds of fault management systems built into them; either to self-repair and recover or to go into a safe mode where a human can come in and figure out what happened and recover them.

“You've got to design for the entire mission life upfront; there isn't somebody who can watch them 24x7, or go and repair them when they do break, and you can't refuel them or put new parts on either.”

Atrium says nearly $11 billion in insurance claims has been paid out to the space industry in the 20 years leading up to 2014. The most common points of failure were communications payloads, attitude and orbit control systems (AOCS – essentially the navigation and maneuvering systems) & computers, power systems, and data handling components.

Similarly, a 2005 study of 156 satellite failures found that AOCS (including gyroscopes, momentum wheels, and thrusters) and power systems were responsible for more than half of failures, with mechanical failures around solar panels and short circuits of electrical systems also common issues. Over 40 percent of all failures happen within the first year of in-orbit activities. Space phenomena were directly involved in 17 percent of all failures.

“You get a lot of early lifetime failures up to the first year after launch, then nothing for a while, and then you get a spike at the end of the lifecycle,” says Dr. Weeden of SWF. A large satellite may have to go through an unfurling process once it disengages from a rocket, and then realign itself, before finally calling home.

“That deployment stage can be where there's a fair number of problems. If that initial contact doesn't happen, and the satellite never orients its solar panels to the sun, it runs out of battery and dies.”

Accidents can occur before the machines make it to orbit, and sometimes before they even make it to the launchpad. In 2009 Nasa’s Orbiting Carbon Observatory (OCO) satellite failed to separate from its launch rocket, and the whole assembly crashed into the ocean 17 minutes after lift-off. In 2003 the 1.4-ton NOAA-19 satellite needed $135 million worth of repairs after Lockheed Martin employees dropped it on the floor during manufacturing.

Reliability and testing have improved over the years, and satellites are now less over-engineered as we learn about what actually causes satellites to fail once out in orbit.

“[In the past] they weren't looking at what happened to a previous satellite, because they didn't know,” explains Vick. “They would simply have tried to shield everything because they didn't know what was most susceptible to radiation. By being able to simulate things in the lab, including using facilities like ISS to simulate radiation, we've become more aware of what matters more to what's really happening. We're now focusing on the things that actually need to be shielded.”

We are also slowly starting to open up the possibilities to repair, refuel, and potentially upgrade existing satellites even after years in orbit. Northrop Grumman’s Mission Extension Vehicle is the first satellite that can service other satellites and extend their lifespan. MEV-1 completed its first docking to a client satellite, Intelsat IS-901 in February 2020, to keep the satellite operational for a further five years, while MEV-2 is due to dock with the Intelsat IS-1002 satellite in early 2021. Nasa is working on a similar in-orbit service satellite as part of the agency’s OSAM-1/Restore-L project.

Satellite failures are bad for Earth and Space

Lightspeed_satellite -- thales.jpg
– Thales

Though relatively rare, in-orbit failures do happen. Despite a successful launch in December 2020, SiriusXM's new 7 ton SXM-7 Satellite, built by Maxar to provide digital radio to consumers, failed during in-orbit testing. SXM-7, along with the SXM-8 satellite due to launch later this year, was meant to replace the Boeing-built XM-3 and XM-4 which were launched in 2005 and 2006, and are now approaching the end of their lives. Though its engine was able to move it to the right orbit, some of SXM-7's payload failed and the satellite has since been classed as a “total loss.” The company is making a $225 million insurance claim, and will launch SXM-8 later than planned.

In 2019, the six-ton Intelsat 29e satellite failed after a fuel leak. Its propulsion system experienced damage that caused a leak of the on-board propellant, disrupting service to the satellite’s customers. A second anomaly occurred during recovery efforts after which it was judged lost. Launched in 2016, it served just three of its planned 15 years.

That same year, almost the whole Galileo network – Europe’s equivalent to the US GPS – went down. It was later recovered, but two of the network’s 26 satellites have had to be retired early due to on-board issues, and at time of writing a further two have been ‘temporarily down’ for over a month.

Satellites generally remain in service for between seven to 10 years in low earth orbit (LEO) below 2,000 km (1,200 mi), and more than 15 in GEO. Aside from the loss of service and the impact unexpected failures have on Earth, severely damaged or failed satellites create risks for the operational satellites in close proximity.

At best, large failed satellites are multi-ton hunks of metal traveling at thousands of miles per hour in uncontrolled trajectories that could collide with functioning satellites and interfere with signals. If laden with fuel – whether in the form of propellant or energy in batteries – they become potential weapons of destruction. Astrophysicist Jonathan McDowell of the Harvard-Smithsonian Center for Astrophysics described the failed Intelsat 29e as “a floating bomb in GEO” given it was now slightly off track on its planned orbit and could potentially cross paths with other GEO satellites in the future.

At higher orbits, satellites are larger and move somewhat slower, which means they can survive impacts with small piece of debris. But at lower orbits even tiny pieces of debris can be highly destructive. In 2016, a fleck of paint was enough to damage a window on the International Space Station. The ISS does have metal shielding – panels of layered thin metal sheets akin to Kevlar vests – to protect it from larger pieces, but this isn’t practical or possible for most satellites due to cost, weight, and size restrictions, leaving most to either make evasive manoeuvres or cross their fingers and hope for a near miss.

New small satellite mega-constellations

Where once space was purely the domain of military, government, and large telecoms companies, a new fleet of commercial startups are sending up huge numbers of small satellites, which are changing the industry. Today, there are currently around 3,000 operational satellites in orbit, but that number is increasing rapidly, with a massive potential impact on the future of the industry.

Over 1,000 satellites were launched in 2020 alone, the vast majority of them coming from commercial actors looking to deploy huge numbers of small satellites. It’s not uncommon to see rockets fired into space now launching more than 100 satellites at a time. SpaceX’s Starlink is the biggest player amongst the new wave of space satellites. Elon Musk’s company has launched over 1,000 satellites since 2019 to provide high-speed broadband Internet connectivity and has permission from the FCC to launch more than 40,000 into LEO. These satellites weigh around 260 kg (570 lb) and are about the size of a large table and generally operate from below 550 km (341 mi) altitude.

But Starlink is just one of a growing number of companies looking to fill the skies and provide connectivity from LEO. Amazon’s Project Kuiper will see the company invest $10 billion to launch 3,000 satellites over this decade. Though it has scaled back plans since emerging from bankruptcy, the UK’s OneWeb still plans to have almost 650 satellites in orbit by June 2022 with a second generation of sats arriving in 2024-25. Its satellites are smaller than Starlink’s at just 150kg, but orbit at an altitude of 1,200 km (750 mi).

There are other commercial players: Planet has launched over 350 of its 4kg Dove cubesats since 2013, and currently has more than 200 in operation. Kleos Space plans to launch up to 20 clusters of smallsats to offer maritime intelligence to commercial and defense companies. Californian company Swarm is planning to build a space-based Internet of things (IoT) for uses such as vehicle tracking, logistics, water, and resource monitoring. It plans to have 150 satellites up by the end of 2021. In February LyteLoop raised $40 million for its vision of in-orbit data storage on up to 300 250kg satellites.

Even the US Defense Advanced Research Projects Agency (DARPA) is looking to get in on the act and turn satellites to military use. Project Blackjack is investigating how LEO smallsats can supplement and/or replace the US’ GEO satellites for activities such as surveillance.

One space mainframe or a cloud of satellites

The arrival of these constellations means the industry is seeing a divergence. There are huge, highly-resilient individual machines in high orbits; and large swarms of small and breakable machines in low orbits that, while individually fragile, create a more resilient overall system because there can be tens or even hundreds of failover points.

“If you have a geosynchronous satellite, the critical system redundancy might be threefold,” says Bell. “Now I've got 1,000 fold, it makes the system much more resilient and reliable from the failure standpoint.”

SFW’s Weeden likens the change to the switch from mainframe computing to distributed servers in the data center industry.

“You go from a few very large, very expensive, very powerful things to a more distributed set of satellites. Maybe each one individually is not quite as powerful but you've got dozens to hundreds or thousands of them, which is a different kind of resilience,” he says.

“The bigger ones are more resilient on an individual basis. We’re seeing a shift towards individual satellites that are probably less resilient, but a system that is more resilient on the whole. If you've got one satellite and it fails you're screwed. If you've got 100, and five of them fail, you're probably okay.”

While having a thousand points of failure could create greater systemic resilience, any fleet-wide design flaws could potentially have massive effects if thousands of machines in orbit suddenly all suffer the same defect and cease to function.

“One of the major risks is just making sure you don't have a systemic design issue,” says Aerospace's Bell. “You certainly want to make sure you've rooted out a design problem that is across all 1,000 satellites that could cause them to fail prematurely.”

In the 1990s, a number of Boeing 601 satellites were found to have a design flaw in their spacecraft control processor (SCP) where a tin-plated relay formed crystalline ‘whiskers’ that could cause an electrical short. Though each satellite contained two SCPs, there were cases of both SCPs failing. At least eight 601s have seen SCP failures and four of them were lost, including the Galaxy IV communications satellite, which caused 80 percent of pager services in the US to go down. Similar issues in thousands of satellites could be catastrophic.

“It's lurking design problems that suddenly appear [that worry me],” says McDowell. “If there were a generic design flaw lurking that pops up, years down the road in one of these mega constellation designs that could be very bad. You could end up with high failure rates.”

Moving fast and breaking things in orbit

Even when generic design flaws are ruled out and system resilience is increased, questions remain over how many of those individual satellites might fail. LEO satellites have some natural protection from some of the worst space weather thanks to their lower orbits, but their smaller size means they are generally less protected from any adverse weather effects they do see, and the speed of orbit means they would be unlikely to survive any collisions with debris or other satellites.

With so many new players in space, some of them lacking the manufacturing nous of the incumbents, there is an increased possibility of onboard failures.

Christopher Jackson, director of Acuitas Reliability, has previously said around 35 percent of small satellites fail to complete their mission, with almost 20 percent being Dead on Arrival (DOA). Smallsats often suffer from design, manufacturing, and testing flaws, as well as often failing to conduct proper analysis after failures, he claimed.

While that might be the case with many cubesats, the new commercial companies are taking a highly iterative approach to developing their satellites, and failure rates are dropping quickly. In a series of tweets last year, Harvard’s McDowell noted how SpaceX went from a 13 percent failure rate with its V0.9 prototypes, to a 3 percent failure rate with its first V1 sats, to just 0.2 percent after that.

“They really improved reliability, about halfway through last year,” McDowell tells DCD. “The more recent ones have had almost no failures.”

While that failure rate is good, in a constellation of thousands that could still create added space debris risks if they are not de-orbited properly.

“The sort of failure rates you can tolerate in a constellation of 100 satellites, you can't really tolerate in a constellation of 30,000,” he adds.

What the industry is missing, argues RAL Space’s Vick, is standards that are applicable to this new breed of smaller and cheaper satellite.

“At the high level, ESA’s ECSS and Nasa’s NTSS are very prescriptive; they're very engineered, and they do provide an ultimate best solution,” he says. “But they are not necessarily affordable, and the problem is that there is no cheaper alternative to those standards at this point in time and I think there does need to be.”

Without a middle or lower standard for smaller commercial companies, Vick says commercial companies are left with nothing to inform them about standards they could working to, which aren't the most costly and engineered option.

“There are attempts to provide those standards, but it’s difficult to produce them because there will always be people who will be trying to argue that we shouldn't allow anybody to do anything less than the best.”

SWF’s Weeden agrees on the need for more standards: “Just like we have different rules and standards for semi-trucks and bicycles and station wagons, we probably need them for satellites as well, but so far we really haven't done the science to figure out what those different rules should be.”

Small sats teach incumbents about risk

Tight regulation around slotting, combined with the harsh conditions and the costs of getting there means GEO and higher Earth orbits will remain the domain of large satellite and incumbent operators, at least for the foreseeable future. But it remains to be seen whether the two sectors continue, or we end up with a general move towards smaller units built at scale across the industry.

“We're at a pivot point right now where space needs to be more agile, and we don't have a production footing to do that,” says Aerospace Corp’s Bell.

The massive fleets of new small satellites provide the chance to apply mass production techniques, that previously haven’t been applicable to small-scale manufacturing of lower numbers of large buses.

“Right now we're on a cycle of about every five or ten years, which doesn't allow you to keep pace with technology,” he adds. “We don't have a way to turn out a new vehicle with upgrades once a year and a block upgrade or a brand new model every four or five years like the automotive or phone industry.”

“Something like GPS is putting out two to three satellites a year, which is by no means a production run. If it costs more, then you want to make sure it works, so you end up in a spiral where you're spending more money to put in the redundancy, put in the fault management, and test it to make sure it works on the ground perfectly.”

The arrival of ambitious startups has also seen large incumbents forced to act. As part of its Lightspeed constellation, Telesat plans to launch almost 300 satellites weighing 700 kilos each, which will provide high-speed broadband by 2023. The rapid iteration, smaller units, and large numbers of machines means the new commercial players can improve technology more quickly, improve testing capability, and glean more reliability data from a larger pool of sources, which can provide new learnings for the incumbents.

“The new players have effectively scaled for production,” Bell adds. “They're able to evolve because of quantity and the amount of industrial base, it's huge compared to the kinds of quantity and scale we have. They’re trying to look at what it takes to build production lines where you can stabilize the production line and build large unit counts, and they've actually been able to spend more time optimizing testing.”

Smaller satellites can be tested more easily; they no longer need cranes and high bays, but can be pushed around on a wheeled cart by a person, which can massively simplify assembly integration and test. And once in space, companies can glean more information about what causes failures.

“Thousands of units are now giving you a statistical sample of parts reliability,” says Bell. “You can monitor and can start to get a feel for the environment and how bad the environment is even embedded inside the spacecraft.”

The higher risk appetite and ‘test and re-flight policy’ is closer to how the software industry operates than the traditional space industry, according to RAL Space’s Vick, but has positive effects throughout the sector.

“I think that's good for all of us because it does mean we're getting new ideas put into practice in space far quicker and with less direct investment from government, so that's good for us,” he says.

“Those highly engineered satellites can't really afford to trial new technologies and new methods for the first time. But once those technologies are proven in the new space environment they can find their way into the bigger and more highly engineered satellites, so the older style satellites are actually benefiting from what's happening in technology.”

The fact these commercial companies are willing to take risks and fail on some iterations of satellites marks a change from the more traditional companies, which are reluctant to accept the larger costs of failure, and the political ramifications if Government/military agencies are on board.

“Somebody like Elon Musk and Starlink, he's obviously answerable to his shareholders but they won't be too worried about the political fallout of it going wrong,” says Vick. “Whereas that's not necessarily true for a big government-led mission, the risk appetite is much lower in big multi-agency, multicountry developed development.”