Kyoto University lost 77TB of supercomputer storage data after a flawed HPE software update to its backup program began deleting too much data.
“A bug in the backup program of the storage system caused an accident in which some files in / LARGE0 were lost. We have stopped processing the problem, but we may have lost nearly 100TB of files, and we are investigating the extent of the impact,” the university said in a notice last month.
It later confirmed that some 34 million files from 14 research groups totaling 77TB were lost between December 14-16 due to a bug in a backup program.
“Due to a defect in the program that backs up the storage of the supercomputer system, the supercomputer system became large. An accident occurred in which some data of the capacity storage was deleted unintentionally. We sincerely apologize for the inconvenience caused,” the university said in a follow-up post. “We will continue to work to prevent recurrence so that such a situation will not occur again in the future.”
The university said backups are currently on pause, with plans to resume in January once the issue is resolved. The university also planned to improve its backup architecture to prevent such data loss incidents in the future.
The incident was seemingly caused by an update from HPE to its backup program. A letter from HPE Japan to Kyoto University details that HPE accepts blame for the incident.
“We believe that this file loss is 100% our responsibility,” read the letter. It suggests the incident was caused by an update to a backup script to delete old log files deleting more than it should. The issue was seemingly discovered by an affected user, which alerted the university and stopped the rogue program.
In a January 4 update, the university said files that cannot be recovered due to the absence of backups total around 28TB comprising 25 million files.
The Kyoto University supercomputer fleet consists of three systems; Camphor 2 is a 5.48 petaflops Cray XC40 system, Laurel 2 a 1.03 petaflops Cray CS400 2820XT, and Cinnamon 2 a 42.4 teraflops Cray CS400 4840X.