The growth of big data has allowed us to do some miraculous things. With enough fuel from big data, machine learning can predict property crimes, detect fraud, and even anticipate future weather events. Arguably the most incredible uses of these algorithms, though, may be in healthcare.

Healthcare IT
– Pixabay / rawpixel

Life saving predictions

By offering a second perspective, data driven machine learning algorithms can offer critical secondary feedback and aid in early detection. For example, researchers in China have begun using a machine learning algorithm that predicts the likelihood of a patient waking from a coma. In several instances, the algorithm correctly predicted that patients would awake when doctors on staff did not. While undoubtedly there were times when the algorithm was wrong and the doctors were right, it is nonetheless accurate enough to provide invaluable second opinions for clinical diagnoses.

Healthcare image data is especially valuable for machine learning. As just one example, when evaluated by ophthalmologists, a deep learning algorithm had over 90 percent specificity and sensitivity for diabetic retinopathy when fed thousands of retinal photographs. In dermatology, when fed 130,000 patient images, an algorithm was comparably accurate to dermatologists in identifying skin cancer. While it’s undoubtedly helpful to provide an accurate second opinion that isn’t subject to human error, these algorithms are providing a service that doctors already provide: diagnostics. This begs the question: are there any areas where machine learning provides aid that doctors can not?

Big data and precision medicine

The emerging science of precision medicine uses patient data to provide personalized healthcare insights where doctors can not. In the past few years, it’s been shown that genes and their variable expression (epigenetics) and the microbiome (gut bacteria) play a significant role in health. As just one example, some drugs may be completely ineffective or even counterproductive when analyzed in the context of a patient’s genes. With a comprehensive genomic screening, drugs could be tailored to the patient’s unique record to provide improved outcomes.

Unfortunately, with the brief amount of time that doctors have with each patient, they simply do not have the bandwidth for this level of personalized care. Most Americans have spent enough time in waiting rooms to know that doctors can barely handle their volume of patients as it is. How would this system fare with a more comprehensive patient record when it’s estimated that a complete one would contain roughly six terabytes? While a doctor may not be able to utilize this vast volume of data, researchers have already shown that machine learning algorithms can quickly predict optimal drug therapies from patient genomic profiles.

Adoption and challenges

With so much potential data for each patient, it stands to reason that healthcare systems must become cloud based to enable mainstream usage of these incredible technologies. However, CIOs and other healthcare executives will have to address many difficulties to make this a reality. Many machine learning algorithms are only as good as the data they are given, and pulling together large sets of healthcare data for them is a tall order.

Healthcare systems are notoriously distributed, with offices often suffering delays as patient records are forwarded from office to office. It is common for even a patient’s primary care provider not to have a complete electronic health record (EHR).

While it is certainly possible to create accessible data sets that enable the benefits of precision medicine, with several in use already, the goal of widespread adoption is complicated by lack of awareness and stringent HIPAA laws which restrict data sharing, for good reason. Few files are as sensitive as a medical record. If every healthcare practice begins using the full EMR for precision medicine, there will be substantially more opportunities for disastrous data leaks.

Every data center holding patient records will need to be secured and decommissioned perfectly, without fail, every single time. Despite the great leaps we have made in network security, healthcare data leaks have fairly consistently increased year over year.

Thankfully, patient records used to train up the algorithms can be anonymized - at the cost of marginal but lessening data loss - which at the very least reduces the risk downstream of the initial database. In addition to retaining the risk of leaks upstream of the initial publication, many would argue that publishing even anonymized patient data is a violation of privacy. These, among other logistical issues will need to be addressed as we strive towards a more symbiotic relationship between technology and healthcare.

In the next issue of the DCD Magazine, the DCD editorial team dive deeper into the topic of machine learning in science and healthcare, with a look at the Department of Energy's efforts to cure certain forms of cancer under the CANDLE initiative. Subscribe for free today: