Experts are predicting that in just a few short years, 60 percent of the data used for the development of AI and analytics projects will be synthetically generated. Ultimately, synthetic data technology aims to simulate real-world scenarios to train AI systems virtually. This enables AI and machine learning (ML) developers to generate vast amounts of perfectly labeled data on-demand. This emerging technology has the potential to reshape and transform several industries. The use-cases of synthetic data are broad, from helping build safer autonomous vehicles, more capable robotics, and new AR/VR applications.
Like any other emerging technology in the early stages of adoption, synthetic data must overcome hurdles to reach widespread use.
A new report from Synthesis AI highlights how technology executives recognize the importance and potential of synthetic data and how their biggest obstacles might be their own companies. According to the report, synthetic data needs three key elements to realize more adoption: more organizational knowledge, trust that synthetic data is as good as “real data,” and more proof from industry use cases.
Increasing organizational knowledge
One of the barriers keeping synthetic data from broader adoption is organizational knowledge of current technologies. Despite recognizing the importance of synthetic data for training AI models, only half (51 percent) of technology executive respondents knew of state-of-the-art synthetic data approaches indicating a critical knowledge gap.
Additionally, 67 percent of technology executives in the report agree that their organization lacks the knowledge and understanding of implementing synthetic data.
It’s up to technology executives to be the agent of change within organizations that could benefit from using synthetic data. Additionally, employees at all levels of organizations can educate themselves on how synthetic data is implemented in other industries and use cases. Recent academic papers, industry case studies, and the first book on synthetic data are rich sources of information for organizations looking to come up to speed.
Help wanted: vertical and use case examples
An essential step in organizations gaining more understanding of synthetic data is further industry adoption and use cases. With 67 percent of technology executives agreeing that users in their industry will not accept synthetic data until they see the benefits for themselves, it’s clear technology leaders have their eyes on others to demonstrate the efficacy of synthetic data.
The good news is that leading AI companies are beginning to speak openly about the benefits of synthetic data in their development process. Recently Tesla showed off their simulation platform for training autonomous vehicles, and Microsoft research published a study on using synthetic faces to build improved facial models. Other leading companies, like Google, Apple, and Facebook, have internal synthetic data teams working on a broad set of new models.
Synthetic data vs. ‘real world’ data
Capturing and preparing real-world data for model training is a long and tedious process. Deploying the necessary hardware can be costly and, in the case of complicated computer vision systems like autonomous vehicles, robotics, or satellite imagery, be prohibitively expensive. Once the data is captured, human annotation is used to label important features. This process is prone to error, and humans are limited in their ability to label key information like the 3D position required for many applications.
Synthetic data, on the other hand, is computer-generated image data that models the real world. Technologies from the visual effects industry are coupled with generative neural networks to create vast amounts of photo-realistic and automatically labeled image data. This leads to higher quality data at a fraction of the cost and time of real data.
Using traditional data presents a growing issue surrounding ethical use and privacy. The use of real-world data is only becoming more prohibitive as each country individually establishes compliance laws around data collection, data storage, and more.
Despite the complexities of using traditional data, 46 percent of technology executives have concerns that models built with synthetic data are not as good as real-world data. The synthetic to real data gap is a valid concern. Still, new techniques like domain adaption and mixed training have helped researchers and companies produce high-performing models with synthetic data. Microsoft’s recent work with synthetic facial data demonstrated the ability to build better-performing models with only synthetic data. Several months before the Microsoft work, Synthesis AI demonstrated the value of synthetic data in three different facial models). In time, more and more case studies will emerge, and companies will have the confidence to invest in synthetic data capabilities.
Understanding, adapting, and seeing results from new and emerging technologies takes time and patience. Education and implementation do not happen overnight. Each of the essential elements outlined above cannot work independently and carry forward synthetic data alone. More use cases and industry adoption will follow when practitioners understand synthetic data is as good or even better than real-world data. Widespread organizational knowledge will be achieved as more use case examples emerge and vice versa. Each element, working together, is essential for furthering the adoption of synthetic data.