The FAIR data principles were first expounded in 2016 by a consortium of leading scientists and organizations in Scientific Data magazine. Their goal was to ensure that scientific data sets could be found and used by machines, with minimal human intervention. FAIR stands for Findable, Accessible, Interoperable and Reusable.
Six years later, how can these principles help to create collaborative, data-sharing ecosystems up and down supply chains, in complex operational landscapes such as transport and allow servitization models in multi-party ownership scenarios such as utilities? Over the same period, we have seen the evolution of the digital twin.
While digital twins mean different things to different people, the widely accepted interpretation is a digital data twin, where the twin is a virtual representation of something in the real world.
Single pane of glass
One of the first uses of digital twins was to make a ‘single pane of glass’ abstraction of an asset – which could be a thing, a person or a place. Data for assets is inevitably stored in incompatible ways in disparate systems for multiple, unrelated purposes. So, by focusing access to all this data through the digital twin, you create a single point where authorized people, applications - and even other twins - can go to find out about the asset, its current state and even subscribe to its updates. A big advantage of this model is that the owner of the asset stays in control of what goes in the pane of glass and who’s allowed to look at it.
This approach works for purpose-built applications that have knowledge about a twin’s data built into their logic. For example, they may know that the dimensions of the asset are recorded in the twin as centimetres and the weight in kilograms. But this means that the data must be programmed by people; it is then fixed to that type of twin, and it’s difficult for the parties with whom you want to share the data to understand it.
This only partially addresses the problem of multiple sources and has moved the interpretation of the data into application logic. Imagine the application is data-centric, in that it reacts to data and metadata – date about data. In our example, the weight data would ‘say’ it was in kilograms and the application would use this to interpret the data and respond accordingly.
Now imagine how the ‘browser’ model that we use every day would work with digital twin data. You could search for twins and allow the application to react to metadata and display their data in the best way. You could show data from more than one twin and compare them. You could write code snippets to do this automatically. You could even take data from multiple twins, run synthesising algorithms on it and publish the results as more twins.
But this is only possible if digital twins are made FAIR.
Why twins must be FAIR
‘Search’ implies that the twins are made findable by their creator. ‘Choose’ implies that the twins are accessible - if I’m authorized. ‘React' and 'Compare’ imply that the data received is understandable and hence interoperable. Code snippets, synthesis and algorithms all imply that the data can be used for reasons other than its original purpose - hence reusable.
In the internet world, HTML is used to ‘mark-up’ data to tell browsers how to render it. For example, tags like <table>, <tr>, <td>, etc tell the browser that this is tabular data. The FAIR principles don’t stipulate what method is to be used to specify metadata, but they do demand that: “(Meta)data use a formal, accessible, shared and broadly applicable language for knowledge representation”.
There aren’t too many of these languages. RDF (Rich Data Format) and the Semantic Web technologies are the de-facto standards. But the digital twin browser application we imagined is not the end of the story. The originators of the FAIR principles had autonomous machine interoperability as one of their goals.
If we apply this thinking to digital twins in an ecosystem, twins’ agents – the behavioural part of a twin – could search for twins near or related to them, interact with their data and then maybe drop the connection when they’ve moved on. For example, the twin of a train could search for nearby twins of pollen count data as it was moving, because pollen clogs the filters when the engine is running. Twins of engines on the train would know when they were running and update their metadata to reflect that they have been affected. Service engineers can look at the twins of the engines in the train to see when the filters need to be changed.
FAIR data principles build on the foundations of each other. You can’t reuse if you can’t interoperate; you can’t interoperate if the data is inaccessible; you can’t try to access data if you can’t find it.
Saying that the FAIR principles are for scientific datasets, is like saying Amazon is for books. Five years ago, the originators of the FAIR principles must have had a good idea that their principles applied to many things - including digital twin ecosystems. The best principles work like that. They give you yardsticks and guidance but don’t limit where you apply them.