Data Integrity in Blockchain and DLT's
Updated: Mar 6
Blockchain and Distributed Ledger Technology (DLT) is only as good as the data it has inputted into it. As the saying goes, garbage in, garbage out.
When considering the many benefits that can be seen by integrating blockchain into your industrial or enterprise solution, data integrity must be top of mind. This will ensure that correct data is being immutably stored and therefore can be confidently analyzed when issues arise (such as a product recall).
The following explores key requirements for data integrity, creates important common definitions, and offers various methods to increase data integrity.
The following information has been paraphrased or used directly from a fantastic report completed by the World Economic Forum, titled "Redesigning Trust: Blockchain Deployment Toolkit"
What are the key requirements for data integrity?
Data Origin Integrity
- Blockchain alone does not ensure data integrity
- The accuracy and reliability of data must be preserved from the point of creation to the point of usage on the blockchain
- A common point where problems can occur is at the location of submission to the blockchain
- Third-party integrations are often used to submit real-world information into the blockchain, these are called oracles
- The entity submitting the information (the oracle) is often the same entity as the one that provides the data
- The oracle must be trusted
- This problem of ensuring the accuracy and correctness of data at the time it is submitted to the blockchain is widely referred to in the blockchain industry as the “Oracle Problem”
Digital Twin Integrity
- Often we represent real-world objects such as materials and products on the blockchain
- This digital representation is referred to as the real-world object’s ‘digital twin’
- It should be then be questioned if the data attached to the digital twin presents an accurate view of the physical object and whether the link between the physical object and digital twin may have been compromised
Common Data Flow
The following is a common data flow from a device to a blockchain.
Creation/Cleaning: Measurements are made, and raw data is produced.
Storage/Gateway: The data is stored somewhere.
Oracle: The oracle connects the data gateway to the blockchain.
Blockchain node network: The transaction undergoes the consensus process, gets stored in a block and is eventually confirmed on the blockchain network.
What could cause data integrity not to be present when data is being sent to the blockchain?
There are two main types of faults, benign and malicious. Benign faults are unintentional and are by far the most common. Malicious faults are caused by bad actors and hackers and are less common. For the context presented here, only benign faults are investigated as they are very common in an industrial and enterprise setting.
How to reduce or prevent benign faults?
Proper system design, maintenance, management, and business practice will prevent the vast majority of benign faults.
How do you increase your data integrity? The following are effective methods.
Cross-validation: Another approach is to “cross-validate” inputs, meaning that each input submitted is corroborated with nearby inputs. For example, if all of the temperature sensors deployed in a grid-like fashion across a large food storage facility report that the current temperature is around 5˚C, with the exception of a single sensor reporting that the temperature is 30˚C, it is plausible that the single sensor is malfunctioning, and its input can be automatically thrown out.
Provably Honest Protocols: Another option is to integrate cryptographic protocols and special hardware that allow oracles supplying data inputs to include a corresponding “proof” that the data they are submitting is exactly the data they received from the data source. When the data and proof are received by the blockchain, they are checked against each other, and the data is thrown out if the proof is invalid.
Hypertext Transfer Protocol Secure (HTTPS): For requests made over the internet, one of the best options is TLSNotary, which modifies the internet HTTPS protocol to allow any computer to produce a proof that a particular web page appeared in its browser.
Trusted Execution Environment (TEE): For requests that primarily require some computation to be completed off-chain, one of the leading technologies that can produce a similar proof of correctness are TEEs such as Intel Software Guard Extensions. Essentially, Intel chips that
support this protocol include a special component completely isolated from other components in the computer called the Trusted Execution Environment. Other components in the computer can’t read the memory inside the TEE, nor can they see the inputs or outputs of the TEE’s computations, since all of that data in encrypted while in transit. This TEE can then be used to run highly sensitive code that computes over highly sensitive data, with a strong guarantee that the code ran correctly and without leakage of confidential information to any third party or even the computer that this TEE resides on. TEEs excel at providing strong data integrity guarantees in highly adversarial environments.
As we integrate future solutions that merge real with digital worlds, the advantageous benefits of blockchain will be naturally discussed. When creating or integrating blockchain solutions into your process, be sure to account for data integrity. Blockchain and DLT's are fantastic at immutably storing data and displaying when it has been changed, but it has no way of accounting for incorrect data. This is unless it has been engineered into the solution.
The above tools can be used to help create a more secure and accurate system.
Ask Ascend Data Tech how we can help you create your own solution. We offer development, consulting, and would simply love to talk about blockchain or DLT's with you!
Also, be sure to follow us on Twitter @AscendDataTech and Linkedin https://www.linkedin.com/company/ascend-data-tech
Thanks, and happy integrating!