News Release

DNA palette code for time-series archival data storage

Peer-Reviewed Publication

Science China Press

Encoding scheme of DNA Palette

image: 

Overview of the DNA Palette encoding scheme: (a) Raw data: Brain MRI data. (b) Illustration of the DNA Palette encoding process: The DNA Palette code establishes a bijection between binary sequences and sets of oligos).

view more 

Credit: ©Science China Press

This study is led by Dr. Yingjin Yuan (Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China) and Dr. Xiaoguang Tong (Department of Neurosurgery, Huanhu Hospital, Tianjin, 300350, China). The research team focused on storing brain magnetic resonance imaging (MRI) data from clinical patients and proposed an innovative coding scheme enabling efficient and reliable storage of digital information in DNA.

Brain MRI is highly valuable in clinical diagnosis, surgical planning, and treatment evaluation due to its non-invasive and high-precision nature. Throughout a patient's treatment journey, ongoing follow-ups and comparative analysis of historical images are crucial for detecting subtle yet significant changes in the patient's condition, thereby supporting personalized and precise medical interventions. For some patients, these datasets require stable storage for several decades or more. However, the large volume and extended retention periods of such medical data challenge current storage technologies. Similar time-series archival characteristics are observed in other scientific datasets, such as meteorological observations and planetary exploration data. These datasets, generated from continuous monitoring of specific subjects or regions, also require long-term, stable storage solutions.

DNA is considered a promising medium for addressing these data storage challenges. However, DNA-based data storage systems face several technical obstacles, including burst errors during DNA synthesis and sequencing, as well as the disordered spatial arrangement of the oligonucleotides (oligos). These issues result in various error types, random orders, and high error rates, which directly impact the accuracy and reliability of DNA-based data storage.

The research team provided an innovative coding scheme called DNA Palette. Tailored for the characteristics of time-series archival data, the DNA Palette code uses unordered combinations of index-free oligos to construct a bijective mapping between binary information and oligos.  Results from in vitro storage experiments using clinical brain MRI data, along with large-scale simulation tests conducted on public MRI datasets (over 30,000 files, 10 GB), planetary science datasets, and meteorological datasets, demonstrate that the DNA Palette code offers high net information density, broad applicability, and data recovery capabilities under low sequencing coverage rates.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.