Abstract
Recently, the requirement of storing digital data has been growing rapidly; however, the conventional storage medium cannot satisfy these huge demands. Fortunately, thanks to biological technology development, storing digital data into deoxyribonucleic acid (DNA) has become possible in recent years. Furthermore, because of the attractive features (e.g., high storing density, long-term durability, and stability), DNA storage has been regarded as a potential alternative storage medium to store massive digital data in the future. Nevertheless, reading and writing digital data over DNA requires a series of extremely time-consuming processes (i.e., DNA sequencing and DNA synthesis). More specifically, among the two costs, the writing cost is the predominant cost of a DNA data storage system. Therefore, to enable efficient DNA storage, this article proposes an index management scheme for reducing the number of accesses to DNA storage. Additionally, this article introduces a new DNA data encoding format with VERA (Version Editing Recovery Approach) to reduce the total writing bits while inserting and deleting the data. To the best of our knowledge, this work is the first work to provide a total data management solution for DNA storage. According to the experimental results, the proposed design with VERA can reduce the cost by 77% and improve the performance by 71% compared to the append-only methods.
- [1] . 2016. Forward error correction for DNA data storage. Procedia Comput. Sci. 80 (2016), 1011–1022. Google Scholar
Digital Library
- [2] . 2016. A DNA-Based archival storage system. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). Association for Computing Machinery, New York, NY, 637–649. Google Scholar
Digital Library
- [3] . 2010. Overlap extension PCR cloning: A simple and reliable way to create recombinant plasmids. BioTechniques 48, 6 (2010), 463–465. Google Scholar
Cross Ref
- [4] . 2016. On DNA and Transistorso. Retrieved from http://www.synthesis.cc/synthesis/category/Carlson+Curves.Google Scholar
- [5] . 2019. Molecular digital data storage using DNA. Nature Rev. Genet. 20, 8 (2019), 456–466.Google Scholar
Cross Ref
- [6] . 2019. High information capacity DNA-based data storage with augmented encoding characters using degenerate bases. Sci. Rep. 9, 1 (2019), 1–7.Google Scholar
- [7] . 2012. Next-generation digital information storage in DNA. Science 337, 6102 (2012), 1628–1628.Google Scholar
Cross Ref
- [8] . 2020. Cisco Annual Internet Report (2018–2023) White Paper. Retrieved from https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html.Google Scholar
- [9] . 2017. DNA fountain enables a robust and efficient storage architecture. Science 355, 6328 (2017), 950–954. Retrieved from arXiv: https://science.sciencemag.org/content/355/6328/950.full.pdf.Google Scholar
Cross Ref
- [10] . 2016. How DNA could store all the world’s data. Nature News 537, 7618 (2016), 22.Google Scholar
Cross Ref
- [11] . 2011. Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 21, 5 (2011), 734–740.Google Scholar
Cross Ref
- [12] . 2013. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 7435 (2013), 77–80.Google Scholar
Cross Ref
- [13] . 2016. Coming of age: Ten years of next-generation sequencing technologies. Nature Rev. Genet. 17, 6 (2016), 333.Google Scholar
Cross Ref
- [14] . 2015. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angewandte Chemie Int. Ed. 54, 8 (2015), 2552–2555. Google Scholar
Cross Ref
- [15] . 2017. Fundamental limits of DNA storage systems. In Proceedings of the IEEE International Symposium on Information Theory (ISIT’17). 3130–3134. Google Scholar
Digital Library
- [16] 2017. DNA sequencing price. Retrieved from https://www.biobasic.com/dna-pricing/.Google Scholar
- [17] . 2017. A comparative study and survey on existing DNA compression techniques. Int. J. Adv. Res. Comput. Sci. 8, 3 (2017).Google Scholar
- [18] . 2014. Large-scale de novo DNA synthesis: Technologies and applications. Nature Methods 11, 5 (2014), 499.Google Scholar
Cross Ref
- [19] . 2017. Scientific user behavior and data-sharing trends in a petascale file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12.Google Scholar
Digital Library
- [20] . 2020. Dynamic and scalable DNA-based information storage. Nature Commun. 11, 1 (2020). Google Scholar
Cross Ref
- [21] . 2019. Improve the site-directed mutagenesis efficiency of overlap extension PCR by outboard-primers. Biotechnol. Bull. 35, 12 (2019), 196.Google Scholar
- [22] . 1981. Synthesis of deoxyoligonucleotides on a polymer support. J. Amer. Chem. Soc. 103, 11 (1981), 3185–3191.Google Scholar
Cross Ref
- [23] . 2017. Generations of sequencing technologies: From first to next generation. Electromagn. Biol. Med. 9, 3 (2017), 8–p.Google Scholar
- [24] . 2020. Reading and writing digital data in DNA. Nature Protocols 15, 1 (2020), 86–101.Google Scholar
Cross Ref
- [25] . 2019. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nature Commun. 10, 1 (2019), 1–6.Google Scholar
Cross Ref
- [26] . 2018. Random access in large-scale DNA data storage. Nature Biotechnol. 36, 3 (2018), 242.Google Scholar
Cross Ref
- [27] 2021. Gene Synthesis. Retrieved from https://www.synbio-tech.com/.Google Scholar
- [28] . 2015. A rewritable, random-access DNA-based storage system. Sci. Rep. 5, 1 (2015), 1–10.Google Scholar
- [29] . 2013. A simple, universal, efficient PCR-based gene synthesis method: Sequential OE-PCR gene synthesis. Gene 524, 2 (2013), 347–354.Google Scholar
Cross Ref
- [30] . 2016. Nucleic acid memory. Nature Mater. 15, 4 (2016), 366–370. Google Scholar
Cross Ref
Index Terms
How to Enable Index Scheme for Reducing the Writing Cost of DNA Storage on Insertion and Deletion
Recommendations
A multiple-file write scheme for improving write performance of small files in Fast File System
Fast File System (FFS) stores files to disk in separate disk writes, each of which incurs a disk positioning (seek + rotation) limiting the write performance for small files. We propose a new scheme called co-writing to accelerate small file writes in ...
Reducing Storage Overhead with Small Write Bottleneck Avoiding in Cloud RAID System
GRID '12: Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid ComputingCloud storage systems commonly use replication of stored data sets to ensure high reliability and availability. However, the high storage overhead of replication becomes increasingly unacceptable with the explosive growth of data stored in cloud. Some ...






Comments