skip to main content
research-article

SWARAM: Portable Energy and Cost Efficient Embedded System for Genomic Processing

Published:08 October 2019Publication History
Skip Abstract Section

Abstract

Treatment of patients using high-quality precision medicine requires a thorough understanding of the genetic composition of a patient. Ideally, the identification of unique variations in an individual’s genome is needed for specifying the necessary treatment. Variant calling workflow is a pipeline of tools, integrating state of the art software systems aimed at alignment, sorting and variant calling for the whole genome sequencing (WGS) data. This pipeline is utilized for identifying unique variations in an individual’s genome (compared to a reference genome). Currently, such a workflow is implemented on high-performance computers (with additional GPUs or FPGAs) or in cloud computers. Such systems are large, have a high cost, and rely on the internet for genome data transfer which makes the system unusable in remote locations unequipped with internet connectivity. It further raises privacy concerns due to processing being carried out in a different facility.

To overcome such limitations, in this paper, for the first time, we present a cost-efficient, offline, scalable, portable, and energy-efficient computing system named SWARAM for variant calling workflow processing. The system uses novel architecture and algorithms to match against partial reference genomes to exploit smaller memory sizes which are typically available in tiny processing systems. Extensive tests on a standard benchmark data-set (NA12878 Illumina platinum genome) confirm that the time consumed for the data transfer and completing variant calling workflow on SWARAM was competitive to that of a 32-core Intel Xeon server with similar accuracy, but costs less than a fifth, and consumes less than 40% of the energy of the server system. The original scripts and code we developed for executing the variant calling workflow on SWARAM are available in the associated Github repository https://github.com/Rammohanty/swaram.

References

  1. 2013. Maxeler Technologies. https://www.maxeler.com/products/mpc-xseries/.Google ScholarGoogle Scholar
  2. 2019. SWARAM repository. https://github.com/Rammohanty/swaram.Google ScholarGoogle Scholar
  3. J. Arram, T. Kaplan, W. Luk, and P. Jiang. 2016. Leveraging FPGAs for accelerating short read alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics / IEEE, ACM 5963, c (2016), 1--10.Google ScholarGoogle Scholar
  4. K. Benkrid, Y. Liu, and A. Benkrid. 2009. A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 4 (2009), 561--570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Brodin, K. Eiglmeier, M. Marmiesse, A. Billault, T. Garnier, S. Niemann, S. Cole, and R. Brosch. 2002. Bacterial artificial chromosome-based comparative genomic analysis identifies Mycobacterium microti as a natural ESAT-6 deletion mutant. Infection and Immunity 70, 10 (2002), 5568--5578.Google ScholarGoogle ScholarCross RefCross Ref
  6. N. Chen, T. Chiu, Y. Li, Y. Chien, and Y. Lu. 2015. Power efficient special processor design for burrows-wheeler-transform-based short read sequence alignment. In Biomedical Circuits and Systems Conference (BioCAS), 2015 IEEE. IEEE, 1--4.Google ScholarGoogle Scholar
  7. S. Chen and M. A Senar. 2016. Accelerating BWA aligner using multistage data parallelization on multicore and manycore architectures. Procedia Computer Science 80 (2016), 2438--2442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cleary, R. Braithwaite, K. Gaastra, B. Hilbush, S. Inglis, S. Irvine, A. Jackson, R. Littin, M. Rathod, D. Ware, et al. 2015. Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines. BioRxiv (2015), 023754.Google ScholarGoogle Scholar
  9. D. D’Agostino, L. Morganti, E. Corni, D. Cesini, and I. Merelli. 2019. Combining edge and cloud computing for low-power, cost-effective metagenomics analysis. Future Generation Computer Systems 90 (2019), 79--85.Google ScholarGoogle ScholarCross RefCross Ref
  10. P. Danecek, A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. DePristo, R. Handsaker, G. Lunter, G. Marth, S. Sherry, et al. 2011. The variant call format and VCFtools. Bioinformatics 27, 15 (2011), 2156--2158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. DePristo, E. Banks, R. Poplin, K. Garimella, J. Maguire, C. Hartl, A. Philippakis, G. Del Angel, M. Rivas, M. Hanna, et al. 2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43, 5 (2011), 491--498.Google ScholarGoogle ScholarCross RefCross Ref
  12. Y. Erlich and A. Narayanan. 2014. Routes for breaching and protecting genetic privacy.Google ScholarGoogle Scholar
  13. F. S. Collins, E. D. Green, A. E. Guttmacher, and M. S. Guyer. 2003. A vision for the future of genomics research. Nature 431, April (2003), 835--847.Google ScholarGoogle ScholarCross RefCross Ref
  14. GAIB. 2018. NA12878. Retrieved Apr 19, 2018 from ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/latest/GRCh37.Google ScholarGoogle Scholar
  15. S. Gire, A. Goba, K. Andersen, R. Sealfon, D. Park, L. Kanneh, S. Jalloh, M. Momoh, M. Fullah, G. Dudas, et al. 2014. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345, 6202 (2014), 1369--1372.Google ScholarGoogle ScholarCross RefCross Ref
  16. V. Gnanasambandapillai, A. Bayat, and S. Parameswaran. 2018. MESGA: An MPSoC based embedded system solution for short read genome alignment. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference. IEEE Press, 52--57.Google ScholarGoogle Scholar
  17. Y. Guo, X. Ding, Y. Shen, G. Lyon, and K. Wang. 2015. SeqMule: Automated pipeline for analysis of human exome/genome sequencing data. Scientific Reports 5 (2015), 1--10. http://dx.doi.org/10.1038/srep14283Google ScholarGoogle Scholar
  18. C. Herzeel, P. Costanza, T. Ashby, and R. Wuyts. 2013. Performance Analysis of BWA Alignment. Technical Report. Technical Report Exascience Life Lab.Google ScholarGoogle Scholar
  19. E. Houtgast, V. Sima, K. Bertels, and Z. Al-Ars. 2016. GPU-accelerated BWA-MEM genomic mapping algorithm using adaptive load balancing. In Architecture of Computing Systems -- ARCS 2016, F. Hannig, J. Cardoso, T. Pionteck, D. Fey, W. Schroder-Preikschat, and J. Teich (Eds.). Springer International Publishing, Cham, 130--142.Google ScholarGoogle Scholar
  20. S. Huang, G. Manikandan, A. Ramachandran, K. Rupnow, W. Hwu, and D. Chen. 2017. Hardware acceleration of the pair-HMM algorithm for DNA variant calling. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 275--284.Google ScholarGoogle Scholar
  21. Illumina. 2016. MiniSeq System. https://science-docs.illumina.com/documents/Instruments/miniseq-system-spec-sheet-770-2015-039/miniseq-system-spec-sheet-770-2015-039.pdf.Google ScholarGoogle Scholar
  22. J. Ivković, A. Veljović, and B. Ranđelović. 2016. ODROID-XU4 as a desktop PC and microcontroller development boards alternative. Technics and Informatics in Education May (2016), 439--444.Google ScholarGoogle Scholar
  23. B. Kelly, J. Fitch, Y. Hu, D. Corsmeier, H. Zhong, A. Wetzel, R. Nordquist, D. Newsom, and P. White. 2015. Churchill: An ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics. Genome Biology 16, 1 (2015), 6.Google ScholarGoogle ScholarCross RefCross Ref
  24. P. Klus, S. Lam, D. Lyberg, M. Cheung, G. Pullan, I. McFarlane, G. Yeo, and B. Lam. 2012. BarraCUDA-a fast short read sequence aligner using graphics processing units. BMC Research Notes 5, 1 (2012), 27.Google ScholarGoogle ScholarCross RefCross Ref
  25. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25, 16 (2009), 2078--2079.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Li and N. Homer. 2010. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11, 5 (2010), 473--483.Google ScholarGoogle ScholarCross RefCross Ref
  27. Y. Liao, G. Smyth, and W. Shi. 2013. The subread aligner: Fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research 41, 10 (2013), e108--e108.Google ScholarGoogle ScholarCross RefCross Ref
  28. C. Liu, T. Wong, E. Wu, R. Luo, S. Yiu, Y. Li, B. Wang, C. Yu, X. Chu, K. Zhao, et al. 2012. SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28, 6 (2012), 878--879.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Illumina Cambridge Ltd. 2018 (accessed Apr 19, 2018). NA12878. https://www.ebi.ac.uk/ena/data/view/ERR194147.Google ScholarGoogle Scholar
  30. R. Luo, Y. Wong, W. Law, L. Lee, J. Cheung, C. Liu, and T. Lam. 2014. BALSA: Integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU. PeerJ 2 (2014), e421.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. OD́riscoll, J. Daugelaite, and R. Sleator. 2013. Big data, Hadoop and cloud computing in genomics. Journal of Biomedical Informatics 46, 5 (2013), 774--781.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Olson, M. Kim, C. Clauson, B. Kogon, C. Ebeling, S. Hauck, and W. Ruzzo. 2012. Hardware acceleration of short read mapping. In 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 161--168.Google ScholarGoogle Scholar
  33. World Health Organization et al. 2015. WHO: Ebola Situation Report 11 March 2015.Google ScholarGoogle Scholar
  34. S. Pabinger, A. Dander, M. Fischer, R. Snajder, M. Sperk, M. Efremova, B. Krabichler, M. Speicher, J. Zschocke, and Z. Trajanoski. 2014. A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics 15, 2 (2014), 256--278.Google ScholarGoogle ScholarCross RefCross Ref
  35. V. Popic and S. Batzoglou. 2017. A hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy. Nature Communications 8 (2017), 15311.Google ScholarGoogle ScholarCross RefCross Ref
  36. R. Poplin, V. Ruano-Rubio, M. DePristo, T. Fennell, M. Carneiro, G. der Auwera, D. Kling, L. Gauthier, A. Levy-Moonshine, D. Roazen, and Others. 2017. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv (2017), 201178.Google ScholarGoogle Scholar
  37. A. Rimmer, H. Phan, I. Mathieson, Z. Iqbal, S. Twigg, A. Wilkie, G. McVean, G. Lunter, WGS500 Consortium, et al. 2014. Integrating mapping-, assembly-and haplotype-based approaches for calling variants in clinical sequencing applications. Nature Genetics 46, 8 (2014), 912--918.Google ScholarGoogle ScholarCross RefCross Ref
  38. S. Sandmann, A. De Graaf, M. Karimi, B. Van Der Reijden, E. Hellström-Lindberg, J. Jansen, and M. Dugas. 2017. Evaluating variant calling tools for non-matched next-generation sequencing data. Scientific Reports 7 (2017), 43169.Google ScholarGoogle ScholarCross RefCross Ref
  39. M. Schatz. 2009. CloudBurst: Highly sensitive read mapping with MapReduce. Bioinformatics 25, 11 (2009), 1363--1369.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Schatz, B. Langmead, and S. Salzberg. 2010. Cloud computing and the DNA data race. Nature Biotechnology 28, 7 (2010), 691--693.Google ScholarGoogle ScholarCross RefCross Ref
  41. M. Schatz, C. Trapnell, A. Delcher, and A. Varshney. 2007. High-throughput sequence alignment using graphics processing units. BMC Bioinformatics 8, 1 (2007), 474.Google ScholarGoogle ScholarCross RefCross Ref
  42. N. Siva. 2008. 1000 Genomes project.Google ScholarGoogle Scholar
  43. Zachary D. Stephens, Skylar Y. Lee, Faraz Faghri, Roy H. Campbell, Chengxiang Zhai, Miles J. Efron, Ravishankar Iyer, Michael C. Schatz, Saurabh Sinha, and Gene E. Robinson. 2015. Big data: Astronomical or genomical? PLoS Biology 13, 7 (2015), e1002195.Google ScholarGoogle ScholarCross RefCross Ref
  44. Ellen Tsai, Rimma Shakbatyan, Jason Evans, Peter Rossetti, Chet Graham, Himanshu Sharma, Chiao-Feng Lin, and Matthew Lebo. 2016. Bioinformatics workflow for clinical whole genome sequencing at partners healthcare personalized medicine. Journal of Personalized Medicine 6, 1 (2016), 12.Google ScholarGoogle ScholarCross RefCross Ref
  45. M. Yang, B. Athey, H. Arabnia, A. Sung, Q. Liu, J. Yang, J. Mao, and Y. Deng. 2009. High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC Genomics 10, SUPPL. 1 (2009), 1--3.Google ScholarGoogle Scholar

Index Terms

  1. SWARAM: Portable Energy and Cost Efficient Embedded System for Genomic Processing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!