skip to main content
research-article
Public Access

Hadoop on Named Data Networking: Experience and Results

Published:13 June 2017Publication History
Skip Abstract Section

Abstract

The Named Data Networking (NDN) architecture retrieves content by names rather than connecting to specific hosts. It provides benefits such as highly efficient and resilient content distribution, which fit well to data-intensive distributed computing. This paper presents and discusses our experience in modifying Apache Hadoop, a popular MapReduce framework, to operate on an NDN network. Through this first-of-its-kind implementation process, we demonstrate the feasibility of running an existing, large, and complex piece of distributed software commonly seen in data centers over NDN. We show advantages such as simplified network code and reduced network traffic which are beneficial in a data center environment. There are also challenges faced by NDN, that are being addressed by the community, which can be magnified under data center traffic. Through detailed evaluation, we show a reduction of 16% for overall data transmission between Hadoop nodes while writing data with default replication settings. Preliminary results also show promise for in-network caching of repeated reads in distributed applications. We also show that overall performance is currently slower under NDN, and we identify challenges and opportunities for further NDN improvements.

References

  1. M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In ACM SIGCOMM Computer Communication Review, volume 38, pages 63--74. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arista. Networking in the Hadoop Cluster. https://www.arista.com/assets/data/pdf/TechBulletins/\\NetworkingInTheHadoopCluster.pdf, 2016.Google ScholarGoogle Scholar
  3. A. S. Bonifacio, A. Menolli, and F. Silva. Hadoop mapreduce configuration parameters and system performance: a systematic review. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), page 1. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2014.Google ScholarGoogle Scholar
  4. S. Byan, J. Lentini, A. Madan, and L. Pabon. Mercury: Host-side flash caching for the data center. In 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pages 1--12. IEEE, 2012.Google ScholarGoogle Scholar
  5. Google. Protocol Buffers | Google Developers. https://developers.google.com/protocol-buffers/, 2016.Google ScholarGoogle Scholar
  6. Intel. HiBench: HiBench is a big data benchmark suite. https://github.com/intel-hadoop/HiBench, 2016.Google ScholarGoogle Scholar
  7. V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, and R. L. Braynard. Networking named content. In Proceedings of the 5th international conference on Emerging networking experiments and technologies, pages 1--12. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a globally-deployed software defined wan. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13, pages 3--14, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. J. Ko, V. Pappas, R. Raghavendra, Y. Song, R. B. Dilmaghani, K.-w. Lee, and D. Verma. An information-centric architecture for data center networks. In Proceedings of the second edition of the ICN workshop on Information-centric networking, pages 79--84. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Kreutz, F. M. V. Ramos, P. Veríssimo, C. E. Rothenberg, S. Azodolmolky, and S. Uhlig. Software-Defined Networking: A Comprehensive Survey. Proceedings of the IEEE, 103(1):63, 2015.Google ScholarGoogle Scholar
  11. P. Mehrotra, J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff, S. Saini, and R. Biswas. Performance evaluation of Amazon EC2 for NASA HPC applications. In Proceedings of the 3rd workshop on Scientific Cloud Computing Date, pages 41--50. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Palo Alto Research Center. CCNx | PARC's implementation of content-centric networking. https://blogs.parc.com/ccnx/, 2016.Google ScholarGoogle Scholar
  13. K. Schneider, C. Yi, B. Zhang, and L. Zhang. A practical congestion control scheme for named data networking. In 3rd ACM Conference on Information-Centric Networking (ICN 2016). ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Shafer, S. Rixner, and A. L. Cox. The hadoop distributed filesystem: Balancing portability and performance. In Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on, pages 122--133. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Sherwood. An implementation of content-centric networking sockets for use with hadoop. Stetson University, 12 2011. Senior research project.Google ScholarGoogle Scholar
  16. J. Sherwood. Proposal: An implementation of hadoop utilizing content-centric networking. Stetson University, 5 2011.Google ScholarGoogle Scholar
  17. J. Shi, T. Liang, H. Wu, B. Liu, and B. Zhang. Ndn-nic: Name-based filtering on network interface card. In Proceedings of the 3nd International Conference on Information-Centric Networking. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. The Apache Software Foundataion. Apache Hadoop. https://hadoop.apache.org/, 2016.Google ScholarGoogle Scholar
  19. The Apache Software Foundataion. Apache Hadoop - HDFS Architecture. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html, 2016.Google ScholarGoogle Scholar
  20. The NDN Project. Named Data Networking (NDN) - A Future Internet Architecture. https://named-data.net/, 2016.Google ScholarGoogle Scholar
  21. The NDN Project. Tools and Applications - Named Data Networking (NDN). https://named-data.net/codebase/applications/, 2016.Google ScholarGoogle Scholar
  22. L. Zhang, A. Afanasyev, J. Burke, V. Jacobson, k. claffy, P. Crowley, C. Papadopoulos, L. Wang, and B. Zhang. Named data networking. SIGCOMM Comput. Commun. Rev., 44(3):66--73, July 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Zhu, D. Li, F. Wang, A. Li, K. Ramakrishnan, Y. Liu, J. Wu, N. Zhu, and X. Liu. CCDN: Content-Centric Data Center Networks. 2016.Google ScholarGoogle Scholar

Index Terms

  1. Hadoop on Named Data Networking: Experience and Results

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!