Abstract
In this article, we present LWDLS, a lightweight data location service designed for Exascale storage systems (storage systems with order of 1018 bytes) and geo-distributed storage systems (large storage systems with physically distributed locations). LWDLS provides a search-based data location solution, and enables free data placement, movement, and replication. In LWDLS, probe and prune protocols are introduced that reduce topology mismatch, and a heuristic flooding search algorithm (HFS) is presented that achieves higher search efficiency than pure flooding search while having comparable search speed and coverage to the pure flooding search. LWDLS is lightweight and scalable in terms of incorporating low overhead, high search efficiency, no global state, and avoiding periodic messages. LWDLS is fully distributed and can be used in nondeterministic storage systems and in deterministic storage systems to deal with cases where search is needed. Extensive simulations modeling large-scale High Performance Computing (HPC) storage environments provide representative performance outcomes. Performance is evaluated by metrics including search scope, search efficiency, and average neighbor distance. Results show that LWDLS is able to locate data efficiently with low cost of state maintenance in arbitrary network environments. Through these simulations, we demonstrate the effectiveness of protocols and search algorithm of LWDLS.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, A Lightweight Data Location Service for Nondeterministic Exascale Storage Systems
- John Bent, Garth Gibson, Gary Grider, Ben McClelland, Paul Nowoczynski, James Nunez, Milo Polte, and Meghan Wingate. 2009. PLFS: A checkpoint filesystem for parallel applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. 1--12. DOI: http://dx.doi.org/10.1145/1654059.1654081 Google Scholar
Digital Library
- Kevin Brandstatter, Dongfang Zhao, Ke Wang, Anupam Rajendran, Zhao Zhang, Ioan Raicu, Tonglin Li, and Xiaobing Zhou. 2013. ZHT: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS'13). Google Scholar
Digital Library
- John Buford. 2013. Microsoft PowerPoint - JBuford-IETF-P2PSIP-Overlay-Systems-v3.ppt-IETF64_P2PSIP_AdHoc_P2P_Overview_Buford.pdf. (2013). http://www.softarmor.com/sipping/meets/ietf64/slides/IETF64_P2PSIP_AdHoc_P2P_Overview_Buford.pdf.Google Scholar
- Philip H. Carns, Walter B. Ligon, III, Robert B. Ross, and Rajeev Thakur. 2000. PVFS: A parallel file system for linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference. Google Scholar
Digital Library
- Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott Shenker. 2003. Making gnutella-like P2P systems scalable. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'03). ACM, New York, 407--418. DOI: http://dx.doi.org/10.1145/863955.864000 Google Scholar
Digital Library
- Sérgio Crisóstomo, Udo Schilcher, Christian Bettstetter, and João Barros. 2012. Probabilistic flooding in stochastic networks: Analysis of global information outreach. Comput. Netw. 56, 1, 142--156. DOI: http://dx.doi.org/10.1016/j.comnet.2011.08.014 Google Scholar
Digital Library
- Matthew L. Curry, Ruth Klundt, and H. Lee Ward. 2012. Using the Sirocco file system for high-bandwidth checkpoints. Sandia National Laboratories, Technical Report SAND2012-1087. http://prod.sandia.gov/techlib/access-control.cgi/2012/121087.pdf.Google Scholar
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's highly available key-value store. SIGOPS Oper. Syst. Rev. 41, 205--220. DOI: http://dx.doi.org/10.1145/1323293.1294281 Google Scholar
Digital Library
- Wolfgang E. Denzel, Jian Li, Peter Walker, and Yuho Jin. 2008. A framework for end-to-end simulation of high-performance computing systems. In Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops (Simutools'08). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Article 21, http://dl.acm.org/citation.cfm?id=1416222.1416248. Google Scholar
Digital Library
- Jack Dongarra. 2010. Impact of architecture and technology for extreme scale on software and algorithm design. In Proceedings of the Department of Energy Workshop on Cross-Cutting Technologies for Computing at the Exascale.Google Scholar
- Rossano Gaeta and Matteo Sereno. 2011. Generalized probabilistic flooding in unstructured peer-to-peer networks. IEEE Trans. Parallel Distrib. Syst. 22, 12, 2055--2062. DOI: http://dx.doi.org/10.1109/TPDS.2011.82 Google Scholar
Digital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles. ACM, 96--108. http://www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf. Google Scholar
Digital Library
- Christos Gkantsidis, Milena Mihail, and Amin Saberi. 2005. Hybrid search schemes for unstructured peer-to-peer networks. In Proceedings of the 24th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCom'05). Vol. 3, 1526--1537. DOI: http://dx.doi.org/10.1109/INFCOM.2005.1498436Google Scholar
Cross Ref
- Anjali Gupta, Barbara Liskov, and Rodrigo Rodrigues. 2003. One hop lookups for peer-to-peer overlays. In Proceedings of the 9th Conference on Hot Topics in Operating Systems (HOTOS'03). Vol. 9, USENIX Association, Berkeley, CA, 2--2. http://dl.acm.org/citation.cfm?id=1251054.1251056. Google Scholar
Digital Library
- Song Jiang, Lei Guo, Xiaodong Zhang, and Haodong Wang. 2008. LightFlood: Minimizing redundant messages and maximizing scope of peer-to-peer search. IEEE Trans. Parallel Distrib. Syst. 19, 5, 601--614. DOI: http://dx.doi.org/10.1109/TPDS.2007.70772 Google Scholar
Digital Library
- Ketama 2013. Ketama. http://www.audioscrobbler.net/development/ketama/.Google Scholar
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 2, 35--40. DOI: http://dx.doi.org/10.1145/1773912.1773922 Google Scholar
Digital Library
- Tsungnan Lin, Pochiang Lin, Hsinping Wang, and Chiahung Chen. 2009. Dynamic search algorithm in unstructured peer-to-peer networks. IEEE Trans. Parall. Distrib. Syst. 20, 5, 654--666. DOI: http://dx.doi.org/10.1109/TPDS.2008.134 Google Scholar
Digital Library
- Yunhao Liu. 2008. A two-hop solution to solving topology mismatch. IEEE Trans. Parall. Distrib. Syst. 19, 11, 1591--1600. DOI: http://dx.doi.org/10.1109/TPDS.2008.24 Google Scholar
Digital Library
- Yunhao Liu, Li Xiao, Xiaomei Liu, L.M. Ni, and Xiaodong Zhang. 2005. Location awareness in unstructured peer-to-peer systems. IEEE Trans. Parall. Distrib. Syst. 16, 2, 163--174. DOI: http://dx.doi.org/10.1109/TPDS.2005.21 Google Scholar
Digital Library
- Boon Thau Loo, Ryan Huebsch, Ion Stoica, and Joseph M. Hellerstein. 2004. The case for a hybrid p2p search infrastructure. In Proceedings of the 3rd International Conference on Peer-to-Peer Systems (IPTPS'04). Springer-Verlag, Berlin, Heidelberg, 141--150. DOI: http://dx.doi.org/10.1007/978-3-540-30183-7_14 Google Scholar
Digital Library
- Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. 2002. Search and replication in unstructured peer-to-peer networks. In Proceedings of the 16th International Conference on Supercomputing (ICS'02). ACM, New York, 84--95. DOI: http://dx.doi.org/10.1145/514191.514206 Google Scholar
Digital Library
- Petar Maymounkov and David Mazières. 2002. Kademlia: A peer-to-peer information system based on the XOR metric. In Revised Papers from the 1st International Workshop on Peer-to-Peer Systems (IPTPS'01). Springer-Verlag, 53--65. http://dl.acm.org/citation.cfm?id=646334.687801. Google Scholar
Digital Library
- Memcached 2013. Memcached. http://www.memcached.org/.Google Scholar
- Mark Newman, Steven Strogatz, and Duncan J. Watts. 2001. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 2, 026118. DOI: http://dx.doi.org/10.1103/PhysRevE.64.026118Google Scholar
Cross Ref
- Paul Nowoczynski, Nathan Stone, Jared Yanovich, and Jason Sommerfield. 2008. Zest Checkpoint storage system for large supercomputers. In Petascale Data Storage Workshop (PDSW'08). 1--5. DOI: http://dx.doi.org/10.1109/PDSW.2008.4811883Google Scholar
Cross Ref
- Konstantinos Oikonomou, Dimitrios Kogias, and Ioannis Stavrakakis. 2010. Probabilistic flooding for efficient information dissemination in random graph topologies. Comput. Netw. 54, 10, 1615--1629. DOI: http://dx.doi.org/10.1016/j.comnet.2010.01.007 Google Scholar
Digital Library
- Karl Pearson. 1905. The problem of the random walk. Nature 72, 1865, 294--294. DOI: http://dx.doi.org/10.1038/072294b0Google Scholar
- Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. 2001. A scalable content-addressable network. SIGCOMM Comput. Commun. Rev. 31, 4, 161--172. DOI: http://dx.doi.org/10.1145/964723.383072 Google Scholar
Digital Library
- Matei Ripeanu, Adriana Iamnitchi, and Ian Foster. 2002. Mapping the gnutella network. IEEE Internet Comput. 6, 1, 50--57. DOI: http://dx.doi.org/10.1109/4236.978369 Google Scholar
Digital Library
- Ohad Rodeh and Avi Teperman. 2003. zFS - A scalable distributed file system using object disks. In Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003 (MSST'03). 207--218. DOI: http://dx.doi.org/10.1109/MASS.2003.1194858 Google Scholar
Digital Library
- Antony Rowstron and Peter Druschel. 2001. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, 329--350. Google Scholar
Digital Library
- F. Schmuck and R. Haskin. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the 1st Conference on File and Storage Technologies (FAST'02), Monterey, CA. Google Scholar
Digital Library
- Philip Schwan. 2003. Lustre: Building a file system for 1,000-node clusters. In Proceedings of the Linux Symposium. 9.Google Scholar
- Haiying Shen, Cheng-Zhong Xu, and Guihai Chen. 2006. Cycloid: A constant-degree and lookup-efficient P2P overlay network. Perform. Eval. 63, 3, 195--216. DOI: http://dx.doi.org/10.1016/j.peva.2005.01.004 Google Scholar
Digital Library
- Alexandre O. Stauffer and Valmir C. Barbosa. 2004. Probabilistic heuristics for disseminating information in networks. CoRR cs.NI/0409001. Google Scholar
Digital Library
- Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. SIGCOMM Comput. Commun. Rev. 31, 4, 149--160. DOI: http://dx.doi.org/10.1145/964723.383071 Google Scholar
Digital Library
- Hong Tang and Tao Yang. 2003. An efficient data location protocol for self-organizing storage clusters. In Proceedings of the International Conference for High Performance Computing and Communications. Google Scholar
Digital Library
- Bruce Tolley. 2011. Solarflare Fujitsu low latency test report - Solarflare_low-latency_TestReport.pdf. http://www.fujitsu.com/downloads/COMP/ffna/ethernet/Solarflare_Low-Latency_TestReport.pdf.Google Scholar
- András Varga and Rudolf Hornig. 2008. An overview of the OMNeT++ simulation environment. In Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops (Simutools'08). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). Article 60, http://dl.acm.org/citation.cfm?id=1416222.1416290. Google Scholar
Digital Library
- Sage A. Weil, Scott A. Brandt, Ethan L. Miller, and Carlos Maltzahn. 2006a. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'06). ACM. Google Scholar
Digital Library
- Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006b. CEPH: A scalable, high-performance distributed file system. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. 307--320. Google Scholar
Digital Library
- Tao Yang, Hong Tang, Aziz Gulbeden, Jingyu Zhou, and Lingkun Chu. 2004. Sorrento: A self-organizing storage cluster for parallel data-intensive applications. In Proceedings of the High Performance Computing, Networking and Storage Conference (SC'04). Google Scholar
Digital Library
- Min Yang and Yuanyuan Yang. 2010. An efficient hybrid peer-to-peer system for distributed data sharing. IEEE Trans. 59, 9, 1158--1171. DOI: http://dx.doi.org/10.1109/TC.2009.175 Google Scholar
Digital Library
- Ben Y. Zhao, John D. Kubiatowicz, and Anthony D. Joseph. 2001. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech. rep. UCB/CSD-01-1141. EECS Department, University of California, Berkeley. http://www.eecs.berkeley.edu/Pubs/TechRpts/2001/5213.html. Google Scholar
Digital Library
Index Terms
A Lightweight Data Location Service for Nondeterministic Exascale Storage Systems
Recommendations
Using Working Set Reorganization to Manage Storage Systems with Hard and Solid State Disks
ICPPW '14: Proceedings of the 2014 43rd International Conference on Parallel Processing WorkshopsScientific applications from many problem domains produce and/or access large volumes of data. To support these applications, designers of high-end computing (HEC) systems have greatly increased the capacity of storage systems in recent years. However, ...
A Private and Scalable Authentication for RFID Systems Using Reasonable Storage
TRUSTCOM '11: Proceedings of the 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and CommunicationsIn recent years, numerous authentication protocols for radio frequency identification (RFID) systems have been proposed to protect privacy. Due to the hardware resource limitation on RFID tags, the majority of these protocols are based on symmetric-key ...






Comments