Abstract
We describe the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired resources as a topology of interconnected groups with required intragroup, intergroup, and per-node characteristics, along with the utility that the application derives from specified ranges of metric values. This design gives users the flexibility to find geographically distributed resources for applications that are sensitive to both node and network characteristics, and allows the system to rank acceptable configurations based on their quality for that application.
Rather than evaluating a single implementation of SWORD, we explore a variety of architectural designs that deliver the required functionality in a scalable and highly available manner. We discuss the trade-offs of using a centralized architecture as compared to a fully decentralized design to perform wide-area resource discovery. To summarize our results, we found that a centralized architecture based on 4-node server cluster sites at network-peering facilities outperforms a decentralized DHT-based resource discovery infrastructure with respect to query latency for all but the smallest number of sites. However, although a centralized architecture shows significant promise in stable environments, we find that our decentralized implementation has acceptable performance and also benefits from the DHT's self-healing properties in more volatile environments. We evaluate the advantages and disadvantages of centralized and distributed resource discovery architectures on 1000 hosts in emulation and on approximately 200 PlanetLab nodes spread across the Internet.
- Albrecht, J., Tuttle, C., Snoeren, A. C., and Vahdat, A. 2006. PlanetLab application management using Plush. SIGOPS Oper. Syst. Rev. 40, 1, 33--40. Google Scholar
Digital Library
- Aspnes, J., Kirsch, J., and Krishnamurthy, A. 2004. Load balancing and locality in range-queriable data structures. In Proceedings of the Annual ACM SIGOPS Symposium on Principles of Distributed Computing (PODC). Google Scholar
Digital Library
- Aspnes, J. and Shah, G. 2003. Skip graphs. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Google Scholar
Digital Library
- AuYoung, A., Chun, B. N., Snoeren, A. C., and Vahdat, A. 2004. Resource allocation in federated distributed computing infrastructures. In Proceedings of the Symposium on Reliable Infrastructures for XML (OASIS).Google Scholar
- Awerbuch, B. and Scheidler, C. 2003. Peer-to-Peer systems for prefix search. In Proceedings of the Annual ACM SIGOPS Symposium on Principles of Distributed Computing (PODC). Google Scholar
Digital Library
- Balazinska, M., Balakrishnan, H., and Karger, D. 2002. INS/Twine: A scalable peer-to-peer architecture for intentional resource discovery. In Proceedings of the IEEE International Conference on Program Comprehension (ICPC).Google Scholar
- Bavier, A., Bowman, M., Chun, B., Culler, D., Karlin, S., Muir, S., Peterson, L., Roscoe, T., Spalink, T., and Wawrzoniak, M. 2004. Operating systems support for planetary-scale network services. In Proceedings of the ACM Symposium on Networked Systems Design and Implementation (NSDI). Google Scholar
Digital Library
- Bharambe, A., Agrawal, M., and Seshan, S. 2004. Mercury: Supporting scalable multi-attribute range queries. In Proceedings of the ACM SIGCOMM Data Communications Conference. Google Scholar
Digital Library
- Chang, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2002. Towards capturing representative AS-level Internet topologies. In Proceedings of the ACM Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). Google Scholar
Digital Library
- Chawathe, Y., Ramabhadran, S., Ratnasamy, S., LaMarca, A., Shenker, S., and Hellerstein, J. 2005. A case study in building layered DHT applications. In Proceedings of the ACM SIGCOMM Data Communications Conference. Google Scholar
Digital Library
- Chen, Y., Bindel, D., Song, H., and Katz, R. 2004. An algebraic approach to practical and scalable overlay network monitoring. In Proceedings of the ACM SIGCOMM Data Communications Conference. Google Scholar
Digital Library
- Chun, B. 2008. Slicestat. http://berkeley.intel-research.net/bnc/slicestat/.Google Scholar
- Considine, J., Byers, J., and Mayer-Patel, K. 2003. A constraint satisfication approach to testbed embedding services. In Proceedings of the Workshop on Hot Topics in Network (HotNets).Google Scholar
- Crainiceanu, A., Linga, P., Gehrke, J., and Shanmugasundaram, J. 2004. Querying peer-to-peer networks using P-trees. In Proceedings of the International Workshop on Web and Databases (WebDB). Google Scholar
Digital Library
- Czajkowski, K., Fitzgerald, S., Foster, I., and Kesselman, C. 2001. Grid information services for distributed resource sharing. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC). Google Scholar
Digital Library
- Czajkowski, K., Foster, I., Kesselman, C., Sander, V., and Tuecke, S. 2002. SNAP: A protocol for negotiating service level agreements and coordinating resource management in distributed systems. In Proceedings of the 8th Workshop on Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science, vol. 2537. Springer, 153--183. Google Scholar
Digital Library
- Dabek, F., Cox, R., Kaahoek, F., and Morris, R. 2004. Vivaldi: A decentralized network coordinate system. In Proceedings of the ACM SIGCOMM Data Communications Conference. Google Scholar
Digital Library
- Dabek, F., Zhao, B., Druschel, P., Kubiatowicz, J., and Stoica, I. 2003. Towards a common API for structured P2P overlays. In Proceedings of the International Workshop on Peer-to-Peer Systems (IPTPS).Google Scholar
- DNS 1987. Domain names-implementation and specification. http://www.ietf.org/rfc/rfc1035.txt.Google Scholar
- Douceur, J. R. 2002. The Sybil attack. In Proceedings of the International Workshop on Peer-to-Peer Systems (IPTPS). Google Scholar
Digital Library
- Ferguson, D., Nikolaou, C., Sairamesh, J., and Yemini, Y. 1996. Economic Models for Allocating Resources in Computer Systems. World Scientific (Scott Clearwater, Ed.).Google Scholar
- Foster, I. and Kesselman, C. 2003. The Grid 2. Morgan Kaufmann.Google Scholar
- Foster, I., Kesselman, C., and Tuecke, S. 2001. The anatomy of the grid: Enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15, 3, 200--222. Google Scholar
Digital Library
- Fu, Y., Chase, J., Chun, B., Schwab, S., and Vahdat, A. 2003. SHARP: An architecture for secure resource peering. In Proceedings of the SIGOPS Symposium on Operating Systems Principles (SOSP). Google Scholar
Digital Library
- Gupta, A., Agrawal, D., and Abbad, A. E. 2003. Approximate range selection queries in peer-to-peer systems. In Proceedings of the Conference on Innovative Data Systems Research (CIDR).Google Scholar
- Huang, A. and Steenkiste, P. 2003. Network-Sensitive service discovery. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS). Google Scholar
Digital Library
- Huebsch, R. 2004. PlaneTlab application manager. http://appmanager.berkeley.intel-research.net/.Google Scholar
- Huebsch, R., Hellerstein, J. M., Boon, N. L., Loo, T., Shenker, S., and Stoica, I. 2003. Querying the Internet with PIER. In Proceedings of the International Conference on Very Large Databases (VLDB). Google Scholar
Digital Library
- Ibaraki, T. and Katoh, N. 1988. Resource Allocation Problems: Algorithmic Approaches. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Jagadish, H. V. 1990. Linear clustering of objects with multiple attributes. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- Jini. 1998. Jini homepage. http://java.sun.com/products/jini.Google Scholar
- Karger, D. and Ruhl, M. 2004. Simple efficient load balancing algorithms for peer-to-peer systems. In Proceedings of the International Workshop on Peer-to-Peer Systems (IPTPS). Google Scholar
Digital Library
- Kazaa. 2001. Kazaa homepage. http://www.kazaa.com/us/index.htm.Google Scholar
- Kee, Y.-S., Logothetis, D., Huang, R., Casanova, H., and Chien, A. 2005. Efficient resource description and high quality selection for virtual grids. In Proceedings of the IEEE International Symposium on Cluster Computing and the Gird (CCGrid). Google Scholar
Digital Library
- Krishnamurthy, B. and Wang, J. 2000. On network-aware clustering of Web clients. In Proceedings of the ACM SIGCOMM Data Communications Conference. Google Scholar
Digital Library
- LDAP 1997. LDAP homepage. http://www.ietf.org/rfc/rfc2251.txt.Google Scholar
- Li, J., Stribling, J., Morris, R., Kaashoek, M. F., and Gil, T. M. 2005. A performance vs. cost framework for evaluating DHT design tradeoffs under churn. In Proceedings of the Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM).Google Scholar
- Linux VServer. 2003. VServer homepage. http://linux-vserver.org/.Google Scholar
- Litzkow, M., Livny, M., and Mutka, M. 1988. Condor--A hunter of idle workstations. In Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS).Google Scholar
- Liu, C. and Foster, I. 2004. A constraint language approach to matchmaking. In Proceedings of the IEEE International Workshop on Research Issues in Data Engineering (RIDE). Google Scholar
Digital Library
- Liu, C., Yang, L., Foster, I., and Angulo, D. 2002. Design and evaluation of a resource selection framework. In Proceedings of the International Symposium on High Performance Distributed Computing (HPDC). Google Scholar
Digital Library
- Massie, M., Chun, B., and Culler, D. 2004. The Ganglia distributed monitoring system: Design, implementation, and experience. Parallel Comput. 30, 7 (Jul.).Google Scholar
Cross Ref
- Nath, S., Ke, Y., Gibbons, P. B., Karp, B., and Seshan, S. 2003. IrisNet: An architecture for enabling sensor-enriched Internet services. Tech. Rep. IRP-TR-03-04, Intel Research, Pittsburgh, Pennsylvania. June.Google Scholar
- Ng, T. S. E. and Zhang, H. 2002. Predicting Internet network distance with coordinates-based approaches. In Proceedings of the Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM).Google Scholar
- Ng, T. S. E. and Zhang, H. 2004. A network positioning system for the Internet. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC). Google Scholar
Digital Library
- Oppenheimer, D., Chun, B., Patterson, D., Snoeren, A. C., and Vahdat, A. 2006. Service placement in shared wide-area platforms. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC). Google Scholar
Digital Library
- Pai, V. 2008. CoTop: A slice-based top for PlanetLab. http://codeen.cs.princeton.edu/cotop/.Google Scholar
- Pai, V. S., Wang, L., Park, K., Pang, R., and Peterson, L. 2003. The dark side of the Web: An open proxy's view. In Proceedings of the Workshop on Hot Topics in Networks (HotNets).Google Scholar
- Ramabhadran, S., Ratnasamy, S., Hellerstein, J. M., and Shenker, S. 2004. Prefix hash tree. In Proceedings of the Annual ACM SIGOPS Symposium on Principles of Distributed Computing (PODC). Google Scholar
Digital Library
- Raman, R., Livny, M., and Solomon, M. 1998. Matchmaking: Distributed resource management for high throughput computing. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC). Google Scholar
Digital Library
- Raman, R., Livny, M., and Solomon, M. 2003. Policy driven heterogeneous resource co-allocation with gangmatching. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC). Google Scholar
Digital Library
- Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S. 2001. A content addressable network. In Proceedings of the ACM SIGCOMM Data Communications Conference. Google Scholar
Digital Library
- Red Herring Magazine. 2004. Distributed computing: We come in peace. Red Herring Mag. (Aug.).Google Scholar
- Reynolds, P. and Vahdat, A. 2003. Efficient peer-to-peer keyword searching. In Proceedings of the ACM/IFIP/USENIX International Middleware Conference. Google Scholar
Digital Library
- Rhea, S., Chun, B.-G., Kubiatowicz, J., and Shenker, S. 2005. Fixing the embarrassing slowness of OpenDHT on PlanetLab. In Proceedings of the Conference on Real, Large Distributed Systems (WORLDS). Google Scholar
Digital Library
- Rhea, S., Geels, D., Roscoe, T., and Kubiatowicz, J. 2004. Handling churn in a DHT. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC). Google Scholar
Digital Library
- Rhea, S., Godfrey, B., Karp, B., Kubiatowicz, J., Ratnasamy, S., Shenker, S., Stoica, I., and Yu, H. 2005. OpenDHT: A public DHT service and its uses. In Proceedings of the ACM SIGCOMM Data Communications Conference. Google Scholar
Digital Library
- SLP. 1987. SLP. http://www.ietf.org/rfc/rfc2165.txt.Google Scholar
- Spence, D. and Harris, T. 2003. XenoSearch: Distributed resource discovery in the XenoServer open platform. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC). Google Scholar
Digital Library
- Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. 2001. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of the ACM SIGCOMM Data Communications Conference. Google Scholar
Digital Library
- Tang, C. and Dwarkadas, S. 2004. Hybrid global-local indexing for efficient peer-to-peer information retrieval. In Proceedings of the ACM Symposium on Networked Systems Design and Implementation (NSDI). Google Scholar
Digital Library
- Tang, C., Xu, Z., and Mahalingam, M. 2003. pSearch: Information retrieval in structured overlays. ACM SIGCOMM Comput. Commun. Rev. 33, 1, 89--94. Google Scholar
Digital Library
- Vahdat, A., Yocum, K., Walsh, K., Mahadevan, P., Kostić, D., Chase, J., and Becker, D. 2002. Scalability and accuracy in a large-scale network emulator. In Proceedings of the ACM USENIX Symposium on Operating Systems Design and Implementation (OSDI). Google Scholar
Digital Library
- van Renesse, R., Birman, K., and Vogels, W. 2003. Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21, 2, 164--206. Google Scholar
Digital Library
- Wawrzoniak, M., Peterson, L., and Roscoe, T. 2003. Sophia: An information plane for networked systems. In Proceedings of the Workshop on Hot Topics in Networking (HotNets).Google Scholar
- White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., and Joglekar, A. 2002. An integrated experimental environment for distributed systems and networks. In Proceedings of the ACM USENIX Symposium on Operating Systems Design and Implementation (OSDI). Google Scholar
Digital Library
- Zhang, X. and Schopf, J. 2004. Performance Analysis of the Globus toolkit monitoring and discovery service, MDS2. In Proceedings of the International Workshop on Middleware Performance (MP).Google Scholar
Index Terms
Design and implementation trade-offs for wide-area resource discovery
Recommendations
Design and implementation tradeoffs for wide-area resource discovery
HPDC '05: Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International SymposiumThis paper describes the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired resources as a topology of interconnected ...
A Cross-Transmission Protocol Architecture Concept for Resource Discovery in P2P Overlay Networks
WCSE '09: Proceedings of the 2009 WRI World Congress on Software Engineering - Volume 03In this paper, we propose a cross-transmission protocol architecture for resource discovery in P2P overlay networks, where each client in the network can query and get the resource identifiers for other P2P overlay networks (like Infohash in Bittorrent ...
On the Performance of Flooding-Based Resource Discovery
We consider flooding-based resource discovery in distributed systems. With flooding, a node searching for a resource contacts its neighbors in the network, which in turn contact their own neighbors and so on until a node possessing the requested ...






Comments