Abstract
Continuous processing of a streaming graph maintains an approximate result of the iterative computation on a recent version of the graph. Upon a user query, the accurate result on the current graph can be quickly computed by feeding the approximate results to the iterative computation --- a form of incremental computation that corrects the (small amount of) error in the approximate result. Despite the effectiveness of this approach in processing growing graphs, it is generally not applicable when edge deletions are present --- existing approximations can lead to either incorrect results (e.g., monotonic computations terminate at an incorrect minima/maxima) or poor performance (e.g., with approximations, convergence takes longer than performing the computation from scratch).
This paper presents KickStarter, a runtime technique that can trim the approximate values for a subset of vertices impacted by the deleted edges. The trimmed approximation is both safe and profitable, enabling the computation to produce correct results and converge quickly. KickStarter works for a class of monotonic graph algorithms and can be readily incorporated in any existing streaming graph system. Our experiments with four streaming algorithms on five large graphs demonstrate that trimming not only produces correct results but also accelerates these algorithms by 8.5--23.7x.
- D.J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, et al. The design of the borealis stream processing engine. In CIDR, volume 5, pages 277--289, 2005.Google Scholar
- R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, A. Reznichenko, D. Ryabkov, M. Singh, and S. Venkataraman. Photon: Fault-tolerant and scalable joining of continuous data streams. In SIGMOD, pages 577--588, 2013.Google Scholar
Digital Library
- L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: Membership, growth, and evolution. In KDD, pages 44--54, 2006.Google Scholar
Digital Library
- H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tatbul, R. Tibbetts, and S. Zdonik. Retrospective on aurora. The VLDB Journal, 13(4):370--383, 2004. Google Scholar
Digital Library
- P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, pages 595--601, 2004.Google Scholar
Digital Library
- M. Cha, H. Haddadi, F. Benevenuto, and P.K. Gummadi. Measuring user influence in twitter: The million follower fallacy. ICWSM, 10(10--17):30, 2010.Google Scholar
- R. Chen, J. Shi, Y. Chen, and H. Chen. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In EuroSys, pages 1:1--1:15, 2015.Google Scholar
Digital Library
- R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012.Google Scholar
Digital Library
- U. Demiryurek, B. Pan, F. Banaei-Kashani, and C. Shahabi. Towards modeling the traffic data on road networks. In International Workshop on Computational Transportation Science, pages 13--18, 2009. Google Scholar
Digital Library
- D. Ediger, J. Riedy, D.A. Bader, and H. Meyerhenke. Tracking structure of streaming social networks. In IEEE IPDPS Workshops and Phd Forum, pages 1691--1699, May 2011. Google Scholar
Digital Library
- D. Ediger, K. Jiang, J. Riedy, and D.A. Bader. Massive streaming data analytics: A case study with clustering coefficients, In IEEE IPDPSW, pages 1--10, 2010.Google Scholar
Cross Ref
- D. Ediger, R. Mccoll, J. Riedy, and D.A. Bader. Stinger: High performance data structure for streaming graphs, In HPEC, Sept. 2012.Google Scholar
Cross Ref
- Friendster network dataset, 2015.Google Scholar
- J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.Google Scholar
Digital Library
- STREAM Group et al. Stream: The stanford stream data manager. IEEE Data Engineering Bulletin, http://www-db. stanford. edu/stream, 2(003), 2003.Google Scholar
- W. Han, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V. Prabhakaran, W. Chen, and E. Chen. Chronos: A graph engine for temporal graph analysis. In EuroSys, pages 1:1--1:14, 2014.Google Scholar
- M.R. Henzinger, V. King, and T. Warnow. Constructing a tree from homeomorphic subtrees, with applications to computational evolutionary biology. Algorithmica, 24(1):1--13, 1999. Google Scholar
Cross Ref
- A.P. Iyer, L.E. Li, T. Das, and I. Stoica. Time-evolving graph processing at scale. In International Workshop on Graph Data Management Experiences and Systems, pages 5:1--5:6, 2016. Google Scholar
Digital Library
- E. Kanoulas, Y. Du, T. Xia, and D. Zhang. Finding fastest paths on a road network with speed patterns. In ICDE, page 10, April 2006. Google Scholar
Digital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google Scholar
Digital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J.M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, April 2012. Google Scholar
Digital Library
- G. Malewicz, M.H. Austern, A.J. C. Bik, J.C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, and Google Inc. Pregel: A system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.Google Scholar
Digital Library
- D.G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In SOSP, pages 439--455, 2013. Google Scholar
Digital Library
- F. Reiss, K. Stockinger, K. Wu, A. Shoshani, and J.M. Hellerstein. Enabling real-time querying of live and historical stream data. In SSDM, page 28, 2007. Google Scholar
Digital Library
- C. Ren, E. Lo, B. Kao, X. Zhu, and R. Cheng. On querying historical evolving graph sequences, In Proc. VLDB Endow., Vo. 4, No. 11, 2011.Google Scholar
Digital Library
- J. Riedy and H. Meyerhenke. Scalable algorithms for analysis of massive, streaming graphs, In SIAM Parallel Processing for Scientific Computing, 2012.Google Scholar
- L. Roditty and U. Zwick. A fully dynamic reachability algorithm for directed graphs with an almost linear update time. SIAM Journal on Computing, 45(3):712--733, 2016. Google Scholar
Digital Library
- A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: Scale-out graph processing from secondary storage. In SOSP, pages 410--424, 2015.Google Scholar
Digital Library
- A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric graph processing using streaming partitions. In SOSP, pages 472--488, 2013.Google Scholar
Digital Library
- P. Roy, A. Khan, and G. Alonso. Augmented sketch: Faster and more accurate stream processing. In SIGMOD, pages 1449--1463, 2016.Google Scholar
Digital Library
- S. Salihoglu and J. Widom. GPS: A graph processing system. In SSDBM, pages 22:1--22:12, 2013. Google Scholar
Digital Library
- X. Shi, B. Cui, Y. Shao, and Y. Tong. Tornado: A system for real-time iterative analysis over evolving data. In SIGMOD, pages 417--430, 2016. Google Scholar
Digital Library
- Y. Shiloach and S. Even. An on-line edge-deletion problem. Journal of the ACM, 28(1):1--4, 1981. Google Scholar
Digital Library
- J. Shun and G.E. Blelloch. Ligra: A lightweight graph processing framework for shared memory. In PPoPP, pages 135--146, 2013. Google Scholar
Digital Library
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In SIGKDD, pages 1222--1230, 2012. Google Scholar
Digital Library
- T. Suzumura, S. Nishii, and M. Ganse. Towards large-scale graph stream processing platform. In WWW Companion, pages 1321--1326, 2014. Google Scholar
Digital Library
- A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J.M Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, et al. Storm@ twitter. In SIGMOD, pages 147--156, 2014.Google Scholar
- C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In WSDM, pages 333--342, 2014.Google Scholar
Digital Library
- K. Vora, G. Xu, and R. Gupta. Load the edges you need: A generic I/O optimization for disk-based graph processing. In USENIX ATC, pages 507--522, 2014.Google Scholar
- K. Vora, R. Gupta, and G. Xu. Synergistic Analysis of Evolving Graphs. In TACO, Vol. 13, No. 4, pages 32:1--32:27, 2016. Google Scholar
Digital Library
- K. Vora, S-C. Koduru, and R. Gupta. ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms using a Relaxed Consistency based DSM. In OOPSLA, pages 861--878, 2014.Google Scholar
Digital Library
- K. Wang, A. Hussain, Z. Zuo, G. Xu, and A. A. Sani. Graspan: A single-machine disk-based graph system forinterprocedural static analyses of large-scale systems code. In ASPLOS, 2017.Google Scholar
Digital Library
- K. Wang, G. Xu, Z. Su, and Y. D. Liu. GraphQ: Graph query processing with abstraction refinement. In USENIX ATC, pages 387--401, 2015.Google Scholar
- M. Wu, F. Yang, J. Xue, W. Xiao, Y. Miao, L. Wei, H. Lin, Y. Dai, and L. Zhou. Gram: Scaling graph computation to the trillions. In SoCC, pages 408--421, 2015.Google Scholar
Digital Library
- M. Yuan, K-L. Wu, G. Jacques-Silva, and Y. Lu. Efficient processing of streaming graphs for evolution-aware clustering. In CIKM, pages 319--328, 2013. Google Scholar
Digital Library
- M. Zaharia, M. Chowdhury, M.J Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud, 2010.Google Scholar
Digital Library
- M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In HotCloud, 2012.Google Scholar
Digital Library
- E. Zeitler and T. Risch. Massive scale-out of expensive continuous queries, In Proceedings of the VLDB Endowment, Vol. 4, No. 11, 2011.Google Scholar
- X. Zhang, N. Gupta, and R. Gupta. Pruning dynamic slices with confidence. In PLDI, pages 169--180, 2006. Google Scholar
Digital Library
- X. Zhang, R. Gupta, and Y. Zhang. Precise dynamic slicing algorithms. In ICSE, pages 319--329, 2003.Google Scholar
Digital Library
Recommendations
KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsContinuous processing of a streaming graph maintains an approximate result of the iterative computation on a recent version of the graph. Upon a user query, the accurate result on the current graph can be quickly computed by feeding the approximate ...
KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations
Asplos'17Continuous processing of a streaming graph maintains an approximate result of the iterative computation on a recent version of the graph. Upon a user query, the accurate result on the current graph can be quickly computed by feeding the approximate ...
sGrapp: Butterfly Approximation in Streaming Graphs
We study the fundamental problem of butterfly (i.e., (2,2)-bicliques) counting in bipartite streaming graphs. Similar to triangles in unipartite graphs, enumerating butterflies is crucial in understanding the structure of bipartite graphs. This benefits ...







Comments