Abstract
Time-evolving web and social network graphs are modeled as a set of pages/individuals (nodes) and their arcs (links/relationships) that change over time. Due to their popularity, they have become increasingly massive in terms of their number of nodes, arcs, and lifetimes. However, these graphs are extremely sparse throughout their lifetimes. For example, it is estimated that Facebook has over a billion vertices, yet at any point in time, it has far less than 0.001% of all possible relationships. The space required to store these large sparse graphs may not fit in most main memories using underlying representations such as a series of adjacency matrices or adjacency lists.
We propose building a compressed data structure that has a compressed binary tree corresponding to each row of each adjacency matrix of the time-evolving graph. We do not explicitly construct the adjacency matrix, and our algorithms take the time-evolving arc list representation as input for its construction. Our compressed structure allows for directed and undirected graphs, faster arc and neighborhood queries, as well as the ability for arcs and frames to be added and removed directly from the compressed structure (streaming operations). We use publicly available network data sets such as Flickr, Yahoo!, and Wikipedia in our experiments and show that our new technique performs as well or better than our benchmarks on all datasets in terms of compression size and other vital metrics.
- [1] Jérôme Kunegis. KONECT – The Koblenz Network Collection. 2020. Point contact Wikipedia articles edited. http://konect.uni-koblenz.de/.Google Scholar
- [2] The Max Planck Institute for Software Systems. 2020. User-to-user link crawled on Flickr Social Network. http://socialnetworks.mpi-sws.org/data-www2009.html.Google Scholar
- [3] Yahoo! Research. 2020. Yahoo! Network Flows Data, version 1.0. http://webscope.sandbox.yahoo.com/catalog.php?datatype=g.Google Scholar
- [4] . 2014. Interleaved K2-Tree: Indexing and navigating ternary relations. In Data Compression Conference. 342–351.
DOI: https://doi.org/10.1109/DCC.2014.56Google Scholar - [5] . 2013. Compact data structures for temporal graphs. In Data Compression Conference. 477–477.
DOI: https://doi.org/10.1109/DCC.2013.59Google Scholar - [6] . 2004. The webgraph framework I: Compression techniques. In 13th International Conference on World Wide Web (WWW’04). ACM, New York, NY, 595–602.
DOI: https://doi.org/10.1145/988672.988752Google ScholarDigital Library
- [7] . 2014. A compressed suffix-array strategy for temporal-graph indexing. In International Symposium on String Processing and Information Retrieval.Google Scholar
- [8] . 2014. Compact representation of web graphs with extended functionality. Inf. Syst. 39 (2014), 152–174.Google Scholar
Digital Library
- [9] . 1999. Membership in constant time and almost-minimum space. SIAM J. Comput. 28, 5 (1999). Society for Industrial and Applied Mathematics.
DOI: https://doi.org/10.1137/S0097539795294165Google Scholar - [10] . 2002. Computing Shortest, Fastest, and Foremost Journeys in Dynamic Networks.
Technical Report RR-4589. INRIA. Retrieved from https://hal.inria.fr/inria-00071996.Google Scholar - [11] . 2015. Data structures for temporal graphs based on compact sequence representations. Inf. Syst. 51, C (
July 2015), 1–26.DOI: https://doi.org/10.1016/j.is.2015.02.002Google ScholarDigital Library
- [12] . 2016. Compressed kd-tree for temporal graphs. Knowl. Inf. Syst. 49, 2 (
Nov. 2016), 553–595.DOI: https://doi.org/10.1007/s10115-015-0908-6Google ScholarDigital Library
- [13] . 2002. A Note on Models, Algorithms, and Data Structures for Dynamic Communication Networks.
Research Report RR-4403. INRIA. Retrieved from https://hal.inria.fr/inria-00072185.Google Scholar - [14] . 2003. High-order entropy-compressed text indexes. In 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’03). Society for Industrial and Applied Mathematics, Philadelphia, PA, 841–850. Retrieved from http://dl.acm.org/citation.cfm?id=644108.644250.Google Scholar
- [15] . 2013. Efficient snapshot retrieval over historical graph data. In IEEE 29th International Conference on Data Engineering (ICDE). 997–1008.Google Scholar
- [16] . 2015. The G* graph database: Efficiently managing large distributed dynamic graphs. Distrib. Parallel Datab. 33, 4 (
Dec. 2015), 479–514.DOI: https://doi.org/10.1007/s10619-014-7140-3Google ScholarDigital Library
- [17] . 2018. Realizing memory-optimized distributed graph processing. IEEE Trans. Knowl. Data Eng. 30, 4 (2018), 743–756.Google Scholar
Cross Ref
- [18] . 2014. Pushing the envelope in graph compression. CIKM’14 (
Nov. 2014), 1549–1558.DOI: https://doi.org/10.1145/2661829.2662053Google ScholarDigital Library
- [19] . 2017. Queryable compression on streaming social networks. In IEEE International Conference on Big Data (BigData’17). IEEE Computer Society, 10.
DOI: https://doi.org/10.1109/BigData.2017.8258020Google ScholarCross Ref
- [20] . 2018. Queryable compression on time-evolving social networks with streaming. In IEEE International Conference on Big Data (BigData’18). IEEE Computer Society, 10.
DOI: https://doi.org/10.1109/BigData.2018.8622386Google ScholarCross Ref
- [21] . 2013. Graph metrics for temporal networks. CoRR abs/1306.0493 (2013).Google Scholar
- [22] . 2011. On querying historical evolving graph sequences. PVLDB 4 (2011), 726–737.Google Scholar
Digital Library
Index Terms
Queryable Compression on Time-evolving Web and Social Networks with Streaming
Recommendations
On compressing weighted time-evolving graphs
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementExisting graph compression techniquesmostly focus on static graphs. However for many practical graphs such as social networks the edge weights frequently change over time. This phenomenon raises the question of how to compress dynamic graphs while ...
Vulnerability of super edge-connected networks
When the underlying topology of an interconnection network is modeled by a connected graph G, the connectivity of G is an important measurement for reliability and fault tolerance of the network. For a given integer h>=0, a subset F of edges in a ...
Algorithms for the minimal cutsets enumeration of networks by graph search and branch addition
LCN '00: Proceedings of the 25th Annual IEEE Conference on Local Computer NetworksThis paper presents effective algorithms for enumerating the minimal cutsets of networks. After a graph is modeled after a network, first, by a graph method the spanning tree of the graph and binary tree whose event is complement to it are evaluated. ...






Comments