Abstract
Ordered (key-value) maps are an important and widely-used data type for large-scale data processing frameworks. Beyond simple search, insertion and deletion, more advanced operations such as range extraction, filtering, and bulk updates form a critical part of these frameworks.
We describe an interface for ordered maps that is augmented to support fast range queries and sums, and introduce a parallel and concurrent library called PAM (Parallel Augmented Maps) that implements the interface. The interface includes a wide variety of functions on augmented maps ranging from basic insertion and deletion to more interesting functions such as union, intersection, filtering, extracting ranges, splitting, and range-sums. We describe algorithms for these functions that are efficient both in theory and practice.
As examples of the use of the interface and the performance of PAM we apply the library to four applications: simple range sums, interval trees, 2D range trees, and ranked word index searching. The interface greatly simplifies the implementation of these data structures over direct implementations. Sequentially the code achieves performance that matches or exceeds existing libraries designed specially for a single application, and in parallel our implementation gets speedups ranging from 40 to 90 on 72 cores with 2-way hyperthreading.
Supplemental Material
Available for Download
PAM (Parallel Augmented Maps) is a parallel C++ library implementing the interface for augmented maps [1]. It is designed for maintaining an ordered map data structure while efficiently answering range-based and related queries. In the experiments we use the interface in four examples: augmented-sums, interval-queries, 2d range-queries, and an inverted index. The released code includes the examples and scripts for running the specific experiments reported in the paper. It is also designed so it is easy to try in many other scenarios (different sizes, different numbers of cores, and other operations described in the paper, but not reported in the experiments).
- Georgy Adelson-Velsky and E. M. Landis. 1962. An Algorithm for the Organization of Information. Proc. of the USSR Academy of Sciences 145 (1962), 263--266. In Russian, English translation by Myron J. Ricci in Soviet Doklady, 3:1259--1263, 1962.Google Scholar
- Pankaj K. Agarwal, Kyle Fox, Kamesh Munagala, and Abhinandan Nath. 2016. Parallel Algorithms for Constructing Range and Nearest-Neighbor Searching Data Structures. In Proc. ACM SIGMOD-SIGACT-SIGAI Symp. on Principles of Database Systems (PODS). 429--440. Google Scholar
Digital Library
- Yaroslav Akhremtsev and Peter Sanders. 2015. Fast Parallel Operations on Search Trees. arXiv preprint arXiv:1510.05433 (2015).Google Scholar
- Ahmed M. Aly, Hazem Elmeleegy, Yan Qi, and Walid Aref. 2016. Kangaroo: Workload-Aware Processing of Range Data and Range Queries in Hadoop. In Proc. ACM International Conference on Web Search and Data Mining (WSDM). ACM, New York, NY, USA, 397--406. Google Scholar
Digital Library
- Flurry analytics. -. (-). https://developer.yahoo.com/flurry/docs/analytics/Google Scholar
- Antonio Barbuzzi, Pietro Michiardi, Ernst Biersack, and Gennaro Boggia. 2010. Parallel Bulk Insertion for Large-scale Analytics Applications. In Proc. International Workshop on Large Scale Distributed Systems and Middleware (LADIS). Google Scholar
Digital Library
- Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Guy Golan-Gueta, Eshcar Hillel, Idit Keidar, and Moshe Sulamy. 2017. KiWi: A Key-Value Map for Scalable Real-Time Analytics. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP). 357--369. Google Scholar
Digital Library
- Rudolf Bayer. 1972. Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms. Acta Informatica 1 (1972), 290--306. Google Scholar
Digital Library
- Jon Louis Bentley. 1979. Decomposable searching problems. Information processing letters 8, 5 (1979), 244--251.Google Scholar
- Juan Besa and Yadran Eterovic. 2013. A concurrent red-black tree. J. Parallel Distrib. Comput. 73, 4 (2013), 434--449. Google Scholar
Digital Library
- Guy E Blelloch, Daniel Ferizovic, and Yihan Sun. 2016. Just Join for Parallel Ordered Sets. In Proc. of the ACM Symp. on Parallelism in Algorithms and Architectures (SPAA). 253--264. Google Scholar
Digital Library
- Guy E. Blelloch and Margaret Reid-Miller. 1998. Fast Set Operations Using Treaps. In Proc. ACM Symp. on Parallel Algorithms and Architectures (SPAA). 16--26. Google Scholar
Digital Library
- Nathan Grasso Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun. 2010. A practical concurrent binary search tree. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 257--268. Google Scholar
Digital Library
- Trevor Brown and Hillel Avni. 2012. Range Queries in Non-blocking k-ary Search Trees. Springer Berlin Heidelberg, Berlin, Heidelberg, 31--45.Google Scholar
- Trevor Brown, Faith Ellen, and Eric Ruppert. 2014. A general technique for non-blocking trees. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 329--342. Google Scholar
Digital Library
- Chee Yong Chan and Yannis E. Ioannidis. 1999. Hierarchical Prefix Cubes for Range-Sum Queries. In Proc. International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 675--686. Google Scholar
Digital Library
- Rosetta Code.-. Inverted index. (-). https://rosettacode.org/wiki/Inverted_indexGoogle Scholar
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2001. Introduction to Algorithms (second edition). MIT Press and McGraw-Hill. Google Scholar
Digital Library
- Costin S. 2012. C# based interval tree. https://code.google.com/archive/p/intervaltree. (2012).Google Scholar
- Bolin Ding and Arnd Christian König. 2011. Fast set intersection in memory. Proc. of the VLDB Endowment 4, 4 (2011), 255--266. Google Scholar
Digital Library
- Dana Drachsler, Martin T. Vechev, and Eran Yahav. 2014. Practical concurrent binary search trees via logical ordering. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 343--356. Google Scholar
Digital Library
- James R Driscoll, Neil Sarnak, Daniel Dominic Sleator, and Robert Endre Tarjan. 1986. Making data structures persistent. In Proc. ACM Symp. on Theory of computing (STOC). 109--121. Google Scholar
Digital Library
- Herbert Edelsbrunner. 1980. Dynamic Rectangle Intersrction Searching. Technical Report Institute for Technical Processing Report 47. Technical University of Graz, Austria.Google Scholar
- Stephan Erb, Moritz Kobitzsch, and Peter Sanders. 2014. Parallel bi-objective shortest paths using weight-balanced b-trees with bulk updates. In Experimental Algorithms. Springer, 111--122. Google Scholar
Digital Library
- Preparata Franco and Michael Ian Preparata Shamos. 1985. Computational geometry: an introduction. Springer Science & Business Media. Google Scholar
Digital Library
- Leonor Frias and Johannes Singler. 2007. Parallelization of Bulk Operations for STL Dictionaries. In Euro-Par 2007 Workshops: Parallel Processing, HPPC 2007, UNICORE Summit 2007, and VHPC 2007. 49--58. Google Scholar
Digital Library
- Hong Gao and Jian-Zhong Li. 2005. Parallel Data Cube Storage Structure for Range Sum Queries and Dynamic Updates. J. Comput. Sci. Technol. 20, 3 (May 2005), 345--356. Google Scholar
Digital Library
- Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1, 1 (01 Mar 1997), 29--53. Google Scholar
Digital Library
- Antonin Guttman. 1984. R-trees: A Dynamic Index Structure for Spatial Searching. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, USA, 47--57. Google Scholar
Digital Library
- Chaim-Leib Halbet and Konstantin Tretyakov. 2015. Python based interval tree. https://github.com/chaimleib/intervaltree. (2015).Google Scholar
- Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, and Ramakrishnan Srikant. 1997. Range Queries in OLAP Data Cubes. In Proc. ACM International Conference on Management of Data (SIGMOD). ACM, New York, NY, USA, 73--88. Google Scholar
Digital Library
- Paul Hudak. 1986. A Semantic Model of Reference Counting and Its Abstraction (Detailed Summary). In Proc. of the 1986 ACM Conference on LISP and Functional Programming (LFP '86). 351--363. Google Scholar
Digital Library
- Hiroshi Inoue, Moriyoshi Ohara, and Kenjiro Taura. 2014. Faster set intersection with SIMD instructions by reducing branch mispredictions. Proc. of the VLDB Endowment 8, 3 (2014), 293--304. Google Scholar
Digital Library
- Jinwoong Kim, Sul-Gi Kim, and Beomseok Nam. 2013. Parallel multi-dimensional range query processing with R-trees on GPU. J. Parallel and Distrib. Comput. 73, 8 (2013), 1195 -- 1207. Google Scholar
Digital Library
- Hans-Peter Kriegel, Marco Pötke, and Thomas Seidl. 2000. Managing Intervals Efficiently in Object-Relational Databases. In Proc. International Conference on Very Large Data Bases (VLDB). 407--418. Google Scholar
Digital Library
- H. T. Kung and Philip L. Lehman. 1980. Concurrent Manipulation of Binary Search Trees. ACM Trans. Database Syst. 5, 3 (1980), 354--382. Google Scholar
Digital Library
- Kim S. Larsen. 2000. AVL Trees with Relaxed Balance. J. Comput. Syst. Sci. 61, 3 (2000), 508--522. Google Scholar
Digital Library
- Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on. IEEE, 38--49. Google Scholar
Digital Library
- Charles E. Leiserson. 2010. The Cilk++ concurrency platform. The Journal of Supercomputing 51, 3 (2010), 244--257. Google Scholar
Digital Library
- Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for New Hardware Platforms. In Proc. IEEE International Conference on Data Engineering (ICDE). 302--313. Google Scholar
Digital Library
- LevelDB. -. (-). leveldb.orgGoogle Scholar
- Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache craftiness for fast multicore key-value storage. In Proc. of the 7th ACM European Conference on Computer Systems. ACM, 183--196. Google Scholar
Digital Library
- Mark De Berg Mark, Mark Van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. 2000. Computational Geometry. Springer.Google Scholar
- Edward M. McCreight. 1985. Priority Search Trees. SIAM J. Comput. 14, 2 (1985), 257--276.Google Scholar
Digital Library
- David R Musser, Gillmer J Derge, and Atul Saini. 2009. STL tutorial and reference guide: C+ + programming with the standard template library. Addison-Wesley Professional. Google Scholar
Digital Library
- Aravind Natarajan and Neeraj Mittal. 2014. Fast concurrent lock-free binary search trees. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 317--328. Google Scholar
Digital Library
- Gabriele Neyer. 2017. dD Range and Segment Trees. In CGAL User and Reference Manual (4.10 ed.). CGAL Editorial Board. http://doc.cgal.org/4.10/Manual/packages.html#PkgRangeSegmentTreesDSummaryGoogle Scholar
- Jürg Nievergelt and Edward M. Reingold. 1973. Binary Search Trees of Bounded Balance. SIAM J. Comput. 2, 1 (1973), 33--43.Google Scholar
Digital Library
- Chris Okasaki. 1999. Purely functional data structures. Cambridge University Press. Google Scholar
Digital Library
- Oracle. -. Oracle NoSQL. (-). https://www.oracle.com/database/nosql/index.htmlGoogle Scholar
- Mark H Overmars. 1996. Designing the computational geometry algorithms library CGAL. In Applied computational geometry towards geometric engineering. Springer, 53--58. Google Scholar
Digital Library
- Heejin Park and Kunsoo Park. 2001. Parallel algorithms for red-black trees. Theoretical Computer Science 262, 1 (2001), 415--435. Google Scholar
Digital Library
- Wolfgang J. Paul, Uzi Vishkin, and Hubert Wagener. 1983. Parallel Dictionaries in 2-3 Trees. In Proc. International Colloq. on Automata, Languages and Programming (ICALP). 597--609. Google Scholar
Digital Library
- Chuck Pheatt. 2008. Intel® threading building blocks. Journal of Computing Sciences in Colleges 23, 4 (2008), 298--298. Google Scholar
Digital Library
- Aleksandar Prokopec, Nathan Grasso Bronson, Phil Bagwell, and Martin Odersky. 2012. Concurrent Tries with Efficient Non-blocking Snapshots. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP). Google Scholar
Digital Library
- Anand Rajaraman and Jeffrey David Ullman. 2011. Mining of Massive Datasets:. Cambridge University Press. Google Scholar
Digital Library
- RocksDB. -. (-). rockdb.orgGoogle Scholar
- HananSamet. 1990. The design and analysis of spatial data structures. Vol. 199. Addison-Wesley Reading, MA. Google Scholar
Digital Library
- Raimund Seidel and Celcilia R. Aragon. 1996. Randomized Search Trees. Algorithmica 16 (1996), 464--497.Google Scholar
Cross Ref
- Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. 2013. F1: A Distributed SQL Database That Scales. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1068--1079. Google Scholar
Digital Library
- Johannes Singler, Peter Sanders, and Felix Putze. 2007. MCSTL: The Multi-core Standard Template Library. In Proc. European Conf. on Parallel Processing (Euro-Par). 682--694. Google Scholar
Digital Library
- TBB -. Intel® Threading Building Blocks 2.0 for Open Source. (-). https://www.threadingbuildingblocks.org/.Google Scholar
- Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G. Andersen. {n. d.}. Building A Bw-Tree Takes More Than Just Buzz Words {Experiments and Analyses} (Unpublished). ({n. d.}).Google Scholar
- Wikimedia Foundation. 2016. Wikepedia:Database download. https://en.wikipedia.org/wiki/Wikipedia:Database_download. (2016).Google Scholar
- Huanchen Zhang, David G Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In Proc. of the 2016 International Conference on Management of Data. ACM, 1567--1581. Google Scholar
Digital Library
- Justin Zobel and Alistair Moffat. 2006. Inverted Files for Text Search Engines. ACM Comput. Surv. 38, 2, Article 6 (July 2006). Google Scholar
Digital Library
Index Terms
PAM: parallel augmented maps
Recommendations
PAM: parallel augmented maps
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingOrdered (key-value) maps are an important and widely-used data type for large-scale data processing frameworks. Beyond simple search, insertion and deletion, more advanced operations such as range extraction, filtering, and bulk updates form a critical ...
A system-theory approach to decompose CPM signals into PAM waveforms
In 1986, Laurent showed that binary continuous phase modulated (CPM) signals can be decomposed into pulse amplitude modulated (PAM) waveforms. This result, later extended to multilevel CPM signals by Mengali and Morelli, has attracted much attention, ...
Decomposition of M-ary CPM signals into PAM waveforms
It is widely known that minimum shift keying (MSK) may be seen as a PAM signaling scheme and that the same is true, albeit approximately, with MSK-like modulations. It is also known (perhaps not so widely) that any binary continuous phase-modulated (CPM)...







Comments