Abstract
We present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework. Automating such a translation is challenging: imperative updates must be translated into a functional MapReduce form in a manner that both preserves semantics and enables parallelism. Our approach works by first translating the input code into a functional representation, with loops succinctly represented by fold operations. Then, guided by rewrite rules, our system searches a space of equivalent programs for an effective MapReduce implementation. The rules include a novel technique for handling irregular loop-carried dependencies using group-by operations to enable greater parallelism. We have implemented our technique in a tool called Mold. It translates sequential Java code into code targeting the Apache Spark runtime. We evaluated Mold on several real-world kernels and found that in most cases Mold generated the desired MapReduce program, even for codes with complex indirect updates.
- Apache Hadoop. http://hadoop.apache.org. Accessed on 03/05/2014.Google Scholar
- Apache Spark. https://spark.apache.org. Accessed on 03/20/2014.Google Scholar
- Breeze. http://www.scalanlp.org. Accessed on 03/20/2014.Google Scholar
- Scala Parallel Collections. http://docs.scala-lang.org/overviews/parallel-collections/overview.html. Accessed on 03/20/2014.Google Scholar
- T. J. Watson Libraries for Analysis. http://wala.sf.net. Accessed: 2013-05-20.Google Scholar
- A. W. Appel. SSA is functional programming. SIGPLAN Not., 33(4):17--20, Apr. 1998. Google Scholar
Digital Library
- R. S. Bird. Algebraic identities for program calculation. Comput. J., 32(2):122--126, Apr. 1989. Google Scholar
Digital Library
- R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--478, Sept. 1994. Google Scholar
Digital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, 2004. Google Scholar
Digital Library
- D. Dig, M. Tarce, C. Radoi, M. Minea, and R. Johnson. Relooper: Refactoring for loop parallelism in java. OOPSLA '09, pp. 793--794, 2009. Google Scholar
Digital Library
- L. Franklin, A. Gyori, J. Lahoda, and D. Dig. Lambdaficator: From imperative to functional programming through automated refactoring. ICSE '13, pp. 1287--1290, 2013. Google Scholar
Digital Library
- S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. PLDI '11, pp. 62--73, 2011. Google Scholar
Digital Library
- M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. Supercomputing '95, 1995. Google Scholar
Digital Library
- R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directed superoptimizer. PLDI '02, pp. 304--314, 2002. Google Scholar
Digital Library
- R. A. Kelsey. A correspondence between continuation passing style and static single assignment form. IR '95, pp. 13--22, 1995. Google Scholar
Digital Library
- Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak. Automatic synthesis of out-of-core algorithms. SIGMOD '13, pp. 133--144, 2013. Google Scholar
Digital Library
- K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL '98, pp. 107--120, 1998. Google Scholar
Digital Library
- R. Lämmel. Google's MapReduce programming model - revisited. Science of Computer Programming, 70(1):1--30, 2008. Google Scholar
Digital Library
- S.-w. Liao. Parallelizing user-defined and implicit reductions globally on multiprocessors. ACSAC'06, pp. 189--202, 2006. Google Scholar
Digital Library
- E. Meijer, M. Fokkinga, and R. Paterson. Functional programming with bananas, lenses, envelopes and barbed wire. FPCA '91, pp. 124--144, 1991. Google Scholar
Digital Library
- C. Nugteren and H. Corporaal. Introducing Bones: a parallelizing source-to-source compiler based on algorithmic skeletons. GPGPU-5, pp. 1--10, 2012. Google Scholar
Digital Library
- B. C. Oliveira, A. Moors, and M. Odersky. Type classes as objects and implicits. OOPSLA '10, pp. 341--360, 2010. Google Scholar
Digital Library
- N. Ramsey. Unparsing expressions with prefix and postfix operators. Software: Practice and Experience, 28(12):1327--1356, 1998. Google Scholar
Digital Library
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. HPCA '07, pp. 13--24, 2007. Google Scholar
Digital Library
- M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. SC '12, pp. 72:1--72:11, 2012. Google Scholar
Digital Library
- E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. ASPLOS '13, pp. 305--316, 2013. Google Scholar
Digital Library
- A. M. Sloane. Lightweight language processing in Kiama. GTTSE III, pp. 408--425. Springer, 2011. Google Scholar
Digital Library
- M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. PLDI '03, pp. 91--102, 2003. Google Scholar
Digital Library
- S. d. Swierstra and O. Chitil. Linear, bounded, functional pretty-printing. J. Funct. Program., 19(1):1--16, Jan. 2009. Google Scholar
Digital Library
- V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache Hadoop YARN: Yet another resource negotiator. SOCC '13, pp. 5:1--5:16, 2013. Google Scholar
Digital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. Hot-Cloud'10, pp. 10--10, 2010. Google Scholar
Digital Library
Index Terms
Translating imperative code to MapReduce
Recommendations
Translating imperative code to MapReduce
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & ApplicationsWe present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework. Automating such a translation is challenging: imperative updates must be translated into a functional MapReduce form in a manner that ...
MapReduce: Review and open challenges
The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Divide and translate: improving long distance reordering in statistical machine translation
WMT '10: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATRThis paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and reconstructs the target sentence using the clause ...







Comments