ABSTRACT
We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worst-case linear-time discrimination functions (discriminators) can be defined generically, by (co-)induction on an expressive language of order denotations. The generic definition yields discriminators that generalize both distributive sorting and multiset discrimination. The generic discriminator can be coded compactly using list comprehensions, with order denotations specified using Generalized Algebraic Data Types (GADTs). A GADT-free combinator formulation of discriminators is also given.
We give some examples of the uses of discriminators, including a new most-significant-digit lexicographic sorting algorithm.
Discriminators generalize binary comparison functions: They operate on n arguments at a time, but do not expose more information than the underlying equivalence, respectively ordering relation on the arguments. We argue that primitive types with equality (such as references in ML) and ordered types (such as the machine integer type), should expose their equality, respectively standard ordering relation, as discriminators: Having only a binary equality test on a type requires Θ(n2) time to find all the occurrences of an element in a list of length n, for each element in the list, even if the equality test takes only constant time. A discriminator accomplishes this in linear time. Likewise, having only a (constant-time) comparison function requires Θ(n log n) time to sort a list of n elements. A discriminator can do this in linear time.
Supplemental Material
Available for Download
Supplemental material for: Generic discrimination: sorting and paritioning unshared data in linear time
- M. Ajtai, J. Komlos, and E. Szemeredi. Sorting in c log n parallel steps. Combinatorica, 3:1--19, 1983. Google Scholar
Digital Library
- Thomas Ambus. Multiset discrimination for internal and external data management. Master's thesis, DIKU, University of Copenhagen, July 2004. http://plan-x.org/projects/msd/msd.pdf.Google Scholar
- A. Andersson and S. Nilsson. A new efficient radix sort. In Proc. 35th Anniual IEEE Symposium on Foundations of Computer Science (FOCS), pages 714--721, 1994. Google Scholar
Digital Library
- Arne Andersson and Stefan Nilsson. Implementing radixsort. J. Exp. Algorithmics, 3:7, 1998. ISSN 1084-6654. doi: http://doi.acm.org/10.1145/297096.297136. Google Scholar
Digital Library
- Arne Andersson, Torben Hagerup, Stefan Nilsson, and Rajeev Raman. Sorting in linear time? Journal of Computer and System Sciences (JCSS), 57(1):74--93, August 1998. Google Scholar
Digital Library
- K. E. Batcher. Sorting networks and their applications. In Proc. AFIPS Spring Joint Computer Conference, volume 32, pages 307--314, 1968.Google Scholar
Digital Library
- Jon Bentley. Aha! Algorithms. Communications of the ACM, 26(9):623--627, September 1983. Programming Pearls. Google Scholar
Digital Library
- J. Cai and R. Paige. Look ma, no hashing, and no arrays neither. In Jan., editor, Proc. 18th Annual ACM Symp. on Principles of Programming Languages (POPL), Orlando, Florida, pages 143--154, 1991. Google Scholar
Digital Library
- Jiazhen Cai and Robert Paige. Using multiset discrimination to solve language processing problems without hashing. Theoretical Computer Science (TCS), 145(1--2), July 1995. Google Scholar
Digital Library
- Gianni Franceschini, S. Muthukrishnan, and Mihai Patrascu. Radix sorting with no extra space. In Proc. European Symposium on Algorithms (ESA), volume 4698 of Lecture Notes in Computer Science (LNCS), pages 194--205. Springer, 2007. doi: 10.1007/978-3-540-75520-3. Google Scholar
Digital Library
- M.L. Fredman and D.E. Willard. Surpassing the information-theoretic bound with fusion trees. Journal of Computer and System Sciences (JCSS), 47:424--436, 1993. Google Scholar
Digital Library
- Glasgow Haskell. The Glasgow Haskell Compiler. http://www.haskell.org/ghc/, 2005.Google Scholar
- Yijie Han and Mikkel Thorup. Integer sorting in o(n√log log n expected time and linear space. In Proceedings of the 43d Annual IEEE Sympositum on Foundations of Computer Science (FOCS), pages 135--144. IEEE Computer Society, 2002. Google Scholar
Digital Library
- Fritz Henglein. Multiset discrimination. Unpublished manuscript. See http://plan-x.org/msd/multiset-discrimination.pdf, September 2003.Google Scholar
- Fritz Henglein. A language for total preorders. Unfinished manuscript, March 2008.Google Scholar
- Fritz Henglein and Jesper Jørgensen. Formally optimal boxing. In Proc. 21st ACM Symp. on Principles of Programming Languages (POPL), Portland, Oregon, P.O.Box 64145, Baltimore, MD 21264, Jan. 1994. ACM, ACM Press. Google Scholar
Digital Library
- Ralf Hinze. Generalizing generalized tries. Journal of Functional Programming, 10(4):327--351, 2000. Google Scholar
Digital Library
- C. A. R. Hoare. Algorithm 63: partition. Commun. ACM, 4(7):321, 1961. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/366622.366642. Google Scholar
Digital Library
- Johan Jeuring and Patrik Jansson. Polytypic programming. In Advanced Functional Programming, Lecture Notes in Computer Science, pages 68--114. Springer-Verlag, 1996. Google Scholar
Digital Library
- Sian Jha, Jens Palsberg, Tian Zhao, and Fritz Henglein. Efficient type matching. In Olivier Danvy, Fritz Henglein, Harry Mairson, and Alberto Pettorossi, editors, Automatic Program Development-A Tribute to Robert Paige. Springer, 2008. ISBN 978-1-4020-6584-2.Google Scholar
- Donald Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison Wesley, 2nd edition, 1998. Google Scholar
Digital Library
- K. Mehlhorn. Data Structures and Algorithms 1: Sorting and Searching, volume I of EATCS Monographs on Theoretical Computer Science. Springer-Verlag, 1984. Google Scholar
Digital Library
- R. Paige. Optimal translation of user input in dynamically typed languages. Draft, July 1991.Google Scholar
- Robert Paige. Efficient translation of external input in a dynamically typed language. In Proc. 13th World Computer Congress. Elsevier, February 1994.Google Scholar
- Robert Paige and Robert E. Tarjan. Three partition refinement algorithms. SIAM Journal of Computing, 16(6):973--989, December 1987. Google Scholar
Digital Library
- Robert Paige and Zhe Yang. High level reading and data structure compilation. In Proc. 24th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages (POPL), Paris, France, pages 456--469, http://www.acm.org, January 1997. ACM, ACM Press. Google Scholar
Digital Library
- D. L. Shell. A high-speed sorting procedure. Communications of the ACM, 2(7), 1959. Google Scholar
Digital Library
- Ranjan Sinha and Justin Zobel. Efficient trie-based sorting of large sets of strings. In Michael Oudshoorn, editor, Proc. 26th Australasian Computer Science Conference (ACSC), Adelaide, Australia, volume 16 of Conferences in Research and Practice in Information Technology, 2003. Google Scholar
Digital Library
- J. W. J. Williams. Algorithm 232 - heapsort. Communications of the ACM, 7(6):347--348, 1964.Google Scholar
- Yoav Zibin, Joseph Gil, and Jeffrey Considine. Efficient algorithms for isomorphisms of simple types. In Proc. 2003 ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 160--171. ACM, ACM Press, January 2003. SIGPLAN Notices, Vol. 38, No. 1. Google Scholar
Digital Library
Index Terms
Generic discrimination: sorting and paritioning unshared data in linear time
Recommendations
Generic discrimination: sorting and paritioning unshared data in linear time
ICFP '08We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worst-case linear-time discrimination functions (discriminators) can be defined generically, by (co-)induction on an expressive language of ...
Generic algorithms for factoring strings
Information Theory, Combinatorics, and Search TheoryIn this paper we describe algorithms for factoring words over sets of strings known as circ-UMFFs, generalizations of the well-known Lyndon words based on lexorder, whose properties were first studied in 1958 by Chen, Fox and Lyndon. In 1983 Duval ...
Generic multiset programming with discrimination-based joins and symbolic Cartesian products
This paper presents GMP, a library for generic, SQL-style programming with multisets. It generalizes the querying core of SQL in a number of ways: Multisets may contain elements of arbitrary first-order data types, including references (pointers), ...







Comments