skip to main content
10.1145/1411204.1411220acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Generic discrimination: sorting and paritioning unshared data in linear time

Published:20 September 2008Publication History

ABSTRACT

We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worst-case linear-time discrimination functions (discriminators) can be defined generically, by (co-)induction on an expressive language of order denotations. The generic definition yields discriminators that generalize both distributive sorting and multiset discrimination. The generic discriminator can be coded compactly using list comprehensions, with order denotations specified using Generalized Algebraic Data Types (GADTs). A GADT-free combinator formulation of discriminators is also given.

We give some examples of the uses of discriminators, including a new most-significant-digit lexicographic sorting algorithm.

Discriminators generalize binary comparison functions: They operate on n arguments at a time, but do not expose more information than the underlying equivalence, respectively ordering relation on the arguments. We argue that primitive types with equality (such as references in ML) and ordered types (such as the machine integer type), should expose their equality, respectively standard ordering relation, as discriminators: Having only a binary equality test on a type requires Θ(n2) time to find all the occurrences of an element in a list of length n, for each element in the list, even if the equality test takes only constant time. A discriminator accomplishes this in linear time. Likewise, having only a (constant-time) comparison function requires Θ(n log n) time to sort a list of n elements. A discriminator can do this in linear time.

Skip Supplemental Material Section

Supplemental Material

Video

References

  1. M. Ajtai, J. Komlos, and E. Szemeredi. Sorting in c log n parallel steps. Combinatorica, 3:1--19, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Thomas Ambus. Multiset discrimination for internal and external data management. Master's thesis, DIKU, University of Copenhagen, July 2004. http://plan-x.org/projects/msd/msd.pdf.Google ScholarGoogle Scholar
  3. A. Andersson and S. Nilsson. A new efficient radix sort. In Proc. 35th Anniual IEEE Symposium on Foundations of Computer Science (FOCS), pages 714--721, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arne Andersson and Stefan Nilsson. Implementing radixsort. J. Exp. Algorithmics, 3:7, 1998. ISSN 1084-6654. doi: http://doi.acm.org/10.1145/297096.297136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arne Andersson, Torben Hagerup, Stefan Nilsson, and Rajeev Raman. Sorting in linear time? Journal of Computer and System Sciences (JCSS), 57(1):74--93, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. E. Batcher. Sorting networks and their applications. In Proc. AFIPS Spring Joint Computer Conference, volume 32, pages 307--314, 1968.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jon Bentley. Aha! Algorithms. Communications of the ACM, 26(9):623--627, September 1983. Programming Pearls. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cai and R. Paige. Look ma, no hashing, and no arrays neither. In Jan., editor, Proc. 18th Annual ACM Symp. on Principles of Programming Languages (POPL), Orlando, Florida, pages 143--154, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jiazhen Cai and Robert Paige. Using multiset discrimination to solve language processing problems without hashing. Theoretical Computer Science (TCS), 145(1--2), July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gianni Franceschini, S. Muthukrishnan, and Mihai Patrascu. Radix sorting with no extra space. In Proc. European Symposium on Algorithms (ESA), volume 4698 of Lecture Notes in Computer Science (LNCS), pages 194--205. Springer, 2007. doi: 10.1007/978-3-540-75520-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M.L. Fredman and D.E. Willard. Surpassing the information-theoretic bound with fusion trees. Journal of Computer and System Sciences (JCSS), 47:424--436, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Glasgow Haskell. The Glasgow Haskell Compiler. http://www.haskell.org/ghc/, 2005.Google ScholarGoogle Scholar
  13. Yijie Han and Mikkel Thorup. Integer sorting in o(n√log log n expected time and linear space. In Proceedings of the 43d Annual IEEE Sympositum on Foundations of Computer Science (FOCS), pages 135--144. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fritz Henglein. Multiset discrimination. Unpublished manuscript. See http://plan-x.org/msd/multiset-discrimination.pdf, September 2003.Google ScholarGoogle Scholar
  15. Fritz Henglein. A language for total preorders. Unfinished manuscript, March 2008.Google ScholarGoogle Scholar
  16. Fritz Henglein and Jesper Jørgensen. Formally optimal boxing. In Proc. 21st ACM Symp. on Principles of Programming Languages (POPL), Portland, Oregon, P.O.Box 64145, Baltimore, MD 21264, Jan. 1994. ACM, ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ralf Hinze. Generalizing generalized tries. Journal of Functional Programming, 10(4):327--351, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. A. R. Hoare. Algorithm 63: partition. Commun. ACM, 4(7):321, 1961. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/366622.366642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Johan Jeuring and Patrik Jansson. Polytypic programming. In Advanced Functional Programming, Lecture Notes in Computer Science, pages 68--114. Springer-Verlag, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sian Jha, Jens Palsberg, Tian Zhao, and Fritz Henglein. Efficient type matching. In Olivier Danvy, Fritz Henglein, Harry Mairson, and Alberto Pettorossi, editors, Automatic Program Development-A Tribute to Robert Paige. Springer, 2008. ISBN 978-1-4020-6584-2.Google ScholarGoogle Scholar
  21. Donald Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison Wesley, 2nd edition, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Mehlhorn. Data Structures and Algorithms 1: Sorting and Searching, volume I of EATCS Monographs on Theoretical Computer Science. Springer-Verlag, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Paige. Optimal translation of user input in dynamically typed languages. Draft, July 1991.Google ScholarGoogle Scholar
  24. Robert Paige. Efficient translation of external input in a dynamically typed language. In Proc. 13th World Computer Congress. Elsevier, February 1994.Google ScholarGoogle Scholar
  25. Robert Paige and Robert E. Tarjan. Three partition refinement algorithms. SIAM Journal of Computing, 16(6):973--989, December 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Robert Paige and Zhe Yang. High level reading and data structure compilation. In Proc. 24th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages (POPL), Paris, France, pages 456--469, http://www.acm.org, January 1997. ACM, ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. L. Shell. A high-speed sorting procedure. Communications of the ACM, 2(7), 1959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ranjan Sinha and Justin Zobel. Efficient trie-based sorting of large sets of strings. In Michael Oudshoorn, editor, Proc. 26th Australasian Computer Science Conference (ACSC), Adelaide, Australia, volume 16 of Conferences in Research and Practice in Information Technology, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. W. J. Williams. Algorithm 232 - heapsort. Communications of the ACM, 7(6):347--348, 1964.Google ScholarGoogle Scholar
  30. Yoav Zibin, Joseph Gil, and Jeffrey Considine. Efficient algorithms for isomorphisms of simple types. In Proc. 2003 ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 160--171. ACM, ACM Press, January 2003. SIGPLAN Notices, Vol. 38, No. 1. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generic discrimination: sorting and paritioning unshared data in linear time

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!