Abstract
We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.
References
- AHO,A.AND CORASICK, M. 1975. Efficient string matching: an aid to bibliographic search. Commun. ACM 18, 6, 333-340.]] Google Scholar
- AHO, A., HOPCROFT,J.,AND ULLMAN, J. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA.]] Google Scholar
- ALTSCHUL, S., GISH, W., MILLER, W., MYERS,G.,AND LIPMAN, D. 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403-410.]]Google Scholar
- AMIR, A., LEWENSTEIN, M., AND LEWENSTEIN, N. 1997a. Pattern matching in hypertext. In Proceedings of the 5th International Workshop on Algorithms and Data Structures (WADS '97). LNCS, vol. 1272, Springer-Verlag, Berlin, 160-173.]] Google Scholar
- AMIR, A., AUMANN, Y., LANDAU, G., LEWENSTEIN, M., AND LEWENSTEIN, N. 1997b. Pattern matching with swaps. In Proceedings of the Foundations of Computer Science (FOCS'97), 1997, 144- 153.]] Google Scholar
- APOSTOLICO, A. 1985. The myriad virtues of subword trees. In Combinatorial Algorithms on Words. Springer-Verlag, Barlin, 85-96.]]Google Scholar
- APOSTOLICO,A.AND GALIL, Z. 1985. Combinatorial Algorithms on Words. NATO ISI Series. Springer-Verlag, Berlin.]] Google Scholar
- APOSTOLICO,A.AND GALIL, Z. 1997. Pattern Matching Algorithms. Oxford University Press, Oxford, UK.]] Google Scholar
- APOSTOLICO,A.AND GUERRA, C. 1987. The Longest Common Subsequence problem revisited. Algorithmica 2, 315-336.]]Google Scholar
- ARAUJO, M., NAVARRO,G.,AND ZIVIANI, N. 1997. Large text searching allowing errors. In Proceedings of the 4th South American Workshop on String Processing (WSP '97), Carleton Univ. Press. 2-20.]]Google Scholar
- ARLAZAROV, V., DINIC, E., KONROD, M., AND FARADZEV, I. 1975. On economic construction of the transitive closure of a directed graph. Sov. Math. Dokl. 11, 1209, 1210. Original in Russian in Dokl. Akad. Nauk SSSR 194, 1970.]]Google Scholar
- ATALLAH, M., JACQUET,P.,AND SZPANKOWSKI, W. 1993. A probabilistic approach to pattern matching with mismatches. Random Struct. Algor. 4, 191- 213.]]Google Scholar
- BAEZA-YATES, R. 1989. Efficient Text Searching. Ph.D. thesis, Dept. of Computer Science, University of Waterloo. Also as Res. Rep. CS-89-17.]] Google Scholar
- BAEZA-YATES, R. 1991. Some new results on approximate string matching. In Workshop on Data Structures, Dagstuhl, Germany. Abstract.]]Google Scholar
- BAEZA-YATES, R. 1992. Text retrieval: Theory and practice. In 12th IFIP World Computer Congress. Elsevier Science, Amsterdam. vol. I, 465-476.]] Google Scholar
- BAEZA-YATES, R. 1996. A unified view of string matching algorithms. In Proceedings of the Theory and Practice of Informatics (SOFSEM '96). LNCS, vol. 1175, Springer-Verlag, Berlin, 1-15.]] Google Scholar
- BAEZA-YATES,R.AND GONNET, G. 1992. A new approach to text searching. Commun. ACM 35, 10, 74-82. Preliminary version in ACM SIGIR '89.]] Google Scholar
- BAEZA-YATES,R.AND GONNET, G. 1994. Fast string matching with mismatches. Information and Computation 108, 2, 187-199. Preliminary version as Tech. Rep. CS-88-36, Data Structuring Group, Univ. of Waterloo, Sept. 1988.]]Google Scholar
- BAEZA-YATES,R.AND NAVARRO, G. 1997. Multiple approximate string matching. In Proceedings of the 5th International Workshop on Algorithms and Data Structures (WADS '97). LNCS, vol. 1272, 1997, Springer-Verlag, Berlin, 174-184.]] Google Scholar
- BAEZA-YATES,R.AND NAVARRO, G. 1998. New and faster filters for multiple approximate string matching. Tech. Rep. TR/DCC-98-10, Dept. of Computer Science, University of Chile. Random Struct. Algor. to appear. ftp://ftp. dcc.ptuchile.cl/pub/users/gnavarro/multi. ps.gz.]]Google Scholar
- BAEZA-YATES,R.AND NAVARRO, G. 1999. Faster approximate string matching. Algorithmica 23,2, 127-158. Preliminary versions in Proceedings of CPM '96 (LNCS, vol. 1075, 1996) and in Proceedings of WSP'96, Carleton Univ. Press, 1996.]] Google Scholar
- BAEZA-YATES,R.AND NAVARRO, G. 2000. Blockaddressing indices for approximate text retrieval. J. Am. Soc. Inf. Sci. (JASIS) 51, 1 (Jan.), 69-82.]] Google Scholar
- BAEZA-YATES,R.AND PERLEBERG, C. 1996. Fast and practical approximate pattern matching. Information Processing Letters 59, 21-27. Preliminary version in CPM '92 (LNCS, vol. 644. 1992).]] Google Scholar
- BAEZA-YATES,R.AND R' EGNIER, M. 1990. Fast algorithms for two dimensional and multiple pattern matching. In Proceedings of Scandinavian Workshop on Algorithmic Theory (SWAT '90). LNCS, vol. 447, Springer-Verlag, Berlin, 332-347.]] Google Scholar
- BAEZA-YATES,R.AND RIBEIRO-NETO, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading, MA.]] Google Scholar
- BLUMER, A., BLUMER, J., HAUSSLER, D., EHRENFEUCHT, A., CHEN, M., AND SEIFERAS, J. 1985. The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40, 31-55.]]Google Scholar
- BOYER,R.AND MOORE, J. 1977. A fast string searching algorithm. Commun. ACM 20, 10, 762-772.]] Google Scholar
- CHANG,W.AND LAMPE, J. 1992. Theoretical and empirical comparisons of approximate string matching algorithms. In Proceedings of the 3d Annual Symposium on Combinatorial Pattern Matching (CPM '92). LNCS, vol. 644, Springer-Verlag, Berlin, 172-181.]] Google Scholar
- CHANG,W.AND LAWLER, E. 1994. Sublinear approximate string matching and biological applications. Algorithmica 12, 4/5, 327-344. Preliminary version in FOCS '90.]]Google Scholar
- CHANG,W.AND MARR, T. 1994. Approximate string matching and local similarity. In Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM '94). LNCS, vol. 807, Springer-Verlag, Berlin, 259-273.]] Google Scholar
- CHVATAL,V.AND SANKOFF, D. 1975. Longest common subsequences of two random sequences. J. Appl. Probab. 12, 306-315.]]Google Scholar
- COBBS, A. 1995. Fast approximate matching using suffix trees. In Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching (CPM '95), 41-54.]]Google Scholar
- COLE,R.AND HARIHARAN, R. 1998. Approximate string matching: a simpler faster algorithm. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA '98), 463-472.]] Google Scholar
- COMMENTZ-WALTER, B. 1979. A string matching algorithm fast on the average. In Proc. ICALP '79. LNCS, vol. 6, Springer-Verlag, Berlin, 118-132.]] Google Scholar
- CORMEN, T., LEISERSON,C.,AND RIVEST, R. 1990. Introduction to Algorithms. MIT Press, Cambridge, MA.]] Google Scholar
- CROCHEMORE, M. 1986. Transducers and repetitions. Theor. Comput. Sci. 45, 63-86.]] Google Scholar
- CROCHEMORE,M.AND RYTTER, W. 1994. Text Algorithms. Oxford Univ. Press, Oxford, UK.]] Google Scholar
- CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK, S., LECROQ, T., PLANDOWSKI,W.,AND RYTTER,W. 1994. Speeding up two string-matching algorithms. Algorithmica 12, 247-267.]]Google Scholar
- DAMERAU, F. 1964. A technique for computer detection and correction of spelling errors. Commun. ACM 7, 3, 171-176.]] Google Scholar
- DAS, G., FLEISHER, R., GASIENIEK, L., GUNOPULOS, D., AND K ARK AINEN, J. 1997. Episode matching. In Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching (CPM '97). LNCS, vol. 1264, Springer-Verlag, Berlin, 12-27.]] Google Scholar
- DEKEN, J. 1979. Some limit results for longest common subsequences. Discrete Math. 26, 17-31.]]Google Scholar
- DIXON,R.AND MARTIN, T. Eds. 1979. Automatic Speech and Speaker Recognition. IEEE Press, New York.]] Google Scholar
- EHRENFEUCHT,A.AND HAUSSLER, D. 1988. A new distance metric on strings computable in linear time. Discrete Appl. Math. 20, 191-203.]] Google Scholar
- ELLIMAN,D.AND LANCASTER, I. 1990. A review of segmentation and contextual analysis techniques for text recognition. Pattern Recog. 23, 3/4, 337- 346.]] Google Scholar
- FRENCH, J., POWELL, A., AND SCHULMAN, E. 1997. Applications of approximate word matching in information retrieval. In Proceedings of the 6th ACM International Conference on Information and Knowledge Management (CIKM '97), 9-15.]] Google Scholar
- GALIL,Z.AND GIANCARLO, R. 1988. Data structures and algorithms for approximate string matching. J. Complexity 4, 33-72.]] Google Scholar
- GALIL,Z.AND PARK, K. 1990. An improved algorithm for approximate string matching. SIAM J. Comput. 19, 6, 989-999. Preliminary version in ICALP '89 (LNCS, vol. 372, 1989).]] Google Scholar
- GIEGERICH, R., KURTZ, S., HISCHKE,F.,AND OHLEBUSCH, E. 1997. A general technique to improve filter algorithms for approximate string matching. In Proceedings of the 4th South American Workshop on String Processing (WSP '97). Carleton Univ. Press. 38-52. Preliminary version as Tech. Rep. 96-01, Universit at Bielefeld, Germany, 1996.]]Google Scholar
- GONNET, G. 1992. A tutorial introduction to Computational Biochemistry using Darwin. Tech. rep., Informatik E. T. H., Zuerich, Switzerland.]]Google Scholar
- GONNET,G.AND BAEZA-YATES, R. 1991. Handbook of Algorithms and Data Structures, 2d ed. Addison-Wesley, Reading, MA.]] Google Scholar
- GONZALEZ,R.AND THOMASON, M. 1978. Syntactic Pattern Recognition. Addison-Wesley, Reading, MA.]]Google Scholar
- GOSLING, J. 1991. A redisplay algorithm. In Proceedings of ACM SIGPLAN/SIGOA Symposium on Text Manipulation, 123-129.]] Google Scholar
- GROSSI,R.AND LUCCIO, F. 1989. Simple and efficient string matching with k mismatches. Inf. Process. Lett. 33, 3, 113-120.]]Google Scholar
- GUSFIELD, D. 1997. Algorithms on Strings, Trees and Sequences. Cambridge Univ. Press, Cambridge.]] Google Scholar
- HALL,P.AND DOWLING, G. 1980. Approximate string matching. ACM Comput. Surv. 12, 4, 381-402.]] Google Scholar
- HAREL,D.AND TARJAN, E. 1984. Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13, 2, 338-355.]] Google Scholar
- HECKEL, P. 1978. A technique for isolating differences between files. Commun. ACM 21, 4, 264- 268.]] Google Scholar
- HOLSTI,N.AND SUTINEN, E. 1994. Approximate string matching using q-gram places. In Proceedings of 7th Finnish Symposium on Computer Science. Univ. of Joensuu. 23-32.]]Google Scholar
- HOPCROFT,J.AND ULLMAN, J. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Reading, MA.]] Google Scholar
- HORSPOOL, R. 1980. Practical fast searching in strings. Software Practice Exper. 10, 501-506.]]Google Scholar
- JOKINEN,P.AND UKKONEN, E. 1991. Two algorithms for approximate string matching in static texts. In Proceedings of the 2nd Mathematical Foundations of Computer Science (MFCS '91). Springer- Verlag, Berlin, vol. 16, 240-248.]]Google Scholar
- JOKINEN, P., TARHIO,J.,AND UKKONEN, E. 1996. Acomparison of approximate string matching algorithms. Software Practice Exper. 26, 12, 1439- 1458. Preliminary version in Tech. Rep. A-1991- 7, Dept. of Computer Science, Univ. of Helsinki, 1991.]] Google Scholar
- KARLOFF, H. 1993. Fast algorithms for approximately counting mismatches. Inf. Process. Lett. 48, 53-60.]] Google Scholar
- KECECIOGLU,J.AND SANKOFF, D. 1995. Exact and approximation algorithms for the inversion distance between two permutations. Algorithmica 13, 180-210.]]Google Scholar
- KNUTH, D. 1973. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley, Reading, MA.]] Google Scholar
- KNUTH, D., MORRIS, J., JR, AND PRATT, V. 1977. Fast pattern matching in strings. SIAM J. Com-put. 6, 1, 323-350.]]Google Scholar
- KUKICH, K. 1992. Techniques for automatically correcting words in text. ACM Comput. Surv. 24,4, 377-439.]] Google Scholar
- KUMAR,S.AND SPAFFORD, E. 1994. A patternmatching model for intrusion detection. In Proceedings of the National Computer Security Conference, 11-21.]]Google Scholar
- KURTZ, S. 1996. Approximate string searching under weighted edit distance. In Proceedings of the 3rd South American Workshop on String Processing (WSP '96). Carleton Univ. Press. 156- 170.]]Google Scholar
- KURTZ,S.AND MYERS, G. 1997. Estimating the probability of approximate matches. In Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching (CPM '97). LNCS, vol. 1264, Springer-Verlag, Berlin, 52-64.]] Google Scholar
- LANDAU,G.AND VISHKIN, U. 1988. Fast string matching with k differences. J. Comput. Syst. Sci. 37, 63-78. Preliminary version in FOCS '85.]] Google Scholar
- LANDAU,G.AND VISHKIN, U. 1989. Fast parallel and serial approximate string matching. J. Algor. 10, 157-169. Preliminary version in ACMSTOC '86.]] Google Scholar
- LANDAU, G., MYERS, E., AND SCHMIDT, J. 1998. Incremental string comparison. SIAM J. Comput. 27, 2, 557-582.]] Google Scholar
- LAWRENCE,S.AND GILES, C. L. 1999. Accessibility of information on the web. Nature 400, 107-109.]]Google Scholar
- LEE, J., KIM, D., PARK, K., AND CHO, Y. 1997. Efficient algorithms for approximate string matching with swaps. In Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching (CPM '97). LNCS, vol. 1264, Springer-Verlag, Berlin, 28-39.]] Google Scholar
- LEVENSHTEIN, V. 1965. Binary codes capable of correcting spurious insertions and deletions of ones. Probl. Inf. Transmission 1, 8-17.]]Google Scholar
- LEVENSHTEIN, V. 1966. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 8, 707-710. Original in Russian in Dokl. Akad. Nauk SSSR 163, 4, 845-848, 1965.]]Google Scholar
- LIPTON,R.AND LOPRESTI, D. 1985. A systolic array for rapid string comparison. In Proceedings of the Chapel Hill Conference on VLSI, 363- 376.]]Google Scholar
- LOPRESTI,D.AND TOMKINS, A. 1994. On the search-ability of electronic ink. In Proceedings of the 4th International Workshop on Frontiers in Handwriting Recognition, 156-165.]]Google Scholar
- LOPRESTI,D.AND TOMKINS, A. 1997. Block edit models for approximate string matching. Theor. Comput. Sci. 181, 1, 159-179.]] Google Scholar
- LOWRANCE,R.AND WAGNER, R. 1975. An extension of the string-to-string correction problem. J. ACM 22, 177-183.]] Google Scholar
- LUCZAK,T.AND SZPANKOWSKI, W. 1997. A suboptimal lossy data compression based on approximate pattern matching. IEEE Trans. Inf. Theor. 43, 1439-1451.]]Google Scholar
- MANBER,U.AND WU, S. 1994. GLIMPSE: A tool to search through entire file systems. In Proceedings of USENIX Technical Conference. USENIX Association, Berkeley, CA, USA. 23-32. Preliminary version as Tech. Rep. 93-34, Dept. of Computer Science, Univ. of Arizona, Oct. 1993.]] Google Scholar
- MASEK,W.AND PATERSON, M. 1980. A faster algorithm for computing string edit distances. J. Comput. Syst. Sci. 20, 18-31.]]Google Scholar
- MASTERS, H. 1927. A study of spelling errors. Univ. of Iowa Studies in Educ. 4,4.]]Google Scholar
- MCCREIGHT, E. 1976. A space-economical suffix tree construction algorithm. J. ACM 23, 2, 262- 272.]] Google Scholar
- MELICHAR, B. 1996. String matching with k differences by finite automata. In Proceedings of the International Congress on Pattern Recognition (ICPR '96). IEEE CS Press, Silver Spring, MD. 256-260. Preliminary version in Computer Anal-ysis of Images and Patterns (LNCS, vol. 970, 1995).]] Google Scholar
- MORRISON, D. 1968. PATRICIA-Practical algorithm to retrieve information coded in alphanumeric. J. ACM 15, 4, 514-534.]] Google Scholar
- MUTH,R.AND MANBER, U. 1996. Approximate multiple string search. In Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching (CPM '96). LNCS, vol. 1075, Springer-Verlag, Berlin, 75-86.]] Google Scholar
- MYERS, G. 1994a. A sublinear algorithm for approximate keyword searching. Algorithmica 12, 4/5, 345-374. Perliminary version in Tech. Rep. TR90-25, Computer Science Dept., Univ. of Arizona, Sept. 1991.]]Google Scholar
- MYERS, G. 1994b. Algorithmic Advances for Searching Biosequence Databases. Plenum Press, New York, 121-135.]]Google Scholar
- MYERS, G. 1986a. Incremental alignment algorithms and their applications. Tech. Rep. 86-22, Dept. of Computer Science, Univ. of Arizona.]]Google Scholar
- MYERS, G. 1986b. An O(ND) difference algorithm and its variations. Algorithmica 1, 251-266.]]Google Scholar
- MYERS, G. 1991. An overview of sequence comparison algorithms in molecular biology. Tech. Rep. TR-91-29, Dept. of Computer Science, Univ. of Arizona.]]Google Scholar
- MYERS, G. 1999. A fast bit-vector algorithm for approximate string matching based on dynamic progamming. J. ACM 46, 3, 395-415. Earlier version in Proceedings of CPM'98 (LNCS, vol. 1448).]] Google Scholar
- NAVARRO, G. 1997a. Multiple approximate string matching by counting. In Proceedings of the 4th South American Workshop on String Processing (WSP '97). Carleton Univ. Press, 125-139.]]Google Scholar
- NAVARRO, G. 1997b. A partial deterministic automaton for approximate string matching. In Proceedings of the 4th South American Workshop on String Processing (WSP '97). Carleton Univ. Press, 112-124.]]Google Scholar
- NAVARRO, G. 1998. Approximate Text Searching. Ph.D. thesis, Dept. of Computer Science, Univ. of Chile. Tech. Rep. TR/DCC-98-14. ftp://ftp. dcc.uchile.cl/pub/users/gnavarro/thesis98. ps.gz.]]Google Scholar
- NAVARRO, G. 2000a. Improved approximate pattern matching on hypertext. Theor. Comput. Sci., 237, 455-463. Previous version in Proceedings of LATIN '98 (LNCS, vol. 1380).]] Google Scholar
- NAVARRO, G. 2000b. Nrgrep: A fast and flexible pattern matching tool, Tech. Rep. TR/DCC-2000-3. Dept. of Computer Science, Univ. of Chile, Aug. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/ nrgrep.ps.gz.]]Google Scholar
- NAVARRO,G.AND BAEZA-YATES, R. 1998a. Improving an algorithm for approximate pattern matching. Tech. Rep. TR/DCC-98- 5, Dept. of Computer Science, Univ. of Chile. Algorithmica, to appear. ftp:// ftp.dcc.uchile.cl/pub/users/gnavarro/dexp. ps.gz.]]Google Scholar
- NAVARRO,G.AND BAEZA-YATES, R. 1998b. A practical q-gram index for text retrieval allowing errors. CLEI Electron. J. 1,2.http://www.clei.cl.]]Google Scholar
- NAVARRO,G.AND BAEZA-YATES, R. 1999a. Fast multidimensional approximate pattern matching. In Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching (CPM '99). LNCS, vol. 1645, Springer-verlag, Berlin, 243- 257. Extended version to appear in J. Disc. Algor. (JDA).]] Google Scholar
- NAVARRO,G.AND BAEZA-YATES, R. 1999b. A new indexing method for approximate string matching. In Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching (CPM '99), LNCS, vol. 1645, Springer-verlag, Berlin, 163- 185. Extended version to appear in J. Discrete Algor. (JDA).]] Google Scholar
- NAVARRO,G.AND BAEZA-YATES, R. 1999c. Very fast and simple approximate string matching. Inf. Process. Lett. 72, 65-70.]] Google Scholar
- NAVARRO,G.AND RAFFINOT, M. 2000. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Exp. Algor. 5,4. Previous version in Proceedings of CPM '98. Lecture Notes in Computer Science, Springer-Verlag, New York.]] Google Scholar
- NAVARRO, G., MOURA, E., NEUBERT, M., ZIVIANI,N.,AND BAEZA-YATES, R. 2000. Adding compression to block addressing inverted indexes. Kluwer Inf. Retrieval J. 3, 1, 49-77.]] Google Scholar
- NEEDLEMAN,S.AND WUNSCH, C. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48, 444-453.]]Google Scholar
- NESBIT, J. 1986. The accuracy of approximate string matching algorithms. J. Comput.-Based Instr. 13, 3, 80-83.]] Google Scholar
- OWOLABI,O.AND MCGREGOR, R. 1988. Fast approximate string matching. Software Practice Exper. 18, 4, 387-393.]] Google Scholar
- REGNIER,M.AND SZPANKOWSKI, W. 1997. On the approximate pattern occurrence in a text. In Proceedings of Compression and Complexity of SEQUENCES '97. IEEE Press, New York.]] Google Scholar
- RIVEST, R. 1976. Partial-match retrieval algorithms. SIAM J. Comput. 5,1.]]Google Scholar
- SAHINALP,S.AND VISHKIN, U. 1997. Approximate pattern matching using locally consistent parsing. Manuscript, Univ. of Maryland Institute for Advanced Computer Studies (UMIACS).]]Google Scholar
- SANKOFF, D. 1972. Matching sequences under deletion/insertion constraints. In Proceedings of the National Academy of Sciences of the USA, vol. 69, 4-6.]]Google Scholar
- SANKOFF,D.AND KRUSKAL, J., Eds. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA.]]Google Scholar
- SANKOFF,D.AND MAINVILLE, S. 1983. Common Subsequences and Monotone Subsequences. Addison-Wesley, Reading, MA, 363-365.]]Google Scholar
- SCHIEBER,B.AND VISHKIN, U. 1988. On finding lowest common ancestors: simplification and parallelization. SIAM J. Comput. 17, 6, 1253- 1262.]] Google Scholar
- SELLERS, P. 1974. On the theory and computation of evolutionary distances. SIAM J. Appl. Math. 26, 787-793.]]Google Scholar
- SELLERS, P. 1980. The theory and computation of evolutionary distances: pattern recognition. J. Algor. 1, 359-373.]]Google Scholar
- SHI, F. 1996. Fast approximate string matching with q-blocks sequences. In Proceedings of the 3rd South American Workshop on String Processing (WSP'96). Carleton Univ. Press. 257- 271.]]Google Scholar
- SUNDAY, D. 1990. A very fast substring search algorithm. Commun. ACM 33, 8, 132-142.]] Google Scholar
- SUTINEN, E. 1998. Approximate Pattern Matching with the q-Gram Family. Ph.D. thesis, Dept. of Computer Science, Univ. of Helsinki, Finland. Tech. Rep. A-1998-3.]]Google Scholar
- SUTINEN,E.AND TARHIO, J. 1995. On using q-gram locations in approximate string matching. In Proceedings of the 3rd Annual European Symposium on Algorithms (ESA '95). LNCS, vol. 979, Springer-Verlag, Berlin, 327-340.]] Google Scholar
- SUTINEN,E.AND TARHIO, J. 1996. Filtration with qsamples in approximate string matching. In Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching (CPM '96). LNCS, vol. 1075, Springer-Verlag, Berlin, 50-61.]] Google Scholar
- TAKAOKA, T. 1994. Approximate pattern matching with samples. In Proceedings of ISAAC '94. LNCS, vol. 834, Springer-Verlag, Berlin, 234- 242.]] Google Scholar
- TARHIO,J.AND UKKONEN, E. 1988. A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci. 57, 131- 145.]] Google Scholar
- TARHIO,J.AND UKKONEN, E. 1993. Approximate Boyer-Moore string matching. SIAM J. Com-put. 22, 2, 243-260. Preliminary version in SWAT'90 (LNCS, vol. 447, 1990).]] Google Scholar
- TICHY, W. 1984. The string-to-string correction problem with block moves. ACMTrans. Comput. Syst. 2, 4, 309-321.]] Google Scholar
- UKKONEN, E. 1985a. Algorithms for approximate string matching. Information and Control 64, 100-118. Preliminary version in Proceedings of the International Conference Foundations of Computation Theory (LNCS, vol. 158, 1983).]] Google Scholar
- UKKONEN, E. 1985b. Finding approximate patterns in strings. J. Algor. 6, 132-137.]]Google Scholar
- UKKONEN, E. 1992. Approximate string matching with q-grams and maximal matches. Theor. Comput. Sci. 1, 191-211.]] Google Scholar
- UKKONEN, E. 1993. Approximate string matching over suffix trees. In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM '93), 228-242.]] Google Scholar
- UKKONEN, E. 1995. Constructing suffix trees online in linear time. Algorithmica 14, 3, 249- 260.]]Google Scholar
- UKKONEN,E.AND WOOD, D. 1993. Approximate string matching with suffix automata. Algorithmica 10, 353-364. Preliminary version in Rep. A-1990-4, Dept. of Computer Science, Univ. of Helsinki, Apr. 1990.]]Google Scholar
- ULLMAN, J. 1977. A binary n-gram technique for automatic correction of substitution, deletion, insertion and reversal errors in words. Comput. J. 10, 141-147.]]Google Scholar
- VINTSYUK, T. 1968. Speech discrimination by dynamic programming. Cybernetics 4, 52-58.]]Google Scholar
- WAGNER,R.AND FISHER, M. 1974. The string to string correction problem. J. ACM 21, 168-178.]] Google Scholar
- WATERMAN, M. 1995. Introduction to Computational Biology. Chapman and Hall, London.]]Google Scholar
- WEINER, P. 1973. Linear pattern matching algorithms. In Proceedings of IEEE Symposium on Switching and Automata Theory, 1-11.]]Google Scholar
- WRIGHT, A. 1994. Approximate string matching using within-word parallelism. Software Practice Exper. 24, 4, 337-362.]] Google Scholar
- WU,S.AND MANBER, U. 1992a. Agrepfia fast approximate pattern-matching tool. In Proceedings of USENIX Technical Conference. USENIX Association, Berkeley, CA, USA. 153-162.]]Google Scholar
- WU,S.AND MANBER, U. 1992b. Fast text searching allowing errors. Commun. ACM 35, 10, 83-91.]] Google Scholar
- WU, S., MANBER,U.,AND MYERS, E. 1995. A subquadratic algorithm for approximate regular expression matching. J. Algor. 19, 3, 346-360.]] Google Scholar
- WU, S., MANBER,U.,AND MYERS, E. 1996. A subquadratic algorithm for approximate limited expression matching. Algorithmica 15,1,50- 67. Preliminary version as Tech. Rep. TR29-36, Computer Science Dept., Univ. of Arizona, 1992.]]Google Scholar
- YAO, A. 1979. The complexity of pattern matching for a random string. SIAM J. Comput. 8, 368- 387.]]Google Scholar
- YAP, T., FRIEDER,O.,AND MARTINO, R. 1996. High Performance Computational Methods for Biological Sequence Analysis. Kluwer Academic Publishers, Dordrecht.]] Google Scholar
- ZOBEL,J.AND DART, P. 1996. Phonetic string matching: lessons from information retrieval. In Proceedings of the 19th ACM International Conference on Information Retrieval (SIGIR '96), 166- 172.]] Google Scholar







Comments