Abstract
In this work, we study the problem of testing subsequence-freeness. For a given subsequence (word) w = w1 … wk, a sequence (text) T = t1 … tn is said to contain w if there exist indices 1 ≤ i1 < … < ik ≤ n such that tij = wj for every 1 ≤ j ≤ k. Otherwise, T is w-free. While a large majority of the research in property testing deals with algorithms that perform queries, here we consider sample-based testing (with one-sided error). In the “standard” sample-based model (i.e., under the uniform distribution), the algorithm is given samples (i, ti) where i is distributed uniformly independently at random. The algorithm should distinguish between the case that T is w-free, and the case that T is ε-far from being w-free (i.e., more than an ε-fraction of its symbols should be modified so as to make it w-free). Freitag, Price, and Swartworth (Proceedings of RANDOM, 2017) showed that O((k2 log k)ε) samples suffice for this testing task. We obtain the following results.
– | The number of samples sufficient for one-sided error sample-based testing (under the uniform distribution) is O(kε). This upper bound builds on a characterization that we present for the distance of a text T from w-freeness in terms of the maximum number of copies of w in T, where these copies should obey certain restrictions. | ||||
– | We prove a matching lower bound, which holds for every word w. This implies that the above upper bound is tight. | ||||
– | The same upper bound holds in the more general distribution-free sample-based model. In this model, the algorithm receives samples (i, ti) where i is distributed according to an arbitrary distribution p (and the distance from w-freeness is measured with respect to p). | ||||
- [1] . 2016. On active and passive testing. Combinatorics, Probability and Computing 25, 1 (2016), 1–20.Google Scholar
Cross Ref
- [2] . 2001. Regular languages are testable with a constant number of queries. SIAM Journal on Computing 30, 6 (2001), 1842–1862. Google Scholar
- [3] . 2012. Active property testing. In Proceedings of the 53rd Annual Symposium on Foundations of Computer Science (FOCS). 21–30.Google Scholar
Digital Library
- [4] . 2021. Property testing of regular languages with applications to streaming property testing of visibly pushdown languages. In Proceedings of the 48th International Colloquium Automata, Languages and Programming. 119:1–119:17.Google Scholar
- [5] . 2019. Testing local properties of arrays. In Proceedings of the 10th Innovations in Theoretical Computer Science conference (ITCS). 11:1–11:20.Google Scholar
- [6] . 2018. Improved bounds for testing forbidden order patterns. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 2093–2112.Google Scholar
Cross Ref
- [7] . 2019. Finding monotone patterns in sublinear time. In Proceedings of the 16teeth Annual Symposium on Foundations of Computer Science (FOCS). 1469–1494.Google Scholar
Cross Ref
- [8] . 2017. Deleting and testing forbidden patterns in multi-dimensional arrays. In Proceedings of the 44th International Colloquium Automata, Languages and Programming. 9:1–9:14.Google Scholar
- [9] . 2016. Testing convexity of figures under the uniform distribution. In Proceedings of the 32nd International Symposium on Computational Geometry (SoCG). 17:1–17:15.Google Scholar
- [10] . 2016. Tolerant testers of image properties. In Proceedings of the 43rd International Colloquium Automata, Languages and Programming. 462:1–462:14.Google Scholar
- [11] . 2021. VC dimension and distribution-free sample-based testing. In Proceedings of the 53rd Annual ACM Symposium on the Theory of Computing (STOC). 504–517.Google Scholar
Digital Library
- [12] . 2019. Almost optimal distribution-free junta testing. In Proceedings of the 34th IEEE Annual Conference on Computational Complexity (CCC). 2:1–2:13.Google Scholar
- [13] . 2020. Almost optimal testers for concise representations. In Proceedings of the 24th International Workshop on Randomization and Computation (RANDOM). 5:1–5:20.Google Scholar
- [14] . 2020. A Survey on Distribution Testing: Your Data is Big. But is it Blue? 1–100.
DOI: Google ScholarCross Ref
- [15] . 2019. Testing \( k \)-monotonicity: The rise and fall of boolean functions. Theory of Computing 15, 1 (2019), 1–55.Google Scholar
- [16] . 2018. Distribution-free junta testing. In Proceedings of the 15th Annual ACM Symposium on the Theory of Computing (STOC). 749–759.Google Scholar
- [17] . 2016. Tight bounds for the distribution-free testing of monotone conjuctions. In Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 54–71.Google Scholar
- [18] . 2009. Introduction to Algorithms (3rd. ed.). MIT Press and McGraw-Hill.Google Scholar
Digital Library
- [19] . 2016. A new approach for testing properties of discrete distributions. In Proceedings of the 57th Annual Symposium on Foundations of Computer Science (FOCS). 685–694.Google Scholar
Cross Ref
- [20] . 1999. Improved bounds for testing monotonicity. In Proceedings of the 3rd International Workshop on Randomization and Approximation Techniques in Computer Science (RANDOM). 97–108.Google Scholar
- [21] . 2011. Distribution-free testing for monomials with a sublinear number of queries. Theory of Computing 7, 1 (2011), 155–176.Google Scholar
Cross Ref
- [22] . 1998. Spot-checkers. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing (STOC). 259–268.Google Scholar
Digital Library
- [23] . 2002. Monotonicity testing over general poset domains. In Proceedings of the 34th Annual ACM Symposium on the Theory of Computing (STOC). 474–483.Google Scholar
Digital Library
- [24] . 2007. Testing of matrix-poset properties. Combinatorica 27, 3 (2007), 293–327.Google Scholar
Digital Library
- [25] . 1956. Maximal flow through a network. Canadian Journal of Mathematics 8 (1956), 399–404.Google Scholar
Cross Ref
- [26] . 2017. Testing hereditary properties of sequences. In Proceedings of the 21st International Workshop on Randomization and Computation (RANDOM). 44:1–44:10.Google Scholar
- [27] . 2009. Distribution-free testing lower bound for basic boolean functions. Theory of Computing 5, 1 (2009), 191–216.Google Scholar
Cross Ref
- [28] . 2016. The uniform distribution is complete with respect to testing identity to a fixed distribution.
ECCC TR16-015. To appear in the book: Computational Complexity and Property Testing, LNCS 12050, pages 152–172. 2020. Google Scholar - [29] . 2017. Introduction to Property Testing. Cambridge University Press. Google Scholar
Cross Ref
- [30] . 2000. Testing monotonicity. Combinatorica 20, 3 (2000), 301–337.Google Scholar
Cross Ref
- [31] . 1998. Property testing and its connection to learning and approximation. Journal of the ACM 45, 4 (1998), 653–750.Google Scholar
Digital Library
- [32] . 2016. On sample-based testers. ACM Transactions on Computing Theory 8, 2 (2016), 7:1–7:54.Google Scholar
Digital Library
- [33] . 2007. Distribution-free property testing. SIAM Journal on Computing 37, 4 (2007), 1107–1138.Google Scholar
Digital Library
- [34] . 2000. Testing problems with sub-learning sample complexity. Journal of Computer and System Sciences 61, 3 (2000), 428–456.Google Scholar
Digital Library
- [35] . 2019. Testing for forbidden order patterns in an array. Random Structures and Algorithms 55, 2 (2019), 402–426.Google Scholar
Cross Ref
- [36] . 2018. Parameterized property testing of functions. ACM Transactions on Computing Theory 9, 4 (2018).Google Scholar
Digital Library
- [37] . 2009. Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM Journal on Computing 39, 3 (2009), 813–842.Google Scholar
Digital Library
- [38] . 2020. Almost optimal distribution-free sample-based testing of \( k \)-modality. In Proceedings of the 24th International Workshop on Randomization and Computation (RANDOM). 27:1–27:19.Google Scholar
- [39] . 1996. Robust characterization of polynomials with applications to program testing. SIAM Journal on Computing 25, 2 (1996), 252–271.Google Scholar
Digital Library
- [40] . 2001. Average Case Analysis of Algorithms on Sequences. Wiley-Interscience, New York.Google Scholar
Digital Library
Index Terms
Optimal Distribution-Free Sample-Based Testing of Subsequence-Freeness with One-Sided Error
Recommendations
VC dimension and distribution-free sample-based testing
STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of ComputingWe consider the problem of determining which classes of functions can be tested more efficiently than they can be learned, in the distribution-free sample-based model that corresponds to the standard PAC learning setting. Our main result shows that ...
Optimal distribution-free sample-based testing of subsequence-freeness
SODA '21: Proceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete AlgorithmsIn this work, we study the problem of testing subsequence-freeness. For a given subsequence (word) w = w1 … wk, a sequence (text) T = t1 … tn is said to contain w if there exist indices 1 ≤ i1 < … < ik ≤ n such that tij = wj for every 1 ≤ j ≤ k. ...
Distribution-free junta testing
STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of ComputingWe study the problem of testing whether an unknown n-variable Boolean function is a k-junta in the distribution-free property testing model, where the distance between functions is measured with respect to an arbitrary and unknown probability ...






Comments