Abstract
The publication of user data for statistical analysis and research can be extremely beneficial for both academic and commercial uses, such as statistical research and recommendation systems. To maintain user privacy when such a publication occurs many databases employ anonymization techniques, either on the query results or the data itself. In this article, we examine and analyze the privacy offered when using the query-set-size control method for aggregate queries over a data structures representing various topologies. We focus on the mathematical queries of minimum, maximum, median, and average and show some query types that may be used to extract hidden information. We prove some combinations of these queries will maintain a measurable level of privacy even when using multiple queries. We offer a privacy probability measure, indicating the probability of an attacker to obtain information defined as sensitive by utilizing legitimate queries over such a system. Our results are mathematically proven and backed by simulations using vehicular network data based on the TAPASCologne project.
- [1] . 1989. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv. 21, 4 (1989), 42.Google Scholar
Digital Library
- [2] . 2013. Private decayed predicate sums on streams. In Proceedings of the International Conference on Database Theory (
ICDT’13 ). 12. Google ScholarDigital Library
- [3] . 2011. Private and continual release of statistics. ACM Trans. Inf. Syst. Secur. 14, 3, Article
26 (2011), 24 pages. Google ScholarDigital Library
- [4] . 1986. Security problems on inference control for SUM, MAX, and MIN queries. J. ACM 33, 3 (
May 1986), 451–464. Google ScholarDigital Library
- [5] . 1983. Compromising statistical databases responding to queries about means. ACM Trans. Database Syst. 8, 1 (
March 1983), 60–80. Google ScholarDigital Library
- [6] . 1979. The tracker: A threat to statistical database security. ACM Trans. Database Syst. 4, 1 (
March 1979), 76–96. Google ScholarDigital Library
- [7] . 2018. Detecting violations of differential privacy. In Proceedings of the ACM Conference on Computer and Communications Security (
CCS’18 ). 15. Google ScholarDigital Library
- [8] . 2008. Differential privacy: A survey of results. In Theory and Applications of Models of Computation.Google Scholar
Cross Ref
- [9] . 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography.Google Scholar
- [10] . 2010. Differential privacy under continual observation. In Proceedings of the ACM Symposium on Theory of Computing (
STOC’10 ). 10. Google ScholarDigital Library
- [11] . 2008. Composition attacks and auxiliary information in data privacy. in Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (
KDD’08 ). 9.Google Scholar - [12] . 2005. Estimating the mean and variance from the median, range, and the size of a sample. BMC Med. Res. Methodol. 5, 1 (2005).Google Scholar
Cross Ref
- [13] . 2011. How much is enough? choosing \( \epsilon \) for differential privacy. In Information Security.Google Scholar
- [14] . 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’07).Google Scholar
Cross Ref
- [15] . 2007. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1, Article
3 (2007).Google ScholarDigital Library
- [16] . 1989. Ranges and Trackers in Statistical Databases. Springer, Berlin, 193–206. Google Scholar
Cross Ref
- [17] . 2001. Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 6 (2001), 1010–1027. Google Scholar
Digital Library
- [18] . 2013. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. IEEE Sign. Process. Mag. 30, 5 (2013). Google Scholar
Cross Ref
- [19] . 1979. Linear queries in statistical databases. ACM Trans. Database Syst. 4, 2 (
June 1979), 156–167. Google ScholarDigital Library
- [20] . 2002. K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 5 (2002), 14.Google Scholar
- [21] . 2011. Vehicular Mobility Trace of the City of Cologne, Germany. Retrieved from http://kolntrace.project.citi-lab.fr/.Google Scholar
- [22] . 2014. Generation and analysis of a large-scale urban vehicular mobility dataset. IEEE Trans. Mobile Comput. 13, 5 (2014), 1061–1075.Google Scholar
Digital Library
- [23] . 2018. Privacy risks with facebook’s PII-based targeting: Auditing a data brokers advertising interface. In Proceedings of the IEEE Symposium on Security and Privacy.Google Scholar
Cross Ref
- [24] . 2017. Differential Privacy and Applications. Springer.Google Scholar
Cross Ref
Index Terms
Privacy Analysis of Query-Set-Size Control
Recommendations
Privacy Analysis of Query-Set-Size Control
Privacy in Statistical DatabasesAbstractVast amounts of information of all types are collected daily about people by governments, corporations and individuals. The information is collected, for example, when users register to or use on-line applications, receive health related services, ...
Freedom of Privacy: Anonymous Data Collection with Respondent-Defined Privacy Protection
The massive amount of sensitive survey data about individuals that agencies collect and share through the Internet is causing a great deal of privacy concerns. These concerns may discourage individuals from revealing their sensitive information. ...
The cost of privacy: destruction of data-mining utility in anonymized data publishing
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data miningRe-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasi-identifier" attributes such as ZIP code and birthdate. Their objective is ...






Comments