Abstract
We introduce the concern of confidentiality protection of business information for the publication of search engine query logs and derived data. We study business confidentiality, as the protection of nonpublic data from institutions, such as companies and people in the public eye. In particular, we relate this concern to the involuntary exposure of confidential Web site information, and we transfer this problem into the field of privacy-preserving data mining. We characterize the possible adversaries interested in disclosing Web site confidential data and the attack strategies that they could use. These attacks are based on different vulnerabilities found in query log for which we present several anonymization heuristics to prevent them. We perform an experimental evaluation to estimate the remaining utility of the log after the application of our anonymization techniques. Our experimental results show that a query log can be anonymized against these specific attacks while retaining a significant volume of useful data.
- Adar, E. 2007. User 4xxxxx9: Anonymizing query logs. In Proceedings of the Workshop in Query Log Analysis: Social and Technological Challenges (WWW'07).Google Scholar
- Albert, R., Jeong, H., and Barabasi, A.-L. 2000. Error and attack tolerance of complex networks. Nature 406, 6794, 378--382.Google Scholar
- AOL. AOL Research Web site, no longer online. http://research.aol.com.Google Scholar
- Arrington, M. 2006. AOL proudly releases massive amounts of private data. http://www.techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/.Google Scholar
- Baeza-Yates, R. 2007. Graphs from search engine queries. In Proceedings of the 33rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM'07). Springer, 1--8. Google Scholar
Digital Library
- Baeza-Yates, R., Jones, R., and Poblete, B. 2010. Issues with privacy preservation in query log mining. In Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques, F. Bonchi and E. Ferrari, Eds. Chapman and Hall/CRC Press.Google Scholar
- Baeza-Yates, R. and Tiberi, A. 2007. Extracting semantic relations from query logs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google Scholar
Digital Library
- Barbaro, M. and Zeller, T. 2006. A face is exposed for AOL searcher no. 4417749. New York Times.Google Scholar
- Broder, A. 2002. A taxonomy of web search. ACM SIGIR Forum 36, 2, 3--10. Google Scholar
Digital Library
- Chen, B.-C., Kifer, D., LeFevre, K., and Machanavajjhala, A. 2009. Privacy-Preserving Data Publishing. Vol. 2. Now Publishers Inc. Google Scholar
Digital Library
- Clifton, C., Kantarcioglu, M., and J.Vaidya. 2002. Defining privacy for data mining. In Proceedings of the National Science Foundation Workshop on Next Generation Data Mining.Google Scholar
- Cooper, A. 2008. A survey of query log privacy-enhancing techniques from a policy perspective. ACM Trans. Web 2, 4. Google Scholar
Digital Library
- Jones, R., Kumar, R., Pang, B., and Tomkins, A. 2007. “I know what you did last summer”: Query logs and user privacy. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). ACM, New York, 909--914. Google Scholar
Digital Library
- Kumar, R., Novak, J., Pang, B., and Tomkins, A. 2007. On anonymizing query logs via token-based hashing. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, 629--638. Google Scholar
Digital Library
- Poblete, B., Spiliopoulou, M., and Baeza-Yates, R. 2008. Website privacy preservation for query log publishing. In Proceedings of the 1st SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD'07). Lecture Notes in Computer Science. vol. 4890. Springer. Google Scholar
Digital Library
- Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst. 10, 5, 557--570. Google Scholar
Digital Library
- Vedder, R. G., Vanecek, M. T., Guynes, C. S., and Cappel, J. J. 1999. CEO and CIO perspectives on competitive intelligence. Comm. ACM 42, 8, 108--116. Google Scholar
Digital Library
- Verykios, V., Bertino, E., Fovino, I., Provenza, L., Saygin, Y., and Theodoridis, Y. 2004. State-of-the-art in privacy preserving data mining. SIGMOD Record 33, 1, 50--57. Google Scholar
Digital Library
- Zack, M. H. 1999. Developing a knowledge strategy. California Management Review 41, 125--145.Google Scholar
Cross Ref
- Zanasi, A. 1998. Competitive intelligence through data mining public sources. Compet. Intell. Rev. 9, 1, 44--54.Google Scholar
Cross Ref
Index Terms
Privacy-preserving query log mining for business confidentiality protection
Recommendations
An effective value swapping method for privacy preserving data publishing
Privacy is an important concern in the society, and it has been a fundamental issue when to analyze and publish data involving human individual's sensitive information. Recently, the slicing method has been popularly used for privacy preservation in ...
Privacy-preserving process mining: A microaggregation-based approach
AbstractThe proper exploitation of vast amounts of event data by means of process mining techniques enables the discovery, monitoring and improvement of business processes, allowing organizations to develop more efficient business intelligence ...
Highlights- Research on privacy-preserving process mining is on the rise.
- Existing privacy-...
Task Independent Privacy Preserving Data Mining on Medical Dataset
ACT '09: Proceedings of the 2009 International Conference on Advances in Computing, Control, and Telecommunication TechnologiesIn this era of data digitization, data mining is essential for getting valuable information. However, privacy and security issues remain major barriers during this process. Since medical records are related to human subjects, privacy protection is taken ...






Comments