Abstract
In the Internet era, the data being collected on consumers like us are growing exponentially, and attacks on our privacy are becoming a real threat. To better ensure our privacy, it is safer to let the data owner control the data to be uploaded to the network as opposed to taking chance with data servers or third parties. To this end, we propose compressive privacy, a privacy-preserving technique to enable the data creator to compress data via collaborative learning so that the compressed data uploaded onto the Internet will be useful only for the intended utility and not be easily diverted to malicious applications.
For data in a high-dimensional feature vector space, a common approach to data compression is dimension reduction or, equivalently, subspace projection. The most prominent tool is principal component analysis (PCA). For unsupervised learning, PCA can best recover the original data given a specific reduced dimensionality. However, for the supervised learning environment, it is more effective to adopt a supervised PCA, known as discriminant component analysis (DCA), to maximize the discriminant capability.
The DCA subspace analysis embraces two different subspaces. The signal-subspace components of DCA are associated with the discriminant distance/power (related to the classification effectiveness), whereas the noise subspace components of DCA are tightly coupled with recoverability and/or privacy protection. This article presents three DCA-related data compression methods useful for privacy-preserving applications:
—Utility-driven DCA: Because the rank of the signal subspace is limited by the number of classes, DCA can effectively support classification using a relatively small dimensionality (i.e., high compression).
—Desensitized PCA: By incorporating a signal-subspace ridge into DCA, it leads to a variant especially effective for extracting privacy-preserving components. In this case, the eigenvalues of the noise-space are made to become insensitive to the privacy labels and are ordered according to their corresponding component powers.
—Desensitized K-means/SOM: Since the revelation of the K-means or SOM cluster structure could leak sensitive information, it is safer to perform K-means or SOM clustering on a desensitized PCA subspace.
- Thee Chanyaswad, J. Morris Chang, and Sun-Yuan Kung. 2017. A compressive multi-kernel method for privacy-preserving machine learning. In Proceedings of the 2017 IEEE International Joint Conference on Neural Networks (IJCNN’17).Google Scholar
Cross Ref
- Thee Chanyaswad, J. Morris Chang, Prateek Mittal, and Sun-Yuan Kung. 2016. Discriminant-component eigenfaces for privacy-preserving face recognition. In Proceedings of the 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP’16). IEEE, Los Alamitos, CA, 1--6.Google Scholar
Cross Ref
- Richard O. Duda and Peter E. Hart. 1973. Pattern Recognition and Scene Analysis. Wiley.Google Scholar
- Ronald A. Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 2, 179--188.Google Scholar
Cross Ref
- Arthur E. Hoerl and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1, 55--67.Google Scholar
Digital Library
- Teuvo Kohonen. 1984. Self-Organization and Associative Memory. Springer-Verlag, New York, NY.Google Scholar
- Sun-Yuan Kung. 2014. Kernel Methods and Machine Learning. Cambridge University Press, Cambridge, England.Google Scholar
- Sun-Yuan Kung. 2015. Discriminant component analysis for privacy protection and visualization of big data. Multimedia Tools and Applications 76, 3, 3999--4034. Google Scholar
Digital Library
- Sun-Yuan Kung. 2017. Compressive privacy: From information\/estimation theory to machine learning {lecture notes}. IEEE Signal Processing Magazine 34, 1, 94--112.Google Scholar
Cross Ref
- Beresford N. Parlett. 1980. The Symmetric Eigenvalue Problem. Prentice-Hall Series in Computational Mathematics. Prentice Hall.Google Scholar
- Andrey Nikolayevich Tikhonov. 1943. On the stability of inverse problems. Comptes Rendus (Doklady) de l’Academie des Sciences de l’URSS 39, 195--198.Google Scholar
- Vladimir Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, NY. Google Scholar
Digital Library
Index Terms
Collaborative PCA/DCA Learning Methods for Compressive Privacy
Recommendations
Discriminant component analysis for privacy protection and visualization of big data
Big data has many divergent types of sources, from physical (sensor/IoT) to social and cyber (web) types, rendering it messy and, imprecise, and incomplete. Due to its quantitative (volume and velocity) and qualitative (variety) challenges, big data to ...
The connections between principal component analysis and dimensionality reduction methods of manifolds
ICIC'11: Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligenceIsometric feature mapping (ISOMAP), locally linear embedding (LLE) and Laplacian eigenmaps (LE) are recently proposed nonlinear dimensionality reduction methods of manifolds. When these methods are satisfied with some specific constraints, some hidden ...
Tensor Rank One Discriminant Analysis-A convergent method for discriminative multilinear subspace selection
This paper proposes Tensor Rank One Discriminant Analysis (TR1DA) in which general tensors are input for pattern classification. TR1DA is based on Differential Scatter Discriminant Criterion (DSDC) and Tensor Rank One Analysis (TR1A). DSDC is a ...






Comments