Abstract
In this article, we propose a novel multimedia sensor fusion approach based on heterogeneous sensors for biometric access control applications. The proposed fusion technique uses multiple acoustic and visual sensors for extracting dominant biometric cues, and combines them with nondominant cues. The performance evaluation of the proposed fusion protocol and a novel cascaded authentication approach using a 3D stereovision database shows a significant improvement in performance and robustness, with equal error rates of 42.9% (audio only), 32% (audio + 3D face + 2D lip features), 15% (audio + 3D face + 2D eye features), and 7.3% (audio-3D face + 2D lip + 2D eye-eyebrows) respectively.
Supplemental Material
Available for Download
Online appendix to multimedia sensor fusion for retrieving identity in biometric access control systems on article 26.
- Bowyer, K. W., Chang, K., and Flynn, P. 2006. A survey of approaches and challenges in 3D and multimodal 3D + 2D face recognition. Comput. Vis. Image Understand. 101, 1, 1--15. Google Scholar
Digital Library
- Brunnelli, R. and Fala Vigna, D. 1995. Person identification using multiple cues. IEEE Trans. Patt. Anal. Mach. Intel. 17, pp. 955--966. Google Scholar
Digital Library
- Callan, D., Jones J. A., Munhall, K. G., Kroos, C., Callan, A., and Vatikiotis-Bateson, E. 2003. Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport 14, 2213--2218.Google Scholar
Cross Ref
- Chelubishi, C. C., Deravi, F., and Mason, J. S. D. 2002. A review of speech-based bimodal recognition. IEEE Trans. Multimedia 4, 23--35. Google Scholar
Digital Library
- Chetty, G. and Wagner, M. 2004. Automated lip feature extraction for liveness verification in audio-video authentication. In Proceedings of Image and Vision Computing New Zealand Conferences, 17--22.Google Scholar
- Chetty, G. and Wagner, M. 2007. Audio-visual speaker identity verification using lip motion features. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH '07).Google Scholar
- Dasarathy, B. V. 1997. Sensor fusion potential exploitation-innovative architectures and illustrative applications. Proc. IEEE 85, 24--38.Google Scholar
Cross Ref
- Dutagaci, H., Sankur, B., and Yemez, Y. 2006. 3D face recognition by projection-based features. In Proceedings of the SPIE Conference on Electronic Imaging: Security, Steganography, and Watermarking of Multimedia.Google Scholar
- Goecke, R. and Millar, J. B. 2004. The audio-video Australian English speech data corpus AVOZES. In Proceedings of the 8th International Conference on Spoken Language Processing (INTERSPEECH '04). 2525--2528.Google Scholar
- Gokberk, B., Irfanoglu, M. O., and Akarun, L. 2006. 3D shape-based face representation and facial feature extraction for face recognition. Image Vision Comput. To appear.Google Scholar
- Halld, L. and Linas, J. 1997. An introduction to multisensor data fusion. Proc IEEE 85, 6--23.Google Scholar
Cross Ref
- Hani, C. Y., Kuratate, T., and Vatikiotis-Bateson, E. 2002. Linking facial animation, head motion, and speech acoustics. J. Phonetics 30, 3, 555--568.Google Scholar
Cross Ref
- Hyvarinen A. and Oja, E., 2000. Independent component analysis: Algorithms and applications. Neural Netw. 13, 4--5, 411--430. Google Scholar
Digital Library
- Kahraman, F. and Stegmann, M. B. 2006, Towards illumination-invariant localization of faces using active appearance models. Proceedings of the IEEE Nordic Signal Processing Symposium.Google Scholar
- Kroos, C., Kuratate, T., and Vatikiotis-Bateson, E. 2002 Video-based face motion measurement. J. Phonetics 30, 3, 569--590.Google Scholar
Cross Ref
- Ortega-Garcia J. 2003, MCYT baseline corpus: A bimodal biometric database. In IEE Proceedings on Vision, Image and Signal Processing.Google Scholar
Cross Ref
- Pigeon, S. and Vandendorpe, L. 1998. Image-based multimodal face authentication. Signal Process. 69, 59--79. Google Scholar
Digital Library
- Potamianos, G. G., Net, C., Gravier, G., Garg, A., and Senior, A. W., 2003, Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91, 1306--1324.Google Scholar
- Quatieri, T. F. 2002. Discrete Time Speech Signal Processing. Signal Processing Series. Prentice Hall. Google Scholar
Digital Library
- Santi, A., Servos, P., Vatikiotis-Bateson, E., Kuratate, T. and Munhall, K. 2003. Perceiving biological motion: Dissociating talking from walking. J. Cogn. Neurosci. 15, 800--809. Google Scholar
Digital Library
Index Terms
Multimedia sensor fusion for retrieving identity in biometric access control systems
Recommendations
Identity retrieval in biometric access control systems using multimedia fusion
ICONIP'10: Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part IIIn this paper, we propose a novel multimedia sensor fusion approach based on heterogeneous sensors for biometric access control applications. The proposed fusion technique uses multiple acoustic and visual sensors for extracting dominant biometric cues, ...
Multimodal biometric system using rank-level fusion approach
Special issue on cybernetics and cognitive informaticsIn many real-world applications, unimodal biometric systems often face significant limitations due to sensitivity to noise, intraclass variability, data quality, nonuniversality, and other factors. Attempting to improve the performance of individual ...
Fuzzy fusion in multimodal biometric systems
KES'07/WIRN'07: Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part IMultimodal authentication systems represent an emerging trend for information security. These systems could replace conventional mono-modal biometric methods using two or more features for robust biometric authentication tasks. They employ unique ...






Comments