350 rub
Journal Information-measuring and Control Systems №4 for 2009 г.
Article in number:
Research of multimodal human-computer interaction by an information enquiry kiosk
Authors:
A. L. Ronzhin, A. A. Karpov
Abstract:
The architecture and laboratory prototype of the automatic information enquiry system MIDAS (Multimodal Interactive-Dialogue Automaton for Self-Service) with a multimodal user interface are presented in the paper, as well as researches and analysis of interaction between users and developed device are described. For distant speech recognition of Russian commands, a microphone array is applied in the multimodal kiosk. It allows us to localize a source of useful speech signal and to eliminate influence of external acoustical noises to the accuracy of speech recognition. Miniature video-cameras and corresponding processing methods for optical flow provide both detection and tracking of position of a user inside of the kiosk-s working zone. Information system proposes multimodal user interface for getting information about employees and departments of SPIIRAS as well as map-based information on the streets of St. Petersburg. The main hard-and-software modules of the kiosk are: (1) video processing using a technology of computer vision in order to detect position of human-s body, face and some facial organs; (2) speaker-independent system of automatic recognition of continuous Russian speech that uses a microphone array to eliminate acoustical noises and to localize a source of useful voice signal at distant speech recording; (3) a module for audio-visual Russian speech synthesis to be applied for realization of a virtual character - avatar; (4) an interactive graphical user interface based on a touchscreen; (5) a model of dialogue and a dialogue manager that include a database of an applied domain and a system for dialogue strategy control. Fusion of user-friendly computer interfaces with speech technologies and usage of virtual talking characters allow us to create effective and natural interfaces, where comfort of a human being plays a main role. By means of the multimodal information kiosk the questions of natural and ergonomic communications between a user and a machine are studied taking into consideration diverse ways of communication. Collected experimental data were analyzed in order to investigate cognitive and behavioral characteristics of a user and to optimize the multimodal interfaces proposed to clients. The experiments on quantitative evaluation of users - operation speed with diverse contact-based and hands-free means of information input were made, attracting potential users and using methodology proposed by the ISO 9241-9 standard based on Fitt-s laws. Moreover the estimation of accuracy of spatial localization of a sound source by the developed microphone array and several methods for spatial-spectral analysis of speech signal was made. Organized cognitive experiments on communication between users and the multimodal automaton have shown willingness of the most of potential users to natural multimodal or speech interaction with an automatic system and their refusal to operate by the common communication way, but unnatural for inter-human conversations, based on a touchscreen and mechanical manipulators
Pages: 22
References
  1. Johnston M., Bangalore S. MATCHkiosk: A Multimodal Interactive City Guide // In Proc. of Association of Computational Linguistics (ACL-2004). Barcelona. Spain. 2004. Рp. 223-226.
  2. McCauley L., D-Mello S. MIKI: a speech enabled intelligent kiosk // Intelligent virtual agents. LNCS, Springer. 2006. Vol. 4133. Рp. 132-144.
  3. Lienhart R., MaydtJ. An Extended Set of Haar-like Features for Rapid Object Detection // In Proc. of IEEE International Conference on Image Processing ICIP. 2002. Рp. 900-903.
  4. Bouguet J.-Y. et al. Pyramidal implementation of the Lucas-Kanade feature tracker // Technical Report, Intel Corporation, Microprocessor Research Labs. 2000.
  5. Brandstein M., Ward D. (Eds.)Microphone Arrays. Springer Verlag. 2000.
  6. Ронжин А.Л., Карпов А.А. Сравнение методов пассивной локализации источника речи в рабочем секторе многомодального киоска // Известия вузов. Приборостроение. 2008. Т. 51. № 11. С. 41-47.
  7. Soukoreff R.W., MacKenzie I.S. Towards a standard for pointing device evaluation, perspectives on 27 years of Fitts' law research in HCI // International journal of human-computer studies. 2004. Vol. 61. Issue 6. Рp. 751-789.
  8. Леонтьева Ал., Йекель Р., Кагиров И., РонжинА. Анализ речи и поведения студентов при компьютерном тестировании знаний в многомодальном режиме. Труды второго междисциплинарного семинара «Анализ разговорной русской речи» (АР3 - 2008). Санкт-Петербург. 2008. С. 94-101.
  9. Ronzhin A.L., Karpov A.A. Russian Voice Interface // Pattern Recognition and Image Analysis. MAIK. 2007. Vol. 17. No. 2. Рp. 321-336.
  10. Dahlback N., Jonsson A.,  Ahrenberg L. Wizard of Oz Studies - Why and How // Knowledge Based Systems. 1993. Vol. 6. Num. 4. Рp. 258-266.