Speech Applications based on websites – a feasibility assessment

Abstract

The principle of basing speech applications on websites (“the principle”) is usually frowned upon by both IT and Speech experts. The process is often referred to as “screen scraping”, indicating a lack of understanding of the technological aspects. Also the opportunities that it offers are not always valued for their huge potential.

This presentation aims to discuss pros and cons of the principle, by putting the benefits of the value added applications in perspective with technological possibilities and constraints. The fundamental advantage of the principle is that in the current web centric world many benefits can be obtained from a standardized web-interface as the single source for all communication channels. This way of interfacing allows for speed and efficiency in the creation and life-cycle management of quickly evolving content and service concepts.

Through the implementation in some commercial websites the reading out of texts in websites in the web browser by using Speech Synthesis is for many people already a familiar phenomenon. Speech input offers at least the same potential. Some areas and solutions that could or already benefit from both are:

- Designing and prototyping speech applications for self service.
- Integrating multi-channel applications for computers and mobile devices.
- Powerful multi-channel solutions e.g. Employee Customer Satisfaction Feedback and ICT Helpdesk.
- Multi-modal use of computers and mobile devices e.g. for general handsfree or making them more accessible for people with special impairments.

The success of the deployed applications is strongly determined by the capabilities and constraints of speech technology. Some factors that cannot be resolved by the application developer and require a fundamental approach are:

- Dealing with incomplete or irrelevant information
- Dealing with “real” natural language and adoption of foreign words.
- Dealing with background noise, background voices and environmental acoustics (speech recognition only).