Little by little, we are learning how to do a better job in building natural language analyzers and generators. Our tool kit is slowly growing -- adding, in particular, in the last few years, better tools for learning language patterns from corpora. Of course, our tools are still quite primitive; when we look back at this time in later years we will be amazed at how much we didn't understand about natural language.But we are also learning to make better use of the tools we do have. We are coming to a better appreciation of how relatively simple tools -- morphological analyzers, name recognizers, phrase parsers, to name a few -- can be remarkably effective in particular applications. And from this appreciation has flowed a steadily increasing stream of natural language applications.It is this growing stream that we come here this week to nurture and reflect on. The Conferences on Applied Natural Language Processing are intended to highlight the ways in which natural language processing can be applied to real tasks. With the help of the program committee and other colleagues, we have made a particular effort this year to broaden the range of applications which are presented. Conferences above all are about exchanging ideas, and by stretching the range of the conference we hope to expose people to problems, to techniques, and to applications they might not have seen before. We have also provided an extensive program of demonstrations, ranging from early research prototypes to more mature commercial systems; there is nothing like a live demo to crystallize the problems and accomplishments in our field.In running an applied conference we are faced forever with the question of what is an "applied" paper. We have chosen to answer that question in an inclusive fashion, including several sessions which address basic technologies such as morphology, parsing, and sense disambiguation, which underlie many of our applications. As we build new applications, we are aware of how shortcomings in these basic technologies affect our design, so it is important to bring together people working on the technologies with those working on the applications.
CommandTalk: a spoken-language interface for battlefield simulations
CommandTalk is a spoken-language interface to battlefield simulations that allows the use of ordinary spoken English to create forces and control measures, assign missions to forces, modify missions during execution, and control simulation system ...
Natural language in four spatial interfaces
We describe our experiences building spoken language interfaces to four demonstration applications all involving 2- or 3-D spatial displays or gestural interactions: an air combat command and control simulation, an immersive VR tactical scenario viewer, ...
High performance segmentation of spontaneous speech using part of speech and trigger word information
We describe and experimentally evaluate an efficient method for automatically determining small clause boundaries in spontaneous speech. Our method applies an artificial neural network to information about part of speech and trigger words.We find that ...
A maximum entropy approach to identifying sentence boundaries
We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and ! as either a valid or invalid sentence boundary. The training ...
QuickSet: multimodal interaction for simulation set-up and control
- Philip R. Cohen,
- Michael Johnston,
- David McGee,
- Sharon Oviatt,
- Jay Pittman,
- Ira Smith,
- Liang Chen,
- Josh Clow
This paper presents a novel multimodal system applied to the setup and control of distributed interactive simulations. We have developed the QuickSet prototype, a pen/voice system running on a hand-held PC, communicating through a distributed agent ...
Natural language dialogue service for appointment scheduling agents
Appointment scheduling is a problem faced daily by many individuals and organizations. Cooperating agent systems have been developed to partially automate this task. In order to extend the circle of participants as far as possible we advocate the use of ...
Insights into the dialogue processing of VERBMOBIL
We present the dialogue module of the speech-to-speech translation system VERBMOBIL. We follow the approach that the solution to dialogue processing in a mediating scenario can not depend on a single constrained processing tool, but on a combination of ...
An evaluation of strategies for selective utterance verification for spoken natural language dialog
As with human-human interaction, spoken human-computer dialog will contain situations where there is miscommunication. In experimental trials consisting of eight different users, 141 problem-solving dialogs, and 2840 user utterances, the Circuit Fix-It ...
Name pronunciation in German text-to-speech synthesis
We describe the name analysis and pronunciation component in the German version of the Bell Labs multilingual text-to-speech system. We concentrate on street names because they encompass interesting aspects of geographical and personal names. The system ...
Applying repair processing in Chinese homophone disambiguation
Repair processing plays an important role in spoken language processing systems. This paper proposes a method for correcting Chinese repetition repairs and demonstrates the effects of repair processing in Chinese homophone disambiguation. The ...
A non-projective dependency parser
We describe a practical parser for unrestricted dependencies. The parser creates links between words and names the links according to their syntactic functions. We first describe the older Constraint Grammar parser where many of the ideas come from. ...
Incremental finite-state parsing
This paper describes a new finite-state shallow parser. It merges constructive and reductionist approaches within a highly modular architecture. Syntactic information is added at the sentence level in an incremental way, depending on the contextual ...
Developing a hybrid NP parser
We describe the use of energy function optimisation in very shallow syntactic parsing. The approach can use linguistic rules and corpus-based statistics, so the strengths of both linguistic and statistical approaches to NLP can be combined in a single ...
An annotation scheme for free word order languages
We describe an annotation scheme and a tool developed for creating linguistically annotated corpora for non-configurational languages. Since the requirements for such a formalism differ from those posited for configurational languages, several features ...
The domain dependence of parsing
A major concern in corpus based approaches is that the applicability of the acquired knowledge may be limited by some feature of the corpus, in particular, the notion of text 'domain'. In order to examine the domain dependence of parsing, in this paper, ...
Automatic acquisition of two-level morphological rules
We describe and experimentally evaluate a complete method for the automatic acquisition of two-level rules for morphological analyzers/generators. The input to the system is sets of source-target word pairs, where the target is an inflected form of the ...
Probabilistic and rule-based tagger of an inflective language: a comparison
We present results of probabilistic tagging of Czech texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective languages. After description of the tag system used, we show the results of four experiments ...
CSeg& Tag1.0: a practical word segmenter and POS tagger for Chinese texts
Chinese word segmentation and POS tagging are two key techniques in many applications in Chinese information processing. Great efforts have been paid to the research in the last decade, but unfortunately, no practical system with high performance for ...
The NLP role in animated conversation for CALL
Language learning is a relatively new application for natural language processing (NLP) and for intelligent tutoring and learning environments (ITLEs). NLP has a crucial role to play in foreign language ITLEs, whether they are designed for explicit or ...
Reading more into foreign languages
GLOSSER is designed to support reading and learning to read in a foreign language. There are four language pairs currently supported by GLOSSER: English Bulgarian, English-Estonian, English Hungarian and French-Dutch. The program is operational on UNIX ...
Large-scale acquisition of LCS-based lexicons for foreign language tutoring
We focus on the problem of building large repositories of lexical conceptual structure (LCS) representations for verbs in multiple languages. One of the man results of this work is the definition of a relation between broad semantic classes and LCS ...
A prototype of a grammar checker for Czech
This paper describes the implementation of a prototype of a grammar based grammar checker for Czech and the basic ideas behind this implementation. The demo is implemented as an independent program cooperating with Microsoft Word. The grammar checker ...
Techniques for accelerating a grammar-checker
The paper describes several possibilities of using finite-state automata as means for speeding up the performance of a grammar-and-parsing-based (as opposed to pattern-matching-based) grammar-checker able to detect errors from a predefined set. The ...
EasyEnglish: a tool for improving document quality
We describe the authoring tool, EasyEnglish, which is part of IBM's internal SGML editing environment, Information Development Workbench. EasyEnglish helps writers produce clearer and simpler English by pointing out ambiguity and complexity as well as ...
Contextual spelling correction using latent semantic analysis
Contextual spelling errors are defined as the use of an incorrect, though valid, word in a particular sentence or context. Traditional spelling checkers flag misspelled words, but they do not typically attempt to identify words that are used incorrectly ...
An automatic scoring system for advanced placement biology essays
This paper describes a prototype for automatically scoring College Board Advanced Placement (AP) Biology essays.1. The scoring technique used in this study was based on a previous method used to score sentence-length responses (Burstein, et al, 1996). ...
Dutch sublanguage semantic tagging combined with mark-up technology
In this paper, we want to show how the morphological component of an existing NLP-system for Dutch (Dutch Medical Language Processor - DMLP) has been extended in order to produce output that is compatible with the language independent modules of the LSP-...
A statistical profile of the Named Entity task
In this paper we present a statistical profile of the Named Entity task, a specific information extraction task for which corpora in several languages are available. Using the results of the statistical analysis, we propose an algorithm for lower bound ...
Nymble: a high-performance learning name-finder
This paper presents a statistical, learned approach to finding names and other nonrecursive entities in text (as per the MUC-6 definition of the NE task), using a variant of the standard hidden Markov model. We present our justification for the problem ...
Disambiguation of proper names in text
Identifying the occurrences of proper names in text and the entities they refer to can be a difficult task because of the many-to-many mapping between names and their referents. We analyze the types of ambiguity --- structural and semantic --- that make ...


