skip to main content
10.1145/355214acmconferencesBook PagePublication PagesiralConference Proceedingsconference-collections
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages
ACM2000 Proceeding
  • Chairmen:
  • Kam-Fai Wong,
  • Dik L. Lee,
  • Jong-Hyeok Lee
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
IRAL00: 5th International Workshop on Information Retrieval with Asia Languages Hong Kong China 30 September 2000- 1 October 2000
ISBN:
978-1-58113-300-4
Published:
01 November 2000
Sponsors:
SIGWEB, ACM Hong Kong Chapter, SIGIR, SIGLINK

Bibliometrics
Abstract

No abstract available.

Article
Free
Improving automatic Chinese text categorization by error correction

In this paper we use the miss-classified news in training data as a feedback to improve the classification accuracy. We isolate the miss-classified news from the news of original classes to form new subclasses, and modify Rocchio linear classifier by ...

Article
Free
Web page classification based on k-nearest neighbor approach

Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web. In this paper, we propose a Web page classifier based on an adaptation of k-Nearest Neighbor (k-NN) approach. To improve the performance of k-NN ...

Article
Free
Combining multiple sources for short query translation in Chinese-English cross-language information retrieval

In this paper, we examine various factors that affect the retrieval performance of Chinese-English cross-language retrieval. The factors include segmentation dictionary coverage, segmentation algorithm, transfer dictionary coverage, transfer dictionary ...

Article
Free
Query term disambiguation for Web cross-language information retrieval using a search engine

With the worldwide growth of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing CLIR approaches based on query translation require parallel corpora or comparable corpora for the disambiguation of ...

Article
Free
Explorative multilingual text retrieval based on fuzzy multilingual keyword classification

This paper proposes an explorative approach to multilingual text retrieval (MLTR) based on fuzzy multilingual keyword classification. The approach applies fuzzy clustering to obtain a classification of multilingual keywords by concepts. A multilingual ...

Article
Free
MIETTA — a framework for uniform and multilingual access to structured database and Web information

We describe a WWW-based information system called MIETTA, which allows uniform and multilingual access to heterogenous data sources in the tourism domain. The design of the search engine is based on a new crosslingual framework. The framework integrates ...

Article
Free
Hybrid term indexing for different IR models

Retrieval effectiveness depends on how terms are extracted and indexed. For Chinese text (and others like Japanese and Korean), there are no space to delimit words. Indexing using hybrid terms (i.e. words and bigrams) were able to achieve the best ...

Article
Free
PM-based indexing for Chinese text retrieval

This paper focused on introducing a novel PM indexing schema for Chinese text retrieval. Different with the Western languages, there is no delimiter between words in Chinese texts. The indexing is based either on the characters or on the segmented ...

Article
Free
An efficient accessing technique of Chinese characters using Boshiamy Chinese input system

In this paper, a new efficient technique for Chinese character retrieval is proposed. This technique designs a minimal perfect hashing function based on the Chinese remainder theorem for a simply and widely used Chinese input system called Boshiamy ...

Article
Free
Improvement of vector space information retrieval model based on supervised learning

This paper proposes and method to improve retrieval performance of the vector space model (VSM) by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback ...

Article
Free
Character cluster based Thai information retrieval

Some languages including Thai, Japanese and Chinese do not have explicit word boundary. This causes the problem of word boundary ambiguity that results in decreasing the accuracy of information retrieval. This paper proposes a new technique so-called ...

Article
Free
Japanese probabilistic information retrieval using location and category information

Robertson's 2-poisson information retrieve model does not use location and category information. We constructed a framework using location and category information in a 2-poisson model. We submitted two systems based on this framework to the IREX ...

Article
Free
Query expansion using phonetic confusions for Chinese spoken document retrieval

This paper presents a method of query expansion based on phonetic confusions for retrieving spoken documents using text queries. This method is applied to a Chinese spoken document retrieval task. A series of experiments have been carried out for ...

Article
Free
A first step towards flexible local feedback for ad hoc retrieval

Local feedback for ad hoc retrieval typically hurts performance for about one-third of the search requests while improving the average performance. Our objective is to make it more reliable by estimating the optimal number of assumed-relevant documents ...

Article
Free
Information extraction for Thai documents

An increasing amount of electronically available information is stored in Asian language documents, which makes Information Retrieval (IR) and Information Extraction (IE) for these languages important for a large number of users. Analysis and extraction ...

Article
Free
Korean text summarization using an aggregate similarity

In this paper, each document is represented by a weighted graph called a text relationship map. In the graph, each node represents a vector of nouns in a sentence, an undirected link connects two nodes if two sentences are semantically related, and a ...

Article
Free
Research on a faster algorithm for pattern matching

Based on deep analysis of Boyer-Moore algorithm and Quick Search algorithm, we propose a faster algorithm for single pattern matching by utilizing the continuous skip over the text, this idea enables its high performance because of the large shift on ...

Article
Free
Dynamic programming: a method for taking advantage of technical terminology in Japanese documents

We introduce a new similarity measure based on dynamic programming, intended for technical terms such as machine translation system, which are quite common in technical writing. We compare our proposal with systems which use standard IDF cosine ...

Article
Free
Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval

In Korean text, recently, the use of English words with or without phonetic translation is growing at high speed. To make matters worse the Korean transliterations of an English word may be very various. The mixed use of English words and their various ...

Article
Free
On the use of words and n-grams for Chinese information retrieval

In the processing of Chinese documents and queries in information retrieval (IR), one has to identify the units that are used as indexes. Words and n-grams have been used as indexes in several previous studies, which showed that both kinds of indexes ...

Article
Free
Content-based language models for spoken document retrieval

Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multimedia collections in the near future. This paper presents a novel concept of applying the content-based language models to ...

Article
Free
Structural analysis of cooking preparation steps in Japanese

We propose a method to create process flow graphs automatically from textbooks for cooking programs. This is realized by understanding context by narrowing down the domain to cooking, and making use of domain specific constraints and knowledge. Since it ...

Article
Free
Topic detection and tracking in English and Chinese

Topic Detection and Tracking (TDT) refers to automatic techniques for discovering, threading, and retrieving topically related material in streams of data. Newswire and broadcast news are the canonical sources. In 1999, TDT research was extended from ...

Article
Free
Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval

We investigated using the LDC English/Chinese bilingual wordlists for English-Chinese cross language retrieval. It is shown that the Chinese-to-English wordlist can be considered as both a phrase and word dictionary, and is preferable to the English-to-...

Article
Free
MT-based Japanese-Enlish cross-language IR experiments using the TREC test collections

This paper evaluates the effectiveness of MT-based Japanese-English CLIR using a subcollection of the TREC test collections and two bilingual researchers to separately translate the TREC requests into Japanese. Our main findings are as follows: (1)With ...

Article
Free
Construction of a Chinese-English WordNet and its application to CLIR

This paper integrates five linguistic resources, including Cilin, a Chinese-English dictionary, ASBC corpus, SemCor, and WordNet, to construct a Chinese-English WordNet. The result is employed in Chinese-English information retrieval. Under TREC-6 text ...

Article
Free
Comparison of word-based and syllable-based retrieval for Tibetan (poster session)

Tibetan retrieval based on automatically segmented words is compared with the use of overlapping syllable n-grams using a known-item retrieval evaluation. The optimal span of fixed-length n-grams is found to be 2 syllables, and indexing words is found ...

Article
Free
Effect of dependency relationships and ordered co-occurrence of words on Japanese information retrival (poster session)

We propose two Japanese information retrieval methods that enhance retrieval effectiveness using relationships between words. One is a method using dependency relationships between words in a sentence, and another is a method using the ordered co-...

Article
Free
Automatic text summarization based on relevance feedback with query splitting (poster session)

This paper describes a method of text summarization using a query expansion technique. Generally, summarization systems using query expansion have the problem that feedback query gets biased during a query expansion process. We can alleviate this ...

Article
Free
Automatic recommendation of hot topics in discussion-type newsgroups (poster session)

We are developing an intelligent network news reader intended to help people use discussion-type network news more effectively. It is called HISHO and will assist users to find whole threads in Japanese discussions that are relevant to the users' ...

Contributors
  • Chinese University of Hong Kong
  • Hong Kong University of Science and Technology
  • Pohang University of Science and Technology

Recommendations