skip to main content
note
Open Access

Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language

Published:17 June 2023Publication History

Skip Abstract Section

Abstract

Named entity recognition in the Indonesian language has significantly developed in recent years. However, it still lacks standardized publicly available corpora; a small dataset is available but suffers from inconsistent annotations. Therefore, we re-annotated the dataset to improve its consistency and benefit the community. Our re-annotation led to better training results from an effective baseline model consisting of bidirectional long short-term memory and conditional random fields. To fully utilize the limited available data, we utilized better contextualization and transferred external knowledge by exploiting monolingual and multilingual pre-trained language models, such as IndoBERT and XLM-RoBERTa. In addition to the general improvement from the language models, we observed that the monolingual model is more sensitive, while the multilingual ones show advantages in rich morphological knowledge. We also applied cross-lingual transfer learning to utilize high-resource corpora in other languages. We adopted English, Spanish, Dutch, and German as the source languages for the target Indonesian language and found that Dutch plays a special role in the data transfer method due to morphological similarity attributable to historical reasons.

Skip 1INTRODUCTION Section

1 INTRODUCTION

The Indonesian language is a low-resource language for natural language processing (NLP) due to the lack of extensive public corpora available, especially for the named entity recognition (NER) task. However, neural models used in high-resource NER tasks have a fair generalization and have achieved impressive accuracy [56]. Low-resource NER tasks remain unsatisfying [22, 24, 51], limiting the new studies proposed for this language. The problem of the deficiency of data remains twofold. (1) The lack of quality and availability of the corpora, such as consistency of the annotation. Some datasets were created from DBpedia and Wikipedia text, yet they are automatically generated [4, 16], and the others are not publicly available. (2) The datasets are limited in size. For example, the conversational text datasets from chatbots [27] and Twitter [52] comprise fewer than a thousand sentences.

An open dataset with human annotation for Indonesian NER published by Syaifudin and Nurwidyantoro (hereafter referred to as S&N (2016)) [43] in the news domain is available with approximately 2,000 sentences. However, there are inconsistencies in the dataset. We found that the organization entity is more ambiguous than location and person entities. One example of the inconsistency is that some tokens were mistagged as an organization; the term “DPP” (“party’s representative council” in English) is frequently classified as an organization, although it is not referred to as an organization. Our first goal is to offer a standardized and low-noise dataset that facilitates meaningful training and testing in the Indonesian language by enhancing the existing one. As is known, training on noisy data results in a poor model and leads to mispredictions. Therefore, we contribute to improving reliable public Indonesian NER datasets by improving the annotation quality.

Lample et al. [29] has shown that bidirectional long short-term memory (BiLSTM) with a conditional random field (CRF) as the decoder is a simple yet effective method to handle a sequence labeling problem such as NER [3]. Prior Indonesian NER works adopted fastText [52], convolutional neural networks (CNNs) [16], or part-of-speech (POS) tags [17] as the feature representation and benefit from BiLSTM. These methods suffer from context disambiguation of the entities in a sentence. In contrast, we applied various pre-trained language models (PLMs), both monolingual and multilingual, as the input representation of BiLSTM-CRF to exploit the contextual knowledge of the dataset to tackle this problem. We investigate on how those PLMs benefit the Indonesian NER, because the monolingual PLMs have more knowledge in the language-specific features, while multilingual PLMs learn a shared knowledge from the richer languages [12]. We include the transformer-based language model known as bidirectional encoder representation from transformers (BERT), which obtains the word representation contextually based on the sentence. We compare three models for the experiment: two models for the multilingual transformer-based models (multilingual BERT (mBERT) [12] and XLM-RoBERTa (XLM-R) [11]) and two monolingual BERT for the Indonesian language (IndoBERT [51] and IndoLEM [24]).

To further reduce the dependency on a gold standard annotated dataset, we demonstrate unsupervised cross-lingual transfer learning to leverage the knowledge from the high- to the low-resource languages. We examine single-source transfer in English and multi-source cross-lingual transfer in English, Spanish, Dutch, and German, to the target language, Indonesian. All cross-lingual transfer approaches show competitive results for the NER task in the Indonesian language without relying on any Indonesian supervision.

Our contributions in this study are as follows:

(1)

We re-annotate a human-annotated Indonesian NER dataset to improve its consistency and make the dataset publicly available1 [22]. Our analysis illustrates that the re-annotation improves NER performance.

(2)

Through an empirical experiment, we demonstrate that topping frozen PLMs with a sequence-labeling architecture is beneficial for training in a low-resource scenario owing to its rich contextual representation.

(3)

We perform cross-lingual transfer learning from several high-resource languages, such as English, Spanish, Dutch, and German, to the Indonesian language and present competitive results compared with the supervised method.

This article is organized into the following sections. Section 2 presents some works related to our study. Section 3 details our re-annotation process and the dataset we used for the experiments. Section 4 introduces the overall framework of our NER models and the experiment settings. Section 5 explains the results of each experiment, and Section 6 discusses and analyzes corresponding error examples from each result. Finally, Section 7 concludes this work and mentions its future opportunities. For Sections 36, the contents are divided into two items, supervised monolingual and unsupervised cross-lingual transfer NER. The former focuses on the impact of the dataset re-annotation, while the latter on cross-lingual transfer without Indonesian data during the training stage.

Skip 2RELATED WORK Section

2 RELATED WORK

We divide the prior studies into three parts: a brief introduction of the NER dataset in the Indonesian language, supervised approaches for monolingual NER, and unsupervised cross-lingual transfer learning approaches.

2.1 NER Dataset in the Indonesian Language

Building a sizeable human-annotated dataset is costly and time-consuming. Some public NER datasets in the Indonesian language are available but have limited size or inconsistent annotation. The large one is automatically labeled with roughly 20K sentences [4], yet the human-annotated ones contain 2K sentences each [24, 51]. One of the existing manually labeled datasets (S&N 2016) contains inconsistent annotation and has an error rate of approximately 30% [24].

An early work on the same dataset was a quotation identification task [43]. The dataset was constructed from three Indonesian online news sites, namely Kompas,2 Tempo,3 and TribunNews.4 The dataset mainly covers topics in politics, society, and economics. In this task, they used an NER system as a preprocessing step for a quotation identification task, so they built a human-annotated NER dataset to be able to build the NER model. In our study, we focus on the NER task and re-annotate the dataset to avoid a low-performance model that learns from noisy data.

2.2 Monolingual NER for the Indonesian Language

NER methods for significant languages such as English have evolved from early models to modern ones. They offer sophisticated results by adopting entity-aware transformer-based models and stack several contextual embeddings as the input representation of an LSTM model [50, 56]. Fine-tuning a BERT-based model has also demonstrated significant results for various downstream tasks, including NER [11]. However, their performance for the NER task was less effective than that of BiLSTM-CRF over a sequence of contextual word embeddings [12]. BiLSTM-CRF is an older, yet simple and robust, model proposed by Lample et al. [29] that benefits the sequence labeling structure of an NER task owing to its forward and backward network, which has access to the past and future input features. The superior performance of English NER is highly adaptable to low-resource languages, such as the Indonesian language, without ignoring its unique characteristics and the limited available resources.

Early Indonesian NER models adopted a rule-based approach with supervision from a contextual dictionary, morphological features, and POS to perform the NER task. Budi and Bressan examined an association rule mining approach with a thorough explanation of the characteristics of the Indonesian language for NER and co-reference resolution tasks [8]. Statistical machine learning approaches were also investigated, such as support vector machines [28] and CRF with a gazetteer and POS information [34]. To increase the training data size, Leonandya et al. [31] examined a semi-supervised learning model to automatically tag the unlabeled data.

Recent Indonesian NER has adopted a neural network approach with fewer hand-crafted features. BiLSTM [18, 29] has been widely used with various input representation methods. Gunawan et al. [16] applied CNNs for word n-gram representation, and Hoesen and Purwarianti [17] pre-trained word embeddings with POS tags [17]. In exploring the out-of-vocabulary (OOV) problem in the conversational text, Kurniawan and Louvan [27] employed BiLSTM-CRF without including a pre-trained word representation, and Wintaka et al. [52] used an Indonesian fastText pre-trained model as the feature representation. In a work similar to ours, Leonandya and Ikhwantri [32] investigated the impact of language model pre-training on the NER task. However, their conversational text data is private, and thus their study is not replicable. The latest Indonesian NER works exploiting PLMs include IndoBERT [51] and IndoLEM [24]; both are Indonesian PLMs evaluated over a set of NLP tasks and datasets in the Indonesian language, including NER.

2.3 Cross-lingual Transfer Learning in NER

Cross-lingual transfer learning exploits the knowledge available in high-resource languages to alleviate the lack of available datasets in low-resource NLP tasks [11]. Various studies have been conducted in many NLP tasks, such as neural machine translation (NMT) [21], grammar error correction [57], and POS tagging [13]. Generally, a cross-lingual transfer can be performed in two ways: (1) data transfer [20, 42, 55] and (2) model transfer-based methods [53, 54]. However, prior cross-lingual transfer efforts for Indonesian NER have mainly explored the latter.

The idea of data transfer is to translate a source dataset in a high-resource language into a low-resource target language or vice versa. The translated pseudo-data are fed into a task-specific method, so it highly depends on the translation and the alignment quality. Xie et al. [55] translated the source data by aligning vector spaces of the source’s and the target’s pre-trained model to produce a word-to-word translation. This method simplifies the label projection step owing to the absence of word order changes, yet it often mismatches the grammatical structure of the target language. Off-the-shelf machine translation systems are also helpful for data transfer by performing entity matching based on orthographic and phonetic similarity between the two languages [20], or using the back-translation method, in which the target language is translated to the source language and inference is performed using the state-of-the-art model in the source language [42]. However, taking advantage of machine translation systems requires well-developed word-alignment tools to project the labels due to word order differences between languages.

Instead of translation and alignment, model transfer relies on language-independent features such as multilingual word representation. A straightforward method in model transfer is direct transfer, where a model trained in the source language directly offers predictions in the target language [36, 54]. Wu et al. [53] performed an indirect transfer using teacher–student learning, where a model in the source language is used as a teacher model to train a student model in the target language. In addition to single-source transfer, previous studies showed promising results in conducting cross-lingual transfer from multi-source languages without any parallel corpora [33, 48]. Rahimi et al. [39] examined few-shot learning for NER and thoroughly investigated mistakes and language-specific transfer errors in 41 languages.

Few cross-lingual transfer studies have been explored for the Indonesian language. Prior studies primarily investigated parsing or POS tagging tasks for which there are already large corpora for the Indonesian dataset, such as Universal Dependency Treebank5 [1, 26]. Ikhwantri [19] adopted cross-character embedding between the English and Indonesian languages and fine-tuned an English PLM to the NER task in the Indonesian language. An interesting result from Rahimi et al. [39] is their finding that Italian gives the best transfer to Indonesian in a direct model transfer method, compared to English as the most common single source language, and Malay as the most similar language. In our study, we compare several data transfer-based and model transfer-based models and analyze how the scenarios impact our low-resource settings of Indonesian NER.

Skip 3DATASETS AND RE-ANNOTATION Section

3 DATASETS AND RE-ANNOTATION

This section describes the dataset inconsistencies and re-annotation of the existing available Indonesian NER datasets, as well as NER datasets in four different languages as the source transfer for unsupervised cross-lingual transfer learning. We follow the inside–outside–beginning (IOB) tagging by Tjong Kim Sang et al. [45] as a standard NER dataset format.

3.1 Monolingual Indonesian NER

Resources for Indonesian NER have grown significantly, although they remain limited in size. We present the existing NER datasets that are used in our extended experiments. All datasets are in IOB format, which is the format of our dataset. The details of the datasets are as follows.

  • NERP [51]: An NER dataset that consists of Indonesian news collected from several websites, having 8,400 sentences in total, split into 6,720, 840, and 840 of sentences for training, development, and test sets, respectively. It has five entities, namely, location, person, product or brand name, event name, and food and beverage.

  • NERGrit [51]: A dataset from the Grit-ID repository with 2,090 sentences, split into 1,672, 209, and 209 for training, development, and test sets, respectively. It has three entities: location, organization, and person.

  • NERUI [24]: This dataset consists of 2,125 sentences annotated by students from an NLP class at the University of Indonesia in 2016. It covers three entities: location, organization, and person.

  • NERUGM [24]: An NER dataset from the University of Gajah Mada, which was extracted from online news articles. It has 2,343 sentences and includes five entities: location, organization, person, time, and quantity. This is the dataset from S&N (2016), with a different split due to the absence of a development set in the original dataset.

3.1.1 Inconsistency of the Existing Datasets.

The dataset by S&N (2016)6,7 leads to low accuracy, with an error rate of approximately 30% [24]. It is because of the annotation inconsistency in the dataset, such as entities that are mistagged, or vice versa. The errors mainly occur for the organization (ORG) and the non-entities (O), as summarized in Table 2(b). We show the annotation’s inconsistent examples of person and organization entities in Table 1.

Table 1.
Sentence 1President Joko Widodo met the chairman of Gerindra Prabowo Subianto at Istana Bogor
IndonesianPresidenJokoWidodobertemuKetuaUmumGerindraPrabowoSubiantodiIstanaBogor
English translationPresidentJokoWidodometchairmangeneralGerindraPrabowoSubiantoatpalaceBogor
S&N (2016)OB-PERI-PEROB-PERI-PERI-PERI-PERI-PEROB-LOCI-LOC
OursOB-PERI-PEROOOB-ORGB-PERI-PEROB-LOCI-LOC
Sentence 2Politician PDI Perjuangan Guruh Soekarnoputra visited the chairman of Gerindra Party Suhardi
IndonesianPolitikusPDIPerjuanganGuruhSoekarnoputramenjengukKetuaUmumPartaiGerindraSuhardi
English translationPoliticianPDIPerjuanganGuruhSoekarnoputravisitedchairmangeneralPartyGerindraSuhardi
S&N (2016)OB-ORGI-ORGB-PERI-PEROOOOOB-PER
OursOB-ORGI-ORGB-PERI-PEROOOB-ORGI-ORGB-PER
Sentence 3Vice chairman of Gerindra Party Edy Prabowo stated that his party would not be hurt
IndonesianWakilKetuaUmumPartaiGerindraEdyPrabowomenyatakanpartainyatakakansakit hati
English translationVicechairmangeneralPartyGerindraEdyPrabowostatedhis partynotwouldhurt
S&N (2016)OOOB-ORGI-ORGB-PERI-PEROOOOO
OursOOOB-ORGI-ORGB-PERI-PEROOOOO
  • The red tokens indicate the difference after re-tagging. The blue tokens represent consistent annotation between S&N (2016) and ours. Tag prefix “B” indicates the entity’s first word, whereas “I” indicates the remaining parts of the entity.

Table 1. Examples of Tags Before (S&N (2016)) and After (Ours) Re-tagging

  • The red tokens indicate the difference after re-tagging. The blue tokens represent consistent annotation between S&N (2016) and ours. Tag prefix “B” indicates the entity’s first word, whereas “I” indicates the remaining parts of the entity.

The three sentences in Table 1 include the same pattern, [title][organization][person], where only [organization] and [person] tokens should be tagged as entities. However, in the first sentence, all the tokens “Ketua Umum Gerindra Prabowo Subianto” are mistagged as a person’s name. These tokens are from three different entities. In the second sentence, red tokens indicate inconsistent annotation, where the token “Suhardi” (a person’s name) is the only token that is labeled as an entity by S&N (2016), without labeling “Partai Gerindra” as an organization. However, the blue tokens present the correct annotation. In the second sentence, the token “Politikus” is not labeled as an entity, but “PDI Perjuangan” and “Guruh Soekarnoputra” are labeled as organization and person, respectively. Thus, the third sentence shows the correct tag sequence as “Wakil Ketua Umum” [title], “Partai Gerindra” [organization], and “Edy Prabowo” [person]. These examples illustrate distinct annotations in the same pattern. We checked through the annotations and noticed that this mistagged type occurs several times.

3.1.2 Dataset Re-annotation.

To fix the inconsistency in the existing dataset, we re-annotated the dataset and provided a more standardized NER dataset for the Indonesian language that is publicly available. The re-annotation process was manually done by three native speakers. Despite the five entities included in S&N (2016), we took three common entities in NER: location, organization, and person. We excluded the time and quantity entities, because they are often written numerically, and a robust NER model will readily distinguish them. Therefore, we focused on ambiguous entities in the Indonesian language, which a model might not correctly recognize, and only compared the three entities in both datasets to make the results reasonably comparable. Regarding dataset split, we used the same test set as in S&N (2016), and randomly sampled some instances from the training set to form a development set, as presented in Table 2(a).

Table 2.

Table 2. Data Statistics and Confusion Matrix of Our Re-annotation from S&N (2016)

3.1.3 Our Annotation Guideline.

To help the annotators agree on each entity, we set the following guidelines and describe the difference in the ambiguous entities, such as organization based on the definition of the Indonesian language’s characteristics by Budi and Bressan [8] in their rule-based NER study.

  • Location (LOC): This indicates the name of a location where activities or events occur semantically. A location preposition usually comes before such entities, namely “di” (at), “ke” (to), or “dari” (from). Specific location names, such as a country or city name (e.g., Indonesia in “Indonesia is one of the largest countries”), when not used contextually as a location would not be annotated as a location. Conversely, an organization name (e.g., university or office) is sometimes used as a location name when the sentence refers to its building or location. In this case, we annotate the entity as a location name.

  • ORG: This indicates an organization’s name. The name of the organization is usually an official institution that is legally registered. For instance, “Universitas Indonesia” (a university), “Komisi Nasional Hak Asasi Manusia” (a national institution), and “Perserikatan Bangsa-Bangsa” (an intergovernmental organization).

  • Person (PER): This identifies a person’s name. Any form of the person’s name—full, nickname, or abbreviation—is annotated as one name. For example, “Abu Rizal Bakrie” is the full name of a person, who may also be mentioned as “Ical” (nickname) or “ARB” (abbreviation). A person’s title, such as “Pak” (Mr.) in “Pak Rizal” (Mr. Rizal), is not included in the person’s name; it is annotated as “Pak [Rizal]B-PER,” not as “[Pak]B-PER [Rizal]I-PER.”

  • An organization or person name that is sometimes written in full may, at other times, be written in its abbreviated form. When both forms appear, the annotation will be separated into two entities. For example, the sentence, “Universitas Gadjah Mada (UGM) berlokasi di Yogyakarta.” (Gadjah Mada University (UGM) is located in Yogyakarta) is annotated as follows:

    [Universitas]B-ORG [Gadjah]I-ORG [Mada]I-ORG ([UGM]B-ORG) berlokasi di [Yogyakarta]B-LOC

Regarding ambiguous entities, we also consider the tags following the word’s semantic meaning. For example, the organization name tends to be confused with the location name because of the preposition or the dual meaning of the name as an organization or the organization’s office where activity happens. We used Fleiss’ kappa [14] to calculate the inter-annotator agreement of the three annotators, and obtained a score of 0.92, which shows good reliability [5].

We present the label changes of our annotation in Table 2(b). The number of location entities was reduces by approximately 20%, and almost 500 tokens of non-entities decreased by about 500. In contrast, the organization and person entities increased after the re-annotation. The table shows that S&N (2016) does not correctly label most of the organization entities, where 701 organization tokens were labeled as non-entities.

3.2 Cross-lingual Transfer Dataset Sources

We investigate two approaches in conducting the unsupervised cross-lingual transfer learning for the Indonesian NER with the methods further explained in Section 4.2.

The first approach is single-source transfer, with English as the source language. We use the CoNLL-2003 Dataset [47] as the source transfer NER dataset. In terms of the data transfer method using (c) parallel corpora of Section 4.2, we did not use an NER dataset from the source language. Instead, we labeled the source side of parallel English–Indonesian (EN-ID)corpora, as shown in Table 3. It consists of approximately 90K pairs of parallel sentences. Due to the absence of entities in some sentences that make the pseudo-data noisier, we removed the sentences with no entity and obtained approximately 45K sentences for the pseudo-training data. Regarding the NMT model we built to translate the English NER dataset to the Indonesian language, we trained the model using the IWSLT 2016 TED Talks dataset [10]. To compare if using a larger noisy dataset from Wikipedia would improve the performance of the model transfer method, we extracted Indonesian Wikipedia, took approximately 30K sentences randomly from the dump, and used this unlabeled dataset as additional training data.

Table 3.
Parallel Corpora# Words# SentencesSource
PANL BPPT [9]1,041,75624,024PAN Localization Project by BPPT
Asian Language Treebank (ALT) [40]879,21720,106ALT Project - NICT
Global Voices v2018 [44]504,84216,042OPUS GlobalVoices
SMERU1,102,43426,966EN-ID Bilingual Corpus8
AusAid129,5793,112
BBC15,783468

Table 3. The Parallel Corpora of English–Indonesian (EN-ID) Statistics for the Parallel Corpora Scenario

The second approach is multi-source transfer, to extend the scope of the source languages in examining cross-lingual transfer. Table 4 summarizes the data statistics of the source languages’ NER datasets. We chose the languages with the most available NER datasets after English, so we adopted English (EN), Spanish (ES), Dutch (NL), and German (DE) as the source languages. The English dataset is the CoNLL-2003 [47], the Spanish and Dutch NER datasets are from CoNLL-2002 benchmark [46], and GermEval 2014 [6] is used as the German NER dataset. All datasets have more than the three standard entity types (location, organization, and person). Therefore, we omitted entity types other than these three to perform comparable transfer learning to our Indonesian NER dataset.

Table 4.
Data SplitEnglish (EN)Spanish (ES)Dutch (NL)German (DE)
SentenceEntitySentenceEntitySentenceEntitySentenceEntity
Train14,98723,4998,32318,79815,80613,34424,00021,215
Development3,4665,9421,9154,3512,8952,6162,2001,790
Test3,6845,6481,5173,5585,1953,9415,1004,495
Total22,13735,08911,75526,70723,89619,90131,30027,500
  • English is from CoNLL-2003 [47], Spanish and Dutch are from CoNLL-2002 [46], and German is from GermEval 2014 [6].

Table 4. Data Statistics of Our Source Languages’ NER Datasets

  • English is from CoNLL-2003 [47], Spanish and Dutch are from CoNLL-2002 [46], and German is from GermEval 2014 [6].

All the training data for the Indonesian NER models in these approaches utilize the pseudo-labeled data obtained by transferring the knowledge from the source language’s dataset. Our gold standard Indonesian NER development and test datasets are used to evaluate the models.

Skip 4OVERALL NER FRAMEWORK Section

4 OVERALL NER FRAMEWORK

Our work involved supervised monolingual NER and unsupervised cross-lingual transfer learning for NER. Our supervised monolingual NER is a vanilla BiLSTM-CRF model with BERT-based feature representation using the gold standard training data in the Indonesian language. For the cross-lingual transfer learning, all of the pseudo-data obtained from the cross-lingual transfer step is fed into the BiLSTM-CRF model, as in the monolingual NER problem, because they are already in the target language. Finally, we tested all of the monolingual and cross-lingual models on the test set of our re-annotated data to compare each model equally.

4.1 Supervised Monolingual NER

We employed two approaches to building the monolingual NER model: (1) Fine-tuning the BERT-based models to the NER task and (2) BiLSTM-CRF. The IOB prefixes turn the NER task into common sequence labeling tasks, such as the POS tagging, which was introduced by Huang et al. [18]. BiLSTM-CRF is the previous state-of-the-art approach for English NER [3, 29] and was adopted in early Indonesian NER studies using deep learning. This method resulted in relatively good performance compared with rule-based and earlier machine learning approaches [17, 27, 52]. Wintaka et al. [52] applied BiLSTM-CRF using fastText [7] as the input representation. To further explore the use of PLMs, we also experimented with monolingual and multilingual BERT-based models as the input representation of BiLSTM-CRF. For both approaches, we used IndoBERT [51] and IndoLEM [24] as the monolingual models, and mBERT [12] and XLM-R [11] as the multilingual models. We did not apply the first approach to the fastText pre-trained model, because fine-tuning only works for BERT-based models.

In the next section, we use the BiLSTM-CRF instead of fine-tuning the BERT-based models for our cross-lingual transfer learning simply, because our results from comparing both approaches show that BiLSTM-CRF generally performs better for the Indonesian NER.

4.2 Unsupervised Cross-lingual Transfer Learning

We applied two methods of cross-lingual transfer learning: data and model transfer based, as illustrated in Figure 1. The details of source languages and datasets are explained in the previous section. We illustrated fully unsupervised learning of the Indonesian NER model, where all of the pseudo-training data were obtained by transferring the knowledge from other languages. We only used the test set of our gold Indonesian NER dataset to evaluate the models.

Fig. 1.

Fig. 1. Overview of our four cross-lingual transfer NER methods. We obtain an NER pseudo-dataset in the target language from all of the methods and use the data as an input for the BiLSTM-CRF model, as in the supervised monolingual NER.

4.2.1 Data Transfer.

Data transfer-based methods mainly capture knowledge from the dataset of the source languages as the source transfer. The simplest way to do so is by translating the data. Due to the limited size of our gold NER dataset in the Indonesian language, translation enlarges the dataset volume available for our models. We adopted three methods to implement the data transfer, further explained as follows.

(a) Vector-based transfer: This is a method that translates sentences in an unsupervised way. It highly depends on the source and the target languages’ monolingual word embeddings by projecting them in a shared embedding space [30]. By aligning the vectors from both word embeddings, two words in different languages can be identified as a word pair based on a nearest neighbors algorithm and share one annotated label. We translated the source languages’ NER datasets to the Indonesian language and directly copied the labels without a word alignment tool, considering that the vector-based transfer performs a word-to-word translation of the language pair [55] as shown in Figure 1.

(b) NMT transfer: Instead of performing a word-to-word translation with fixed alignment, NMT transfer requires an NMT model and a good word alignment tool to translate the data [49]. We trained an NMT model to translate the NER dataset from the source languages into the Indonesian language. Instead of mapping the labels using attention as in previous works, we projected the labels using a word alignment tool. In training the NMT model, we use the Fairseq PyTorch toolkit9 [35] with default hyperparameters of the transformer-based model implementation. To align the words from the source to the translated target language for projecting the labels, we used the word alignment tool Eflomal10 [58], which has better performance than its predecessors. Both methods translate the CoNLL-2003 English NER dataset [47] into the Indonesian language and project each token label to build pseudo-data in the target language.

(c) Parallel corpora: We exploit the high availability of parallel EN-ID corpora by labeling the source side of the parallel data and aligning the word pairs to project the labels, as in (b). We labeled the source side of the parallel data that are presented in Table 3 using an off-the-shelf NER tool, Stanza11 [38]. Once we obtained the annotations for the source side, we projected the annotations to the target side of the parallel data, which is the Indonesian language, because Stanza is not available in the Indonesian language. We also used Eflomal [58] to project the labels from the source to the target side and obtained the pseudo-data of the Indonesian NER.

4.2.2 Model Transfer.

The model transfer is a cross-lingual transfer method that builds a shared model on the source language in a supervised way and tests the model directly on the target language [53, 54]. It is highly dependent on language-independent features of pre-trained models, such as multilingual word representations [11, 12, 53].

(d) Teacher–student model: instead of directly testing the model on the target language, we trained a teacher model on the source language NER dataset and used the model to predict NER labels for the unlabeled data in the target language. As an extension, the labeled dataset obtained from the teacher model’s prediction is adopted as pseudo-labeled data to train a student model in the target language, which produces the final results of the NER model prediction. To be specific, the student model works the same as the monolingual NER model used in the data transfer method after obtaining the NER pseudo-data in the target language. We omitted the labels of our IDNER-News-2K corpus, and Indonesian Wikipedia dumps as the unlabeled data for the teacher model.

4.3 Experiment Settings

All models were evaluated using the standard measurements for the NER task, which are F1 score, precision, and recall. These metrics follow the exact match evaluation from CoNLL-2003 Shared Task [47].

CRF. We included an implementation using CRF as a traditional method in NER to show the impact of the re-annotation process without any assistance from word embeddings or PLMs. We used CRFsuite Toolkit from SpaCy12 and followed the basic configuration for feature templates, namely lowercase, title, and uppercase for before and after token, and lowercase, title, uppercase, and digits for the token itself. For the L1 and L2 regularization, we set them both to 0.1.

BiLSTM-CRF. We adopt the implementation of BiLSTM-CRF by FlairNLP13 [2]. We ran the experiment five times for each model and averaged the scores to ensure the consistency of the models. We also include the standard deviation of the overall F1 score to show the amount of variation of the scores. The dataset format follows Sang’s IOB format [45] with three entity types, which are LOC, ORG, and PER. The monolingual NER with BiLSTM-CRF is implemented using different input representations, namely fastText [15], IndoBERT [51], IndoLEM [24], mBERT [12], and XLM-R [11], with parameter settings as follows: a mini-batch size of 32, one BiLSTM hidden layer, 256 BiLSTM hidden units, a dropout of 0.5, and a learning rate of 0.1. The framework implements an early stopping method; thus, we set a maximum number of 200 epochs, and it stops training as the model converges. When freezing the word representation of the pre-trained model for the input of BiLSTM-CRF, we averaged the vector of the subwords.

Fine-tuning BERT-based models. We fine-tuned the BERT-based PLMs; IndoBERT, mBERT, and XLM-RoBERTa and set the batch size to 32, the learning rate to [1e-5, 3e-5, 5e-5, 7e-5], five epochs for each model, and Adam as the optimizer [23]. We ran each model five times with different seeds to average the scores, similar to the BiLSTM-CRF method.

Scenarios for the cross-lingual transfer. We investigated the single-source and multi-source configuration for transfer learning methods introduced in Section 3.2. For the single-source, all methods from data and model transfer are applied. However, we only implemented the (1) vector-based method of data transfer and (4) teacher-student learning of the model transfer for the multi-source due to the limited available parallel data for Spanish, Dutch, and German with the Indonesian language.

We implemented the BiLSTM-CRF approach with XLM-RoBERTa [11] for both approaches, instead of fine-tuning the pre-trained language model for the model transfer, as in Wu et al. [53]. Our study in the supervised monolingual settings shows that the BiLSTM-CRF approach generally performed better than fine-tuning the transformer-based pre-trained model. We chose to use XLM-RoBERTa, because IndoBERT does not cover characters other than the basic Latin used in the Indonesian language. Meanwhile, XLM-RoBERTa was pre-trained in many languages, including Spanish and German, which have the Latin-1 supplement. When creating the pseudo-data using the teacher model in multi-source, we followed the majority voting scheme [37]. Five scenarios were used for the multi-source approach as follows: (1) all four languages (EN-ES-NL-DE): all of the source language, and the rest is an ablation study with a leave-one-out manner; (2) no German (EN-ES-NL); (3) no Dutch (EN-ES-DE); (4) no Spanish (EN-NL-DE); and (5) no English (ES-NL-DE).

Skip 5RESULTS Section

5 RESULTS

Tables 57 summarize our experimental results for the supervised monolingual NER with our re-annotated NER dataset, and Tables 8 and 9 present our results on the unsupervised cross-lingual transfer method. Generally speaking, our re-annotation exhibits superior performance compared with the annotation of S&N (2016) when trained using the baseline model. Our monolingual and multilingual word embedding experiments yielded positive results when using IndoBERT as a feature representation for the BiLSTM-CRF architecture. The cross-lingual setting with a word vector shared embedding representation also shows competitive results with our baseline in the monolingual scenario.

Table 5.
Dataset AnnotationOverall ScoresLOCORGPER
TrainTestPRFPRFPRFPRF
CRF
S&NS&N64.5367.0765.7868.8586.6076.7144.2974.4255.5379.6060.4368.70
OursS&N62.5073.4767.5475.6886.6080.7740.3690.1255.7681.5764.7372.18
S&NOurs80.9260.3569.1475.4189.3281.7876.8454.6163.8586.1259.0370.05
OursOurs87.2573.9080.0282.8889.3285.9885.6481.8083.6790.2464.6675.34
BiLSTM-CRF on top of FastText (baseline)
S&NS&N69.2084.5576.1174.8285.7879.9245.5483.0258.8283.6084.8684.23
OursS&N66.3988.2875.7979.0585.5782.1839.6488.9554.8484.9588.6086.74
S&NOurs89.2780.0684.4180.7085.1982.8888.1271.2178.7792.1385.9188.91
OursOurs92.2389.5290.8589.0288.5288.7689.1388.1388.6395.5090.8393.10
  • The bold scores show the best score for both models when tested on our test set, and the underlined scores present the best score when tested on the S&N (2016) test set.

Table 5. Traditional Technique and Baseline Model Performance Comparison of S&N (2016) and Our Annotation (Ours)

  • The bold scores show the best score for both models when tested on our test set, and the underlined scores present the best score when tested on the S&N (2016) test set.

5.1 Supervised Monolingual Indonesian NER

Annotation performance. . We present a comparison of both annotation performance results on a traditional technique (CRF) and our baseline model (BiLSTM-CRF) in Table 5. To investigate both models’ performance in the same setting, we did a cross-test for each model by testing them on both test sets, as shown in the table. Using the baseline model, testing the S&N (2016) training data on both test sets shows different scores. We obtain an F1 score of 76.11 and 84.41 when testing on the S&N (2016) and our re-annotation (Ours), respectively. Based on the same training data, the lower score on S&N (2016) and higher score on Ours explains that inconsistent labels could harm the model’s performance evaluation. Entity-level F1 Scores confirmed this phenomenon through the significant jump on the organization entity by approximately 20 points, where our re-annotation confusion matrix in Table 2(b) shows the number of corrected labels from O to ORG as the highest. We believe that consistent annotation, especially on test set, aligns with model performance to produce better evaluation results to avoid false-negative cases.

Additionally, training the model using Ours increases scores between the entities and a relatively high overall F1 score of 90.85. This demonstrates that the inconsistency in the dataset could cause a low prediction score and that our re-annotation improved the model performance.

Monolingual vs. multilingual pre-trained models. We present the results of using pre-trained monolingual and multilingual BERT models for our Indonesian NER task in Table 6. Using the IndoBERT as a feature representation for the BiLSTM-CRF architecture shows the best score of 94.90. Both ways of exploiting IndoBERT, as feature representation and fine-tuning, yield very high organization scores. IndoBERT used the Indo4B dataset in their pre-training step, which contains Indonesian news corpora, the same source as our NER dataset [51]. Therefore, the rich Indonesian vocabularies covered by IndoBERT, the domain similarity between the Indo4B and our dataset, and the sequence-based architecture of BiLSTM-CRF fit the Indonesian NER task better. We performed a two-tailed statistical significance test with a confidence interval of 0.95% for all of the compared models. The result is that most of the models are statistically different from the baseline, marked with , except for the fine-tuned mBERT model. Regarding the comparison against the best model–BiLSTM-CRF with IndoBERT–all of the compared models are significantly different, marked with *.

Table 6.
InputModelOverall ScoresF1 Scores for Each Tag
RepresentationPRFLOCORGPER
Fine-tune IndoBERT91.6694.6493.13 \(\pm\) 0.31\(\ddagger\)*83.8491.8196.83
Fine-tune mBERT89.5492.3390.91 \(\pm\) 0.51*86.7387.7894.11
Fine-tune XLM-R91.6094.7693.15 \(\pm\) 0.23\(\ddagger\)*85.5990.4697.41
fastText (baseline)BiLSTM-CRF92.2389.5290.85 \(\pm\) 0.08*88.7688.6393.10
IndoBERT94.9394.8794.90 \(\pm\) 0.31\(\ddagger\)89.8492.9997.49
IndoLEM93.9693.9493.95 \(\pm\) 0.40\(\ddagger\)*91.8791.2196.51
mBERT90.1189.8089.95 \(\pm\) 0.05\(\ddagger\)*87.3086.5493.20
XLM-R91.1294.0592.56 \(\pm\) 0.54\(\ddagger\)*89.0188.9596.25
  • mBERT and XLM-R are multilingual PLMs; the others are monolingual Indonesian PLMs. We performed a two-tailed test with a confidence interval of 0.95% for each model pair comparison. The result denoted by \(\ddagger\)indicates a model that is significantly different from the baseline, while the result marked with *indicates a model that differs significantly from the best model (IndoBERT + BiLSTM-CRF).

Table 6. NER Model Performance for the Contextual Embedding Experiment on Our Re-annotated Dataset

  • mBERT and XLM-R are multilingual PLMs; the others are monolingual Indonesian PLMs. We performed a two-tailed test with a confidence interval of 0.95% for each model pair comparison. The result denoted by \(\ddagger\)indicates a model that is significantly different from the baseline, while the result marked with *indicates a model that differs significantly from the best model (IndoBERT + BiLSTM-CRF).

In the multilingual setting, XLM-R performs better compared with mBERT. Richer and larger datasets give more information to the neural network model, and XLM-R is pre-trained using large-scale unsupervised multilingual data [11, 12]. The XLM-R model could work better in an NER task, because entity names, particularly the organization, sometimes originate from English or other languages [51]. However, most of the entity names in our dataset are in the Indonesian language; thus, IndoBERT is well suited for our vocabulary. Furthermore, mBERT and XLM-R applied similar tokenization for their words, splitting a longer token into more common subwords [25]. Each subword has its own vector, and averaging the vectors when freezing the word representation for the input of BiLSTM-CRF cause the final average score of each word to be biased. Multilingual models are trained in many languages so that the sub-word representations are shared from other languages as well.

Other Indonesian NER datasets. Experimenting with the models using only our re-annotated dataset may seem biased. We include other Indonesian NER datasets that are recently published in IndoNLU [51] and IndoLEM [24] to offer a fair testbed for the models. In this part, we only run the BiLSTM-CRF model for the additional datasets, following its success on our dataset. The summary of our experiment on the four datasets is shown in Table 7. CodaLab presents high scores from both datasets in the competition. Fine-tune {BERT-based models, IndoLEM} represent the model performance from previous works. Our BiLSTM-CRF with BERT-based feature representation clearly outperformed previous methods for all datasets, as well as the high scores from the CodaLab competition. However, each dataset has its preferred feature representation due to the different dataset characteristics. IndoBERTLARGE has a larger model size than the other pre-trained models, aligned with the fact that NERP is four times larger than the other datasets. This result is also aligned with the fine-tuning method from the previous work. Regarding NERGrit, we found that some of the vocabularies are in English, making it reasonable that the multilingual model performed better than the monolingual ones. In general, the results related to the feature representation are difficult to summarize, yet we showed that BiLSTM-CRF with pre-trained models as the feature representation yielded better performance than fine-tuning the same pre-trained models.

Table 7.
Features/ModelsNERPNERGritNERUINERUGM
CodaLab77.2077.70
Fine-tune BERT-based models [51]79.2579.09
Fine-tune IndoLEM [24]90.174.9
BiLSTM-CRF model with different input representation
fastText (baseline)74.60 \(\pm\) 0.6769.01 \(\pm\) 1.4786.02 \(\pm\) 2.0076.64 \(\pm\) 0.48
IndoBERTBASE77.56 \(\pm\) 0.3178.44 \(\pm\) 0.5991.83 \(\pm\) 0.4479.84 \(\pm\) 0.49
IndoBERTLARGE79.70 \(\pm\) 0.3177.96 \(\pm\) 0.8693.84 \(\pm\) 0.2680.08 \(\pm\) 1.56
IndoLEM78.19 \(\pm\) 0.5274.47 \(\pm\) 0.7191.68 \(\pm\) 0.4281.02 \(\pm\) 0.98
mBERT75.61 \(\pm\) 2.1879.50 \(\pm\) 0.7690.42 \(\pm\) 0.44.81.01 \(\pm\) 0.80
XLM-R77.33 \(\pm\) 0.7278.57 \(\pm\) 0.4791.97 \(\pm\) 0.2581.84 \(\pm\) 0.34
  • NERP and NERGrit are datasets from IndoNLU [51], and they masked the test data for a competition in CodaLab. The scores from IndoNLU for NERP and NERGrit are obtained by fine-tuning a pre-trained IndoBERTLARGE and XLM-R model, respectively. Thus, we also include IndoBERTLARGE in the experiment as a fair comparison to the previous work. We underline the scores using the same pre-trained models applied in the previous works with the fine-tuning method.

Table 7. BiLSTM-CRF Model Performance with Various BERT-based Feature Representations on the Additional Datasets: NERP, NERGrit, NERUI, and NERUGM

  • NERP and NERGrit are datasets from IndoNLU [51], and they masked the test data for a competition in CodaLab. The scores from IndoNLU for NERP and NERGrit are obtained by fine-tuning a pre-trained IndoBERTLARGE and XLM-R model, respectively. Thus, we also include IndoBERTLARGE in the experiment as a fair comparison to the previous work. We underline the scores using the same pre-trained models applied in the previous works with the fine-tuning method.

5.2 Unsupervised Cross-lingual Transfer Learning

Single-source . Table 8 displays the results of the single-source cross-lingual transfer from English as the source language to Indonesian as the target language. Rows 1–9 show the data transfer-based results, with vector-based transfer on rows 1–3, NMT on rows 4–6, and parallel corpora on rows 7–9. Both NMT and parallel corpora used Eflomal to align the word translations. The last four rows are the teacher–student model, comparing the use of our IDNER-News-2K dataset and Indonesian Wikipedia dumps as the unlabeled training data, and XLM-RoBERTa and multilingual BERT as the transformer pre-trained models.

Table 8.
Transfer MethodTraining DataFeaturePRF
Vector-based TransferCoNLL-2003IndoBERT90.9687.8089.35 \(\pm\) 0.44
XLM-RoBERTa88.4086.4787.42 \(\pm\) 0.70
mBERT84.1781.2882.69 \(\pm\) 0.51
NMT-based with EflomalCoNLL-2003IndoBERT88.6285.0286.78 \(\pm\) 0.51
XLM-RoBERTa87.0979.3783.05 \(\pm\) 0.86
mBERT82.9175.9279.25 \(\pm\) 2.50
Parallel Corpora with EflomalGlobal News Parallel DataIndoBERT84.6880.0282.28 \(\pm\) 0.76
XLM-RoBERTa81.6875.7778.61 \(\pm\) 0.65
mBERT75.9866.2670.79 \(\pm\) 0.59
Teacher-Student LearningIDNER-News-2K (ours)XLM-RoBERTa81.7987.9584.76 \(\pm\) 0.20
mBERT74.0081.1277.40 \(\pm\) 0.37
IDWiki-dumpsXLM-RoBERTa82.0886.1284.05 \(\pm\) 0.16
mBERT74.2279.7976.90 \(\pm\) 0.30

Table 8. Single-source Cross-lingual Transfer (from English) Comparing Data and Model Transfer-based Methods using the BiLSTM-CRF

BiLSTM-CRF with IndoBERT always yields the best performance for each data transfer scenario. Translating a gold NER dataset from the source language using vector-based transfer achieved the best score. It occurred because vector-based transfer performs a word-to-word translation in which the word order is unchanged, as in NMT or parallel corpora. Meanwhile, when comparing the performance of using NMT or parallel corpora to which we projected the labels based on their word alignment, Table 8 demonstrates that fine-tuning performs better on the parallel corpora, and BiLSTM-CRF performs better on the translated data from the CoNLL-2003 English NER data.

Multi-source. We summarize the results of the multi-source cross-lingual transfer experiment in Table 9. Overall, vector-based transfer outperforms the teacher–student learning method with Spanish, Dutch, and German as the source languages. In the vector-based transfer scenario, Dutch gives the most gain to be transferred to the Indonesian language, while the model without English as the source language performs best. Interestingly, BiLSTM-CRF + XLM-RoBERTa with Spanish, Dutch, and German achieve better results than English alone in the single-source setting (Table 8). Although the teacher–student learning did not perform as well as the vector-based approach, it still has competitive results. In opposition to vector-based transfer, German gives the most significant gain. It is shown that the lowest score in the teacher–student learning is the model without the German NER dataset as the source (No German).

Table 9.
MethodLanguagesPrecisionRecallF1 Score
Vector-based TransferEN (single-source)88.4086.4787.42 \(\pm\) 0.70
All four languages89.1986.8388.00 \(\pm\) 0.43
No German88.0889.4688.76 \(\pm\) 0.64
No Dutch89.7886.6988.21 \(\pm\) 0.29
No Spanish89.6587.4288.52 \(\pm\) 0.28
No English90.7087.6989.17 \(\pm\) 0.43
Teacher-Student LearningEN (single-source)83.7688.5086.07 \(\pm\) 0.52
All four languages85.0986.6685.86 \(\pm\) 0.18
No German82.7986.2284.47 \(\pm\) 0.27
No Dutch85.1486.6685.89 \(\pm\) 0.50
No Spanish85.3287.2686.28 \(\pm\) 0.29
No English84.8485.0084.92 \(\pm\) 0.27
  • All four languages (EN-ES-NL-DE) means that the source languages are English, Spanish, Dutch, and German. The remaining values are the ablation for each source language (e.g., No German is equal to EN-ES-NL). All models were trained on BiLSTM-CRF with XLM-RoBERTa for the input representation.

Table 9. Multi-source Cross-lingual Transfer Results from English (EN), Spanish (ES), Dutch (NL), and German (DE) as the Source Languages

  • All four languages (EN-ES-NL-DE) means that the source languages are English, Spanish, Dutch, and German. The remaining values are the ablation for each source language (e.g., No German is equal to EN-ES-NL). All models were trained on BiLSTM-CRF with XLM-RoBERTa for the input representation.

Skip 6DISCUSSION Section

6 DISCUSSION

In this section, we present some error examples of the model trained on S&N (2016) annotation compared to our annotation. Moreover, we clarify the effect of monolingual and multilingual information on the prediction results.

Re-annotation. . Sentence 1 of Table 10 demonstrates the annotation errors of S&N (2016) and its prediction in the case of recognizing a person’s name. The words “Ketua Umum Gerindra” were labeled as part of a person’s name, whereas it is a title of a person. When predicting the tokens, the S&N (2016) model correctly spots “Prabowo Subianto” as a person’s name, but it misses “Gerindra” as an organization name. However, our model appropriately does not recognize “Ketua Umum” as an entity and both “Gerindra” and “Prabowo Subianto” as they are.

Table 10.
Sentence 1Joko Widodo met Gerindra’s Chairman Prabowo Subianto
IndonesianJokoWidodobertemuKetuaUmumGerindraPrabowoSubianto
English TranslationJokoWidodometchairmangeneralGerindraPrabowoSubianto
S&N (2016) annotationB-PERI-PEROB-PERI-PERI-PERI-PERI-PER
S&N (2016) fastTextB-PERI-PEROOOOB-PERI-PER
Our annotationB-PERI-PEROOOB-ORGB-PERI-PER
Our fastTextB-PERI-PEROOOB-ORGB-PERI-PER
Sentence 2Required by the Ministry of Law and Human Rights
IndonesiandisyaratkanolehKementerianHukumdanHakAsasiManusia
English Translationrequiredbyministrylawandrightbasichuman
S&N (2016) annotationOOOOOOOO
S&N (2016) fastTextOOB-ORGI-ORGI-ORGI-ORGI-ORGI-ORG
Our annotationOOB-ORGI-ORGI-ORGI-ORGI-ORGI-ORG
Our fastTextOOB-ORGI-ORGI-ORGI-ORGI-ORGI-ORG
  • The tokens in red indicate the incorrect, and those in blue indicate the correct one.

Table 10. Examples of Errors in Prediction Comparing S&N (2016) and Our Annotation When Trained Using Baseline BiLSTM-CRF Model

  • The tokens in red indicate the incorrect, and those in blue indicate the correct one.

In Sentence 2, the S&N (2016) annotation did not tag the words “Kementerian Hukum dan Hak Asasi Manusia” (“The Ministry of Law and Human Rights” in English) as an organization name. However, both models predicted all of the tokens accurately. These errors exhibit how inconsistency in a labeled dataset impacts the inference of a model and worsens the model’s score. Both sentences resulted in false-positive cases in the evaluation step. When the model prediction was correct but the annotation was incorrect, the prediction was considered as a false prediction.

Model prediction in monolingual and multilingual settings. To investigate the impact of monolingual and multilingual pre-trained models, we thoroughly analyzed the prediction resulting from each model in Table 6. Table 11 presents the number of errors for each model. During the error analysis, we only counted the errors when the model falsely predicted the entity and ignored the prefix difference of the labels unless it did not begin with “B-”. Aligned with our findings during the re-annotation, the organization entity has the highest number of errors. The second problem occurred for the person entity, where the model could not recognize non-person names. In the case of the fastText model, it has a moderate number of errors in Organization and Person entities. They are mainly because of the OOV problem, where the tokens were absent in the training and development sets.

Table 11.

Table 11. Number of Errors in Prediction Results of Each Model from Table 6

Most of the errors in the O→ORG case in the IndoBERT (BiLSTM-CRF: 25 cases; fine-tuning: 28 cases) occurred, because the models are too context-sensitive, so that it recognized tokens that have an underlying meaning of an organization as an organization entity (e.g., tokens started with the word “lembaga” (institution) or “gerakan” (movement); nevertheless, these tokens did not mention any official organization name). Meanwhile, the multilingual models’ errors occur, because the tokens are prefixed with uppercase letters (Xxx) or have an all-uppercase form (XXX). This phenomenon illustrates that multilingual models are great at learning the internal word structure of a language. In the case of ORG→O, most errors occurred in mBERT, because the organization tokens do not start with an uppercase letter, as it does in the case of XLM-R. This type of error in IndoBERT occurred because of the OOV problem.

Regarding O→PER, it appeared for a similar reason as O→ORG in IndoBERT. The model falsely recognized the tokens that mentioned the title of a person or referred to a person. It contextually recognized the tokens as a position of a person name, but the token did not mention one. The last type is when the model predicted entities without the prefix “B-” for the first token. This occurred mainly in the fine-tuned model, which does not have the CRF as the decoder. We hypothesize that it occurred, because the fine-tuning only uses the softmax function for the classification layer, where it determines the final output based on the probability score without considering the previous tag as in CRF. Therefore, in some instances, it is possible that the model lost some critical information, such as the beginning or the middle of the entity.

Contribution of different source languages on each transfer method. Our experiments showed that vector-based transfer performed well on multi-source settings, with Dutch giving the most gain to the model. Almost 6,000 Indonesian words are borrowed from Dutch, because both nations have a long historical background [41]. The vector-based transfer method significantly depends on the monolingual models of the source and target language, and these models are fastText-based, where the word vectors are built based on the word morphology. Both languages have similar morphology patterns and the Latin alphabet. The consonant-vowel patterns significantly appear in a word, compared to German and English, where the words often have three or more consecutive consonants. Therefore, similarity in the words’ morphology benefits the cross-lingual transfer using the vector-based method.

Regarding model transfer, German contributes most to teacher–student learning. It has the largest dataset size, with approximately 31K sentences, whereas the other language sources have less than 20K. Considering that model transfer relies on the cross-lingual representation, XLM-RoBERTa has English, Indonesian, and German as the highest amount of data used in the pre-training steps among the source languages we used in this experiment. We demonstrated an extended experiment to investigate the contribution of dataset size and the multilingual PLMs to the German teacher model of the model transfer method, which was performed in the following two ways. (1) Changing the input representation for BiLSTM-CRF. We employed the monolingual fastText and BERT model of each language and mBERT for the comparison of XLM-RoBERTa. (2) Reducing the training set size of the source languages to the smallest, for approximately 8K sentences.

We summarize the F1 scores of the models in Figure 2. The Downsampled BERT and FastText (monolingual) models show that the No German models perform better after the training data reduction, which means that the larger amount of data contributes to the superior performance of German in the multi-source of the model transfer. However, the English models work better, even only by replacing the multilingual with the monolingual model. The No English models perform worst in the ablation, showing that the largest corpora used on the pre-trained English models help the monolingual models form subwords that are beneficial to the Indonesian Language.

Fig. 2.

Fig. 2. Ablation of the input representation and source dataset size to investigate the contribution of German in the multi-source model transfer. All four languages: the source languages are English, Spanish, Dutch, and German. The remainder (No X) are the ablation for each source language. BERT and FastText Input Representation: Instead of using multilingual PLMs when training the teacher and student models, we used the monolingual model of the corresponding language. Downsampled to 8K: We reduced the training set size of the source language following the least (Spanish) for approximately 8K sentences.

In general, both vector-based and model transfer methods in the multi-source scenario improve the performance of cross-lingual transfer learning compared to the single-source scenario. Using multiple languages as the source transfer demonstrates a practical approach for the Indonesian NER task without ignoring the transfer methods and the source–target languages’ similarities.

Skip 7CONCLUSIONS AND FUTURE WORK Section

7 CONCLUSIONS AND FUTURE WORK

We built a more consistent Indonesian NER dataset by re-annotating a previously inconsistent dataset and made it publicly available for further use to the research community. Our annotation resulted in an F1 score of 90.85 with the baseline, with fastText as the input representation. We also compared the use of monolingual and multilingual BERT-based pre-trained models to obtain a more robust model for tackling word ambiguity problems in the Indonesian NER task. We found that the sequence model architecture of BiLSTM-CRF combined with the monolingual IndoBERT pre-trained model yielded a very high F1 score of 94.90. We demonstrated the robustness of the model by examining four other Indonesian NER datasets, namely, NERP, NERGrit, NERUI, and NERUGM. We showed that combining the BiLSTM-CRF model with a BERT-based input representation surpassed the fine-tuning methods for all datasets. However, the BERT-based pre-trained model’s choice varies depending on each dataset’s unique characteristics.

In addition, we show that single and multi-source cross-lingual transfer from the high-resource languages gives promising results using the data transfer method by projecting the entity label with the vector-based transfer. Interestingly, we found that Dutch provides competitive results compared to English owing to both morphologically and phonetically shared vocabularies between Dutch and Indonesian. Our results from the cross-lingual transfer experiments show that multilingual transfer learning can be an alternative to low-resource languages and can be implemented at a lower cost.

Our work is currently limited to the general domain and specific to the NER task. However, specific domain NER and cross-lingual transfer knowledge in other tasks for the Indonesian language are currently less explored. For future work, we aim to investigate the contribution of each language source to the Indonesian NER task. Taking advantage of the cross-lingual transfer method to increase the gold dataset’s size may also help enhance the model’s performance. Moreover, pre-training an entity-aware language model to increase the robustness of the model to solve the entity ambiguity problem for the Indonesian language would be worth exploring.

Footnotes

  1. 1 https://github.com/khairunnisaor/idner-news-2k/.

    Footnote
  2. 2 https://www.kompas.com/.

    Footnote
  3. 3 https://www.tempo.co/.

    Footnote
  4. 4 https://www.tribunnews.com/.

    Footnote
  5. 5 https://universaldependencies.org/.

    Footnote
  6. 6 https://github.com/yusufsyaifudin/Indonesia-ner.

    Footnote
  7. 7 We use the term “S&N (2016)” for the dataset from the original authors, which we re-annotated, and “NERUGM” for the split published by Koto et al. [24] that is used in our extended experiments.

    Footnote
  8. 8 https://github.com/desmond86/Indonesian-English-Bilingual-Corpus.

    Footnote
  9. 9 https://github.com/pytorch/fairseq.

    Footnote
  10. 10 https://github.com/robertostling/eflomal.

    Footnote
  11. 11 https://stanfordnlp.github.io/stanza/index.html.

    Footnote
  12. 12 https://github.com/talmago/spacy_crfsuite.

    Footnote
  13. 13 https://github.com/flairNLP/flair.

    Footnote

REFERENCES

  1. [1] Ahmad Wasi Uddin, Zhang Zhisong, Ma Xuezhe, Chang Kai-Wei, and Peng Nanyun. 2019. Cross-lingual dependency parsing with unlabeled auxiliary languages. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL’19). 372382. Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Akbik Alan, Bergmann Tanja, Blythe Duncan, Rasul Kashif, Schweter Stefan, and Vollgraf Roland. 2019. FLAIR: An easy-to-use framework for state-of-the-art NLP. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 5459.Google ScholarGoogle Scholar
  3. [3] Akbik Alan, Blythe Duncan, and Vollgraf Roland. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics. 16381649.Google ScholarGoogle Scholar
  4. [4] Alfina Ika, Manurung Ruli, and Fanany Mohamad I.. 2016. DBPedia entities expansion in automatically building dataset for Indonesian NER. In Proceedings of the International Conference on Advanced Computer Science and Information Systems (ICACSIS’16). 335340.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Artstein Ron and Poesio Massimo. 2008. Survey article: Inter-coder agreement for computational linguistics. Comput. Linguist. 34, 4 (2008), 555596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Benikova Darina, Biemann Chris, and Reznicek Marc. 2014. NoSta-D named entity annotation for German: Guidelines and dataset. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 25242531.Google ScholarGoogle Scholar
  7. [7] Bojanowski Piotr, Grave Edouard, Joulin Armand, and Mikolov Tomas. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5 (2017), 135146. Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Budi Indra and Bressan Stéphane. 2007. Application of association rules mining to Named Entity Recognition and co-reference resolution for the Indonesian language. Int. J. Bus. Intell. Data Min. 2, 4 (2007), 426446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Budiono, Riza Hammam, and Hakim Chairil. 2009. Resource report: Building parallel text corpora for multi-domain translation system. In Proceedings of the 7th Workshop on Asian Language Resources (ALR7). 9295.Google ScholarGoogle Scholar
  10. [10] Cettolo M., Jan Niehues, Sebastian Stüker, Bentivogli L., Cattoni R., and Federico Marcello. 2016. The IWSLT 2016 evaluation campaign. In Proceedings of the International Workshop on Spoken Language Translation. 114.Google ScholarGoogle Scholar
  11. [11] Conneau Alexis, Khandelwal Kartikay, Goyal Naman, Chaudhary Vishrav, Wenzek Guillaume, Guzmán Francisco, Grave Edouard, Ott Myle, Zettlemoyer Luke, and Stoyanov Veselin. 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 84408451. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 41714186. Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Eskander Ramy, Muresan Smaranda, and Collins Michael. 2020. Unsupervised cross-lingual part-of-speech tagging for truly low-resource scenarios. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 48204831. Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Fleiss Joseph L.. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971), 378382.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Grave Edouard, Bojanowski Piotr, Gupta Prakhar, Joulin Armand, and Mikolov Tomas. 2018. Learning word vectors for 157 languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). 34833487. https://www.aclweb.org/anthology/L18-1550/.Google ScholarGoogle Scholar
  16. [16] Gunawan William, Suhartono Derwin, Purnomo Fredy, and Ongko Andrew. 2018. Named-entity recognition for Indonesian language using bidirectional LSTM-CNNs. In Proceedings of the 3rd International Conference on Computer Science and Computational Intelligence (ICCSCI’18): Empowering Smart Technology in Digital Era for a Better Life, Vol. 135. 425432. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Hoesen Devin and Purwarianti Ayu. 2018. Investigating Bi-LSTM and CRF with POS tag embedding for Indonesian named entity tagger. In Proceedings of the International Conference on Asian Language Processing (IALP’18). 3538.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Huang Zhiheng, Xu Wei, and Yu Kai. 2015. Bidirectional LSTM-CRF models for sequence tagging. arxiv:1508.01991 [cs.CL]. Retrieved from https://arxiv.org/abs/1508.01991.Google ScholarGoogle Scholar
  19. [19] Ikhwantri Fariz. 2019. Cross-lingual transfer for distantly supervised and low-resources Indonesian NER. arxiv:1907.11158. Retrieved from http://arxiv.org/abs/1907.11158.Google ScholarGoogle Scholar
  20. [20] Jain Alankar, Paranjape Bhargavi, and Lipton Zachary C.. 2019. Entity projection via machine translation for cross-lingual NER. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 10831092. Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Ji Baijun, Zhang Zhirui, Duan Xiangyu, Zhang Min, Chen Boxing, and Luo Weihua. 2020. Cross-lingual pre-training based transfer for zero-shot neural machine translation. Proceedings of the AAAI Conference on Artificial Intelligence 34, 01 (2020), 115122. Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Khairunnisa Siti Oryza, Imankulova Aizhan, and Komachi Mamoru. 2020. Towards a standardized dataset on Indonesian named entity recognition. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop. 6471.Google ScholarGoogle Scholar
  23. [23] Kingma Diederik P., Ba Jimmy, Bengio Y., and LeCun Y.. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations.Google ScholarGoogle Scholar
  24. [24] Koto Fajri, Rahimi Afshin, Lau Jey Han, and Baldwin Timothy. 2020. IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP. In Proceedings of the 28th International Conference on Computational Linguistics. 757770. Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Kudo Taku. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6675. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kurniawan Kemal, Frermann Lea, Schulz Philip, and Cohn Trevor. 2021. PPT: Parsimonious parser transfer for unsupervised cross-lingual adaptation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 29072918.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Kurniawan Kemal and Louvan Samuel. 2018. Empirical evaluation of character-based model on neural named-entity recognition in Indonesian conversational texts. In Proceedings of the EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text. 8592.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Kuspriyanto K., Santoso O. S., Widyantoro D. H., Sastramihardja H., Muludi K., and Maimunah Siti. 2010. Performance evaluation of SVM-based information extraction using \(\tau\) margin values. Int. J. Electr. Eng. Inf. 2 (2010), 256265.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Lample Guillaume, Ballesteros Miguel, Subramanian Sandeep, Kawakami Kazuya, and Dyer Chris. 2016. Neural architectures for named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 260270.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Lample Guillaume, Conneau Alexis, Denoyer Ludovic, and Ranzato Marc’Aurelio. 2018. Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations. 114.Google ScholarGoogle Scholar
  31. [31] Leonandya Rezka Aufar, Distiawan Bayu, and Praptono Nursidik Heru. 2015. A semi-supervised algorithm for Indonesian named entity recognition. In Proceedings of the 3rd International Symposium on Computational and Business Intelligence (ISCBI’15). 4550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Leonandya Rezka A. and Ikhwantri Fariz. 2019. Pretrained language model transfer on neural named entity recognition in Indonesian conversational texts. In Proceedings of the 33rd Pacific Asia Conference on Language, Information, and Computation. 104113. https://arxiv.org/abs/1902.07938.Google ScholarGoogle Scholar
  33. [33] Mayhew Stephen, Tsai Chen-Tse, and Roth Dan. 2017. Cheap translation for cross-lingual named entity recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 25362545. Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Muhammad Fawwaz and Khodra Masayu Leylia. 2015. Event information extraction from Indonesian tweets using conditional random field. In Proceedings of the 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA’15). 16. Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Ott Myle, Edunov Sergey, Baevski Alexei, Fan Angela, Gross Sam, Ng Nathan, Grangier David, and Auli Michael. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 4853. https://aclanthology.org/N19-4009.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Pires Telmo, Schlinger Eva, and Garrette Dan. 2019. How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 49965001. Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Plank Barbara and Agić Željko. 2018. Distant supervision from disparate sources for low-resource part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 614620. https://aclanthology.org/D18-1061.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Qi Peng, Zhang Yuhao, Zhang Yuhui, Bolton Jason, and Manning Christopher D.. 2020. Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 101108. Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Rahimi Afshin, Li Yuan, and Cohn Trevor. 2019. Massively multilingual transfer for NER. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 151164. https://aclanthology.org/P19-1015.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Riza Hammam, Purwoadi Michael, Gunarso, Uliniansyah Teduh, Ti Aw Ai, Aljunied Sharifah Mahani, Mai Luong Chi, Thang Vu Tat, Thai Nguyen Phuong, Chea Vichet, Sun Rapid, Sam Sethserey, Seng Sopheap, Soe Khin Mar, Nwet Khin Thandar, Utiyama Masao, and Ding Chenchen. 2016. Introduction of the Asian language treebank. In Proceedings of the Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA’16). 16. Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Sneddon James. 2003. The Indonesian Language: Its History and Role in Modern Society.Google ScholarGoogle Scholar
  42. [42] Sun Linghao, Yi Huixiong, and Chen Huanhuan. 2019. Back attention knowledge transfer for low-resource named entity recognition. arxiv:1906.01183. Retrieved from http://arxiv.org/abs/1906.01183.Google ScholarGoogle Scholar
  43. [43] Syaifudin Yusuf and Nurwidyantoro Arif. 2016. Quotations identification from Indonesian online news using rule-based method. In Proceedings of the International Seminar on Intelligent Technology and Its Applications (ISITIA’16). 187194. Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Tiedemann Jörg. 2012. Parallel data, tools and interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), 22142218. http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.Google ScholarGoogle Scholar
  45. [45] Sang Erik F. Tjong Kim. 2000. Text chunking by system combination. In Proceedings of the 4th Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop. 151153.Google ScholarGoogle Scholar
  46. [46] Sang Erik F. Tjong Kim. 2002. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning 2002 (CoNLL’02). 14.Google ScholarGoogle Scholar
  47. [47] Sang Erik F. Tjong Kim and Meulder Fien De. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003. 142147. https://www.aclweb.org/anthology/W03-0419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Tsai Chen-Tse, Mayhew Stephen, and Roth Dan. 2016. Cross-lingual named entity recognition via Wikification. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, 219228. Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Goot Rob van der, Sharaf Ibrahim, Imankulova Aizhan, Üstün Ahmet, Stepanović Marija, Ramponi Alan, Khairunnisa Siti Oryza, Komachi Mamoru, and Plank Barbara. 2021. From masked language modeling to translation: Non-English auxiliary tasks improve zero-shot spoken language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 24792497. Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Wang Xinyu, Jiang Yong, Bach Nguyen, Wang Tao, Huang Zhongqiang, Huang Fei, and Tu Kewei. 2021. Automated concatenation of embeddings for structured prediction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 26432660. Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Wilie Bryan, Vincentio Karissa, Winata Genta Indra, Cahyawijaya Samuel, Li Xiaohong, Lim Zhi Yuan, Soleman Sidik, Mahendra Rahmad, Fung Pascale, Bahar Syafri, and Purwarianti Ayu. 2020. IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. 843857.Google ScholarGoogle Scholar
  52. [52] Wintaka Deni C., Bijaksana Moch A., and Asror Ibnu. 2019. Named-entity recognition on Indonesian tweets using bidirectional LSTM-CRF. In Proceedings of the 4th International Conference on Computer Science and Computational Intelligence (ICCSCI’19): Enabling Collaboration to Escalate Impact of Research Results for Society, Vol. 157. 221228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Wu Qianhui, Lin Zijia, Karlsson Börje, Lou Jian-Guang, and Huang Biqing. 2020. Single-/multi-source cross-lingual NER via teacher-student learning on unlabeled data in target language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 65056514. Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Wu Shijie and Dredze Mark. 2019. Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 833844. Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Xie Jiateng, Yang Zhilin, Neubig Graham, Smith Noah A., and Carbonell Jaime. 2018. Neural cross-lingual named entity recognition with minimal resources. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 369379. Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Yamada Ikuya, Asai Akari, Shindo Hiroyuki, Takeda Hideaki, and Matsumoto Yuji. 2020. LUKE: Deep contextualized entity representations with entity-aware self-attention. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 64426454. Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Yamashita Ikumi, Katsumata Satoru, Kaneko Masahiro, Imankulova Aizhan, and Komachi Mamoru. 2020. Cross-lingual transfer learning for grammatical error correction. In Proceedings of the 28th International Conference on Computational Linguistics. 47044715. Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Östling Robert and Tiedemann Jörg. 2016. Efficient word alignment with Markov Chain Monte Carlo. Prague Bull. Math. Linguist. 106, 1 (October2016), 125146. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 6
      June 2023
      635 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3604597
      Issue’s Table of Contents

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 June 2023
      • Online AM: 18 April 2023
      • Accepted: 8 April 2023
      • Revised: 25 February 2023
      • Received: 4 April 2022
      Published in tallip Volume 22, Issue 6

      Check for updates

      Qualifiers

      • note
    • Article Metrics

      • Downloads (Last 12 months)214
      • Downloads (Last 6 weeks)155

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!