skip to main content
research-article
Open Access

The Comparison of Language Models with a Novel Text Filtering Approach for Turkish Sentiment Analysis

Published:27 December 2022Publication History

Skip Abstract Section

Abstract

Today, comments can be made on many topics on web platforms with the development of the internet. Analyzing the data of these comments is essential for companies and data scientists. There are many methods for analyzing data. Recently, language models have also been used in many studies for sentiment analysis or text classification. In this study, Turkish sentiment analysis is performed using language models on hotel and movie review datasets. The language models are chosen because they are rarely used in Turkish literature. The pre-trained BERT, ALBERT, ELECTRA, and DistilBERT models for the Turkish language are trained and tested with these datasets. In addition, a text filtering method, which removes the words that can provide the opposition sentiment in the positive or negative labeled text, is proposed for sentiment analysis. These datasets obtained by this method are also retrained with language models and the accuracy values of their models are measured. The results of this study are compared with previous studies using the same datasets. As a result of the analysis, the accuracy values obtain state-of-the-art results with language models compared to previous studies. The best performance has been achieved by training the ELECTRA language model using the proposed text filtering method.

Skip 1INTRODUCTION Section

1 INTRODUCTION

In recent years, social media platforms and websites have become the place to freely express user and customer opinions about products, services, and platforms. Companies and organizations want to receive feedback for relevant thoughts and comments. Due to a large amount of data, it is difficult to analyze manually. In the age of big data, sentiment analysis (SA) has become one of the most popular areas in natural language processing (NLP), as it enables the investigation of the emotional tendencies of this data through artificial intelligence technology. SA is a type of text classification that includes NLP, machine learning, data mining, information retrieval, and other research areas [35]. SA aims to measure the polarity of people’s thoughts or interpretations and is considered a text classification problem [5].

SA is generally applied based on language (Turkish, English, etc.) and domain (movies, hotels, etc.). When the terms related to the domain are analyzed, the “small” term has negative polarity for the room size in the hotel domain, but this term has positive for the battery size for the camera domain [9]. SA is a method of understanding whether a text in the form of unstructured data can determine its polarity. The polarity or sentiment of the text can be positive, neutral, or negative. Today, it is used as decision-making tools in many fields such as marketing, politics, social events, and even finance [12].

SA is closely related to NLP, so it is heavily dependent on language. Most SA study is done in English; however, research is needed in other languages. There are very few studies for the Turkish language as well [8]. Since Turkish is an agglutinative language, it makes NLP operations complicated. For example, parsing the word “yapamayacaktır” into “yap-a-ma-y-acak-tır” is quite complicated. However, in addition to the complexity, the suffixes at the end of words also help with the use of words [20].

Many methods and models such as the language model (LM) and machine learning are used in SA. LMs using deep learning methods can be given as an example for this. LMs analyze the stems of texts to provide a basis for word predictions. The LM is divided in two as one-directional and bidirectional. When given the token array as input, the one-directional LM factorizes the array and assigns a probability to this array. In bidirectional LM, on the other hand, a probability is assigned to the input array by using the left and right context of the word along with the input array and position [21]. LMs can be used pre-trained. Pre-trained LMs are neural networks surrounding text signals from a large text corpus. These models can be fine-tuned for other target tasks. In particular, LMs like BERT are used in different natural language tasks, including text classification and question answering [13]. In this study, bidirectional LMs such as BERT, DistilBERT, ALBERT, and ELECTRA were used to perform Turkish SA. While many studies are using LMs for English SA, there are very few studies for Turkish SA. LMs were used to contribute to the literature in terms of the Turkish language and to analyze the use of LMs in Turkish SA. In addition, a text filtering method (TFM) that can be performed on datasets for SA has been proposed. With the TFM method, the top “k” (5, 10, 15) most frequent words, which can express opposition or neutrality, in positive-negative labeled texts for SA are detected. Then, The contributions of this research can be summarized as follows:

Considering the Turkish literature, this is the first study comparing four LMs for Turkish SA.

TFM has been proposed so that it can eliminate some opposing words found in positive and negative labeled texts for SA.

The accuracy values of LMs with or without TFM are compared with previous studies and the positive effect of LMs has been proven by experiments.

The remainder of the article is structured as follows. Literature research on LMs for SA and Turkish SA is explained under the “Related Work” section. Section 3 describes the datasets, libraries used for preprocessing, LMs, and the proposed TFM. Experiments and results for datasets and LMs are analyzed in Section 4. Finally, the conclusions of this research and future works are discussed in the last section.

Skip 2RELATED WORK Section

2 RELATED WORK

In the literature, the uses of LMs for text classification and Turkish SA studies are given under separate subheadings. Few studies are using LMs for Turkish SA. Therefore, studies on SA for different languages have also been described in the literature.

2.1 Turkish Sentiment Analysis

Ozyurt and Akcayol [20] proposed a topic model–based method called Sentence Segment LDA for perspective-based SA. In the experimental results, they revealed that the proposed method was quite successful in extracting product aspects. Yildirim et al. [36] applied machine translation on Turkish texts and translated them into English and performed SA on these texts. They used machine learning methods on hotel and movie review datasets and analyzed that their translation operation increased accuracy. Shehu et al. [26] used deep learning algorithms for SA of Turkish tweets. They proposed three data augmentation techniques (Shift, Shuffle, and Hybrid) to improve the data diversity. As a result of the analysis, deep learning outperformed machine learning methods. Çoban et al. [7] applied SA of Facebook data for the Turkish language. They analyzed the success of deep and machine learning methods and stated that deep learning methods were more successful. Ciftci and Apaydin [5] applied Turkish SA using long short-term memory (LSTM) methods on a dataset obtained from shopping and movie websites. They showed that recurrent neural network (RNN)–based approaches improve classification accuracy. Yurtalan et al. [37] proposed a linguistically appropriate dictionary-based pole determination and calculation approach for sentence polarity in SA. They tested the proposed system for Twitter using different datasets and showed that it was more successful than the word-based SA systems previously developed for Turkish. Erşahin et al. [11] presented a hybrid approach combining dictionary-based and machine learning–based approaches. They showed that their proposed methodology increases the success for SA on Turkish hotel, movie, and tweet datasets. Ucan et al. [31] proposed an automatic translation method to create a Turkish sentiment dictionary. The proposed method is independent of language and domain. Accordingly, they suggested three sentiment dictionaries for the Turkish language from SentiWordNet. In the results obtained from the three Turkish dictionaries, the translation approach performed well on positive terms and gave more reliable results than negative ones. Catal and Nangir [4] investigated the possible benefits of the concept of multiple classifier systems to Turkish SA and proposed a new classification technique. Experimental results show that their multiple classifier system increases success. Uysal et al. [32], proposed a tool called SentiMedia to automatically classify the polarity of Turkish product reviews, which takes into account language features of texts to measure and summarize customer satisfaction. They measured the success of the proposed tool with machine learning methods and achieved a high accuracy value.

2.2 Language Models for Sentiment Analysis

Using the comments on social media platforms, Othan et al. [19] predicted the direction of the stocks of the Turkish stock market (BIST100). The CNN, RNN, LSTM methods and the BERT language model were used for classification. As a result of the analysis, they showed that the use of the Turkish BERT model increased the success. González-Carvajal and Garrido-Merchán [14] presented a review of BERT and classical NLP approaches. They tested the behavior of BERT against the traditional machine learning methods for IMDB reviews, hotel reviews, and news analysis. As a result of their experiments, they have proven the superiority of BERT. Siğirci et al. [27] performed SA analysis with Turkish comments collected on Google play. They measured the success of the BERT Turkish model on data with two and five classes. They showed that the BERT model achieved high success with low amount of data. Sousa et al. [29] used the BERT model to perform SA of news articles. They fine-tuned a BERT model on the dataset of stock market articles and achieved 72.5% of the F-score. Acikalin et al. [2] used the multilingual BERT model for Turkish SA. They analyzed movie and hotel review datasets through English translation and achieved high accuracy. Singh et al. [28] performed SA on tweets using the BERT model to understand people’s mental states about COVID. They analyzed the success of the model for two datasets and obtained an accuracy value of 94%. Li et al. [17] performed SA of investors on the stock market reviews. First, they extracted the sentiment value from the information published by the investors with the BERT model. These sentiment values are then weighted to calculate the sentiment indicator. They showed that the BERT model used gave better results than both LSTM and SVM methods. Guven [15] applied sentiment analysis on Turkish Tweets. Machine learning methods and the BERT Turkish model were used for classification. As a result of the evaluations, it has been shown that the BERT model is more successful than machine learning methods. Farha and Magdy [1] evaluated the performance of Arabic pre-trained ELECTRA, ALBERT, and BERT LMs on Arabic SA. They have demonstrated that the ELECTRA is one of the best-performing models. Büyüköz et al. [3] tested the ELMo and DistilBERT models on socio-political and local English news. They showed that DistilBERT transfers general semantic information better than ELMo. Pipalia et al. [22] analyzed the success of pre-trained LMs such as BERT, DistilBERT, and XLNet in SA. The XLNet model achieved the most successful result on the IMDB review dataset. Pota et al. [23] proposed a different approach for Twitter SA. First, they converted tweet jargon into plain text with procedures applicable to different languages. Then, they classified the obtained tweets using the BERT model. Tokgoz et al. [30] used pre-trained Turkish BERT and DistilBERT language models for Turkish news classification. They performed analysis using different tokenization methods. As a result of the analysis, they showed that the DistilBERT model was more successful than BERT. Van Thin et al. [33], used supervised learning methods to compare performance between task approaches based on deep learning frameworks and BERT architecture trained on the Vietnamese language. They showed that the multitasking approach based on BERT architecture is more successful than neural network architectures and single-task approaches.

Skip 3METHODOLOGY Section

3 METHODOLOGY

In this study, certain processes are applied sequentially. The stages of the study are shown in Figure 1. First, preprocesses are performed on the hotel and movie datasets. Then, these datasets are trained with pre-trained LMs. The trained models are evaluated with test data and the accuracy values of the LMs are obtained. In addition, the accuracy value of these models is measured by retraining the LMs by applying the TFM on hotel and movie datasets.

Fig. 1.

Fig. 1. The stages of this study.

3.1 Datasets Description

Turkish hotel and movie review datasets1 created by Hacettepe University for SA are used. Movie reviews were obtained from beyazperde.com and hotel reviews from otelpuan.com. All movie reviews have been rated by the authors according to the stars on the site. One or two stars are labeled as negative and four or five stars are labeled positive. Hotel reviews were rated from 0 to 100. A score of 0 to 40 was chosen as negative, and a score of 80 to 100 was chosen as positive. These score distributions were determined as a result of the evaluation of the sentences by experts. The chart of the distribution of the datasets is shown in Figure 2. Positive and negative labels for both datasets are half in the training and test set [31]. In Figure 3, the visualization of the datasets is provided with word clouds.

Fig. 2.

Fig. 2. Distribution of the datasets.

Fig. 3.

Fig. 3. Word clouds for each dataset (Commonly used words: güzel (beautiful), otel (hotel), oda (room), tatil (holiday), yemek (eat), kalmak (stay), berbat (awful), iyi (good), kötü (bad), temiz (clean), havuz (pool), izlemek (watch), değil (not), film (movie), etc.).

3.2 Libraries

Libraries belonging to Python and Java are used to remove stopwords in the texts, correct misspelled words, and lemmatize words. The analyses of these processes are explained in detail under Section 4.1.

3.2.1 Removing Stopwords.

The stopwords in the NLTK2 library are used for the Turkish language. Since the word list in the library is scarce, the stopword list has been expanded.

3.2.2 Text Spelling Correction.

Some NLP tasks need to be normalized as a preprocessing step before applying actual algorithms to the text. Normalization helps better results, especially in areas such as social media and forum texts, chat, and messaging or bot apps. This tool can be used to correct misspelled words or informal speech in noisy texts. Zemberek3 uses various heuristics, searching tables, and LMs for text normalization. First, words are created from a clean and noisy corpus using morphological analysis. With some heuristics and LMs, some words need to be split into two. True, false, and possibly false sets are generated from the compilation. For each noisy word in a sentence, candidates are collected from searching tables, informal and ASCII matching morphological analysis, and a spelling checker. When the Viterbi algorithm is run on candidate words with LM scoring, there is probably a normalized word in the correct order.

3.2.3 Lemmatization.

Zeyrek4 is a python morphological analyzer and descriptor for the Turkish language. It can perform morphological analysis of Turkish text, and return all possible parsings for each word and all possible basic word forms by splitting words into words. This library is used for lemmatization in this study.

3.3 Language Models

3.3.1 BERT.

The BERT model is defined as a Transformer-based bidirectional encoder representation. BERT produces multiple, contextual, and bidirectional word representations. BERT uses a new training tool with the “masked language model” (MLM) method. The MLM randomly masks some tokens in the input. Its purpose is to guess the actual meaning of the masked word based only on its context [10].

BERT’s structure includes pre-training and fine-tuning stages. In pre-training, the model is trained with unlabeled data on different pre-training stages. In the fine-tuning phase, the model is first initialized with pre-trained parameters. All parameters are then fine-tuned using labeled data from downstream tasks [10].

In terms of size, the BERT model has Base and Large types. While the Base model is trained with fewer parameters and layers, the Large model uses more parameters and layers. The model also has multilingual support. The multilingual BERT vocabulary includes 104 languages, including Turkish. The multilingual BERT model is based on the word piece that deals with unknown words to represent word vectors [18].

3.3.2 DistilBERT.

DistilBERT was developed using the BERT model. This model is smaller in size and faster than pre-trained BERT on the same corpus. It is automatically pre-trained on raw text only to generate inputs and labels from texts [24]. This model is pre-trained with three objects as follows:

Distillation loss: The process of training the BERT model to obtain the same probabilities.

MLM: This is part of the BERT model. When retrieving a sentence, the model randomly masks some of the words in the input.

Cosine embedding loss: The BERT model is as well trained to generate hidden states.

3.3.3 ELECTRA.

ELECTRA is used to pre-train transformer networks with less computation than BERT. It has been applied to Transformer [34] text encoders. ELECTRA models distinguish between “actual” and “fake” input tokens produced by another neural network. The purpose of this model is to train a text encoder to distinguish input tokens from high-quality negative samples produced by a small transformer network. Compared to masked language modeling, it was more computationally efficient and performed better on downstream tasks [6].

3.3.4 ALBERT.

Due to the memory limit and communication overhead problem, ALBERT architecture has been developed with much fewer parameters than the BERT architecture. ALBERT includes two-parameter reduction techniques that remove major obstacles to scaling pre-trained models. The first technique is factorized embedding parameterization. The matrix parses the large vocabulary into two smaller matrices and decomposes the size of the hidden layers from the size of word embedding in this technique. The second technique is parameter sharing between layers. This technique prevents the parameter from growing with the depth of the network [16].

3.4 Text Filtering Method

TFM has been proposed to remove some words that cause oppositions in the positive-negative labeled data in the datasets and to increase the accuracy value. First, the positive and negative labeled texts in the training set, which are preprocessed, are separated. The count of all words on the texts with positive and negative labels is calculated separately. The top “k” most frequently used words in positively labeled texts are selected to be removed from negatively labeled texts. At the same time, the top “k” most frequently used words in negatively labeled texts are used to be removed from positively labeled texts. Thus, it is aimed to remove the words that cause the opposite sentiments in positive and negative sentences. In addition, neutral words that don’t express sentiment can be included in these top “k” words. These top “k” words can be found in common for positive and negative labeled texts. By removing these words, the amount of data is reduced. The “k” value was used as 5, 10, and 15, respectively. Considering that TFM removes neutral and opposite sentiment words in texts and reduces the amount of data, it is predicted that the accuracy value may increase. The pseudocode of the TFM is shown in Figure 4.

Fig. 4.

Fig. 4. The pseudocode of TFM.

Skip 4EXPERIMENTS AND RESULTS Section

4 EXPERIMENTS AND RESULTS

4.1 Datasets Preparation

Before the datasets are trained on LMs, the texts are first preprocessed. In the preprocessing stage, punctuation marks removal, text conversion to lowercase, text spelling correction, and removal of stopwords are applied for each dataset. Then, lemmatization is performed for the remaining words in the texts and this text document is given to the LMs for training each dataset.

In most social media texts, there are situations such as missing words and using too many letters in words. Therefore, it is important that the text spelling correction process is performed before the text document is exported to any model. An example of spelling correction applied with Zemberek for the datasets is given in Table 1. As a result of this process, the words found in the wrong form were corrected, although they were the same.

Table 1.
DatasetStageText
TR: Cok guzel ve eglencesi bol bır tatıl gecırdım butun departmanlara bu guzel tatıl ıcın tessekkur edıyorum
HotelFirst versionEN: I had a veryy nicee and fun holidey, I would like to thanq all departments for this beuatiful holidey.
TR: Çok güzel ve eğlencesi bol bir tatil geçirdim bütün departmanlara bu güzel tatil için teşekkür ediyorum
Spelling correctionEN: I had a very nice and fun holiday, I would like to thank all departments for this beautiful holiday.
TR: süpper bir film gidin izleyinn süperrr
MovieFirst versionEN: supper a movie go wattch it, superrrr
TR: süper bir film gidin izleyin, süper
Spelling correctionEN: super a movie go watch it, super

Table 1. Spelling Correction Process in Texts (TR: Turkish, EN: English)

After the spelling correction process, stopwords that don’t express specific meaning were removed from the texts. An example of removal stopwords is shown in Table 2. Thus, the volume of data to be used in the training phase has decreased.

Table 2.
DatasetStageText
TR: Çok güzel ve eğlencesi bol bir tatil geçirdim bütün departmanlara bu güzel tatil için teşekkür ediyorum
HotelNormal TextEN: I had a very nice and fun holiday, I would like to thank all departments for this beautiful holiday.
TR: güzel eğlencesi bol tatil geçirdim departmanlara güzel tatil teşekkür ediyorum
Removal stopwordEN: nice fun holiday thank departments beautiful holiday
TR: süper bir film gidin izleyin, süper
MovieNormal TextEN: super a movie go watch it, super
TR: süper film gidin izleyin süper
Removal stopwordEN: super movie go watch super

Table 2. Removal of Stopwords

To use the words uniformly, the lemmatization process was applied to the texts in the datasets. The same words that have suffixes in different forms were obtained in a single form as a result of the lemmatization. Thus, it is predicted that a more accurate result will be formed for text classification. An example for lemmatization is given in Table 3.

Table 3.
DatasetStageText
TR: güzel eğlencesi bol tatil geçirdim departmanlara güzel tatil teşekkür ediyorum
HotelNormal TextEN: nice fun holiday thank departments beautiful holiday
TR: güzel eğlence bol tatil geç departman güzel tatil teşekkür et
LemmatizationEN: nice fun holiday thank department beautiful holiday
TR: süper film gidin izleyin süper
MovieNormal TextEN: super movie go watch super
TR: süper film git izle süper
LemmatizationEN: super movie go watch super

Table 3. Lemmatization Process in Texts

In addition, the proposed TFM was performed to hotel and movie datasets. With TFM, words with a negative meaning in positively labeled texts and words with positive meaning in negatively labeled texts were removed. At the same time, neutral words in top “k” most frequent words for each label that wouldn’t affect the classification were also removed. The removed top “k” most frequent words as a result of applying TFM for hotel and movie datasets are shown in Table 4. The table shows that the neutral words “otel (hotel), yemek (eat), kalmak (stay), oda (room)” for the hotel dataset, and the neutral words “film (movie), izlemek (watch), sinema (cinema), sahne (scene)” for the movie dataset are common in positive and negative labeled texts, so these words were deleted from the entire dataset (Words in Table 4: berbat (awful), kötü (bad), değil (not), güzel (beautiful), iyi (good), memnun (glad), tavsiye (advice), harika (wonderful), mükemmel (perfect), etc.).

Table 4.
HotelMovie
NegativePositiveNegativePositive
WordCountWordCountWordCountWordCount
otel7,781güzel2,130film29,967film26,039
yemek3,314iyi1,187izlemek4,722izlemek5,677
oda2,680otel1,061kötü3,090iyi4,897
tatil1,949memnun918değil3,069güzel4,357
berbat1,770kalmak849oyun2,477oyun2,540
kötü1,528yemek774konu2,184iz2,436
değil1,504tesis731sahne2,107sinema1,742
kalmak1,475tavsiye633sinema1,757harika1,716
havuz1,376hizmet524senaryo1,667mükemmel1,492
para1,241personel497anlamak1,636sahne1,411
temiz1,231oda466iz1,487değil1,391
iyi1,044temiz425beğenmek1,435anlamak1,270
güzel947havuz362puan1,314konu1,248
personel944animasyon284arkadaş1,271hayat1,136
rezalet901tatil283hayal1,166süper1,101

Table 4. The Removed Top 15 Words According to Train Sets by TFM

As a result of preprocessing and applying TFM to hotel and movie datasets, the amount of data to be used is considerably decreased. This also saves time for us. In addition, a more accurate classification process is performed after the words are in different forms but the same is translated into the same form. The change of the term count in the datasets after the processes is shown in Table 5. The table shows that while the total word count decreased by approximately 47%, the unique word count decreased by approximately 86% for hotel and movie datasets after preprocessing. Also, the total word count decreased by approximately 11.5% for hotel and 18.8% for movie dataset after the TFM process.

Table 5.
HotelMovie
Total word count916,2502,007,114
Total word count after preprocessing488,3561,062,197
Unique word count91,458199,038
Unique word count after preprocessing11,09727,506
Total word count after TFMk = 5450,097902,943
Unique word count after TFM11,09627,503
Total word count after TFMk = 10432,785864,698
Unique word count after TFM11,09427,499
Total word count after TFMk = 15405,032836,205
Unique word count after TFM11,08827,494

Table 5. Change of Word Counts in Datasets After Processes

Statistics of words expressing opposite sentiment in removed words were analyzed. As a result of applying TFM, approximately 56,000 words are removed for the hotel and 190,000 words are removed for the movie dataset. These words include “iyi (good), güzel (beautiful), harika (wonderful),” and so on. with a positive meaning or “kötü (bad), değil (not), berbat (terrible)” with a negative meaning. The statistical graph of removing words from opposite labeled text is shown in Figure 5. For the hotel and movie dataset, respectively, 11.6% and 7.4% of words deleted from the opposite label prevent misclassification.

Fig. 5.

Fig. 5. Statistics of words expressing opposite sentiment (k = 10).

4.2 Analysis and Comparison of Language Models

BERT, DistilBERT, ELECTRA, and ALBERT LMs were used to analyze sentiment on datasets. In these LMs, there are models created for Turkish. These models can be specified as Bert-Multilingual,5 BERT-Tr,6 DistilBERT-Tr,7 ALBERT-Tr,8 and ELECTRA-Tr.9 Details of language models for Turkish are shown in Table 6. The models were trained with Turkish hotel and movie corpus.

Table 6.
Language ModelInformation
BERT-MultilingualThe model is pre-trained with Wikipedia articles in 102 languages. The entire Wikipedia dump for each language was taken as training data. However, the size of Wikipedia for a given language varies widely.
BERT-TrTraining was carried out only with the Turkish corpus. The model was trained with a filtered and sentence segmented version of the Turkish OSCAR corpus (35 GB), which is a Wikipedia dump.
DistilBERT-TrSince it uses information distillation in the pre-training phase, its size is quite small compared to the BERT model. The model was trained on the original Turkish training data (7 GB) using the cased version of BERTurk [25].
ALBERT-TrIt has quite fewer parameters than the BERT architecture, and implements two parameter reduction techniques. The dataset (200 GB) consisting of texts such as online blogs, free e-books, newspapers, Twitter, articles, Wikipedia, and so on, was used for training.
ELECTRA-TrIt uses less calculations than BERT in pre-training. Instead of the MLM method in BERT, it trains the text encoder that distinguishes input tokens. The model was trained with the Turkish OSCAR corpus (35 GB), which is a Wikipedia dump.

Table 6. The Details of the Turkish LMs

New LMs were trained using preprocessed datasets and these models in this study. The training phase was performed with Google COLAB’s GPU processor (Tesla K80 GPU, 12.89 GB Ram, 65 GB Disk Size). The accuracy values of the test sets were measured on the trained new LMs. The accuracy values obtained by all LMs for the datasets and file sizes of LMs are given in Table 7. The table shows that BERT-Tr for hotel and ELECTRA-Tr for movie dataset were the most successful models.

Table 7.
Hotel Accuracy (%, MB)Movie Accuracy (%, MB)
New BERT-Multilingual78.46 (653.8 MB)80.88 (653.8 MB)
New BERT-Tr90.18 (432.2 MB)90.47 (432.2 MB)
New DistilBERT-Tr89.27 (266 MB)89.72 (266 MB)
New ALBERT-Tr89.65 (46.6 MB)89.30 (46.6 MB)
New ELECTRA-Tr89.29 (432 MB)90.67 (432 MB)

Table 7. The Accuracy Values and File Sizes of the Trained LMs on the Datasets Without TFM

Then, the effect of the proposed TFM on the accuracy value of these LMs was analyzed. The datasets passed through this filtering technique were retrained on all LMs with the same parameters according to “k” values (5, 10, 15). The accuracy value of the test sets on the retrained LMs and file sizes of LMs is shown in Table 8. TFM has increased accuracy value positively as it removes the oppositions and neutral words in positive or negative texts.

Table 8.
Hotel Accuracy (%)Movie Accuracy (%)Model’s Size (MB)
k = 5k = 10k = 15k = 5k = 10k = 15(Hotel, Movie)
Retrained BERT-Multilingual94.7695.8893.6582.8284.6983.30653.8, 653.8 MB
Retrained BERT-Tr98.0197.8696.3692.0492.1190.96432.2, 432.2 MB
Retrained DistilBERT-Tr97.7497.6795.1291.3492.0090.89266, 266 MB
Retrained ALBERT-Tr97.1096.6394.1291.7591.7190.4346.6, 46.6 MB
Retrained ELECTRA-Tr97.7498.3895.3192.0892.2191.48394.2, 432 MB

Table 8. The Accuracy Value and File Sizes of Retrained LMs with TFM

The accuracy chart of the LMs obtained after the preprocessing and utilizing TFM is shown in Figure 6. The figure shows only the results for k = 10, where TFM gives the best results on LMs. For the hotel dataset, the new BERT-Tr model was the most successful with 90.18%, while the retrained ELECTRA-Tr model achieved an accuracy value of 98.38% with the TFM. For the movie dataset, while the new ELECTRA-Tr was the most successful with 90.67%, the accuracy value reached 92.21% with retrained ELECTRA-Tr after utilizing TFM. The figure shows that the accuracy value of all trained LMs10 has increased with TFM. When all the models are compared, BERT-Multilingual was the least successful in both datasets. The reason for this is that the model is trained with articles in 102 languages, not only in Turkish. Thus, since this model uses data in many languages during the testing phase, its accuracy values are quite low compared to other models. Other models were trained only in Turkish. ELECTRA-Tr, ALBERT-Tr, and DistilBERT-Tr are derived from BERT by changing the structure and parameter numbers of BERT models. Therefore, the accuracy value between these models can be close.

Fig. 6.

Fig. 6. The accuracy (%) chart of all models (TFM for k = 10).

There are many studies using hotel and movie datasets. The effect of these LMs was compared with previous studies. The accuracy value of previous studies and our LMs are shown in Table 9. Among the trained LMs, only the most successful ones are selected for this table. Compared to the literature, trained LMs and utilizing TFM have been quite successful. The highest accuracy was achieved at 98.38% for hotel and 92.21% for movie datasets with TFM for k = 10.

Table 9.
Hotel Accuracy (%)Movie Accuracy (%)
Ucan et al. [31]80.7084.60
Erşahin et al. [11]91.9686.31
Yildirim et al. [36]86.0083.00
New BERT-Tr (Hotel), ELECTRA-Tr (Movie)90.1890.67
Retrained BERT-Tr (Hotel), ELECTRA-Tr (Movie) with TFM k = 598.0192.08
Retrained ELECTRA-Tr with TFM k = 1098.3892.21
Retrained BERT-Tr (Hotel), ELECTRA-Tr (Movie) with TFM k = 1596.3690.67

Table 9. Comparison of Previous Studies and our LMs

Skip 5DISCUSSION Section

5 DISCUSSION

First, the analysis of four LMs for the SA of Turkish hotel and movie reviews was performed in this study. It has contributed to the literature as there is no study comparing the BERT, ALBERT, DistilBERT, and ELECTRA models for Turkish. When the accuracy values of the LMs are compared with the previous studies, good results have been obtained for the hotel and movie datasets. Since LMs are created by using deep learning methods and the structure of the language, they give more successful results than previous studies.

On the other hand, the proposed TFM method applies filtering according to frequently used words in positive-negative labeled texts. Words expressing opposition or neutral words in these frequently used top “k” words are deleted from other labeled texts (k = 5, 10, 15). Thus, the amount of data decreased by approximately 11.5% for the hotel and 18.8% for the movie dataset. When the LMs were re-analyzed with the datasets obtained as a result of TFM for k = 10, the accuracy values increased in both datasets and gave the state-of-the-art performance. Compared with previous studies, the application of LM and TFM increased the accuracy value for 6.5% hotel and 5.75% movie dataset. The accuracy chart of previous studies and LMs is shown in Figure 7. In TFM analysis, the first 5, 10, and 15 words are removed from the datasets, respectively. When the accuracy values for these “k” values are examined, the reason why the success for the k = 5 value is less than the k = 10 value is that the opposite words can’t be completely deleted from the datasets. For the k = 15 value, it is less successful than the k = 10 value since many neutral words are deleted from the datasets. Therefore, the most appropriate value of k should be determined.

Fig. 7.

Fig. 7. The accuracy chart of trained LMs with previous studies.

When the file sizes of the LMs are examined, the sizes of ALBERT and DistilBERT are lower than the other LMs. Since parameter reduction and information distillation are applied in these models, these models’ sizes are small. Although the models are small in size, they are quite close to the success of other LMs. The use of the ALBERT and DistilBERT models can be considered when time performance is important to the user. The ELECTRA model, which uses fewer calculations and has structural changes, is the most successful in terms of performance.

Skip 6CONCLUSION AND FUTURE WORKS Section

6 CONCLUSION AND FUTURE WORKS

In this study, the effect of LMs was analyzed for SA. LMs trained only after preprocessing achieved an accuracy value of more than 90% in both hotel and movie datasets. Compared to previous studies, the new ELECTRA-Tr model was most successful for the movie dataset. The hotel, on the other hand, achieved success close to the previous studies with the new BERT-Tr model. Then, hotel and movie datasets were updated with the proposed TFM to eliminate some opposite sentiment words in the positive and negative labeled texts. As a result of retrained LMs with TFM, the accuracy value of these models increased considerably. While the accuracy value increase between 6% and 17.5% was achieved for the hotel dataset, there was an increase in accuracy value between 1.6% and 3.8% for the movie. It shows that there are many opposite sentiment words in these texts. The reason why the accuracy value increases more in the hotel dataset is that more words express opposition. The ELECTRA-Tr model, which was retrained as a result of utilizing TFM, was the most successful model with an accuracy rate of 98.38% for the hotel dataset and 92.21% for the movie dataset.

In terms of contributing to the literature, trained LMs and utilizing TFM were compared with previous studies. The results show that LMs and TFM can increase the accuracy value for SA. Compared with previous studies, the application of LMs and TFM increased the accuracy value for 6.5% hotel and 5.75% movie dataset. Therefore, it would be advantageous to use LMs and TFM in SA studies.

In terms of contributing to the Turkish language in future works, it is thought to use LMs in different studies such as analysis of Turkish tweets, spam detection, sarcasm detection, or news headlines classification. The TFM is also intended to be used for SA such as tweets and comments. In addition, it will be beneficial for our language to apply LMs in different areas like question answering, text generation, text summarization, and machine translation for the Turkish language.

Footnotes

REFERENCES

  1. [1] Farha Ibrahim Abu and Magdy Walid. 2021. Benchmarking transformer-based language models for Arabic sentiment and sarcasm detection. In Proceedings of the 6th Arabic Natural Language Processing Workshop. 2131. https://www.aclweb.org/anthology/2021.wanlp-1.3.Google ScholarGoogle Scholar
  2. [2] Acikalin Utku Umur, Bardak Benan, and Kutlu Mucahid. 2020. Turkish sentiment analysis using BERT. In Proceedings of the 2020 28th Signal Processing and Communications Applications Conference (SIU’020). Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Büyüköz Berfu, Hürriyetoğlu Ali, and Özgür Arzucan. 2020. Analyzing ELMo and DistilBERT on socio-political news classification. Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020May (2020), 918. https://www.aclweb.org/anthology/2020.aespen-1.4.Google ScholarGoogle Scholar
  4. [4] Catal Cagatay and Nangir Mehmet. 2017. A sentiment classification model based on multiple classifiers. Applied Soft Computing Journal 50 (2017), 135141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Ciftci Basri and Apaydin Mehmet Serkan. 2019. A deep learning approach to sentiment analysis in Turkish. In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP’18). Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Clark Kevin, Luong Minh-Thang, Le Quoc V., and Manning Christopher D.. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generators. arxiv:2003.10555. Retrieved from http://arxiv.org/abs/2003.10555.Google ScholarGoogle Scholar
  7. [7] Çoban Önder, Özel Selma Ayşe, and Inan Ali. 2021. Deep learning-based sentiment analysis of Facebook data: The case of Turkish users. Computer Journal 64, 3 (2021), 473499. Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Dehkharghani Rahim, Saygin Yucel, Yanikoglu Berrin, and Oflazer Kemal. 2016. SentiTurkNet: A Turkish polarity lexicon for sentiment analysis. Language Resources and Evaluation 50, 3 (2016), 667685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Dehkharghani Rahim, Yanikoglu Berrin, Saygin Yucel, and Oflazer Kemal. 2017. Sentiment analysis in Turkish at different granularity levels. Natural Language Engineering 23, 4 (2017), 535559. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Devlin Jacob, Chang Ming Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), Vol. 1. 41714186. arxiv:1810.04805.Google ScholarGoogle Scholar
  11. [11] Erşahin Buket, Aktaş Özlem, Kilinç Deniz, and Erşahin Mustafa. 2019. A hybrid sentiment analysis method for Turkish. Turkish Journal of Electrical Engineering and Computer Sciences 27, 3 (2019), 17801793. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Esichaikul Vatcharaporn and Phumdontree Chawisa. 2018. Sentiment analysis of Thai financial news. In Proceedings of the 2018 2nd International Conference on Software and E-Business (ICSEB’18). Association for Computing Machinery, New York, NY, 3943. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Gao Luyu, Dai Zhuyun, and Callan Jamie. 2020. Understanding BERT rankers under distillation. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval (ICTIR’20). Association for Computing Machinery, New York, NY, 149152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] González-Carvajal Santiago and Garrido-Merchán Eduardo C.. 2020. Comparing BERT against traditional machine learning text classification. arxiv:2005.13012. Retrieved from http://arxiv.org/abs/2005.13012.Google ScholarGoogle Scholar
  15. [15] Guven Zekeriya Anil. 2021. Comparison of BERT models and machine learning methods for sentiment analysis on Turkish tweets. In Proceedings of the 2021 6th International Conference on Computer Science and Engineering (UBMK’21). 98101. Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Lan Zhenzhong, Chen Mingda, Goodman Sebastian, Gimpel Kevin, Sharma Piyush, and Soricut Radu. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arxiv:1909.11942. Retrieved from http://arxiv.org/abs/1909.11942.Google ScholarGoogle Scholar
  17. [17] Li Menggang, Li Wenrui, Wang Fang, Jia Xiaojun, and Rui Guangwei. 2021. Applying BERT to analyze investor sentiment in stock market. Neural Computing and Applications 33, 10 (2021), 46634676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Mao Cunli, Man Zhibo, Yu Zhengtao, Gao Shengxiang, Wang Zhenhan, and Wang Hongbin. 2021. A neural joint model with BERT for Burmese syllable segmentation, word segmentation, and POS tagging. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 4 (May2021), Article 54, 23 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Othan Derya, Kilimci Zeynep Hilal, and Uysal Mitat. 2019. Financial sentiment analysis for predicting direction of stocks using bidirectional encoder representations from transformers (BERT) and deep learning models. In International Conference on Innovative and Intelligent Technologies, Istanbul, Turkey. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Ozyurt Baris and Akcayol M. Ali. 2021. A new topic modeling based approach for aspect extraction in aspect based sentiment analysis: SS-LDA. Expert Systems with Applications 168 (2021). Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Petroni Fabio, Rocktäschel Tim, Lewis Patrick, Bakhtin Anton, Wu Yuxiang, Miller Alexander H., and Riedel Sebastian. 2020. Language models as knowledge bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19),24632473. arxiv:1909.01066Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Pipalia Keval, Bhadja Rahul, and Shukla Madhu. 2020. Comparative analysis of different transformer based architectures used in sentiment analysis. In Proceedings of the 2020 9th International Conference on System Modeling and Advancement in Research Trends (SMART’20). 411415. Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Pota Marco, Ventura Mirko, Catelli Rosario, and Esposito Massimo. 2021. An effective BERT-based pipeline for Twitter sentiment analysis: A case study in Italian. Sensors (Switzerland) 21, 1 (2021), 121. Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Sanh Victor, Debut Lysandre, Chaumond Julien, and Wolf Thomas. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arxiv:1910.01108. Retrieved from http://arxiv.org/abs/1910.01108.Google ScholarGoogle Scholar
  25. [25] Schweter Stefan. 2020. BERTurk—BERT Models for Turkish. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Shehu Harisu Abdullahi, Sharif Md Haidar, Sharif Md Haris Uddin, Datta Ripon, Tokat Sezai, Uyaver Sahin, Kusetogullari Huseyin, and Ramadan Rabie A.. 2021. Deep sentiment analysis: A case study on stemmed Turkish Twitter data. IEEE Access 9 (2021), 5683656854. Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Siğirci İbrahim Onur, Özgür Hakan, Oluk Abdullah, Uz Harun, Çetiner Emrah, Oktay Hande Uzun, and Erdemir Kaan. 2020. Sentiment analysis of Turkish reviews on Google Play store. In Proceedings of the 2020 5th International Conference on Computer Science and Engineering (UBMK’20). IEEE, 314315.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Singh Mrityunjay, Jakhar Amit Kumar, and Pandey Shivam. 2021. Sentiment analysis on the impact of coronavirus in social life using the BERT model. Social Network Analysis and Mining 11, 1 (2021). Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Sousa Matheus Gomes, Sakiyama Kenzo, Rodrigues Lucas De Souza, Moraes Pedro Henrique, Fernandes Eraldo Rezende, and Matsubara Edson Takashi. 2019. BERT for stock market sentiment analysis. In Proceedings of the International Conference on Tools with Artificial Intelligence (ICTAI), Vol. 2019-November. 15971601. Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Tokgoz Meltem, Turhan Fatmanur, Bolucu Necva, and Can Burcu. 2021. Tuning language representation models for classification of Turkish news. In 2021 International Symposium on Electrical, Electronics and Information Engineering. 402407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Ucan Alaettin, Naderalvojoud Behzad, Sezer Ebru Akcapinar, and Sever Hayri. 2017. SentiWordNet for new language: Automatic translation approach. In Proceedings of the 12th International Conference on Signal Image Technology and Internet-Based Systems (SITIS’16). 308315. Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Uysal Elif, Yumusak Semih, Oztoprak Kasim, and Dogdu Erdogan. 2017. Sentiment analysis for the social media: A case study for Turkish general elections. In Proceedings of the SouthEast Conference (ACM SE’17). Association for Computing Machinery, New York, NY, 215218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Thin Dang Van, Nguyen Ngan Luu-Thuy, Truong Tri Minh, Le Lac Si, and Vo Duy Tin. 2021. Two new large corpora for Vietnamese aspect-based sentiment analysis at sentence level. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 4 (May2021), Article 62, 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 2017-December. 59996009. arxiv:1706.03762Google ScholarGoogle Scholar
  35. [35] Xu Guixian, Meng Yueting, Qiu Xiaoyu, Yu Ziheng, and Wu Xu. 2019. Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7 (2019), 5152251532. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Yildirim Mustafa, Okay Feyza Yildirim, and Ozdemir Suat. 2020. Sentiment analysis for Turkish unstructured data by machine translation. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data 2020). 48114817. Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Yurtalan Gökhan, Koyuncu Murat, and Turhan Çiğdem. 2019. A polarity calculation approach for lexicon-based Turkish sentiment analysis. Turkish Journal of Electrical Engineering and Computer Sciences 27, 2 (2019), 13251339. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The Comparison of Language Models with a Novel Text Filtering Approach for Turkish Sentiment Analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 2
          February 2023
          624 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3572719
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 December 2022
          • Online AM: 17 August 2022
          • Accepted: 9 August 2022
          • Revised: 2 June 2022
          • Received: 16 June 2021
          Published in tallip Volume 22, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)717
          • Downloads (Last 6 weeks)115

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!