Adaptive Learning of User's Interests

Today, users are confronted with a constant overflow of information. Alternative concepts are urgently needed to tackle this problem. This paper presents a new adaptive learning algorithm and managing the interests of users. This method utilizes a fresh approach of reinforcement learning. Furthermore, a framework is constructed, and the initial outcomes of the experiments are presented and discussed. Lastly, an intelligent webcrawler with a user-friendly interface is introduced which can propose further literature based on a user model.


INTRODUCTION
One of the consequences of the digital transformation is that users are exposed to a constantly expanding overflow of all kinds of information.Thus context based filtering of information based on interest is becoming more important than simply finding information.
For example, the so-called medical dilemma is well-known: fresh doctors, who have just completed their studies, immediately find themselves immersed in large hospitals or medical practices.While still inexperienced they are confronted with a huge number of symptoms, many of which can't always be pinned down to a specific illness, as seen in textbook examples or demonstrations.Furthermore, some diseases have very similar symptoms, which are hard to distinguish or are related to rarer diseases unfamiliar to an inexperienced doctor.Medical recommender systems can help such physicians to significantly minimize the risk of mistakes and incorrect treatments.
More broadly, the role of traditional libraries is rapidly changing.Libraries once were a place to collect, catalog and maintain tremendous unique literary and scholastic materials on paper.Today they instead manage literature in multimedia formats and offer an array of services via different net-based interfaces.Nevertheless, indexes and search procedures remain primarily based on keywords [7,16].
Thus the next step of the digital transformation may entirely change the role of libraries.While classic libraries tried to build up complete or at least comprehensive collections of the world's literature, today's online users are already flooded with unrated [unrelated?] information in quantities impossible to humanly manage.Therefore, it becomes more and more important to filter out relevant information depending on a respective user's interests.
Thus, libraries must be converted into entities which filter the correct and necessary information for their users along with predicting what requests may come next next (information literacy) [15].
To model user interests, an intelligent webcrawler must meet the following criteria: • learn about the users' interests; • recognize its current knowledge; • daily filter out a small set of the most fitting texts for a user such that it can be handled in the available time; • allows the user to evaluate the output of the system and therefore support its enduring learning process with the possibility to adapt to changing needs of the user • adapt learning rates to the specific users' situation and avoid off-line learning phases.
In the following sections, a respective system will be introduced.First, the new conceptual approach is presented, for which an innovative kind of (discrete) reinforcement learning is applied.Based on this, a corresponding system architecture is derived and the experimental results are given and discussed.

CONCEPTUAL APPROACH 2.1 Co-occurrences and TRC's
A co-occurrence graph represents a network of relationships and frequencies of co-occurrence between various terms in a given dataset.Co-occurrence graphs are useful in natural language processing to uncover patterns of association and connectivity between terms.Text Representing Centroids (TRCs) are used to capture the main topics or concepts of a given set of documents.TRCs are generated by analyzing the content and identifying the most common or representative words and phrases [9].

Reinforcement learning of interests
Reinforcement Learning (RL) can be utilized to learn user interests by creating a feedback loop where an agent takes actions to interact with a user and receives rewards or penalties based on the user's feedback [5].Over time, the agent builds a model of the user's interests and can adapt gradually to shifting user preferences.Therefore, the agent must continuously update its internal representation of a user's [users' is plural so "users' knowledge" refers to the knowledge of multiple users] knowledge and current interests.The explicit or implicit user feedback e.g.via a click on a button, can be used to adapt the user representation of the agent.

Proposed Learning Architecture
The major concept is based on the use of co-occurrence graphs rather than neural network based (deep) learning processes in order to avoid offline phases.It has been inspired from similar processes in the human brain [1].Related approaches using co-occurrence graphs have already been made in text translation [4] and different NLP tools, e.g.[3].
Initially, it is assumed that the user presents texts of interest to the system, which uses those texts to establish a local, user dependent co-occurrence Graph  = ( , ) as described in the literature (e.g. in [13]).For subsequent evaluation procedures, a interest index  with  | → [−1, +1] is introduced, such that,  = ( , ,  ( )) and every word in    ∈  will later be assigned a user's interest evaluation  (  ) by our method.Initially, all evaluation values  (  ) are set to zero (i.e.obtain a neutral evaluation).
In an iterative process, the system presents new documents   to the user, usually a small, manageable set of K documents.Every document can be categorized using the technique of text-representing centroids (TRC), introduced in [9] and applied several times, for instance in [10,13,14].Hereby, each document   is represented by the word, determined by its TRC, i.e.   (  ).By giving one to five stars as an evaluation to each presented document   the user may express their interest in the respective text, corresponding to a numerical evaluation  (  ) of −1, −0.5, 0, +0.5 and +1, respectively.
Subsequently, after the user has read and evaluated a document   , the given value to  (  ) may be used to update the evaluation and therefore the interest given to all documents categorized by   (  ).
To allow an adaptation process, not just a simple average value, but a sliding [or rolling?] average value shall be used.It has to take into consideration the most recently processed documents with a higher weight and give lower attention (i.e. a lower weight) to documents read further in the past.
Therefore, a new weighted sliding average value for  (  (  ), ) can be calculated from the previous value  (  (  ),  − 1) by wherein  refers to the overall number of (weighted) values within the sliding average  (  (  ), ) (Note that for  <  all evaluation values are assumed to be neutral, i.e. set to 0.) It is obvious that through the successive weighting of older values by are considered in  (  (  ), ).With this pre-knowledge, the work of an intelligent document crawler is described by the following algorithm (see also Fig. (c) IF   never read before, THEN add all co-occurrences found in   to L. (A hash database shall be used for identifying read documents.)(6) After a given time  =    shall be adapted (i.e.reduced), in order to reduce the learning rate accordingly to the number of overall read processes.It might be increased again, if the user indicates an interest in a new knowledge area or if too many evaluations are away from words with high evaluations  (  ).(7) Go to Step 2.
Especially in the beginning, many documents might have an equal (low) score or evaluate to  (  (  ), ) = 0.In this case, the selection can be done.
• preferring documents, if one or more neighbours of their   (  ) have high already values of  (  (  ), ), • such that many regions (cluster) of the co-occurrence graph are covered or • randomly.Besides the successive learning process, the user may also immediately give the system more documents, which represent his area(s) of interest.With a light modification, unwanted information areas can be learned and recognized as well.These bulk learning processes can be done in the background, while the system is still operational.
The above given algorithm describes a new reinforcement learning on the (discrete) co-occurrence graph.In particular, the learning process takes place during the full operation of the system and will adapt to the state of its user.

IMPLEMENTATION
As a proof of concept, the algorithm is implemented with the popular Apache Tomcat (http://tomcat.apache.org/)servlet container, a web server, the Hagen NLP Toolbox and the graph database Neo4j [2].A PostgreSQL database is used for the administration of user data, user analytics and to store information about the web crawler document collection.(6) User sets retrieval settings with  as number of results,  random results and mandatory terms based on step 4 (7) Start new documents retrieval process (8) Order the graph terms by rating and frequency (9) Retrieve mandatory files based on the terms from step 7 (10) Retrieve  documents matching the top terms of step 9 (11) Retrieve  documents that do not match terms of step 11 (12) User rates documents with a rating of [−1, +1] (13) A new rating is calculated based on equation 1 ( 14) Go to step 8

EXPERIMENTS AND DISCUSSION
As a test basis, a document corpus consisting of 11,680 different articles in 23 categories is used [12].To model different user interests the following two types of user are defined (1) Expert with specific interest in exactly one category (2) Generalist with general interest in multiple non-overlapping categories For each experiment, a user type, an initial co-occurrence graph and the input parameters , ,  of the algorithm are selected.
In addition, the number of documents, which are considered in each iteration, is fixed.This corresponds to the size of a set  of articles, which the crawler loads e.g. from the Internet.At random, 10 subsets of 2000 texts each are selected from the document corpus.
Each experiment consists of 10 trials with 100 iterations each.Additionally, the initial co-occurrence graph consists of 200 randomly selected articles from the whole corpus.
Therefore, all terms for the calculation of the TRC must be contained as nodes in the user graph.The resulting key term does not have to be part of the document.Since for each user an own co-occurrence graph is available and the TRCs are calculated from it, it is to be expected that the documents all have different centroid terms.Thus, a comparison is made which type of graph provides the more comprehensive [or comprehensible?comprehensive = complete, comprehensible = understandable] results during indexing.A graph with many nodes on a specific topic or a graph with fewer nodes but a wider range of topics.
To further understand the user's interests using reinforcement learning, an evaluation system that allows a user to provide feedback on their interests is necessary.For this purpose, interesting documents are returned to the user, which the user can rate.Based on the assigned centroid terms from the first test, the initial document retrieval is checked [or validated?].
The following set of experiments were performed and results were validated with the various statistics tests like e.g. the Shapiro-Wilk-Test and Two Sample t-Test: (1) Convergence: Compare the set  of all downloadable documents against a subset  − , which is determined by the ratings of the user.This question clarifies whether it is necessary to download all accessible texts.The results indicate a good prediction of the user ratings by the indices of interest.Thus it should be sufficient, because the set  contains at least  −  texts with maximum ratings.The changes of  suggested by Mingkhwan after a certain time, are verified by the following experiment: After 100 iterations with a constant ,  is reduced by 50%.Since a reduced  implies a lower impact of new valuations, it is assumed that the indices of interest stabilize and thus higher valuations can be achieved.Statistical analysis of the results show, that if the learning rate is decreased after some time, the intelligent crawler yields at best equally good results compared to when the learning rate remains unchanged, see fig. 4 and fig. 5.

CONCLUSION
The digital revolution confronts us with an overwhelming volume of information, which is often irrelevant in the context of the current informational needs.Therefore, systems are needed that filter and select literature for the user.[11] [6].This paper presented the concept of an adaptive learning algorithm of user interests, which adapts and improves its recommendation over time.After discussing the mathematical foundation of the approach, an experimental framework is presented.The framework is based on the well-known Apache Tomcat servlet container, which provides easy deployment.Internally, it makes use of graph-based text analysis toolbox [8].The graph database Neo4j was used for the persistent storage and retrieval of terms.The database additionally stores links between documents as well as the determination of their semantic relations.Information of a user and their current interests are stored in a PostgreSQL database.The results indicate that the proposed algorithm is capable of modelling users' [is this meant to be plural users?] interests quite well.As shown experimentally, the recommendations of the algorithm outperform an approach with random selection of documents.The algorithm achieves significantly better ratings in different user scenarios, and adapts its recommendations over time.Furthermore, the question of how large the set of reachable documents for recommendations should be, was examined.Our results clearly favor the -greedy strategy employed by the algorithm of Mingkhwan et al..The results indicate that the applied averaging procedure to adjust the interesting indices allows a better estimation of the expected feedback.On the other hand, a high learning rate at the beginning of the deployment can more quickly lead to good results.

Figure 1 : 1 )
Figure 1: Learning Process of the Intelligent Crawler

Figure 3 :
Figure 3: Application Document Retrieval after Learning for User Music

( 2 )( 3 )
Greedy Strategy: Test the influence of a changing -parameter.The choice of  = 3 realizes an -greedy strategy and for  = 0 the algorithm is based mainly on exploitation.The intelligent crawler delivers at least as good results, if the results are do not include randomly selected documents.Parameter : Test the impact of the quality of the presented documents for varying .The experiments show that for tiny values of , a reevaluation does not contribute to the index of interest.On the other hand, as  tends towards infinity the current value of the index of interest is lost.A random selection of documents provides at least as good results as the intelligent crawler, if the latter only consider the last preceding rating in the interest index of the TRC.(4) User model Generalist: Test the open-minded user, which gets presented documents of various categories.The intelligent crawler does not deliver better results than a purely random selection.

Figure 4 :
Figure 4: Mean values  and standard deviations s of 10 experiments