Rating Inference for Custom Trips from Enriched GPS Traces using Random Forests

Trip planning services are employed extensively by users to compute paths between locations for many different use cases, including commuting to work, transportation of goods, and itinerary planning for tourists. In many scenarios, such as planning for a hiking trip, running training, or mountain cycling, it is desirable to provide users with personalized trips according to their preferences. Existing route planning systems for mountain activities recommend user-posted trips, along with ratings w.r.t. the route's difficulty, condition, or enjoyment it provides. However, users often want to define a specific trip by choosing the segments/trails they want to follow. Existing systems do not provide a rating for such trips, thus suffering from the cold-start problem. Also, the efforts to automatically infer such a rating have been limited. In this paper, we study the problem of inferring ratings for custom trips. We propose a machine-learning framework that encodes various rated trip features and employs random forest classifiers to infer ratings. We conduct feature engineering to encode information regarding a) trip location, b) trip elevation profile, c) closeness to points of interest, and d) closeness to locations of geotagged photos. Finally, we present the results of an ablation study on two real-world data sets and five different ratings. We evaluate the efficiency of our proposed approach and the effect each feature has on the rating inference accuracy.


INTRODUCTION
Global Positioning System (GPS) technology has revolutionized how we navigate, document, and engage with our surroundings [22].GPS-enabled devices allow us to pinpoint our location and generate a large amount of data that traces our movements along trips.Due to this increase in the availability of movement data, the analysis of such data has attracted a lot of attention recently [5,15,22,25].While traditional GPS data capture only spatial coordinates and timestamps, enriched GPS traces incorporate additional data, such as contextual data and environmental factors.This augmentation of GPS data transforms raw data into a testbed for innovative applications across different domains.Together with user ratings and feedback, enriched GPS traces provide a unique research opportunity for personalized trip planning in tourism [21].
Custom trips are designed to cater to travelers' specific desires and user preferences for personalized tourism experiences.Some tourism applications, therefore, intend to compute trips and itineraries to users that align with their unique interests, such as cultural exploration or sports intentions [11], visit specific areas [20], avoid crowded places [18], or even are easy to navigate [19].However, as users usually define such trips on the fly, other users have yet to rate them, thus suffering from the cold start problem.For example, consider a user-defined hiking route from A to B along Lake Constance as shown in Figure 1.The user may want to know the ratings for different features of this trip related to the scenery or the route's difficulty.Since this is the first time the route appears in the system, no rating can be shown to the user.As a result, the rating inference of custom trips has emerged as an important feature in tourism applications and location-based services.
On the one hand, custom trips may offer users individual experiences based on trip features and user preferences.On the other hand, given the complexity of user ratings on different aspects for experienced trips [24], it is challenging to compute high-quality ratings for custom trips.In order to infer ratings for a trip based on trip profile, machine learning can be used to assess custom trips [13].For example, travel service providers may leverage machine learning algorithms to generate recommendations tailored to each user's profile, such as personalized hiking trips.Also, machine learning models can help travel service providers continuously improve their services by analyzing traveler feedback and rating data [8].Insights derived from machine learning can lead to high-quality itinerary recommendations and overall trip satisfaction.However, due to the heterogeneous features of a trip, it is still challenging to feed the machine-learning algorithm with optimal trip representations.
This paper, therefore, aims to determine which representation feeds best to the machine learning algorithms and achieves higher accuracy for rating inference.We use Random Forest as the algorithm for rating inference since previous works [2] have concluded that Random Forest performs best in several similar scenarios.The trip representations we discuss may also be used as input for other algorithms.However, determining the optimal machine learning model for the task at hand is out of the scope of this paper.
The rest of the paper is organized as follows.Section 2 reviews the literature related to rating inference and trip recommendations.Section 3 describes the process of feature engineering and model training for inferring the ratings of custom trips.In order to validate the proposed approach, is Section 4 we present the results of our experiments on data sets that contain hiking and cycling trips.Finally, Section 5 concludes the paper and outlines promising directions for future research.

RELATED WORK
Enriched GPS traces contain multi-criteria ratings for trips.In contrast to traditional single-criteria rating systems, where items are typically ranked based on a single dimension, such as user satisfaction rating, multi-criteria ratings consider different factors simultaneously.For example, in a trip recommender system, one hiking route would have various attributes for ratings, such as difficulty, landscape, or quality of experience, reflecting user preferences' multidimensional nature.If the ratings can be inferred, multi-criteria ratings in rich GPS traces may offer a refined approach to cater to user preferences [9].
Rating inference can be achieved by collaborative filtering, similarity measures, machine learning, deep learning etc.For example, Chondrogiannis and Ge [6] solve the problem of inferring ratings for outdoor activity trajectories using similarity measures.They proposed an approach to infer ratings by considering the overlap among the trajectories in a data set.In their work, the trip traces are first mapped to street network data, where the overlaps of the unrated trajectory with rated ones can be computed.The rating is then determined by the weighted average ratings of the overlapping trajectories.The weight depends on the overlapping trajectories' length and the overlapping segments' cumulative length.In contrast to our work, this approach cannot infer ratings for trips that have no overlap with already rated ones.
The rating inference approach in our work is more related to content-based filtering [1], which leverages item features to infer the ratings for other items or user preferences.Content-based filtering is particularly effective in mitigating the cold start problem, where user-based collaborative filtering struggles to provide recommendations for new users or items.This approach can make proper recommendations based on the content attributes even with limited user interaction.
One of the challenges in content-based filtering is how to construct encodings, which depends on the complexity of the input data.If the inputs are ratings, then the ratings can be directly used as encodings for, e.g., cosine similarity.The inputs are often a bag of words such as item descriptions.This is usually solved by word encodings.For example, some existing works [16,17] consider different item features as a bag of words and apply Word2Vec to construct encodings.However, encoding construction is difficult for enriched GPS traces, as enriched data do not belong to natural language nor uni-dimensional ratings.
Rating inference results are usually further used in trip recommendations.The trip recommendation can be seen as a trip ranking problem based on a user's location and preferences [10].Different factors can be taken into account when ranking the relevant trips.For example, Wang et al. [23] consider varying traffic data with neural networks to compute the real-time fastest route.Carusotto et al. [3] consider social media profiles to recommend personalized trips.Other learning approaches have also been used for trip recommendations.For example, Chen et al. [4] proposed to use reinforcement learning to offer diverse trip recommendations in real-time user context.Most learning-based methods focus on how to optimize the sequence of points of interest and construct the trip.However, this paper focuses on inferring the ratings of custom trips that are defined by the users.Thus, apart from trip recommendations, rating inference in this paper can be considered a second opinion for custom trips defined by users.

TRIP RATING INFERENCE 3.1 Problem Setting
A GPS trace  is a sequence ⟨ 1 , . . .,   ⟩ where each   is a position represented by geographical coordinates, i.e., longitude and latitude.A trip is represented by a triple  = {  ,   ,   }, where   is the GPS trace associated with trip ,   is a collection of data in different forms related to  (e.g., min-max altitude, closeness to points-ofinterest, related social media posts etc.), and   is a rating assigned to  by the user who created it.If the value of  is set, then we call  a rated trip.Otherwise  is called an unrated trip.Given a set of rated trips  = { 1 , . . .,  | | and an unrated query trip   , our aim is to infer a rating for   by taking into account the similarity of   with the trips in  .

Data Preparation and Enrichment
The first step of the data preparation phase is the extraction of the elevation profile for each given trip.Since we consider a set of GPS traces as input, we determine the altitude for each location in the trace.In Figure 2, we note the GPS trace with  and the altitude sequence with .Note that  and  have the same length.
The second step of the data preparation phase is the enrichment of the GPS traces by combining them with information from different data sets.In this paper, we consider two types of additional information.First, we consider a set of points-of-interest (POIs) .For each GPS trace , we compute the shortest distance between any location  ∈  and the location of every POI  ∈ .We maintain a distance vector for each GPS trace that stores the shortest distance between the trace and each POI.In Figure 2, this distance vector is denoted by .Second, we consider a data set of geotagged images  .Similar to the POIs, for each trace, we compute the shortest distance between the trace and every image  ∈  .In Figure 2, the vector of distances to the geo-tagged images is denoted by  .

Feature Engineering
As shown in Figure 2 we employ encoders to encode four types of information: a) trip location, b) trip elevation, c) proximity to POIs, and d) proximity to places where users take pictures.In what follows, we elaborate on how we encode each type of information.

Location Encoder.
The main intuition behind using the location of the trips is that trips in the same area are expected to have similar ratings.To encode the location of each trip/trace, we first impose a  ×  grid over the space defined by the minimum bounding rectangle of all traces.Then, we employ three different approaches to compute an encoding: • We employ one-hot encoding [12], i.e., we create a binary vector of  2 elements, where each bit indicates whether the given trip crosses the associated grid cell.• We compute a topological order of the grid cells using the  -order curve [14], assign an ID to each cell based on the  -order.and we determine for each trace the cells it crosses.Then, for each set of IDs we create a vector that contains basic statistics, i.e., min, max, mean, and median values.• Similar to the previous step, but instead of computing basic statistics, we compute a histogram of  buckets.

Altitude Encoder.
To encode the altitude, we first consider the following basic statistics: the total ascent, the total descent, the minimum altitude, and the maximum altitude of the location in the trip trace.Furthermore, we compute the standard deviation of the elevation profile, and we include it in the vector.

POI Distance Encoder.
For encoding the proximity to POIs, we consider two options.First, we use the vector described in the previous section as-is, i.e., a vector that contains the distances to all POIs in .Second, we create a vector for the  nearest neighbors to each trip.This approach gives a clearer picture of how close each route passes by POIs.

Geo-tagged Images
Encoder.We employ one-hot encoding again to encode the proximity to locations where users take pictures.
For each trip/trace, we create a bit vector of size equal to the number of images in  , and we set each bit associated with an image to 1 if the minimum distance between the trace and the image location is below a predefined threshold, e.g., 20 meters.

Model Training
We infer ratings using the Random Forest algorithm, an ensemble learning method in machine learning for classification.Random Forest combines multiple decision trees.Each tree is trained on a different bootstrap sample and uses a random subset of features.The rating inference then is based on the collective decisions of these trees, which result in a robust ensemble learning result.Also, Random Forest can handle high-dimensional data such as categorical and numerical features.

EXPERIMENTS 4.1 Experimental Setup
Data sets.In this section, we present a preliminary evaluation using trip data obtained from Outdooractive1 .The full data set contains a large list of rated trips, along with various types of information for each trip, including the GPS trace and the related activity.The data set also provides five different rated attributes for each trip, i.e., Condition, Difficulty, Technique, Quality of Experience, and Landscape, the ratings of which are integers within the interval [1,6], except for Difficulty the ratings of which are in [1,3].We restricted In addition, we include three more data sets that we use to enrich the trip data.To compute the elevation profile of each trip, we utilized the EU-DEM v1.1 [7] elevation data obtained from Copernicus 2 .We also used a data set containing 400,000 POIs3 obtained from Wikipedia 4 and we used a subset of 181,185 POIs located in the region of the Alps.Last, we used a data set of 50,000 geotagged images 5 .
Methods Overview.In our experiments, we include the following approaches based on the encodings we presented in Section 3.3: One-hot Enc.Location encoder that imposes a grid and uses one-hot encoding to indicate which cells are crossed by the trip trace.Grid Stats Location encoder that imposes a grid, determines the  -order of the grid cells, and computes basic statistics of the numerical cell IDs.Histogram Location encoder that imposes a grid, determines the  -order of the grid cells, and computed a histogram.Elevation Altitude encoder using statistics computed over the elevation profile.POIs POI Distance Encoder that computes a vector of distances to a predefined set of POIs.Img Geo-tagged Images Encoder that computes distances to a predefined set of geotagged images and uses one-hot encoding to indicate which images have been taken from location alongside the trip trace.
Evaluation Metric.We evaluate the inferred ratings by the Mean Absolute Error (MAE), a widely used evaluation metric in machine learning and regression analysis.MAE aims to quantify the accuracy of predictive models by measuring the average magnitude of errors between predicted and actual values.MAE is particularly valuable because of its simplicity and interpretability.The formula for MAE can be described as: where  is the number of targeted ratings in the data set.ŷ is the predicted value for the rating and   is the actual (true) value for the rating.The MAE value represents the average magnitude of the errors of the model.A lower MAE indicates better predictive accuracy.

Experimental Results
In Figure 3, we report the results for all our methods for all ratings on both data sets.We observe that the three location encoders, i.e., One-hot, Grid Stats, and Histogram, achieve a MAE below 1.0 for all ratings apart from Condition.The results indicate that encoding the location of trips has limited prediction power for the rating of hiking trips.In contrast, encoding statistics about the elevation profile is handy to predict Difficulty, and the two ratings that are somehow related to Difficulty, i.e., Condition and Technique.In fact, the MAE of Elevation for Difficulty is the lowest MAE overall.
With regard to POIs, we observe that its behavior is similar to that of the location encoders.Considering the distance to POIs seems to work more like a localization scheme rather than an approach that gives information about the overall experience.Last, Img also does not perform very well on rating prediction.While we expected that POIs and/or Img would have better performance on ratings not directly related to Difficulty, i.e., Quality of Experience and Landscape, that does not seem to be the case.Regarding the Cycling data set, we observe that all methods behave similarly to the Hiking data set for Condition and Difficulty.However, the situation differs for Technique, Quality of Experience, and Landscape.In particular, for Technique, all methods yield slightly higher MAE.For Quality of Experience, and Landscape, we observe that the MAEs of One-hot, Histogram, and Img skyrocket.These approaches clearly fail to provide any meaningful rating prediction.Stats and POIs yield the lowest MAE for these two ratings.Clearly, most of our techniques fail to properly capture the necessary traits that affect the Quality of Experience and Landscape ratings for cycling trips.

Ablation Study
We now report the results of an ablation study where we deep-dive into each group of methods and investigate how each parameter affects the predictive power of each method.
Location encoders.We begin by reporting the MAEs of location encoders varying the grid resolution in Figure 4. Regarding Onehot on the Hiking data set, we observe that, except for Quality of Experience, the grid resolution does not affect MAE in any rating.We observe a slight increase in Difficulty and Technique, while the MAE for Condition and Landscape does not change much.For Quality of Experience, though, we notice a local minimum for the 10 × 10 grid.In the Cycling data set, we observe a slight increase in the MAE for all ratings.Regarding Landscape, we notice a local minimum, but the overall MAE is very high, leading us to conclude that One-hot is not very effective for the Cycling data set.
Regarding Statistics, in the hiking data set, we observe a small improvement with increasing grid resolution for Condition.The MAE for Difficulty remains steady, while we notice a slight increase in MAE with increasing grid resolution for Landscape and Technique.Similar to One-hot, we notice a local minimum for Quality of Experience.For the Cycling data set, we observe that the results are much better than the ones for One-hot.Grid resolution does not seem to affect the predictive power of the method much.MAE does not change for all ratings except for Technique, where we observe a tiny increase in MAE with increasing grid resolution.Finally, regarding Histogram, in the Hiking data set, we observe that the MAE for Condition is improving with an increasing grid resolution, the Difficulty remains relatively unchanged, while the MAE for the rest of the ratings increases.For the Cycling data set, we observe similar behavior for all ratings except for Condition, where the improvement is negligible.
Altitude encoder.We now investigate the effect of basic statistics, trip length, and standard deviation in the elevation profile in the MAE of our Elevation encoder.We report our measurements in Figure 5.In both data sets, we observe that including the length and the standard deviation leads to an improvement in MAE.For Condition and Difficulty in the Hiking data set, we observe that trip length has the highest effect on the predictive power of our encoder.In contrast, we observe that the effect of length and standard deviation is similar in the cycling data set.Nevertheless, as discussed in Section 4.2, Elevation demonstrates the most significant predictive power for Condition, Difficulty, and Technique.Including the length of the trip and the standard deviation in the training of the Random Forest classifier has a positive and, in many cases, significant effect.POI encoder.As discussed in Section 4.2, the POI method demonstrates similar behavior to the location encoders, leading us to believe that maintaining a vector of distances to POIs works more like a localization method.That is because the classifier effectively treats the POIs as landmarks.As such, instead of keeping the distances to all landmarks, we evaluate variants of POI where we maintain only the top- distances to landmarks.The intuition behind this choice is that if a route is close to some POIs, the prediction of ratings related to aspects such as scenery and natural beauty, i.e., Quality of Experience and Landscape, would improve.
However, the results we report in Figure 6 show this is not the case.First, for both data sets, we observe that the MAE for Condition, Difficulty, and Technique is unaffected by the difference in the distance vectors.The MAE for Landscape in the Hiking data set shows the same behavior, while Quality of Experience gets worse for the smaller distance vectors and only improves if we include a very large number of POIs.For the Cycling data set, we notice this behaviour for both Quality of Experience and Landscape, i.e., MAE increase when we use small distance vectors.These results further enhance our hypothesis that POIs work more like a localization mechanism than a critical factor in determining a rating.In other words, the proximity of a trip to POIs does not seem to affect any of the user ratings.

Discussion
From the experimental results, we conclude that different encodings can be dynamically used to infer different ratings.In our current scenario, the ratings can be grouped into trip-oriented and user-oriented ratings.The trip-oriented ratings are focused on the intrinsic features of the trip.Thus, the encoding of trip profiles can offer higher-quality rating inferences.User-oriented ratings focus on how users feel about the trip and user satisfaction.Our experiments show that elevation plays a crucial role in rating inference for trip-oriented ratings.However, none of the studied encodings demonstrated significant results in inferring user-oriented ratings.Different models can be trained to cater to different rating types."One size fits all" encodings may lower the quality of multi-criteria rating inferences.
There are still some open challenges for this work: (1) The model may consider more contextual factors.For example, the context of a trip may include group dynamics, previous experiences, and cultural factors.Thus, incorporating a trip's context may help increase the inference accuracy.(2) Users would often like to know how the inference is made.
In turn, users can be more confident in their trip decisions.Therefore, developing transparent and explainable models may increase user trust and satisfaction (3) Including user feedback to enhance user engagement is also critical.User feedback can be used to improve the model training and provide continuous improvement for implementing trip recommendations.
Further, it would be interesting to fine-tune the trip profile encodings or combine the derived encodings for machine learning-based rating inference.In our experiment, we have encoded the trace, grid, elevation, POI distance, and photo-taken profiles.Each of the profile encodings can be further fine-tuned to increase the quality of rating inferences.Some encodings can be combined.For example, the trace profile and elevation profile can be combined to capture the trip features, and the POI-distance profile and photo-taken profile can be combined to capture user satisfaction better.

CONCLUSION
In this paper, we proposed a machine learning-based approach for custom trip rating inference.We have defined the problem based on the enriched GPS tracing data.Those data are then prepared and pre-processed with feature selection.Afterward, various trip encodings derived from location encoders, altitude encoders, and POI distance encoders are fed into the training process of machine learning.We have conducted experiments against different trip encodings for five multi-criteria rating inferences based on two real-world data sets.The experimental results have not only shown the applicability of the proposed rating reference framework but also indicated which trip encodings can be used to infer which rating and achieve the best accuracy.This has also indicated how to encode the trips for the recommender system and other machinelearning algorithms.
In the future, we plan to conduct a grid search for more possible trip encoding methods.Also, we will investigate the model selection in depth by considering deep learning methods in our rating inference framework emphasizing explainability.Moreover, we plan to test the robustness of the proposed rating inference method with more data sets, although as far as we know there exists a limited number of trip-related data sets.

Figure 1 :
Figure 1: Custom trip defined from the route planner on www.outdooractive.com

Figure 3 :
Figure 3: Summary of MAE for all approaches.