Proiect De Diploma (1) [622627]

UNIVERSITY POLITEHNICA OF BUCHAREST
FACULTY OF AUTOMATIC CONTROL AND COMPUTERS
COMPUTER SCIENCE DEPARTMENT
HBFXComputer Science – Logo
Computer Science
Computer Science& Engineering
& EngineeringDepartment
Department
DIPLOMA PROJECT
Diploma Project Title (eg: Diploma project template)
Subtitle (eg: 2018 version)
Andrei Mihalea
Thesis advisor:
Prof. dr. ing. Andrei Ionescu
BUCHAREST
2018

CONTENTS
1 Introduction 1
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.6 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Motivation 4
3 Existent methods 5
3.1 Deviation detection in literature . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Disorientation detection by mining GPS trajectories for cognitively-impaired
elder [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Activity Recognition and Abnormal Behaviour Detection with Recurrent Neural
Networks [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Method comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Advantages, disadvantages and selected method . . . . . . . . . . . . . . . 11
3.5.1 The iBDD method [7] . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5.2 The RNN methods [5] . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5.3 Method selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Proposed solution 14
1

4.1 Application description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.1 Preprocessing module . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.2 Analysis module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.3 Similarity check module . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.4 Symbolize module . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.5 iBDD module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.6 Detection module . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Indoor version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Room mapping module . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Work
ow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Implementation details 25
5.1 Module implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1.1 Data preprocess . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.3 Similarity check . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.4 Symbolize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.5 Deviation detection . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Indoor version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.1 Room mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.2 Similarity check . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3 Implementation problems and solutions . . . . . . . . . . . . . . . . . . . . 35
5.4 Programming languages and technologies used . . . . . . . . . . . . . . . . 36
6 Evaluare 37

6.1 Outdoor version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.1 Area Under ROC Curve metric . . . . . . . . . . . . . . . . . . . . 37
6.1.2 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.3 Trajectory visualization . . . . . . . . . . . . . . . . . . . . . . . . 39
6.1.4 Constructing the linear model . . . . . . . . . . . . . . . . . . . . . 40
6.2 Indoor version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2.1 Indoor trajectory visualization . . . . . . . . . . . . . . . . . . . . . 45
6.2.2 Evaluating user's behavior . . . . . . . . . . . . . . . . . . . . . . . 46
7 Conclusions 49
Anexe 52
Appendix A Extrase de cod 53

SINOPSIS
Conform M. R. Hodges et al. [1], peste 75% din cazurile de dement , a sunt detectate c^ and
boala deja avanseaz a s ,i devine mai di cil de t ,inut sub control. Metodele existente ^ s ,i propun
detect ,iea degrad arii cognitive s ,i a dement ,ei prin folosirea unui istoric al traiectoriilor unor
utilizatori [7] s ,i g asirea unui comportament deviant baz andu-se pe un tipar al deplas arilor unei
persoane, sau prin detect ,ia activit at ,ilor pe care un individ le realizeaz a [5]. Aceste metode
propun clasi carea unei traiectorii ca ind deviat a sau nu. Metoda implementat a in aceast a
lucrare propune evident ,ierea unui tipar s ,i a unei evolut ,ii a unui utilizator, folosind un model
liniar, dar s ,i evident ,ierea traiectoriilor analizate prin gra ce comparative cu istoria utilizatorului.
ABSTRACT
According to M. R. Hodges et al. [1], over 75% of the dementia cases are detected when
the disease is already in an advanced stage and it's harder to keep under control. Existent
methods propose the detection of cognitive degradation and dementia by using a history of
the user's trajectories [7] and nding a deviating behavior based on the movement patterns
of a person, or by detecting the daily activities an individual makes [5]. These methods
propose the classi cation of a trajectory as deviating or not. The method in this paper
proposes evidentiating a pattern and an evolution of a given user, using a linear model, but
also evidentiating the analyzed trajectories using comparative plots with the user's history.

MULT ,UMIRI
(opt ,ional) Aici putet ,i introduce o sect ,iunea special a de mult ,umiri / acknowledgments.

1 INTRODUCTION
1.1 Context
As stated by M. R. Hodges et al. [1], almost 75% of the cases of early dementia go unnoticed
until the disease advances. Disorientation and wondering can be found between the symptoms
of individuals a ected by dementia.
The problem of analyzing o person's movement behavior and activity patterns is not widely
spread in the literature, mostly because there is hard to nd relevant data, speci c to in-
dividuals which present deviations from their usual patterns. Although, there are several
studies [7, 5] which developed methods for identifying abnormal behavior by using datasets
collected from random persons and modifying them into simulating the behavior of people
with dementia.
1.2 Problem description
The method we developed focuses on analyzing a trajectory of an user when a set of its
previous trajectories is available, in order to show an evolution in the user's behavior and in
their movement patterns. Beside what the state-of-art methods [7, 5] try to achieve, our
method does not focus only on indicating if a trajectory can be classi ed as wandering, and,
instead it tries to use more characteristics of such trajectories to show a tendency in the user's
movement evolution.
1.3 Objectives
The main objective of our solution is o ering an analysis about the evolution of an user's
movement behavior when we have a set of his GPS trajectories. This evolution should be given
by a score and the score variance over time can o er information about the behavior of the user.
1

Another objective is showing the deviation characteristics of a trajectory and o er relevant
plots and information about the distances of each deviating region of the trajectory, providing
an interface that can be used to visualize how a currently analyzed trajectory compares with
respect to the history of already existing trajectories.
1.4 Proposed solution
Our proposed solution analyses a given trajectory by using a set of previous similar trajectories
of the user. The method is based on the one described by Q. Lin et al. [7], and it is based
on working around the two de nitions a deviating trajectory can have, as stated by Q. Lin et
al. [7]. Therefore, a deviating trajectory can be de ned either by loops in the current path
the user follows or as points that do not appear in the history of user's trajectories or appear,
but in a very low ratio. The solution starts with the trajectory currently analyzed and then
computes the similar trajectories, which will be further used in the detection of the abnormal
movement behavior, by counting how many times the user have passed through a region and
comparing it to a chosen value.
1.5 Obtained results
The implemented method evaluates the evolution of a user's behavior, given by a score function
in
uenced by the characteristics of the deviating regions of a trajectory, such as distances and
how many times they appear. These results bring something new to the already existent
method, since they o er an over time analysis of the user's evolution and not only detect if
one of their trajectories can be classi ed or not as wandering.
1.6 Structure
This paper is divided in seven chapters. Chapter 1 represents the introduction which contain
the context of the work and the general aspects it proposes. In the Chapter 2 the motivation
and the reason of choosing this subject and why it presents an interest are presented, while
the Chapter 3 presents the state of the art methods and the technologies used in the next
2

chapters. Chapter 4 describes the solution we implemented, the modules it consists and the
how these modules are assembled together. Chapter 5 shows a detailed implementation of the
modules described in Chapter 4 and also the problems which appeared in the implementation.
The results and comparisons are placed in Chapter 6 and the conclusions and future work are
presented in Chapter 7.
3

2 MOTIVATION
The study of activity recognition and user's movement behavior has been researched and has
been related to the detection of the early stages of dementia in several studies [7, 5]. The
motivation behind this project is rstly based on the need of detecting the cases of dementia
in the earliest stages and secondly on the scienti c purposes in the eld of mobility pattern
mining.
Researching in this eld is even more dicult and challenging due to the lack of datasets, es-
pecially data from persons which has been su ering from dementia. Another diculty comes
from the inability to be sure that deviations in a person's movement patterns are an indi-
cator for cognitive degradation. Also the unpredictability of the human behavior can trick
such movement behavior analysis methods into interpreting wrong results and detecting an
abnormal behavior where everything the person did was intended and not part of a behavior
induced by confusion or other symptoms of early dementia.
Even though the diculties of this eld of research, the state of the art methods [7, 5] have
been proved accurate in detecting wandering behavior in datasets modi ed to simulate the
wandering behavior.
Furthermore, our method dives into the research eld of analyzing movement patterns and
its purpose is to extend the already existent methods, which only o er a classi cation of a
currently analyzed trajectory, and provide a more detailed analysis of such trajectories, along-
side with an evolution of the user, computed based on the characteristics of their movement
patterns and deviating trajectories.
4

3 EXISTENT METHODS
3.1 Deviation detection in literature
Even though it is not very developed, the studying the movement behavior and patterns
of users, searching for wandering characteristics has represented an interest in research and
there are several articles which focus on the subject, trying to explain and to develop di erent
methods of indicating when and how such behavior appears. Existent methods vary from
comparing a trajectory made from GPS points to a history of trajectories and mining the similar
trajectories of a chosen user, to machine learning methods which use datasets from movement
sensors. In this chapter, two di erent methods of detecting and analyzing wandering behavior
will be reviewed: Disorientation detection by mining GPS trajectories for cognitively-impaired
elder [7] and Activity Recognition and Abnormal Behaviour Detection with Recurrent Neural
Networks [5].
3.2 Disorientation detection by mining GPS trajectories for cognitively-
impaired elder [7]
As stated by Q. Lin et al. [7], there can be two possible de nitions of wandering behavior.
The rst de nition states that a point which is part of a looping trajectory can be considered
as wondering, since it may indicate that the user forgot his path traversed so far. The second
de nition implies that if a region of the current trajectory deviates from a model which is
based on the user's history of trajectories, then the presence of wandering behavior may be
a possibility. For the rst type of wandering behavior only the current trajectory is needed
for the analysis, while for the second type, a set of trajectories, representing the history of
the user's movement patterns will be necessary in order to decide if there are regions in the
trajectory which can be classi ed as deviating.
The method proposed by Q.Lin et al. [7] uses the dataset provided by Microsoft Research
5

Asia [2, 3, 4] for evaluation. This dataset consists of real-time GPS data collected from 182
users between April 2007 and August 2012. Between these 182 users, for 31 of them the data
has been labeled with the transportation method. For the evaluation of the method, 10 users
from 182 have been chosen.
As described by Q.Lin et al. [7], their method, isolation-based disorientation detection (iBDD)
can be divided in three separate methods: iBDD, symbolization and detection. The iBDD is
the core method and it calls both the symbolization and the detection methods in order to
detect if a given trajectory is deviated or not.
The iBDD method takes three parameters as inputs: a GPS trajectory of the user, the cell
matrix of the city and a set of symbolized trajectories, which are similar trajectories to the
currently analyzed one and are part of the user's history of movement patterns. The GPS
trajectory is a set of GPS points, represented by their latitude and longitude. A cell matrix
can be described as a map divided in square cells with equal sizes, where the dimension of
a cell can be chosen. The last input contains a set of symbolized trajectories, which are
obtained by traversing the initial GPS trajectories and mapping the points to a cell from the
cell matrix. The return value of the method is a label which will be set as one if the current
analyzed trajectory is classi ed as a deviating trajectory, or 0 otherwise. The method begins
by initializing a set of similar trajectories with the input parameter which gives the set of
symbolized trajectories, then traversing the GPS trajectories and mapping each point to the
cell matrix. If the cell of the current point is not the same with the cell the previous point was
placed in, the cell is passed to the detection method, for further analysis. The working set is
modi ed each time in the detection method, which also o ers the output label that indicates
if the trajectory is deviating.
The second method described and used by the iBDD is the symbolization method, which takes
two inputs: a point of the current trajectory and the cell matrix. The input point is mapped
to the cell matrix by searching the limits of the cell between which it is placed. The resulted
cell is added to the set of already traversed symbolized cells and if it is the same as the last
cell, a counter is increased. This counter is further used to determine if a there are noisy
points in the trajectory. This check is made either by comparing the distance between the
last two consecutive points with a threshold value or by comparing the total distance between
the points in the same cell with a value which depends of on the size of the cell. If these
distances are lower than their chosen threshold values, then the points are not noisy and the
6

process continues by appending the newly found cell to the nal set of cells, representing the
symbolized trajectory. Beside the cell, a label is also returned and it shows if the cell is a new
one, by comparing if it already is the last cell of the list of traversed cells. The purpose of
the symbolization method is to ease the computations, therefore, when analyzing a trajectory
or comparing two trajectories, instead of traversing each point, a much smaller set will have
to be traversed, since all the consecutive points that belong in a cell will only be counted as
one, which is represented by that cell.
The nal method is the one used for classifying if a given trajectory is deviating or not from
the user's model. This method receives as input the current cell of the traversal and a set
of similar trajectories to the one currently analyzed, a set representing the user's history of
movement patterns. Using the current trajectory, the cell and the set of similar trajectories,
a new subset is created, consisting of all the trajectories in the similar set, which contain
the current analyzed trajectory as part of them. After this subset is generated, the ratio
between the subset's length and the length of all the similar trajectories is calculated and then
compared to a chosen threshold value. If the ratio is lower than the threshold value, then
the cell is considered as part of a deviation in the trajectory. The next step is nding which
type of deviation was observed, a point deviating from the normal user's patterns or a looping
section of the trajectory. To identify the looping section, the current cell will be searched in
the set of cells traversed so far in the current trajectory and if it is found, it will be counted
as a looping cell. The traversed set of cells will be replaced with the last two cells, such that
a loop won't be counted twice if found. If the cell is not found in the trajectory traversed
so far, then it is considered as deviating from the user's model and another counter will be
increased. If the cell is either looping or deviating from the history, then the set of similar
trajectories will be reinitialized as the original set, otherwise it will take the value of the subset
previously calculated, which only contains the trajectories which has the current trajectory as
a sub-trajectory. If one of the previously mentioned counters is larger than three, then the
trajectory will be considered as one a ected by wandering behavior.
The authors state that the method can reach a detection rate of about 95% [7]. For evaluating
their method, they chose 10 users from the dataset and then the trajectories were altered, in
order to simulate the wandering behavior, since the original trajectories aren't collected from
people with dementia. After that, the trajectories were labeled as disorientation or not by
more volunteers.
7

3.3 Activity Recognition and Abnormal Behaviour Detection with
Recurrent Neural Networks [5]
In this article the solution is no longer based on mining GPS trajectories and uses a dataset
of sensor activity, collected by Van Kasteren [8] from three households over the course of a
month. The sensors indicate basic daily activities of a user. Three approaches were used for
representing the features of the dataset. The rst approach represents a sensor with a value
of 1 when it is activated and with 0 otherwise. In the second approach, the change of a sensor
state is represented with a value of one. For the nal approach of representing the features,
only the sensor that changed his state last time will have the value of one.
Similarly with the dataset provided by Microsoft Research Asia [2, 3, 4], used in evaluation in
the rst article, this dataset also doesn't contain data collected from people with dementia, so
altering it by appending or removing certain portions of data was also necessary. The authors
give two de nitions for abnormal behavior [7]: the rst one considers that activities which
are repeated or forgotten are a speci c sign that may indicate dementia, while the second
de nition is related to the sleep patterns of the users and the possible dehydration caused
by forgetting to drink water. To simulate the behavior of such individuals, more repeating
activities were added in the dataset and some of the usual basic activities, like eating or
drinking were removed, in order to represent the forgetting behavior.
Three di erent recurrent architectures are used for the detection of deviating behavior of
the users: the vanilla recurrent neural networks, the long short term memory recurrent neural
networks and the gated recurrent unit, proposed by Cho et al. [6]. For evaluating the proposed
solution and comparing them to di erent machine learning architectures, the authors used
the Keras Deep Learning library and the Theano library, beside the implementation o ered
by Van Kasteren et al. [8]. The recurrent neural architectures provided results that were
comparable with the already studied methods. All the three feature representations were
tested using di erent metrics, as shown by Van Kasteren et al. [8]. Concluding the study,
the recurrent architectures were very close to the previously implemented methods, like SVM,
Hidden Markov Models and Na  ve Bayes in all the compared metrics (recall, accuracy, precision
and f-measure). For some datasets and feature representation, they even outperformed the
already existent methods. The results for the third dataset are shown in the Figure 1 [5]
8

Figure 1: Comparison between methods
3.4 Method comparison
The two presented approaches for detecting abnormal behavior are di erent both in terms
of datasets used and in the approach they use for solving the problem. In Table 1 the
summary of both methods is presented, evidentiating the di erences between the two and
their performances.
Table 1: Method comparison
Article Algorithm summary Dataset description Results
9

Disorientation de-
tection by mining
GPS trajectories
for cognitively-
impaired elder [7]The algorithm receives a
GPS trajectory and based
on a set of historical tra-
jectories of the user, it de-
termines if the given tra-
jectory indicates wander-
ing behavior for the user.
The trajectory is symbol-
ized by mapping its points
to a cell matrix, then loop-
ing cells are searched in
the trajectory traversed so
far. Also, the trajectory
is compared to a set of
similar trajectories to de-
termine if there are devi-
ating cells from the user's
model.The dataset used is pro-
vided by Microsoft Re-
search Asia [2, 3, 4]
and contains GPS tra-
jectories recorded from
182 users. Between
these 182, 31 has also
labeled data mention-
ing the transportation
method. For the testing
of the method, 10 users
have been chosen for the
dataset.Authors claim that the
method reaches 95% de-
tection accuracy. The
label resulted as the out-
put of the algorithm is
compared to manually
labeled results for that
trajectory. The false
positive rate is less than
3% as stated by Q. Lin
et al. [7].
10

Activity Recogni-
tion and Abnormal
Behaviour De-
tection with
Recurrent Neural
Networks [5]Three categories (Vanilla,
LSTM, GRU) of recur-
rent neural networks were
used to create models from
the Wireless dataset. Af-
ter that, the trained mod-
els are used to predict
and detect abnormal be-
havior and deviations from
the basic activities pat-
terns. There were con-
sidered three representa-
tions for the features: bi-
nary, change-point and
last- red.Dataset is provided by
Van Kasteren et al. [8]
and comprises sensor
data which are triggered
by some basic day to day
activities.Overall, the RNN mod-
els were close to the
best performing state of
the art methods and
even better for the rst
household, when us-
ing a speci c feature
representation. The
RNN and state-of-the
art methods are very
close with respect to
the metrics used and for
some feature represen-
tations the RNN per-
form better.
3.5 Advantages, disadvantages and selected method
Both methods described in this chapter have their advantages and disadvantages, as summa-
rized in the next subsections.
3.5.1 The iBDD method [7]
Advantages:
Dataset is easy to understand and visualize
The method is not complicated and can work in real-time
Easy to customize
Disadvantages:
11

Eliminating noisy data may be dicult
Its performance is highly dependent on the choice of threshold values
Doesn't use the timestamps of the points
Knowing if a trajectory is deviating or not can't be always an indicator for wandering
3.5.2 The RNN methods [5]
Advantages:
Dataset can be easily altered to simulate wandering behavior
Can be further improved to deep neural networks
Disadvantages:
Dataset is harder to understand and follow
Similarly with the iBDD method [7], detecting if a trajectory is deviated or not can't
solely be an indicator for wandering behavior
3.5.3 Method selection
Analyzing the state of the art methods, their advantages and disadvantages, we've decided
to implement a method based on the one proposed by Q. Lin et al. [7], but not only using it
as a method of deciding whether or not an analyzed trajectory can be classi ed as wandering
behavior, and, instead creating a method that can also provide an analysis over the evolution of
an user's movement behavior. We considered that analyzing and classifying a single trajectory
of a given user is not enough to tell if the user's behavior deviates from the normal and if such
a deviation is always relevant, since the human behavior can be unpredictable, being hard to
tell if the abnormalities found are something the user intended to do or not. Furthermore, we
considered that an over time analysis of the user's movement pattern may be more relevant
than simply deciding if a trajectory is deviated, because it can indicate a tendency of change
in the behavior.
We chose the iBDD method [7] as a starting method since, beside counting the deviating
and looping cells, it is easy to modify it into providing more relevant characteristics of the
12

trajectory, with regard to the looping and deviating regions found, like their distance and the
ratio between their distance and the total distance of the trajectory. These characteristics
can be further used to obtain a score for each of the user's trajectories, a score which may
indicate an over time evolution for that user.
13

4 PROPOSED SOLUTION
4.1 Application description
The purpose of the developed application is analyzing the movement patterns of a given user,
detecting anomalies and unusual behavior compared to a history of trajectories and also being
able to compute and provide an analysis over the evolution in time of the user's movement
behavior.
The implemented solution is based on the solution proposed by Q. Lin et al., the iBDD
algorithm, which de nes wandering either as looping through the same places of a given
trajectory or as a deviation from the user's usual history of trajectories. The purpose of the
iBDD method is analyzing such a trajectory and detecting whether or not the trajectory can
be classi ed as wandering. The di erence is that our implementation doesn't o er a yes or
no answer if the trajectory is considered as wandering or not, it o ers a set of computed
characteristics of the trajectory, like the distance of each deviating part of the trajectory and
the number of loops in a path.
4.1.1 Preprocessing module
The rst module (preprocess) has the role of preprocessing the data and rearranging it for
an easier further use in the next modules. The database contains les and each of these
les consist of a set of geographic points (latitude and longitude). The points also have
an associated time stamp which will be used by the module in order to separate di erent
trajectories. Using the timestamps, during this module, an iteration over all the points of a
set will be done, searching for points whose di erence between their associated timestamps
is larger than a chosen threshold. Such points represent the separation between two di erent
trajectories, where the rst point found is the last point of a trajectory and the second point
is the rst from the next trajectory. The result after the rst module will be a database which
will contain a set of trajectories for every user. These trajectories will be stored separately,
14

for an easier access in the next modules.
Below are the steps taken by this module in order to separate the trajectories of a user in
di erent sets:
Iterate through the user's database and store the sets of trajectories
Iterate through each set and separate the trajectories by comparing the timestamps of
pairs of consecutive points
Store the separated trajectories into a new database
4.1.2 Analysis module
The main module is the analysis one and uses the preprocessed data for analyzing a single
trajectory from a selected user. The inputs of the module are two strings, representing the
selected user and the trajectory belonging to that user, which contains the GPS points to
be analyzed, and a
oat value, which is the size of a cell. The rst step in the process of
analysis is the elimination of the noisy points from the trajectories, which are points whose
GPS positions were badly recorded and are usually identi ed by an unrealistic distance between
it and adjacent points in the same trajectory. For detecting those noisy points, each trajectory
is traversed and the current point in the traversal is compared to the previous point. If the
distance between them is larger than a selected threshold, the second point is eliminated and
the traversal continues from the point after it. After the elimination of the noisy points, the
limits of the selected trajectory are calculated by iterating through each of its points and
comparing their coordinates with the current limits, initialized appropriately. These limits
represent four points: the most extreme north-east, north-west, south-east and south-west
points from the trajectory and form an initial bounding box around the trajectory. After the
initial bounding box is calculated, a second bounding box is created by extending the limits in
every direction by a given percentage. The extended bounding box is then used to calculate
which of the rest of the trajectories are similar to the one currently analyzed. In Figure 2 there
can be observed an initial bounding box for a given trajectory and the bounding box after the
expansion. To be noticed that both bounding boxes have the same center, the second one
being obtained by a symmetric expansion.
15

Figure 2: Bounding box for a given trajectory
The next purpose of this module is creating a cell matrix, based on the already calculated
bounding box. The cell matrix is a series of cells, each of them having a square size and
their length is given by the input parameter of the module. This cell matrix is constructed
by starting from the point with the lowest latitude and longitude, and adding the cell size
to it until the limits of the bounding box are reached. The cell size is measured in meters,
so a conversion to coordinates will be needed in order to be able to calculate each point in
the matrix. Equations 1 and 2 show how this conversion is made and how the new point
is calculated. The rows of the matrix are made from points with the same longitude and
increasing latitudes, while the columns have the same latitude and increasing longitudes. The
cell size is added to the point until it reaches the limits of the bounding box in both directions,
so, the cell matrix will contain pairs of latitudes and longitudes placed in the limits of the
bounding box, where the space between them has the dimension given by the input parameter
(the size of the cell).
lat2=lat1+ (distance=earth radius )(180=pi) (1)
lon 2=lon 1+ (distance=earth radius )(180=pi)=cos(lat=180) (2)
The last task made by this module is collecting the information resulting from the detection
process and compute the distances of each deviating portion of the given trajectory. In order
16

to do that, the trajectory is iterated again and when a point placed in a deviating cell is
found, the distance between this point and the previous point in the trajectory is computed
and added to a sum, initialized with 0. This process is repeated until all consecutive points
that are placed in deviating or looping cells are found, therefore, the process stops when the
current point in the iteration is the point where the trajectory exits a deviating region. The
trajectory traversal is continued, in order to nd all of the deviating regions.
After the points that are part of deviating or looping regions are found, the distance between
them is computed, resulting the distances of each of the deviating regions. These distances will
be used as inputs for a function which will give a tentative evaluation of the user's behavior.
Beside the distance of the deviating regions, also their number will be sent as input to the
function. This function will be used to create a linear model based on all the characteristics
of the trajectory, and giving a score as output. The evolution of an user will be evaluated
using the evolution of the score.
Summarizing the analysis module, it serves multiple purposes, following the next steps:
Eliminate the noisy points from the preprocessed set of trajectories
Compute the limits of the currently analyzed trajectory and create a bounding box based
on these points
Compute the cell matrix, which is a map of the bounding box divided in equal sized
cells of a given dimension
Iterate through the similar trajectories found by the similarity check module and sending
them to the iBDD module
Further analysis of the trajectory characteristics returned by the deviation detection
module
4.1.3 Similarity check module
The process of analyzing a given trajectory cannot be made solely based on the points of that
trajectory, therefore, a history of user trajectories which can be considered similar to the one
currently analyzed is needed, in order to compare it with the history of movement patterns
of that user. The similarity check module receives as input a bounding box, two trajectories,
an optional boolean parameter and it returns a boolean value. The bounding box is the one
calculated for the trajectory currently analyzed for wandering, which is the rst trajectory
17

given as input. The second input trajectory is the one that is compared for being similar with
the rst one. The boolean parameter represents an analysis option based on the time of the
day and if it is set on True, the analysis will also be based on the time, otherwise it will only
be based on the points in the second trajectory. For a trajectory to be similar with the one
currently analyzed, a given selected percentage of its points must be inside the bounding box,
so, the module will iterate through the points of the trajectory and count the ones placed
between the limits of the box, then calculates the ratio between the number of points placed
inside the box and the total number of points of the trajectory. If the analysis is also made
with respect to the time of the day, another similarity criteria is added, such that the second
trajectory needs to have over a selected percentage of its points in the same time interval as
the majority of the points of the rst trajectory, the one currently analyzed for wandering.
The main steps followed by this module, in order to check if two given trajectories are similar,
are:
Count the number of points from the second trajectory that are found inside the bound-
ing box
Compute the ratio between the number of points inside the box and the total number
of points in the trajectory
Compare the ratio to a chose threshold and if the ratio is larger than the threshold, the
trajectories are similar
Optional step: compare the time intervals of the two given trajectories
4.1.4 Symbolize module
Because a trajectory consists of hundreds of GPS points, analyzing each of them and compar-
isons between each other or between points from di erent trajectories may be computational
intensive and inecient, each point will be assigned to a cell in a matrix, such that, instead
of comparing the points of the trajectory individually for detecting the anomalies in the user's
movement pattern, the cells will be compared. The symbolizing module will be used for the
purpose of mapping each point to a cell and symbolizing the trajectory. A symbolized trajec-
tory is a set of cells assigned to the points in the trajectory, the nal result being a traversal of
matrix cells and not trajectory points. It receives as inputs a trajectory point and a cell matrix
and returns a cell in the matrix, which represent the cell in which the input point belongs. An
18

element in the cell matrix is represented as a pair of coordinates, the latitude and longitude.
The cell matrix will be traversed and the input point will be compared with four points from
the matrix, cell matrix i;j, cell matrix i;j+1, cell matrix i+1;jand cell matrix i+1;j+1, for each i
and j with i2f0..height(cell matrix)-1gand j2f0..width(cell matrix)-1g. When a point is
found such that its coordinates are between the four points of the cell matrix, the pair formed
by the corresponding i and j will be returned.
There are two main steps in the process of symbolizing a given trajectory:
Iterate through the trajectory
Map each point to the cell matrix, by comparing its coordinates with the extreme
coordinates of each cell
4.1.5 iBDD module
The iBDD module is the rst module described in Q. Lin et al [7]. As proposed by the
authors, the module takes as inputs the trajectory which has to be analyzed, a cell matrix and
a set of symbolized trajectories, which is a set of trajectories mapped to the cells in the cell
matrix. In the article, the output is a label, which is a boolean value, whose value is set on true
when the trajectory has deviations from previous movement patterns, therefore it is considered
wandering, and set on false, otherwise. We chose not to output a boolean label for the module,
since the dataset was unlabeled and it is also hard to decide if a trajectory can be classi ed
as wandering or not, solely based on the set of GPS trajectories. The implemented version
calculates several characteristics regarding the trajectory, like the number of repeated cells,
the distance of the trajectory in these cells, the distance of the trajectory in cells that weren't
visited before by any other trajectory in the working set and others. These characteristics can
be collected and used to analyze an evolution of an user's behavior. The current trajectory,
received as an input is traversed, point by point and symbolized by mapping each of the GPS
coordinates to a cell in the cell matrix. The result of this step is a symbolized trajectory,
which consists of a series of traversed cells. After that, the cells are traversed and given as
input, one by one to the detection module.
There are two main steps followed in the module:
Iterate through the set of similar trajectories and send them as inputs to the symbolize
19

module
Take each cell of the current analyzed trajectory and send it as input to the detection
module
4.1.6 Detection module
As explained by Q. Lin et al. [7], the detection module receives the current analyzed cell
from the symbolized trajectory and the working set, which is an auxiliary set of trajectories,
considered similar to the current one. In the article, the input cell is concatenated with the
rest of the trajectory traversed so far and then, according to the de nition, a new subset is
generated for the new trajectory, by searching if this trajectory is a sub-trajectory of any of the
trajectories in the set of similar trajectories. After that, the ratio between the length of the
subset and the length of the set of all similar trajectories is calculated and if it is lower than a
given threshold, the trajectory is considered as wandering and after that classi ed either as a
looping trajectory or a deviated one. If more than three cells are part of deviations, the module
classi es the trajectory as wandering and returns the label set on true. In our implementation,
the current cell, received as an input by the module is not concatenated with the trajectory
that was traversed so far, and, instead it is searched in the symbolized trajectories from the
set of similar trajectories and every time it is found a counter, initially set on 0 will increment
its value, then we compute the ratio between the number of times the cell was found in the
the set of similar trajectories and the length of this set, meaning the number of symbolized
trajectories found in it. If this ratio is smaller than a given threshold, the cell is considered as
a deviation from the users usual movement pattern and the cell is added to a list of deviating
cells. The looping cells are counted before calculating the ratio, by searching the current cell
in the trajectory that was traversed so far. If the cell is found in the symbolized trajectory, it
is counted as a looping cell and added to a separated list.
Identifying and computing the characteristics of a given trajectory are made in the detection
module, by following the next steps:
Take the current cell traversed in the trajectory and count how many times it appears
in the similar trajectories
Check if the cell appears in the symbolized trajectory traversed so far and if so, a loop
is found
20

Compute the ratio between the number of times the cell appears in the similar trajec-
tories and the length of the set of similar trajectories
If the ratio is smaller than a chosen threshold, the cell is considered as part of a deviating
trajectory
4.2 Indoor version
The indoor version is based on the implementation of the outdoor one, but has several modi -
cations. The rst di erence is made by the way how the bounding box is xed. In the outdoor
version it was calculated based on the extremities of the current trajectory, meanwhile, in the
indoor version it is set between 0 and the limits of the house. Also, coordinates will not be
used anymore and instead all the computations will be made in meters. Therefore, the cell
matrix will also be formed by pair of points which will represent the positions, in meters of a
series of points, where the point in the bottom left will be considered the origin of the system.
Additionally to the outdoor version, a new le will be used by the analysis module. This le
will contain on the rst line four values, which represent the limits of the house. In the le,
there are also mentioned the rooms of the house and their bounds. For an easier approach, a
room appears more times in the le and each apparition is a rectangle, represented by its four
corners in a bidimensional space, instead of having a single apparition and being represented
as a polygon with more than four edges, that would have made the process of mapping a
point over a room more dicult.
4.2.1 Room mapping module
A new module is added, so beside the mapping of a point to a cell from the cell matrix, the
points will also be mapped over the rooms of the house. The inputs of the module are a point
of the trajectory and a set which contains the set of rectangles for each room. An iteration
through every room is made, followed by one through every rectangle of each room. The
search stops when the rectangle that contains the input point between its limits is found, and
then, the room corresponding to the newly found rectangle is returned by the module. So,
in addition to the cell symbolization, a room symbolization will be used in order to detect
anomalies in the user's behavior.
21

Another di erence between the two versions is found in the module that checks the similarities
between two trajectories. Di erent approaches were tested for this module in the outdoor
version. Initially, all trajectories were considered similar, since the space of the house isn't
very large and taking a single trajectory and expanding its bounds can not be as relevant as
in the outdoor version, where trajectories could have signi cant di erences, being made in
di erent regions of the city. Before reaching a nal version, more methods were tested for this
module. The nal version has more criteria for the similarity between two trajectories. The
rooms where the trajectory begins and ends are considered, alongside the series of traversed
rooms and cells, which, for a trajectory we want to check, have to consist a given percentage
of the rooms or cells in the trajectory which represents the focus of our analysis.
Therefore, the main di erences between the two versions are:
For the indoor version, trajectories in the dataset are only represented as a point in a
bidimensional space
The computations are made in meters in the indoor version
The bounding box is no longer used for the indoor version
A new module is added, which has the purpose of mapping a point to a room in the
house
The similarity check module can be set up to search for similar sequences of room
traversals
4.3 Work
ow
In the next section there will be a detailed explanation of how the above modules are connected
and how a complete run of the assembled program would look like from the data preprocessing
part to the complete analysis of a trajectory. The Figure 3 shows the work
ow diagram.
As observed in the Figure 3, the rst module, preprocess, seen in the second column has the
purpose of converting the data from the initial dataset into something that is easier to use
by the further modules. Each of the database contains more les and also, each le contains
a set of GPS points which can determine one or more trajectories. Since the next modules
need separated trajectories, the preprocess module takes each user's database entry and every
le it stores and separates the trajectories, such that every di erent trajectory will be placed
22

Figure 3: The work
ow followed for analyzing a trajectory
in a le. The result after applying the module will be a new database which will contain the
newly created trajectory les, separated for each user. The names of the les for each user are
numbers from one to the total number of les in the database. The preprocessing part is only
used for the outdoor version, which uses the dataset made available by Microsoft Research
Asia [2, 3, 4].
After all trajectories are separated in di erent les, the next module, analysis, can begin. The
module takes an user and goes through all of the les in the database corresponding to the
selected user. All the trajectories are traversed and the noisy GPS points are eliminated from
them, since a noisy point could be easily mistaken with a deviating point of the trajectory.
After all the trajectories have their noisy points removed, an input parameter of the module
will select the trajectory which will be analyzed. After this trajectory is selected, as described
above, the module will calculate a bounding box using the extreme points from the trajectory,
then based on this bounding box and on an input parameter, it will construct a cell matrix, in
which each cell will have the size received by the parameter. After the bounding box and the
cell matrix are computed, all the trajectories are again iterated, in order to search the ones
similar with given input trajectory. For each trajectory of the speci ed user, the module calls
the similarity check method and records the similar trajectories in a list, for further use, in the
next deviation detection modules.
Two trajectories and a bounding box are received by the similarity check module from the
previous module, the analysis one. The rst one of them is the one currently analyzed trajec-
23

tory and the second one being a trajectory that will be compared with the rst one and with
the bounding box taken as parameter. If the second trajectory is considered similar to the
rst one, the previous module will add it to a list of similar trajectories which will form the set
of similar trajectories, used for the next module. All the relevant user history with respect to
the current trajectory will be kept in this set of trajectories and sent as an input to the iBDD
module.
The iBDD module receives as parameters from the analysis module the selected trajectory,
the cell matrix and the previously computed working set. The trajectory is traversed and each
of its points is given as an input to the symbolize module, alongside with the cell matrix.
After the iteration through the points of the trajectory is done, the symbolized trajectory will
be formed and traversed, cell by cell and used as input for the nal module, detection.
Each cell of the symbolized trajectory is compared with the symbolized trajectories from the
set of similar trajectories. The detection module has the purpose of nding looping and de-
viating cells in the trajectory and collect them into two separated lists. Using these lists, the
analysis module will then calculate the distances between all the points that are part of a
looping or deviating cell.
24

5 IMPLEMENTATION DETAILS
5.1 Module implementation
5.1.1 Data preprocess
The rst problem that appeared was the way data was represented and nding an easier
method of storing it and facilitating its further use. The outdoor dataset, provided by Microsoft
Research Asia [2, 3, 4] contains a user database, where each user have a set of trajectories,
stored in more les. Also, the database contains a label le which stores the date, starting and
ending times, but also the transportation mode for each trajectory of that user. The purpose
of the data preprocess module is to create a new database, containing each of the existing
users, where every one of them will have its own set of separate trajectories, delimited in a
single le each. As a nal result, each line from a le consists of the GPS coordinates of the
geographical point, the date and the time when that point was recorded. After the last point
in the trajectory the transportation mode will be appended at the end of the le.
The rst approach I have tried to implement was iterating through each line from the label
le, storing the date, the starting time, the ending time and the method of transportation
of the trajectory, then iterate through all the user's trajectories and search for the one that
starts and ends at the timestamps previously stored. This approach was faulty because the
starting and ending times in the labels le weren't exactly accurate for the given trajectory
and that could have lead to missing an important part of the points from that trajectory. For
the second approach I took into account that most of the GPS points in the trajectory are
recorded at a high sampling rate and decided to iterate through each le and search when
a point is recorded with a time di erence larger than a given value from the previous point
in the le. Therefore, if this time di erence is larger than the selected value, the point will
represent the beginning of a new trajectory and the previous one will be the ending point in
the last trajectory.
Multiple transportation methods were included in the dataset, but the relevant one for our
25

purpose is walking. The next step was iterating again through the labels le and then, for
each trajectory of the user it is checked if the time interval measured between the starting
and ending times in the label le is inside the time interval measured between the starting
and ending times of the trajectory which is currently veri ed. When a match is found, the
transportation method is added to the trajectory.
For storing the data inside the script, I used a dictionary for the user, where the key is the
indicator in the database corresponding to a given user and the value of a key is also a
dictionary which contains a list of trajectories, labels, starting and ending times and dates.
The result after running the script will be a new user database, containing a list of les for
each user, which represent di erent trajectories recorded from that user.
5.1.2 Analysis
The analysis module is the core of the project and has several functionalities, like reading
data, eliminating noisy and useless data, computing boundaries for trajectories and computing
several characteristics about a given trajectory.
The rst step taken in this module is reading all the trajectory les for the selected user. A
trajectory will be stored as a list of strings, where a string contains the GPS points and the
timestamps for each recorded point. These lists will be stored in a dictionary, where the key
is the name of the le, also representing the name of the trajectory and the value is the list
which represents the trajectory corresponding to that le.
After the trajectories are read and stored, the next step is eliminating the noisy points. In
this step, a new trajectory is initialized with the rst element in the initial trajectory. The
initial trajectory is then iterated from the second point and the distance between the current
point and the last point of the new trajectory is compared with a given threshold value. If the
distance is larger than the threshold value, then the point is not added to the new trajectory
and the iteration continues and if the distance is lower than the threshold, the point is added
to the new trajectory. Algorithm 1 shows how the process of eliminating noisy points works
for a single trajectory.
26

Algorithm 1 Noisy points elimination
Input:traj = [p1;p2;:::;p n]- a given trajectory of the user
Output:newtraj – the trajectory without noisy points
newtraj p1
forpoint intraj[p2;:::;p n]do
ifdistance(point ,newtraj[1])< then
newtraj newtraj[point
end if
end for
After the trajectories without the noisy points are calculated and stored, the initial bounding
box is calculated based on the input trajectory. The points from the trajectory are traversed
and the most extreme latitudes and longitudes are searched. In order to compute the ex-
tended bounding box, which will be used for searching the similar trajectories, two di erences
are computed, one between the two most extreme latitudes and one between the most ex-
treme longitudes. These di erences are then multiplied with a given value and then the value
resulted from the latitude is added to the east and west, expanding the box in width and the
value resulted from the longitude is added to the north and south, therefore the box will be
expanded in height. The expanded bounding box and the initial one will have the same center,
since its dimension is equally increased in both directions for latitude and both directions for
longitude. The bounding box is stored as a list with four values.
In Algorithm 2 is shown how the process of computing the bounding box for a given trajectory
works.
27

Algorithm 2 Compute bounding box
Input:traj = [p1;p2;:::;p n]- a given trajectory of the user
Output:bbox – the computed bounding box
bbox []
minlat;minlon 1
maxlat;maxlon 1
forpoint intraj do
minlat min(point:lat ,min lat)
minlon min(point:lon ,min lon)
maxlat max(point:lat ,max lat)
maxlon max(point:lon ,max lon)
end for
bbox [minlat;minlon;maxlat;maxlon]
The next step is computing the cell matrix, which will be used for the symbolization process.
This matrix is constructed based on the expanded bounding box and an input parameter which
gives the size of a cell. All the cells are equal-sized squares. The starting point of the cell
matrix is the south-west corner of the extended bounding box. The cell matrix is stored as
a list of lists, where each list represents a row. The row elements are pairs of latitude and
longitude coordinates for the limits of a cell. Beginning from the starting point mentioned
above, each row will be constructed by adding the cell size to the latitude of the last point,
until the latitude limit of the bounding box is reached. After this, the longitude value is
increased with the cell size and the latitude value is reseted to the value fo the rst point.
This process is repeated until all the rows are added to the matrix. A potential problem may
appear within the execution of this part if while computing the cell matrix, the extreme points
of the earth would be reached. In this case, the algorithm won't provide the desired result.
This process is described as shown in Algorithm 3.
28

Algorithm 3 Compute cell matrix
Input:bbox= [minlat;minlon;maxlat;maxlon],cellsize
Output:cellmatrix
cellsize []
lon minlon
whilelonmaxlondo
line []
lat minlat
whilelatmaxlatdo
line line[lat
lat lat+ (cellsize=earth radius )(180=pi)
end while
lon lon+ (distance=earth radius )(180=pi)=cos(lat=180)
cellmatrix cellmatrix[line
end while
After computing the bounding box and the cell matrix, the module iterates through the
trajectories and sends them as inputs to the similarity check module in order to nd which of
them will be used as further in the iBDD and detection modules. The detection module will
compute several characteristics relevant to the trajectory analyzed, like the distance of the
deviations, the distance of the looping parts of it and the number of deviating and looping
cells.
For computing the lengths of the deviating portions of the trajectory, we used three lists. The
rst one contains more lists, representing each deviating region of a trajectory. The second
one is used to store the current points in a deviating trajectory and will be reseted to the
empty list when the current deviating part ends. The third list will store the distances of each
of the deviating regions. The process starts with initializing all these lists with the empty list,
then the analyzed trajectory will be iterated and the points mapped to the cell matrix. If the
cell is one of the cells that after the detection module resulted as deviated, the point will be
appended to the second list. When a cell that isn't part of the deviating cells is found, the
current deviating trajectory will be appended to the rst list and then it will be reseted to
the empty list. Computing the distance of such a sub region of a given trajectory is made by
29

traversing the list from the second element and adding the distance between it and the next
point, until all the trajectory is traversed. For the outdoor, version, the distance between two
points is traversed with the haversine formula, shown in (3) and which can be found in the
haversine python library.
2rs
sin2(12
2) + cos(1)cos(2)sin2(12
2) (3)
where:
ris the radius of Earth
1,2are the latitudes of the two points in radians
1,2are the longitudes of the two points in radians
Algorithm 4 shows how the distance of di erent deviating portions of the trajectory is com-
puted. For the looping trajectories, the process is similar, but an additional check is made,
verifying for if the rst pass through a cell that is considered as looping was made, therefore,
only after the second pass the distance will be computed.
Algorithm 4 Compute the deviating portions
Input:traj = [p1;p2;:::;p n]- given trajectory
devcells – list of all deviating cells
cellmatrix – the cell matrix
Output:devregions – list of all deviating regions
devregions;current dev []
forpoint intraj do
cell map(point ,cellmatrix )
ifcell indevcells then
currentdev currentdev[point
else
ifcurrentdev notempty then
devcells devcells[currentdev
currentdev []
end if
end if
end for
30

5.1.3 Similarity check
As stated in the Section 4, the module's purpose is to check the similarity between two
trajectories received as input. The analysis module iterates through all the trajectories and
sends them as inputs to this module, alongside with the bounding box of the rst trajectory,
which is the one currently analyzed, so the rst parameter of the function will always be the
rst trajectory, represented as a list, and the bounding box will be the one computed for this
trajectory. The second parameter will be di erent each time the function is called, being given
by iterating through all the trajectories.
In this module, the trajectory is traversed point by point and it counts every point that is
placed inside the bounding box, by checking if its coordinates are between the limits of the
bounding box. The ratio between this number and the length of the trajectory (the number
of points) is compared to a chosen threshold and if the ratio is larger than the threshold, the
trajectories will be considered similar.
All the inputs are represented as lists. The trajectories have the same format, being lists
of strings and each of the strings stores all the information about a point. When only the
coordinates are needed, the string is split and the coordinates are converted to
oat values.
The bounding box is a list of 4 values, representing the limits of its coordinates.
Algorithm 5 shows how the process of checking the similarity between two trajectories works.
Algorithm 5 Trajectory similarity check
Input:bbox= [minlat;minlon;maxlat;maxlon],traj1,traj2
Output: a boolean value
nr 0
forpoint intraj2do
ifminlatpoint:latmaxlatandminlonpoint:lonmaxlonthen
nr nr+ 1
end if
end for
ifnr=len (traj2)>then
returnTrue
end if
returnFalse
31

5.1.4 Symbolize
This module uses two separate functions to symbolize a trajectory and map its points to the
cell matrix. The rst function takes as input a cell matrix and a point from a trajectory and
returns the indices of the cell the point is placed in. The point is a string which contains
the placement and timestamps where it was recorded. The cell matrix is a list containing the
rows of the matrix, which are also lists formed by pairs of
oat values representing latitudes
and longitudes.
Finding the cell the point is placed in is done by traversing the matrix and checking if the
point is placed between the current point of the cell matrix and its three neighbors. If the
(i;j)are the current indices of the traversal, the point which have to be symbolized is place
between the points in the cell matrix with at (i;j+ 1) ,(i+ 1;j)and(i+ 1;j+ 1) . The
search stops when the corners of the cell matrix where the point is placed between are found
and the (i;j)pair will be returned.
Algorithmalg:mapp shows the process of mapping a point to a cell in the cell matrix. For
symbolizing a whole trajectory, it is traversed, mapping each point from it and appending the
resulting cells in a list. If the cell will be the same with the last element in the list, it will be
ignored.
Algorithm 6 Mapping a point to the cell matrix
Input:cellmatrix ,point
Output:mappedpoint
numrows length(cellmatrix )
foriincellmatrix [0::numrows1]do
row cellmatrix [i]
numcols length(row)
forjinrow[0:::numcols1]do
ifcellmatrix [i][j]:latpoint:latcellmatrix [i][j+ 1]:lat
andcellmatrix [i][j]:lonpoint:loncellmatrix [i+ 1][j+ 1]:lon then
return (i,j)
end if
end for
end for
32

5.1.5 Deviation detection
The deviation detection module is the one that stores the deviating characteristics of a tra-
jectory. It counts the number of deviating and looping cells and stores them into separate
lists. A deviating trajectory is identi ed if the current analyzed cell is already part of the
list of already visited cells and a deviating cell is one where the ratio of similar trajectories
containing it is smaller than a chosen threshold.
The process starts by initializing two lists with the empty list. These lists will keep the de-
viating, respectively looping cells and will be declared as global variables for the sake of not
being added as function parameters and also outputs. The function's arguments are the sym-
bolized trajectory traversed so far, the cell of the current point in the trajectory and the set
of similar symbolized trajectories. The symbolized trajectory is represented as a list of pairs,
where each pair consists of the the row and column indices of the cell matrix. The set of
similar symbolized trajectories is a list of lists, where each one of them has the same for as
the current symbolized trajectory. After initializations, the current cell is appended to the
trajectory traversed so far and a new list is initialized with the empty list. This list will contain
all the trajectories from the similar trajectories that contain the current cell.
After that, a for loop will be used to iterate through the elements from the list of similar
trajectories and a check will be made to see if the cell is in each trajectory. Whenever the cell
is found in a similar trajectory, a counter initialized with 0 will increase its value. The rst case
of wandering was de ned as a deviating trajectory and for identifying it we compare the ratio
between the counter and the number of similar trajectories with a chosen threshold value. If
this ratio is smaller, then the cell is part of a deviating region of the trajectory. The second
de nition implies that a looping trajectory can be a sign of wandering behavior, therefore, a
search is made for such looping regions. So, if the cell is already part of the trajectory that
was traversed so far, it will be considered as a looping cell and will be appended to the list.
Algorithm 7 shows the process of nding the deviating and looping cells from a symbolized
trajectory, when the set of similar trajectories and the trajectory traversed so far are given. 
is a chosen threshold value.
33

Algorithm 7 Cell deviation detection
Input:st- symbolized trajectory traversed so far
cell – current cell in the traversal
TG- set of similar trajectories
Output:deviatingcells
st st[cell
Tsupp []
fortraj inTGdo
ifcell intraj then
Tsupp Tsupp[cell
end if
end for
ifcell inst[0:::len (st)1]or len(Tsupp)/len(TG)then
deviatingcells deviatingcells[cell
end if
returndeviatingcells
5.2 Indoor version
The main di erence between the indoor and the outdoor versions, is given by the usage of a
room mapping module in the case of the indoor version.
The implementation of the module starts by reading an additional le, speci c to the indoor
version, which contains the plan of the house. Each room has a number assigned and is
represented by one or more rectangles that can have equal or di erent sizes. Inside the
program each room will be represented as a key in a dictionary, where its value will be a list
containing all the di erent rectangles that compose that room.
5.2.1 Room mapping
Mapping a point to a room is similar to mapping it to a cell in the cell matrix. Each list from
the dictionary is traversed and the point's coordinates are compared with the corners of the
rectangles from the lists. When the list that has the point inside one of its triangles is found,
34

the key of the dictionary, representing the room will be returned.
5.2.2 Similarity check
Since the house can't be large, the trajectories will also be limited, so compared to the
outdoor version, considering a simpler implementation, all trajectories can be considered as
similar, such that no additional computations will be made. Although, the rooms traversed
are available, so two trajectories could be considered similar if they have the same origin and
destination or if they have a similar subset of visited rooms or if the sequence of rooms a
trajectory traverses is a subsequence of the other.
5.3 Implementation problems and solutions
The rst problem appeared during the implementation of the project was in the similarity check
module, where we used an inecient method for deciding if two trajectories are are similar.
This method was iterating through the rst trajectory, point by point and was calculating the
distance between the point and each point of the second trajectory. When a distance larger
than a selected threshold was found, the iteration was stopped and the trajectory similarity
hypothesis was rejected. If no distance was larger than the threshold, then the trajectories
were considered similar. For some user datasets, this method could take more than half an
hour to nish the check. The current version uses the bounding box method which is much
more ecient. The similarity check is made only by iterating each trajectory once.
Another problem which appeared in the early stages was in
icted by the presence of noisy
points. Since the similarity of two trajectories is in
uenced by the bounding box, a point place
outside of it will not be mapped to the cell matrix, because the matrix has the same limits with
the bounding box. This was a problem because the mapping function was returning a None
value which was further compared to other values and was leading to errors. Eliminating noisy
points from the trajectories and increasing the percentage of points that have to be placed
in the bounding box such that two trajectories are similar were the steps taken to solve the
problem.
35

5.4 Programming languages and technologies used
The main programming language used was Python, mostly because it's simple to use, intuitive
to read and write and o ers a lot of advantages, being easy to make plots, use lists and
dictionaries. Several additional libraries and technologies were used in order to solve speci c
tasks.
haversine library: used for computing the distances between two points represented by
their GPS coordinates
geopy library: also used for calculating the distance between two points (slower, but
more accurate than haversine)
plotly library: used for creating interactive plots
time library: used for computing time di erences between the timestamps of two points
ipywidgets library + jupyter notebook: used for implementing a graphical interface for
an easier visualization
For creating the linear model based on the evaluation function of a trajectory we used the R
language, because of the simplicity it provides for tasks like this and for creating diagnostics
plots. We also used it for di erent other 3D plots.
36

6 EVALUARE
The implemented version is based on the method proposed by Q. Lin et al. [7] and presents
more features like an indoor version and the analysis of the characteristics of the sub regions
that form a deviating trajectory.
6.1 Outdoor version
For evaluating the outdoor version, we used the same dataset used by Q. Lin et al. [7], which
is a dataset made public by Microsoft Research Asia [2, 3, 4]. This dataset contains sets of
trajectories for 183 users. A smaller version of the dataset, containing data for 31 users has
been used, since for this version the trajectories were labeled with the transportation method,
so it was easier to select the one where the user is walking.
6.1.1 Area Under ROC Curve metric
The evaluation metric used in Disorientation detection by mining GPS trajectories for cognitively-
impaired elder [7] is the Area Under ROC Curve. The value of the area can take values between
0 and 1 and the closer the value is to 1, the better the performance is.
Even if our proposed method is based on the one proposed by Q. Lin et al. [7], our output is
not a label indicating if the trajectory is deviating or not, since we considered that such an
output is not enough for being able to evaluate an user's behavior, so we can't compare the
results with the above mentioned metric.
6.1.2 Complexity analysis
Complexity wise, our method and the iBDD method [7] are similar, the complexity being
given by the number of cells in the currently analyzed trajectory multiplied by the number of
trajectories in the set of similar trajectories. For testing the execution time, we iterated more
37

an more times through the set of similar trajectories to see how much the length of this set
a ects the computation time.
Figure 4 shows the execution time of the algorithm vor di erent sizes of the trajectory set. As
observed, the execution time doesn't increase semni catively when the number of trajectories
in the set is increased.
Figure 4: CPU times in
uenced by the size of the trajectory set
Compared to the performance shown in Figure 5 [7], our method is almost six time faster than
the one proposed by Q. Lin et al. [7]. This di erence in speed of execution can be explained
in the process of symbolization, where our method doesn't store all the mapped cells, keeping
a cell only if the last cell stored is di erent from the one resulted in the current step, instead
of storing all the cells and marking the ones that are new. Because of this the traversals
can be done much faster, since the method will iterate by a number of times equal with the
number of cells, not points in the trajectory. Also, the process of checking the similarity of
two trajectories is very simple and fast, needing only a traversal of each of the trajectories in
the set.
Practically, in our implementation the most time consuming part will be the process of mapping
itself, since all the trajectories are traversed and for each point in the trajectory, a search in
the cell matrix will take place.
38

Figure 5: CPU times in
uenced by the size of the trajectory set
6.1.3 Trajectory visualization
One of the main features of our implemented method is the possibility of visualizing a trajectory
and the sub regions that are considered as deviating or looping. In Figure 6 the result of the
algorithm can be observed. The grid represents the cell matrix calculated by the analysis
module, while the boundaries of the plot are the same with the limits of the bounding box,
which was also computed in the analysis method. The trajectories colored in gray represent
the set of similar trajectories (the history of trajectories for the user), while the blue one is
the trajectory currently analyzed. The red subsection represents the deviating part and, as
seen in the gure, only a few trajectories contain the cells that the red section traverses. The
threshold for the ratio between the trajectories that have to contain the analyzed cell and
all the similar trajectories was set up to 0.25 when the experiment shown in the Figure 6 was
made. The dark orange section of the trajectory indicates a possible looping and as shown,
the trajectory leaves a cell then returns back to it again. Since the looping trajectory is veri ed
by searching the current cell in the set of already traversed cell, if we want to represent a
more accurate circular region, we can search the current cell in the set of traversed cell, but
without the last one or two cells. Doing this would eliminate the orange portion from the
category of looping regions. Also, if we want to make sure that a complete circle is made, we
can eliminate the last four cells from the trajectory traversed so far.
39

Figure 6: Outdoor trajectory visualization
6.1.4 Constructing the linear model
Since the implemented method is supposed to o er the analysis of the evolution for an user,
a linear model will be used to approximate this evolution, for an easy visualization. Several
characteristics of the a set of trajectories will be used as inputs, while a function, depending
on these characteristics will be a score. The way user's scores evolve over time can o er a
view of how their movement behavior has evolved.
For the experimental setup, we have chosen a random user and a trajectory belonging to
that user. After that, the chosen trajectory is analyzed and we obtain the the set of similar
trajectories. For this experiment, we also compared the distances of the two distances for
checking the similarity between them. We store the set of similar trajectories and use the rst
100 of them as a history, without analyzing them, and the next 100 will be analyzed using all
the trajectories appearing before them in the set of similar trajectories. The trajectories were
ordered by the timestamp they are starting at, such that a realistic analysis can be made.
For the rst experiment, the features used as inputs for the linear model were the distance
40

of deviating regions, the number of deviating regions, the distance of looping regions, the
number of looping regions, the total distance of the trajectory and the average degree (ratio
between the number similar trajectories that contain a cell and the total number of similar
trajectories), and the formula used was:
(devlendevnum +looplenloopnum +totdist0:1)avgdeg2=1000 (4)
where the terms are described above, in the same order.
In the Figure 7 the residuals versus tted plot for the linear model used. This plot is used as
a visual indicator for nding out if there are non-linear relationships between the residuals. A
horizontal line indicates that such relationship doesn't exist, however, the line is not horizontal,
so the residuals may be spread after a non-linear pattern, indicating that the model may not
be good.
Figure 7: First linear model diagnosis (a)
The second diagnosis plot for the linear model is the residuals versus leverage plot, shown in
Figure 8, used for showing if there are points that may have an in
uence over the regression
line. The points observed in the upper right corner and lower right corner are in
uential
points. For a good model, the Cook's distance line is barely visible in the plot.
41

Figure 8: First linear model diagnosis (b)
The second function used is described in the formula below:
(devlendevnum +looplenloopnum)=(devnum +loopnum + 1) (5)
As seen in Figure 9 the residuals still appear to be spread after a non-linear relationship, but
in Figure 10 we can observe that there are no in
uential cases against the regression line.
So, we considered that the described function for the cost may be a more adequate one for
describing how the user's movement patterns are evolving. Figure 11 shows how the score
evolves in
uenced by the the percentage of the looping and deviating regions of the trajectory.
As we can see, a higher percentage of both means a higher score. In the Formula 5 we have
used the number of loops and number of deviations for a tentative try to ponder the formula
into considering the type of wandering that was predominant.
42

Figure 9: Second linear model diagnosis (a)
Figure 10: Second linear model diagnosis (b)
43

Figure 11: Score in
uenced by the percentage of deviating and looping regions
After the model was created and evaluated, we used the obtained scores to plot the trend
of the user behavior, as shown in Figure 12. The plot shows a slowly ascending trend in the
scores.
Even though the analyzed user had a set of more than 2000 trajectories, analyzing each of
them was time consuming, so only 100 trajectories have been analyzed and labeled with a
score.
44

Figure 12: Score trend for the selected user
6.2 Indoor version
6.2.1 Indoor trajectory visualization
Visualizing the indoor data uses the same approach with the indoor version and can be seen in
the Figure 13. The colors have the same meaning as those explained for the indoor version. As
observed in the gure, a looping region of the trajectory may not be so relevant for evaluating
user's behavior, since many paths can contain as beginning and ending point the same room,
so these paths can have the same set of traversed cell for the two parts of the trajectory
(leaving the room and going to another place and coming back to the room).
45

Figure 13: Trajectory visualization for the indoor version
6.2.2 Evaluating user's behavior
We used the same score function from the outdoor version and tested what the tendency is
from our set of trajectories. Figure 14 shows how the score is in
uenced by the percentage of
the looping and deviating regions of the trajectory. We can observe that in these trajectories
there aren't many deviating regions and most of them are loops, since depending on the plan
of the house, a daily trajectory can usually reach all the places.
46

Figure 14: Scores in
uenced by deviation and looping percentages
In Figure 15 the evolution of the trajectory scores are observed. The dataset was smaller
compared to the one used for the outdoor version. For a more relevant description of the
evolution, a bigger dataset may be needed.
Even though looping regions in the trajectory are correctly detected, they may not be as
relevant as they are in the outdoor version, since loops are common in an indoor trajectory,
so on the future, a more restrictive algorithm should be apply. For example all the loops for
all the rooms could be measured, or the small loops which are also part from a larger loop.
Another idea would be splitting the trajectory and analyze its characteristics for every room it
passes through. Also the score function should be updated so all these factors will in
uence
it.
47

Figure 15: User score evolution
48

7 CONCLUSIONS
Our project have the purpose of extending the state of the art methods used in wandering
detection by providing the possibility to visualize the evolution of the user, instead of an
answer to whether a trajectory they traverse can be classi ed as deviating from their normal
behavior or not.
The objectives of the project are to provide a measure for evaluating the evolution of an user
and also provide the visual representations of how their trajectories compare to the rest of the
trajectories in the history.
Our implementation was based by the implementation made by Q. Lin et al. [7] and is divided
in several modules. The detection of the deviation characteristics of a trajectory is made
based on two de nitions of wandering behavior in a person's movement pattern. The rst
one refers to looping regions in the trajectory traversed by the user and the second one refers
to the regions of the trajectory that are rarely or never visited in the history of movement
patterns for that user.
The results show how a linear model can be created based on the characteristics of the
deviating regions of the trajectory and provides a score that can represent a measure of the
user's evolution over the set of trajectories.
To conclude, the main points evidentiated in the project are:
The state of the art methods focus on deciding if a trajectory is deviated, not on
evaluating the user's evolution
The lack of datasets for wandering behavior makes the evaluation harder to interpret
It is hard to tell if a deviating trajectory or a set of deviating trajectories are an indicator
for showing cognitive degradation or dementia
A model that evaluates the user's evolution can be more relevant than the result of a
single trajectory
Our implemented version proposes an approach where a model is created with the
purpose of evaluating the evolution of an user
49

The application provides a visual representation of a user's trajectory compared with
the history of similar trajectories
For the future work, the project can be extended by:
Simulate wandering behavior in the datasets already used
Add semantic places in the trajectories and explain the user's activity based on them
Try to explain when activities take place by using the timestamps of the points
Improve the indoor version and create statistics for the trajectory split for every room
50

BIBLIOGRAPHY
[1] M. r. hodges, k. kirsch, m. w. newman, and m. e. pollack. automatic assessment of
cognitive impairment through electronic observation of object usage. in pervasive, pages
192{209, 2010.
[2] Yu zheng, like liu, longhao wang, xing xie. learning transportation modes from raw gps
data for geographic application on the web, in proceedings of international conference on
world wild web (www 2008), beijing, china. acm press: 247-256.
[3] Yu zheng, quannan li, yukun chen, xing xie. understanding mobility based on gps data.
in proceedings of acm conference on ubiquitous computing (ubicomp 2008), seoul, korea.
acm press: 312{321.
[4] Yu zheng, yukun chen, quannan li, xing xie, wei-ying ma. understanding transportation
modes based on gps data for web applications. acm transaction on the web. volume 4,
issue 1, january, 2010. pp. 1-36.
[5] Damla Arifoglu and Hamid Bouchachia. Activity recognition and abnormal behaviour
detection with recurrent neural networks. 110:86{93, 12 2017.
[6] Kyunghyun Cho, Bart van Merri enboer, Dzmitry Bahdanau, and Y Bengio. On the prop-
erties of neural machine translation: Encoder-decoder approaches. 09 2014.
[7] Qiang Lin, Daqing Zhang, Kay Connelly, Hongbo Ni, Zhiwen Yu, and Xingshe Zhou.
Disorientation detection by mining gps trajectories for cognitively-impaired elders. 19, 01
2014.
[8] Tim van Kasteren, G B. Englebienne, and B Krose. Human activity recognition from
wireless sensor network data: Benchmark and software, 05 2011.
51

ANEXE
Anexele sunt opt ,ionale. Ce poate intra ^ n anexe:
Exemplu de s ,ier de con gurare sau compilare;
Un tabel mai mare de o jum atate pagin a;
O gura mai mare mai mare de jum atate pagin a;
O secvent , a de cod sursa mai mare de jum atate pagin a;
Un set de capturi de ecran (\screenshot"-uri);
Un exemplu de rulare a unor comenzi plus rezultatul (\output"-ul) acestora;
^In anexe intr a lucruri care ocup a mai mult de o pagin a ce ar ^ ntrerupe rul natural de
parcurgere al textului.
52

A EXTRASE DE COD
. . .
53

Similar Posts