Int. J. of Information and Decision Sciences, Vol. x, No. x, 201X 1 [617242]

Int. J. of Information and Decision Sciences, Vol. x, No. x, 201X 1
Query optimization in real-time spatial Big Data
.
Abstract: Nowadays, real-time spatial applications have become more and more
important. Such applications result dynamic environments where data as well as
queries are continuously moving. As a result, there is a tremendous amount of
real-time spatial data generated every day. The growth of the data volume seems
to outspeed the advance of our computing infrastructure, especially that users
expect to receive the results of each query within a short time period without
holding into account the load of the system. To solve this problem, several
optimization techniques are used. Thus, we propose, as a first contribution, a
novel data partitioning approach for real-time spatial Big data named VPA-
RTSBD (Vertical Partitioning Approach for Real-Time Spatial Big data). This
contribution is an implementation of the Matching algorithm for traditional
vertical partitioning. Then, as a second contribution, we propose a new frequent
itemset mining approach which relaxes the notion of window size and proposes
a new algorithm named PrePost*-RTSBD. Thereafter, a simulation study is
shown to prove that our contributions can achieve a significant performance
improvement.
Keywords: Real-Time spatial Data; Transaction; Stream data; Feedback Control
Scheduling; Quality of Service; Data partitioning; Frequent itemset mining;
Simulation.
Copyright c
201X Inderscience Enterprises Ltd.

2
1 Introduction
In recent years, real-time spatial applications, like location-aware services and traffic
monitoring, have become more and more important. Such applications result dynamic
environments where data as well as queries are continuously moving. As a result, there is
a tremendous amount of real-time spatial data generated every day. The growth of the data
volume seems to outspeed the advance of our computing infrastructure. For instance, in
real-time spatial Big Data, users expect to receive the results of each query within a short
time period without holding in account the load of the system. But with a huge amount
of real-time spatial data generated, the system performance degrades rapidly especially in
overload situations. To solve this problem, we propose the use of data partitioning and
frequent itemset mining approaches as optimization techniques.
As a first contribution, we propose a novel data partitioning approach for real-time
spatial Big data named VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial
Big data). This contribution is an implementation of the Matching algorithm for traditional
vertical partitioning. We find, firstly, the optimal attribute sequence by the use of Matching
algorithm. Then, we propose a new cost model used for database partitioning, for keeping
the data amount of each partition more balanced limit and for providing a parallel
execution guarantees for the most frequent queries. VPA-RTSBD aims to obtain a real-
time partitioning scheme and deals with stream data. It improves the performance of query
execution by maximizing the degree of parallel execution. This affects QoS (Quality Of
Service) improvement in real-time spatial Big Data especially with a huge volume of
stream data. The performance of our contribution is evaluated via simulation experiments.
The results show that the proposed algorithm is both efficient and scalable, and that it
outperforms comparable algorithms.
In real-time spatial Big Data, sliding window model and frequent itemset mining over
dynamic data are the most important problems in the context of data mining. Thus, several
mining methods have been proposed in the literature where sliding window model for
frequent itemset mining is a widely used model for data stream mining due to its emphasis
on recent data and its bounded memory requirement. These methods use the traditional
transaction-based sliding window model. This model keeps a fixed size window based on
the transaction numbers because it assumes that the transactions arrive at a constant rate
which is not suited in the real world applications. Thus, the performance of this model
is degraded in terms of reflecting recent changes. Based on these observations, this paper
relaxes the notion of window size and proposes the use of a timestamp-based sliding
window model. In our proposed frequent itemset mining algorithm, support conditions are
used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to
incrementally maintain the essential information.
This paper is structured as follows: Section 2 discusses related works. Section 3
describes the real-time Spatial Big Data (RTSBD) model. Section 4 introduces the sliding
window model in RTSBD. Section 5 introduces our data partitioning approach. Then,
we present our frequent itemset mining algorithm in section 6. We present results of our
simulation experiments in Section 7. Finally, we conclude the paper and we give future
research directions.

Query optimization in real-time spatial Big Data 3
2 Related works
These recent years, there is a huge increase in the use of spatio-temporal applications
like location-aware services and traffic monitoring. Such applications result dynamic
environments where data as well as queries are continuously moving. Several works
dedicated to continuous spatio-temporal query processing (e.g., see (Hadjieleftheriou et
al., 2003), (Iwerks et al., 2003) , (Kwon et al., 2002), (Lee et al., 2002) , (Mokbel et al.,
2004),(Mokbel et al., 2005), (Prabhakar et al., 2002), (Saltenis et al., 2000), (Tao et al.,
2002), (Tao et al., 2003), (Huang et al.,2013), (Bai et al., 2016)) rely mainly on the ability
of indexing and storing spatio-temporal data.
Today, the issue of real-time seems increasingly clear. Researching on the real-
time spatial analysis becomes a hot topic. This calls for new real-time spatio-temporal
query processing algorithms. Indeed, numerous research efforts are devoted to real-time
continuous spatio-temporal query processing. For instance, Mokbel et al., in (Mokbel et
al., 2004), proposed a scalable location-aware database server called PLACE (Pervasive
Location-Aware Computing Environments). PLACE extends data streaming management
systems to support real-time continuous query processing in spatio-temporal databases.
The PLACE continuous query processor deploys the idea of sliding-window queries from
traditional data streams and exploits a shared-execution paradigm. The main idea of this
approach is to collect similar queries in a query table. Then, these queries are evaluated
by joining moving objects and moving queries. In fact, PLACE is a good solution for
the real-time continuous query processing of spatio-temporal streams. But the amount
of transmitting data to the updates are limited. Indeed, this approach does not take into
account the massive amount of heterogeneous real-time spatial data. Consequently, the
system performance degrades rapidly in overload situations.
To solve the problem associated with huge amounts of GIS data, Zhang et al., in (Zhang
et al., 2014b), proposed a real-time interactive analytics system for large spatio-temporal
data called OceanRT. OceanRT performs real-time analytics over big spatio-temporal data
from large mobile network operators. It proposes a novel computing architecture with
enhanced parallelism and aims for resource efficiency and high scalability. In order to
satisfy the real-time response time requirements, OceanRT employs remote direct memory
access(RDMA). But this ensures a low query response time.
The authors, in (Kiran et al., 2015), proposed a Lambda architecture for cost-effective
batch and speed Big Data processing. This approach combines ideas from database
management and cloud computing. It is a data-processing architecture designed to handle
massive quantities of data by taking advantage of both batch and stream-processing
methods. This architecture allows users to optimize their responses by understanding which
parts of the data need online or batch processing. Thus, it is a good architecture to attempt
to balance latency, throughput, and fault-tolerance. But this work needs to be extended to
explore issues of data security.
Another interesting paper by Salmon et al. (Salmon et al., 2017) proposed a stream
framework for mobility analysis. They introduced the design principles of a holistic
approach combining real-time processing and archived data in a real-time spatial Big Data.
In this approach, transactions arrive at varying frequencies. As the frequency increases
dramatically, the balance of these systems is jeopardized. During the periods of overload,
real-time spatial Big Data will potentially run out of resources and real-time transactions
will then miss their deadline in greater numbers.

4
Thus, we introduced, in (Hamdi et al., 2015), a specific management of the
heterogeneous real-time spatial data based on techniques of scheduling with feedback
control called FCSA-RTSBD (Feedback Control Scheduling Architecture for Real-Time
Spatial Big Data). The main objectives of this architecture are the following: take into
account the heterogeneity of data, guarantee the data freshness, enhance the deadline miss
ratio even in the presence of conflicts and finally satisfy the requirements of users by the
improving of the quality of service (QoS).
Table 1 summarizes our literature survey of key related works. The analysis of these
related works highlights their weak sides.
Table 1 Related works.
ACID online
processingBatch
processingBig data
processingMonitor
system
resourcesOptimise
queries
processingEnhance
QoS
PLACE * *
OceanRT * *
Lambda * * * *
Salmon et
al. (2017)* * * *
FCSA-
RTSBD* * * * * *
As we show in the fifth column in Table 1, there is no control of system resources,
especially in a transient overload in a RTSBD system. Therefore, all jobs are discarded
temporarily. The main purpose of our work is exactly to find solutions for this problem.
3 System Overview
Real-time spatial Big Data (RTSBD) can be seen as a combination of Big Data, real-time
system and spatial database as shown in figure 1. RTSBD inherit:
the respect of the temporal constraints from the real-time system.
the management of huge amount of heterogeneous data from different sources from
Big Data.
the spatial aspect of the data from the spatial database.
Applications using the real-time spatial Big Data receive continuously a huge amount
of real-time spatial data from mobile objects (e.g., moving vehicles in road networks). The
streaming nature of real-time spatial data poses new challenges that require combining
real-time spatial Big Data and data stream management systems.
In this section, we give an overview of heterogeneous real-time spatial data model and
transaction model.

Query optimization in real-time spatial Big Data 5
Figure 1 Real time spatial Big Data.
3.1 Heterogeneous real-time spatial data model
Real-time spatial data in RTSBD are from various sources. Thus, these data are maintained
under heterogeneous formats(text/video/…) and structures (structured/unstructured data).
In this context, we propose the use of ETL (Extract-Transform-Load) process for the
integration of data. ETL processes as follows:
Data extraction: extracts data from various data sources.
Data transformation: transforms the data for storing it in the proper format or
structure for the purposes of querying and analysis.
Data loading: loads it into the final target.
Data in real-time spatial Big Data arrive as stream data. They are continuous,
unbounded and come with high speed. Thus, a real-time spatial data has a deadline and
the ability to change their locations continuously. The validity of a real-time spatial data is
given by Mokbel et al. in (Mokbel et al., 2005) as follow :
validity (di) =validS (di)validT (di) (1)
where validS(d i)=1 if d idoes not change its location else validS(d i)=0 and validT(d i)= 1
if didoesn’t pass its deadline (validT(d i)= 1 if t current<tAV I else validT(d i)= 0) where
tcurrent denotes the current time of our system and t AV I is defined as follows:
tAV I=timestamp +validity (2)
di, as a real time spatial data, is valid when validity(d i)=1 (validS(d i)=1 and
validT(d i)= 1).

6
3.2 Transaction model
Real-time spatial transactions can be classified into two classes: update transactions and
user transactions.
Update transactions: update the location and the values of real-time spatial data in
order to reflect the state of the real world. These transactions are periodic.
User transactions (continuous queries): These transactions are aperiodic. They can
be executed several times or continuously during a period as required by the user.
In the real-time streaming Big Data, like in (Safaei et al., 2016), transactions
are considered as firm real-time where results provided after the deadline are
not profitable. These transactions are characterized by ACID(Atomicity, Consistency,
Isolation, Durability) properties.
4 Sliding window model in Real-Time Spatial Big Data
Today, mining real-time spatial data streams becomes a hot topic. Many applications
generate large amount of spatial data streams in real-time, such as sensor data generated
from sensor networks, disaster management and traffic management. The streaming nature
of real-time spatial data poses new challenges especially these data streams are continuous,
unbounded and come with high speed. Due to these characteristics of data streams, Chang
et al. , in (Chang et al., 2003) stated several requirements for data stream processing such
as:
1. each data of the input data stream should be examined at most once.
2. memory usage in the process of mining data streams should be limited.
3. each data should be processed as early as possible.
4. newly generated data should be processed as fast as possible.
5. the outputs should be instantly available when requested.
6. the errors of outputs should be removed as much as possible.
Most of the recent proposed stream mining methods use the sliding window model. Li
et al., in (Li et al., 2014), differentiated two models of sliding window. The first model is
called the transaction-based sliding window model where the size of the sliding window is
based on the transaction numbers. The second model is called the timestamp-based sliding
window model where the sliding window covers a fixed number of timestamps rather than
a fixed number of transactions.
Traditional stream mining methods used the transaction-based sliding window model.
This model supposes that the transactions arrive at a constant frequency, which is clearly
not suited for most real world situations. Thus, the timestamp-based sliding window model
is more suitable for these situations.
Another challenge of sliding window model is determining window size. In the classic
approaches, the window size is fixed by the user. But, the user is unable to determine the
prior knowledge about the time and scale of changes within the data stream especially with

Query optimization in real-time spatial Big Data 7
Figure 2 Sliding window model.
the unpredictable changing nature of data streams. Thus, this model can’t reflect the recent
changes and the performance of the system using ths model is degraded.
As a solution, several works relaxed the notion of window size and proposed new
algorithms with variable size window like (Deypir et al, 2012), (Chen et al., 2012) and
(Li et al., 2014). These algorithms are more suitable for observing recent changes. For

8
instance, in (Deypir et al, 2012), the window size is determined dynamically based on
amounts of concept change that occurs within the arriving data stream. In (Chen et al.,
2012), Chen et al. proposed a time decay model with a variable-size sliding window in
order to differentiate recent transactions from historic ones. The main problem of these two
algorithms is that the user, as a first step, is unable to fix the most adequate value of the
size of the sliding window.
Based on these observations, we propose the use of a timestamp-based sliding window
model with a variable size window. Thus, we use the same sliding window model in (Li et
al., 2014) as follow :
As a first step, the sliding window covers a fixed number of timestamps as shown in
figure 2(a).
As a second step, the obtained model is converted into a transaction-based sliding
window as shown in figure 2(b).
As shown in figure 2(a), the sliding window covers a fixed number of timestamps; we
assume that the window size is 4. The first sliding window contains 5 transactions which
arrived in the time segment [T 0,T4]. This model is , after that, converted into a transaction-
based sliding window model as shown in figure 2(b). The result is a varying-size sliding
window model.
5 Real-time data stream partitioning over sliding window in real-time spatial
Big Data
5.1 Motivation and Context
The demand of real-time spatial data has been increasing recently. Nowadays, we are
talking about a real-time spatial Big Data that process a large amount of data (may be
in the size of terabyte). As a result, the real-time spatial Big Data can be overloaded
and many transactions may miss their deadlines because data retrieval processes are time
consuming. In order to speed up query processing, several works have proposed many
optimization techniques as data partitioning. Data partitioning is a fragmentation of a
logical database into distinct independent units (Phansalkar et al., 2016). It is applied in
large-scale databases to improve responsiveness, scalability and availability of data.
Several surveys on data partitioning algorithm classify them into horizontal and vertical
data partitioning methods:
Horizontal partitioning (Ahirrao et al., 2015), (Curino et al., 2010), (Das et al., 2009),
(Bernstein et al., 2011) divides a table into disjoint sets of rows. There are three
techniques of horizontal partitioning based on values of data sets (Round-Robin
partition, Range partition and Hash partition). Range partitioning is the most popular
approach specially when there is a periodic loading of a new data.
Vertical partitioning (Cornell et al., 1990), (Navathe et al., 1984), (Navathe et al.,
1989), (Shraddha et al., 2014),(Son et al., 2004), (Zhao et al., 2015) divides a table
into vertical and disjoint sets of columns.
Both of these strategies (horizontal and vertical partitioning) have a significant impact
the performance of the database systems specially with respect to responsiveness, storage

Query optimization in real-time spatial Big Data 9
and processing cost. But, they still static : they can only deal with persistent and stable
workload and they don’t have the ability to partition for unknown database in real-time
spatial Big Data.
As a solution, Curino et al., in (Curino et al., 2010), proposed a new system called
Schism for database partitioning. This system uses a graph with data records as nodes and
edges. This graph is divided, after that, by the use of a method called METIS (Karypis et
al., 1995) which aims at minimizing the number of distributed transactions. In fact, Schism
has a significant impact the performance of the database systems. But it can’t deal with the
large volume of stream data and with large-scale dynamic queries.
In (Liroz-Gistau et al., 2012), Miguel Liroz-Gistau et al. proposed a dynamic workload-
based partitioning algorithm for continuously growing databases. This algorithm uses the
affinity of newly arrived data to produce a mathematical model. This algorithm is quite
interesting for dynamic environments. But, it is not able to get a real-time result after every
query.
In (Jindal et al., 2011), Alekh Jindal et al. presented a new algorithm called O2P(One-
dimensional Online Partitioning) algorithm for partitioning online datasets. This algorithm
begins by computing the affinity between every pair of attributes (Hoffer et al., 1975),
(Navathe et al., 1984), (Comer et al., 1987), (Cornell et al., 1990). Then, it calculates the
cost of every possible split line. The best partitioning scheme aims at minimizing this
cost. Actually, the importance of this approach appears clear. But, it must know the table
structure in advance which is not available in real-time spatial Big Data. Besides, it can’t
deal with stream queries and can’t get real-time result after every query.
To solve these two problems, Mengyu Guo et al. , in (Guo et al., 2016), proposed a
new system for stream partitioning called WSPS. WSPS can deal with stream data and
obtain real-time partitioning scheme by using a dynamic data model. But, it is used in
distributed environments where a query accessed attributes on different partitions and on
several nodes. This costs more resources and the transactions risk to miss their deadlines
while waiting for its validation.
5.2 A data partitioning approach for real-time Spatial Big Data
In this Section we describe our contribution. We propose a novel data partitioning approach
for real-time spatial Big data; the implementation of the Matching algorithm (Bhat et al.,
1976) for vertical partitioning. This algorithm uses Hamming distance to produce clusters.
This approach is divided into three steps that are detailed in the following sections:
Data model initialization
Implementation of Matching algorithm
Data Partitioning.
5.3 Data model initialization
Given a query workload Wtwhich is a stream of queries seen till time t
Wt=fq0;q1;q2;::;q tg.
Step 1 : Assuming that the query q accesses the attribute a, we begin by the definition
of the access function as follow:
Access(q,a)=
1qaccessa
0otherwise
(3)

10
Then, we define a matrix M. Rows in the matrix are the attributes accessed by query
q (0<i<t) in the workload Wtand columns are the queries. Each element in the matrix
M[i;j]= Access(qi;aj) wherei2[1; t],j2[1; m]andmis the number of attributes
accessed by tqueries.
Let us consider an example. Suppose that we have five queries accessing six attributes:
q1: SELECT a FROM T WHERE a = 10;
q2: SELECT b, f FROM T WHERE b = f;
q3: SELECT c, d FROM T WHERE a c;
q4: SELECT f FROM T WHERE f 100;
q5: SELECT e FROM T;
In this case,Wt=fq1;q2;q3;q4;q5gand
M=abcdef
q1 1 0 0 0 0 0
q2 0 1 0 0 0 1
q3 0 0 1 1 0 0
q4 0 0 0 0 0 1
q5 0 0 0 0 1 0
When the sliding window continues, some existing transactions are deleted from the
sliding window and some new transactions arrive. Thus, Mis dynamically updated at every
window. If a new query accesses to attributes already exist in M, only a new row will be
added on the end. If the query accesses to new attributes not exist in M, a new row will be
added on the end and new columns will be added to the matrix on the right. If an existing
query is deleted from the sliding window, the row of this query and the attributes acceded
only by this query have to be deleted.
5.3.1 Implementation of Matching algorithm
This algorithm is developed to reorganize data and to identify clusters (Bhat et al., 1976).
We start with mentioning the different steps of the Matching algorithm:
Step 1 : From anmxtmatrix array Mcompute the mxmarrayB=MTM
Step 2: Select one of the mrows ofMTMarbitrarily;
set i= 1.
Step 3: Select j = i + 1.
Step 4: Try placing the jthrow in each of the (i + 1) positions. Compute the sum
=Pm1
i=1bi;i+1.
Step 5: j = j + 1 and repeat Step 4 until j = m.
Step 6: Place the row k in the position where maximum value of is obtained , i + 1k
m,
Step 7: i = i + 1 and repeat steps 3,4, 5, 6 and 7 till i=m
We use the same matrix Min our previous example and we apply the different steps
of the Matching algorithm as follow:

Query optimization in real-time spatial Big Data 11
B=q1q2q3q4q5
a1 0 0 0 0
b0 1 0 0 0
c0 0 1 0 0
d0 0 1 0 0
e0 0 0 0 1
f0 1 0 1 0abcdef
q1 1 0 0 0 0 0
q2 0 1 0 0 0 1
q3 0 0 1 1 0 0
q4 0 0 0 0 0 1
q5 0 0 0 0 1 0=abcdef
a1 0 0 0 0 0
b0 1 0 0 0 1
c0 0 1 1 0 0
d0 0 1 1 0 0
e0 0 0 0 1 0
f0 1 0 0 0 2
Initially,=P5
i=1bi;i+1=1
The final reordering given through the application of the algorithm is:
B=dcf bae
d1 1 0 0 0 0
c1 1 0 0 0 0
f0 0 2 1 0 0
b0 0 1 1 0 0
a0 0 0 0 1 0
e0 0 0 0 0 1=P5
i=1bi;i+1=2
The optimal attribute sequence Oas=fd;c;f;b;a;eg. Every time a new query comes,
the matrixMis calculated, then the new OaS is dynamically created.
5.3.2 Data Partitioning
The main objective of vertical partitioning approach in real-time spatial Big Data is
to improve the performance of query execution and the system throughput. The high
performance of query execution is related to minimizing the access cost of data partitions.
Especially that the frequency of accessing data on different partitions is a major factor to
affect the query execution cost. Thus, it is very important to minimize this frequency for
the high performance of query execution.
The improvement of the system throughput can be achieved by maximizing the degree
of parallel execution. We can improve this degree if we can minimize the frequency of
interfered accesses between data queries.
As a result, we can define the cost model that reflects both objectives of vertical
partitioning mentioned above as follow:
Cost(qi;P(WT;Oast )) =jP
L(qi)L0( C(qi) +I(qi))jL0jP
L(qi)LL0( C(qi) +I(qi))jLL0jj
where:
P(Wt;Oas t)is a partitioning scheme over OaS of workload Won the timet.
L(qi)is a collection of attributes the query qvisited.
A partition line split the OaS into two sets L’ and L-L’.
C(qi)is the access number of qi.
I(qi)is the interfered access number of qi.
 is a proportional constant between C(qi)andI(qi), >1.
Our objective is to minimize the execution cost.

12
6 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big
Data using window sliding techniques
6.1 Motivation and Context
Li and Lee ,in (Li and Lee, 2009), proposed the MFI-TransSW/MFI-TimeSW algorithm
for mining frequent Itemsets within a sliding window. MFI-TransSW ( Mining Frequent
Itemsets within a Trans action-sensitive Sliding Window) uses the transaction-sensitive
sliding window in which sliding window is composed of a fixed number of transactions
whereas MFI-TimeSW ( Mining Frequent Itemsets within a Time -sensitive Sliding
Window) uses a time-sensitive sliding window in which window consists of a fixed number
of time units. The basic idea of these two algorithms is as follow: as a first step, it scans
the initial sliding window. Then, it differentiates frequent itemsets from infrequent items.
When the sliding window continues, it retrieves the existing itemsets and incrementally
compute their new supports which is used after that for pruning infrequent itemsets or
generating new frequent itemsets. These algorithms have a significant impact for the
performance of the database systems. But they can’t deal with the large volume of stream
data and with large-scale dynamic queries.
Memar et al., in (Memar et al, 2011), proposed an other algorithm named MFI-CBSW
(Mining Frequent Itemset within Circular Block Sliding Window) which uses a blocked
bit-sequence representation of itemsets. In this algorithm, the sliding window is composed
of a sequence of blocks. Each block consists of a fixed number of recent transactions. When
the sliding window continues, the oldest block is deleted and a new block containing newly
generated transactions is added to the window. The main limitation of this algorithm is that
it has to construct a new tree after every new transaction which is not suitable with a huge
amount of transactions and data.
To solve the problem associated with a huge amount of transactions, Deypir et al.,
in (Deypir et al., 2011), proposed a new algorithm named VSW ( Variable Size sliding
Window frequent itemset mining). This algorithm introduces the concept change for
observing recent changes in the set of frequent itemsets over data streams. When the
concept change occurs, an old concept has to be removed and a new concept is append.
The window size, initially, is given by user. Then, it is determined dynamically based on
amounts of concept change. Thus, the amount of change in the set of frequent patterns
is continuously monitored. The window is reduced only if the new transactions change
the set of frequent itemsts. Otherwise, the window is expanded. The main problem of this
algorithm is that the user is unable to fix, initially, the most adequate value of the size of
the sliding window. Besides, removing stale information from the window cost a big space
of memory. Thus, the performance of the system is degraded.
Chen et al., in (Chen et al., 2012), proposed a time decay model with a variable-
size sliding window. This model uses a novel time decay model to reflect the temporal
characteristics of streams. Then, a data structure called SWP-Tree (Sliding Window Pattern
Tree ) is developed. As an FP-tree-like structure, SWP-Tree mines the frequent itemsets
in a varying-size sliding window. During the process of mining, the selection of the decay
factor is dynamically and the size of sliding window changes automatically. The main
problem of this model is that the time decay model cannot find a perfect value of the decay
factor which can change every time in real world applications. Besides, in every SWP-Tree
update, the support of all itemsets have to be computed individually.

Query optimization in real-time spatial Big Data 13
Li et al., in (Li et al., 2014), proposed an algorithm called FIMoTS ( Frequent Itemset
Mining overTime-sensitive Streams). As a first step, this algorithm uses a timestamp-
based sliding window model, which is converted, after that into a transaction-based sliding
window. Then, an extended enumeration tree is developed. By the introduction of two types
of type transforming bound (Type Transforming Upper Bound and Type Transforming
Lower Bound), the itemsets are classified into categories. According to its category,
the itemset can be deferred or ignored. The main drawback of this algorithm is that
an infrequent itemset which can be deferred can be frequent when the sliding window
continues. But as a real-time application, this itemset can be absolute. Thus, this algorithm
has good merits especially with huge amount of stream data but it is not suitable for real-
time applications.
In (Deng et al., 2015), Deng et al. proposed PrePost* as a high-performance algorithm
for mining frequent itemsets. This algorithm employs N-list to represent itemsets and
directly discovers frequent itemsets using a set-enumeration search tree. N-list is a novel
data structure proposed nowadays (Deng et al., 2015). It is very efficient for mining
frequent itemsets. Thus, PrePost* has proven good results in databases. But, it is not
suitable for a huge volume of stream data.
Table 2 Frequent itemset mining algorithms comparison
MFI-
TransSWMFI-Time
SWMFI-CBSW VSW SWP FIMots PrePost*
Sliding Size Fix Fix Variabe Variabe Variabe Variabe –
window Mo
delTransaction-
sensitiveTime-
sensitiveTransaction-
sensitiveTransaction-
sensitiveTransaction-
sensitiveHybrid model –
Real-time
processingNo No No No No No Yes
Big Data
processingNo No No No No Yes No
Stream data
processingYes Yes Yes Yes Yes Yes No
Real-time
spatial Big
Data
processingNo No No No No No No
As we show in Table 2, there is no algorithm which is suitable for mining frequent
itemset in real-time spatial Big Data. Thus, in our contribution, we aim to find a solution
by benefiting from the techniques in the previous algorithms like hybrid model and the
concept of change in FIMots which help this algorithm to manage a huge amount of data
and the real-tme processing in Prepost*.
6.2 Frequent Itemset Mining in Real-Time Spatial Big Data
In this Section we describe our approach. We begin firstly by introducing some
preliminaries.

14
6.2.1 Preliminaries
Given a set of distinct items I= fi1, i2, … , i mgand a time ordered stream of transactions DS
=fT1, T2, … , T ng. Each transaction T ihas a unique identifier named Tidiand contains
an itemset X ( XTi), which is a subset of I.
Definition 1: An itemset is a subset of items of I. An itemset is characterized by the
number of items that constitutes it. For example, N-itemset is an itemset that contains N
items.
Definition 2: The support of X (sup(X)) (X is an item or an itemset) is a percentage of
transactions containing X. X is frequent if sup(X)s, where sis a relative minimum
support given by the user. For example, we assume that X is frequent if half of the
transactions in the sliding window cover it.
6.2.2 PrePost*-RTSBD: The proposed method
Our contribution is an improvement of the existing algorithm PrePost* (Deng et al.,
2015)for mining frequent itemsets from real-time spatial Big Data. The new algorithm is
called PrePost*-RTSBD.
This approach is divided into five steps that are detailed in the following sections:
Scan DS to obtain the set of all frequent 1-itemset, and build the Modified-PPC-tree;
Scan the Modified-PPC-tree to generate the N-lists of frequent 1- itemset;
Scan Modified-PPC-tree to find all frequent 2-itemsets;
Discovery all frequent k-itemsets (k >2) by the use of the set-enumeration tree.
Updating Modified-PPC-tree
a-Initial Modified-PPC-tree
As a first step, we scan DS to obtain the set of frequent 1-itemset by the use of our relative
minimum support . Then, we built the modified-PPC-tree, which is inspired by the PPC-
tree, as follow:
we first generate a root node, which denotes an empty itemset, and then generate
nchildren nodes, which denote ndistinct frequent 1-itemset.
Each node is characterized by seven fields:
1.name : denotes which item this node represents;
2.count : denotes the number of transactions presented by the portion of the path
reaching this node.
3.children : denotes all children of the node
4.pre-order : denotes the pre-order rank of the node
5.post-order : denotes the post-order rank of the node.
6.Type Transforming Upper Bound (Li et al.,2014): denotes the number of
arriving transactions that may result in a change the type of data from
infrequent to frequent or from frequent to infrequent

Query optimization in real-time spatial Big Data 15
7.Type Transforming Lower Bound (Li et al.,2014): denotes the number of
leaving transactions that may result in a change the type of data from infrequent
to frequent or from frequent to infrequent
Let’s examine the following example.
Example 1: Consider the same data stream in figure 3. DScontains five transactions as
shown as follow:
item count
a 3
b 3
c 3
d 3Tid item Ordered frequent items
1 abc abc
2 bd bd
3 ad ad
4 ac ac
5 bcd bcd
Figure 3 Modified-PPC-tree.
When the relative minimum support is1
2,a,b,canddare frequent 1-itemset. As shown
in figure 3, The node awith (1, 1) means that its pre-order is 1, post-order is 1, the item-
name is a, and count is 3. The lbandubarray store the lower bound and the upper bound
respectively. a,b,canddhave the same upper bound, 2, and the lower bound, 2, which
means that only two new transactions arriving or two existing transactions leaving may
result in a change from frequent to infrequent for a,b,candd.
b-Generation of the N-lists of frequent 1-itemset
Similar to PPC-tree, each node in modified-PPC-tree has a PP-code presented as shown : <
(preorder; postorder ) :count> . The N-list of a frequent 1-itemset is a sequence
of all the PP-codes of nodes registering the item in the modified-PPC-tree.
Example 2: The N-list of modified-PPC-tree presented in the example 1 is as follow:
a!<(1;2) : 3>
b!<(2;1) : 1>:::< (6;8) : 2>

16
c!<(3;0) : 1>:::< (4;3) : 1>:::< (7;6) : 1>
d!<(5;4) : 1>:::< (8;5) : 1>:::< (9;7) : 1>
c-Discovery all frequent 2-itemsets
The N-list of frequent 2-itemset is generated by intersecting the N-lists of two frequent
1-itemset. If i1, a frequent 1-itemset with a PP-code <(x1;y1) :z1>, is an ancestor of i2
a frequent 1-itemset with PP-code <(x2;y2) :z2>theni1i2is a frequent 2-itemset with
a PP-code<(x1;y1) :z2>. Then we check all nodes with the form of <(x1;y1) :z2>
and<(x1;y1) :z3>and merge them to get a new node <(x1;y1) :z2+z3>.
d-Discovery all frequent k-itemsets (k >2)
Our proposed algorithm adopts a set-enumeration tree (Burdick et al., 2005) to represent
the search space.
Figure 4 A set-enumeration tree example.
A set-enumeration tree, as shown in figure 4, can be constructed as follows:
Step 1: create the root of the tree.
Step 2: add all frequent 1-itemsets as children of the root ( suppose that the number
of frequent 1-itemsets is mand each child ihas an order named order i)
Step 3: for each chid iof the root add m-order ifrequent 1-itemsets
Step 4: For each leaf node, repeat step 3 until order i= m.
e-Updating Modified-PPC-tree
When the sliding window continues, some existing transactions are deleted from the sliding
window and some new transactions arrive. Thus, many items may change their types from
frequent to infrequent as follow:
When existing mtransactions are deleted from the sliding window, an itemset i
change the type if its type transforming upper bound is not larger than m.

Query optimization in real-time spatial Big Data 17
When the new ntransactions arrive, an itemset jchange the type if its type
transforming upper bound is larger than n.
If one of the nodes changes from frequent to infrequent, all its descendent are deleted. By
the use of the type transforming bound, itemsets are classified into categories and only
those reaching a re-computed threshold need to be reclassified and other itemset processing
can be deferred or ignored. We employ a pruning strategy to remove infrequent which
results an optimized data processing.
Example 3: In this example, we use transactions found in the second sliding window.
item count
a 3
b 1
c 3
d 3
e 1
f 1Tid item Ordered frequent items
3 ad ad
4 ac ac
5 bcd cd
6 acd acd
Figure 5 Modified-PPC-tree.
As shown in figure 5, bbecomes an infrequent item. Thus, all its descendants are
pruned.
Once the modified-PPC-tree is ready, the rest of the steps is the same that at the first
sliding window.

18
7 Simulation Results
7.1 Simulation model
In order to access the performance of real-time analytics on big data, new several systems
have appeared. Well-known systems and prototypes include: Hadoop Online, Storm,
Flume, Kafka and S4. But, these systems lacks the most important database properties
ACID (Atomicity, Consistency, Isolation, Durability) and data warehousing without the
ACID requirement in place within a given system, reliability is suspect. Databases with
ACID properties are guaranteed to achieve successful database transactions. Meanwhile,
we focus on interactive analytic in a data warehouse, rather than continuous analytic over
streams. Thus, we have implemented a simulator in Java which describes the architecture
FCSA-RTSBD (Feedback Control Scheduling Architecture for Real-Time Spatial Big
Data) (Hamdi et al., 2015) with two extensions containing respectively a partitioning
approach by the implementation of our algorithm VPA-RTSBD and a mining frequent item
approach by the implementation of our algorithm PrePost*-RTSBD as shown in figure 6.
Figure 6 Simulation model.
FCSA-RTSBD exploits the foundations of the architecture of Aurora (Bellavista et al.,
2009). It is a data-flow system that process incoming stream according to the requirements
of the applications. It includes a set of operators for satisfying the stream processing
requirements. Each operator consumes data input, performs operations, and produces
results in a continuous manner. Among these operators we can note: windowed operators.
FCSA-RTSBD can process continual queries in real-time processing according to QoS
specifications.
In this model, a transaction T iis associated with a deadline D i, period P i, start time R i,
end time E iand Execution Time Estimation ETE i. Update transactions arrive periodically

Query optimization in real-time spatial Big Data 19
and the arrival of user transactions is defined using the Poisson distribution given by the
following formula:
Fx(t) =
etx>0
0otherwise
(5)
Tiis continually evaluated for stream data belonging to a window whose size is defined
by either the period P ior number of the data received most recently.
Real-time spatial transactions have scheduled transactions, according to the Earliest
Deadline First (EDF) algorithm. Transaction handler consists of a concurrency controller
(CC) by the use of the algorithm SCC-2S-P-IC (Hamdi et al., 2017), a freshness
manager (FM) and a basic scheduler. A transaction can be aborted and restarted by CC.
Freshness manager (FM) checks the freshness of real-time data before the initiation of
a user transaction. If the accessing data is currently stale, FM blocks the corresponding
transaction will be transferred from the block queue to the ready queue as soon as the
corresponding data is put up to date.
Simulation results are measured by the monitor periodically. Miss ratio and precision
control compute the miss ratio and utilization control signals based on the obtained results.
7.2 Experiments
7.2.1 Effectiveness of the Data partitioning Approach for Real-Time Spatial Big
Experiment 1: Partition Time Evaluating
We compare the partition time of VPARTSBD withWSPS ,O2PandSchism . We
use 5 workloads of size 10M (6000 queries), 100M (60,000 queries), 500M (3 million
queries), 1G(6 million queries) and 2G (12 million queries) of TPC-DS (TPC-DS, 2014);
a decision support benchmarka for comparing big data processing systems, containing 25
tables, 429 columns and 99 query templates.
Figure 7 Partition Time of TPC-DS.
By analyzing the result in figure 7, we can find, firstly, that when the workload size is
increasing, the partition time is increasing also for all algorithms. In other hand, although

20
VPARTSBD keeps a query window which means partitioning is done after every N
queries contrarily WSPS partitioning is done after every query, VPARTSBD and
WSPS have the same computing complexity and the partition time of our approach is
significantly lower than WSPS .
Schism and O2Pcan’t deal with the large volume of stream data and with large-scale
dynamic queries. So, they have the worst partition time.
Experiment 2: High-throughput Adaption
We use a workload size of 500M and we generate data at different rates (from 0.5G/s to
Figure 8 High-throughput Control.
5G/s). The objective of this experiment is to evaluate the ability of the high-throughput
adaption. The result is as shown in figure 8.
By analyzing the result, we can find that the rate of generating data affect the partition
time for both algorithms WSPS andVPARTSBD . But our approach has the ability
to adapt to high-throughput better than WSPS . So, the importance of VPARTSBD
appears clear; it can deal well when facing with large-scale stream queries.
Experiment 3: Total running time evaluating
Figure 9 presents the total running time of our simulator on all 20 queries of the benchmark
Figure 9 Evaluation on all queries using 1TB data.
TPC-DS with a dataset size fixed to 1 TBytes.

Query optimization in real-time spatial Big Data 21
On all queries, FCSA-RTSBD with partitioning approach outperforms FCSA-RTSBD
with partitioning approach for all types of queries. The importance of our partitioning
approach appears clear because partitioning algorithm improves responsiveness, scalability
and availability of data.
Experiment 4: Success ratio evaluating
Figure 10 Success ratio evaluating.
Figure 10 shows that If we increase the number of accepted transactions in the system,
the number of validated transactions is increasing also. Moreover, the number of valid
transactions (user and update) using our partitioning approach is the best. This is explained
by the fact that our approach maximizes the degree of parallel execution. Thus this policy
allows a large number of transactions to complete their execution before achieving their
deadlines.
7.2.2 Effectiveness of the Frequent Itemset Mining Approach over Real-Time
Spatial Big
Similar to the previous studies of frequent itemset mining, we use three real datasets for
evaluating the performance of our contribution:
The MUSHROOM dataset lists nearly 120 characteristics of some mushroom
species.
TheCONNECT dataset includes game state information.
TheKOSARAK dataset contains web click-stream data of an on-line news portal.
Table 3 shows the characteristics of these datasets:

22
Table 3 The characteristics of datasets.
Dataset number Trans number Items Size
Mushroom 8124 120 0.83
Connect 67,557 130 0,362
Kosarak 990,002 36,841 12,2
Experiment 1: Comparison of runtime
Figure 11 shows the comparison of our algorithms and the FIMoTS, SWP-Tree and MFI-
CBSW algorithms.
Figure 11 Running time cost for window size.
By analyzing the result in Figure 11, we can find that when the window size is
increasing, the runtime decreases which means that the increase of the data size may result
in a decreased number of frequent itemsets over different datasets when the minimum

Query optimization in real-time spatial Big Data 23
support is employed by the different algorithms. Besides, it appears clear that PrePos*-
RTSBD always outperforms FIMoTS, SWP-Tree and MFI-CBSW algorithms with any
size of the sliding window. In fact, Deng et al., in (Deng et al., 2012), proves that N-
list intersection is more efficient than the construction of conditional pattern base and
conditional FP-tree. Thus, our PrePos*-RTSBD avoids the time consuming process of
constructing the FP-tree by simply N-list intersection, contrarily with SWP-tree which
is inspired by the FP-tree mining frequent itemsets. In addition, the use of the type
transforming bound facilitates the division of nodes dynamically into various categories
which reduces also the time consuming process.
Figure 12 Running time cost for minimum support.

24
Figure 12 shows that the runtime cost of all algorithms reduced linearly with an
increase of the minimum support. Then, the results prove that the runtime cost of PrePos*-
RTSBD is much lower than runtime cost of other algorithms especially with high values
of minimum support because FIMoTS, SWP-Tree and MFI-CBSW did not address the
bottleneck, contrarily to PrePos*-RTSBD which benefits of the feedback control loop in
FSCA-RTSBD that helps to stabilize the system.
Experiment 2: Comparison of memory consumption
Figure 13 Memory consumption.
As shown in figure 13, PrePos*-RTSBD consumes more memory than other
algorithms.
In fact, PrePos*-RTSBD has to maintain a modified-PPC-tree which is bigger than
the corresponding FP-tree since the former contains additional information which is the
code of each node and the type transforming bounds. Thereafter, PrePos*-RTSBD presents
all 2-itemsets which cost more memory because the number of 2-itemsets is usually very
large.

Query optimization in real-time spatial Big Data 25
8 Conclusion
The demand for real-time spatial data has been increasing recently. Nowadays, we are
talking about a real-time spatial Big Data that process a large amount of data. In these
applications, it is desirable to execute transactions within their deadlines using a real-time
spatial data. But the real-time spatial Big Data can be overloaded and many transactions
may miss their deadlines, or real-time spatial data can be violated. Thus, our research has
twofold objectives: to avoid the overloading of the system and to improve performance in
real-time spatial Big data. This enhances the quality of service (QoS) in real-time spatial
Big Data.
As a first contribution, we have researched on the limitations of traditional partitioning
technologies. Then, we have proposed VPARTSBD a novel approach to process
stream queries in real-time spatial Big Data. This contribution is an implementation of
the Matching algorithm for traditional vertical partitioning. It uses Hamming distance to
produce clusters. VPARTSBD is divided into three steps : first, we find automatically
the optimal attribute sequence by the use of Matching algorithm. Secondly, we keep the
data amount of each partition more balanced limit by the use of a cost model. Finally, we
provide a parallel execution guarantees for the most frequent queries.
Then, as a second contribution, we have proposed an efficient frequent itemset mining
algorithm over Real-Time Spatial Big Data. We call this algorithm PrePos*-RTSBD. It
is an improvement of the original algorithm PrePos*. PrePos*-RTSBD employs N-list
to represent itemsets and directly discovers frequent itemsets using a set-enumeration
search tree. Then, it uses the type transforming bound to facilitate the division of nodes
dynamically into various categories.
A simulation study is shown to prove, on the one hand, that VPARTSBD has
a good results in terms of success ratio, high-throughput adaption and total running time
compared to WSPS ,O2PandSchism . The importance of our partitioning approach
appears clear because partitioning algorithm improves responsiveness, scalability and
availability of data. This affects QoS improvement in real-time spatial Big Data especially
with a huge number of data and transactions. On the other hand, experiments prove that
PrePos*-RTSBD can achieve a significant performance improvement in terms of runtime
cost compared to FIMoTS, SWP-Tree and MFI-CBSW. But, it consumes more memory
than other algorithms.
As follow, we have to extend PrePos*-RTSBD in order to consume less memory
especially that data is growing exponentially. Then , we have to find more policies for
QoS improvement in a large-scale real-time spatial data. The most important requirements
for these data structures are the ability of providing fast access to the large volumes
of data. Thus, we shall find new techniques for the data indexing. Another future work
consists of relaxing transaction real-time constraints (ACID) by allowing the loss of some
invocations..
References and Notes
Agrawal, S., Narasayya, V ., Yang, B. (2004, June). Integrating vertical and horizontal partitioning
into automated physical database design. In Proceedings of the 2004 ACM SIGMOD international
conference on Management of data, pp. 359-370. ACM.

26
Ahirrao, S., Ingle, R. (2015). Scalable transactions in cloud data stores. Journal of Cloud Computing,
vol. 4, no. 1, 21.
Bhat, M. V ., Haupt, A. (1976). An efficient clustering algorithm. IEEE Transactions on Systems,
Man, and Cybernetics, no. 1, pp. 61-64.
Bai, L., Lin, Z. and Xu, C. (2016) ’Spatiotemporal operations on spatiotemporal XML data using
XQuery’, in 2016 12th International Conference on Natural Computation, Fuzzy Systems and
Knowledge Discovery (ICNC-FSKD) , IEEE, August, pp.1278-1282.
Bellavista, P., Saracco, R. (2009). Telecommunication Systems and Technologies , Encyclopedia of
Life Support Systems (EOLSS), vol. 1.
Bernstein, P. A., Cseri, I., Dani, N., Ellis, N., Kalhan, A., Kakivaya, G., … Talius, T. (2011, April).
Adapting microsoft SQL server for cloud computing. In Data Engineering (ICDE), 2011 IEEE
27th International Conference on, pp. 1255-1263. IEEE.
Chang, J. H., Lee, W. S. (2003, August). Finding recent frequent itemsets adaptively over online
data streams. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge
discovery and data mining, pp. 487-492. ACM.
Chen, H., Shu, L., Xia, J., Deng, Q. (2012). Mining frequent patterns in a varying-size sliding
window of online transactional data streams. Information Sciences, 215, 15-36.
Chu, W. W., Ieong, I. T. (1993). A transaction-based approach to vertical partitioning for relational
database systems. IEEE Transactions on Software Engineering, vol. 19, no. 8, pp. 804-812.
Coleman, J. R., Sullivan, A. L. (1996). A real-time computer application for the prediction of fire
spread across the Australian landscape. Simulation, vol. 67, no 4, pp. 230-240.
Cova, T. J. (1999). GIS in emergency management. Geographical information systems , vol. 2, pp.
845-858.
Comer, D. W., Philip, S. Y . (1987, February). A vertical partitioning algorithm for relational
databases. In Data Engineering, 1987 IEEE Third International Conference on, pp. 30-35. IEEE.
Cornell, D. W., Yu, P. S. (1990). An effective approach to vertical partitioning for physical design of
relational databases. IEEE Transactions on Software engineering, vol. 16, no 2, pp. 248-258..
Curbera, F., et al., 2005. Web services platform architecture: SOAP, WSDL, WS-policy, WS-
addressing, WS-BPEL, WS-reliable messaging and more. Englewood Cliffs: Prentice Hall.
Curino, C., Jones, E., Zhang, Y ., Madden, S. (2010). Schism: a workload-driven approach to database
replication and partitioning. Proceedings of the VLDB Endowment, vol. 3, no 1-2, pp. 48-57..
Das, S., El Abbadi, A., Agrawal, D. (2009). ElasTraS: An Elastic Transactional Data Store in the
Cloud. HotCloud, vol. 9, pp. 131-142..
Deng, Z., Wang, Z., Jiang, J. (2012). A new algorithm for fast mining frequent itemsets using N-lists.
Science China Information Sciences, vol. 55, no 9, pp. 2008-2030.
Deng, Z. H., Lv, S. L. (2015). PrePost+: An efficient N-lists-based algorithm for mining frequent
itemsets via Children-Parent Equivalence pruning. Expert Systems with Applications, vol. 42, no
13, pp. 5424-5432..
Deypir, M., Sadreddini, M. H., Hashemi, S. (2012). Towards a variable size sliding window model
for frequent itemset mining over data streams. Computers industrial engineering, vol. 63, no 1,
pp. 161-172.

Query optimization in real-time spatial Big Data 27
Drummond, J. E. (1991). Determining and processing quality parameters in geographic information
systems (Doctoral dissertation, University of Newcastle upon Tyne).
Guidan, F., Shaohong, Y . (2013, January). A frequent itemsets mining algorithm based on matrix
in sliding window over data streams. In Intelligent System Design and Engineering Applications
(ISDEA), 2013 Third International Conference on, pp. 66-69. IEEE.
Guo, M., Kang, H. (2016, September). The Implementation of Database Partitioning Based on
Streaming Framework. In Web Information Systems and Applications Conference, 2016 13th, pp.
157-162. IEEE.
Hadjieleftheriou, M., Kollios, G., Gunopulos, D., Tsotras, V . (2003). On-line discovery of dense
areas in spatio-temporal databases. Advances in Spatial and Temporal Databases, pp. 306-324.
Hamdi, S., Bouazizi, E., Faiz, S. (2015, July). A new QoS Management Approach in real-time GIS
with heterogeneous real-time geospatial data using a feedback control scheduling. In Proceedings
of the 19th International Database Engineering and Applications Symposium, pp. 174-179. ACM.
Hamdi, S., Bouazizi, E., and Faiz, S. (2017, October). A Speculative Concurrency Control in Real-
Time Spatial Big Data Using Real-Time Nested Spatial Transactions and Imprecise Computation.
In Computer Systems and Applications (AICCSA), 2017 IEEE/ACS 14th International Conference
on, pp. 534-540. IEEE
Hammer, M., and Niamir, B. (1979, May). A heuristic approach to attribute partitioning. In
Proceedings of the 1979 ACM SIGMOD international conference on Management of data, pp.
93-101. ACM.
Heuvelink, G.B.M. (1998). Error Propagation in Environmental Modelling with GIS. Taylor and
Francis, London.
Hoffer, J. A., and Severance, D. G. (1975, September). The use of cluster analysis in physical data
base design. In Proceedings of the 1st International Conference on Very Large Data Bases, pp.
69-86. ACM.
Huang, C. Y ., and Liang, S. H. (2013). LOST-Tree: A spatio-temporal structure for efficient sensor
data loading in a sensor web browser. International Journal of Geographical Information Science,
vol. 27, no 6, pp. 1190-1209.
Iwerks, G. S., Samet, H., and Smith, K. (2003, September). Continuous k-nearest neighbor queries
for continuously moving points with updates. In Proceedings of the 29th international conference
on Very large data bases-V olume 29, pp. 512-523. VLDB Endowment.
Jindal, A., and Dittrich, J. (2011, September). Relax and let the database do the partitioning online. In
International Workshop on Business Intelligence for the Real-Time Enterprise, pp. 65-80. Springer,
Berlin, Heidelberg.
Kalepu, S., Krishnaswamy, S., and Loke, S.W., 2003. Verity: a QoS metric for selecting web services
and providers. In: Proceedings of Web Information Systems Engineering Workshops, pp. 131-139.
Karypis, G., and Kumar, V . (1995). METIS–unstructured graph partitioning and sparse matrix
ordering system, version 2.0.
Kiran, M., Murphy, P., Monga, I., Dugan, J., and Baveja, S. S. (2015, October). Lambda architecture
for cost-effective batch and speed big data processing. In Big Data (Big Data), 2015 IEEE
International Conference on, pp. 2785-2792. IEEE.
Kwon, D., Lee, S., and Lee, S. (2002, January). Indexing the current positions of moving objects
using the lazy update R-tree. In Mobile Data Management, 2002. Proceedings. Third International
Conference on, pp. 113-120. IEEE.

28
Lee, M. L., Hsu, W., Jensen, C. S., Cui, B., and Teo, K. L. (2003, September). Supporting frequent
updates in r-trees: A bottom-up approach. In Proceedings of the 29th international conference on
Very large data bases-V olume 29, pp. 608-619. VLDB Endowment.
Leng, F., Bao, Y ., Yu, G., Shi, J., and Cai, X. (2011, September). Requirement-based query and update
scheduling in real-time data warehouses. In International Conference on Web-Age Information
Management, pp. 379-389. Springer, Berlin, Heidelberg.
Lepofsky, M., Abkowitz, M., and Cheng, P. (1995). Transportation hazard analysis in an integrated
GIS environment. In Computer Supported Risk Management (pp. 115-131). Springer Netherlands.
Li, H. F., and Lee, S. Y . (2009). Mining frequent itemsets over data streams using efficient window
sliding techniques. Expert systems with applications, vol. 36, no 2, pp. 1466-1477.
Li, H., Zhang, N., Zhu, J., Cao, H., and Wang, Y . (2014). Efficient frequent itemset mining methods
over time-sensitive streams. Knowledge-Based Systems, vol. 56, pp. 281-298.
Lin, X., Orlowska, M., and Zhang, Y . (1993). A graph based cluster approach for vertical partitioning
in database design. Data and Knowledge Engineering, vol. 11, no 2, pp. 151-169.
Liroz-Gistau, M., Akbarinia, R., Pacitti, E., Porto, F., and Valduriez, P. (2012, September). Dynamic
workload-based partitioning for large-scale databases. In International Conference on Database
and Expert Systems Applications, pp. 183-190. Springer, Berlin, Heidelberg.
Lodha, R., Jan, H., ands Kurup, L. (2014). Big data challenges: Data analysis perspective. Int J
Current Eng Technol, vol. 4, no 5, pp. 3286-3289..
Loukides, M. (2012). Data science and data tools. In Big Data Now, chapter 1. O’Reilly.
Mani, A., and Nagarajan, A. (2005). Understanding quality of service for web services, 2002.
[Online] https://www. ibm. com/developerworks/library/ws-quality.
Memar, M., Deypir, M., Sadreddini, M. H., and Fakhrahmad, S. M. (2011, July). A block-based
approach for frequent itemset mining over data streams. In Fuzzy Systems and Knowledge
Discovery (FSKD), 2011 Eighth International Conference on (V ol. 3, pp. 1647-1651). IEEE.
Mokbel, M. F., Xiong, X., and Aref, W. G. (2004, June). SINA: Scalable incremental processing
of continuous queries in spatio-temporal databases. In Proceedings of the 2004 ACM SIGMOD
international conference on Management of data, pp. 623-634. ACM.
Mokbel, M. F., Xiong, X., Aref, W. G., Hambrusch, S. E., Prabhakar, S., and Hammad, M. A.
(2004, August). PALACE: a query processor for handling real-time spatio-temporal data streams.
In Proceedings of the Thirtieth international conference on Very large data bases-V olume 30, pp.
1377-1380. VLDB Endowment.
Navathe, S., Ceri, S., Wiederhold, G., and Dou, J. (1984). Vertical partitioning algorithms for
database design. ACM Transactions on Database Systems (TODS), vol. 9, no 4, pp. 680-710.
Navathe, S. B., and Ra, M. (1989, June). Vertical partitioning for database design: a graphical
algorithm. In ACM Sigmod Record (V ol. 18, No. 2, pp. 440-450). ACM.
Papadomanolakis, S., and Ailamaki, A. (2007, April). An integer linear programming approach to
database design. In Data Engineering Workshop, 2007 IEEE 23rd International Conference on (pp.
442-449). IEEE.
Phansalkar, S., and Ahirrao, S. (2016, December). Survey of data partitioning algorithms for big data
stores. In Parallel, Distributed and Grid Computing (PDGC), 2016 Fourth International Conference
on (pp. 163-168). IEEE.

Query optimization in real-time spatial Big Data 29
Prabhakar, S., Xia, Y ., Kalashnikov, D. V ., Aref, W. G., and Hambrusch, S. E. (2002). Query indexing
and velocity constrained indexing: Scalable techniques for continuous queries on moving objects.
IEEE Transactions on Computers, vol. 51, no 10, pp. 1124-1140..
Ramamritham, K., Son, S. H., and Dipippo, L. C. (2004). Real-time databases and data services.
Real-time systems, vol. 28, no 2-3, pp. 179-215.
Rodrguez, L., and Li, X. (2011, October). A support-based vertical partitioning method for database
design. In Electrical Engineering Computing Science and Automatic Control (CCE), 2011 8th
International Conference on (pp. 1-6). IEEE.
Salmon, L., and Ray, C. (2017). Design principles of a stream-based framework for mobility analysis.
GeoInformatica, vol. 21, no 2, pp. 237-261..
Saltenis, S., Jensen, C. S., Leutenegger, S. T., and Lopez, M. A. (2000). Indexing the positions of
continuously moving objects (vol. 29, no. 2, pp. 331-342). ACM.
Schafer, R. P., Thiessenhusen, K. U., Brockfeld, E., and Wagner, P. (2002). A traffic information
system by means of real-time floating-car data. In Proc. ITS World Congress, Chicago USA.
Shraddha Phansalkar, D. A. (2014). Transaction aware vertical partitioning of database (TA VPD) for
responsive OLTP applications in cloud data stores. Journal of Theoretical and Applied Information
Technology, vol. 59, no 1.
Son, J. H., and Kim, M. H. (2004). An adaptable vertical partitioning method in distributed systems.
Journal of Systems and Software, vol. 73, no 3, pp. 551-561.
Tao, Y ., Papadias, D., and Shen, Q. (2002, August). Continuous nearest neighbor search. In
Proceedings of the 28th international conference on Very Large Data Bases (pp. 287-298). VLDB
Endowment.
Tao, Y ., Papadias, D., and Sun, J. (2003, September). The TPR*-tree: an optimized spatio-temporal
access method for predictive queries. In Proceedings of the 29th international conference on Very
large data bases-V olume 29 (pp. 790-801). VLDB Endowment.
TPC-DS: Transaction Performance Processing Council for Decision Support-decision Support
[online] http://www.tpc.org/tpcds/ [accessed, April 30, 2014].
Tsui, F. C., Espino, J. U., Dato, V . M., Gesteland, P. H., Hutman, J., and Wagner, M. M. (2003).
Technical description of RODS: a real-time public health surveillance system. Journal of the
American Medical Informatics Association, vol. 10, no 5, pp. 399-408.
van der Zee, E., and Scholten, H. (2014). Spatial dimensions of big data: Application of geographical
concepts and spatial technology to the internet of things. In Big Data and Internet of Things: A
Roadmap for Smart Environments (pp. 137-168). Springer, Cham.
Wei, Y ., Son, S. H., Stankovic, J. A., and Kang, K. D. (2003, December). QoS management in
replicated real-time databases. In Real-Time Systems Symposium, 2003. RTSS 2003. 24th IEEE
(pp. 86-97). IEEE.
Zhang, S., Yang, Y ., Fan, W. and Winslett, M.(2014a) ’Design and implementation of a real-time
interactive analytics system for large spatio-temporal data’, Proceedings of the VLDB Endowment ,
vol. 7,no. 13, pp.1754-1759.
Zhang, S., Yang, Y ., Fan, W., Lan, L. and Yuan, M.(2014b) ’Oceanrt: real-time analytics over
large temporal data’, in Proceedings of the 2014 ACM SIGMOD International Conference on
Management of Data , ACM, June, pp.1099-1102.
Zhao, W., Cheng, Y ., and Rusu, F. (2015). Workload-Driven Vertical Partitioning for Effective Query
Processing over Raw Data. arXiv preprint arXiv:1503.08946.

Similar Posts