Representation And Searching Algorithm For Opening Hoursdoc
=== Representation and searching algorithm for opening hours ===
A Novel Representation and Searching Algorithm for Opening Hours
Teodora Husar
Department of Computer Science and Information Technology, University of Oradea
Oradea, Romania
e-mail: [anonimizat]
Cornelia Győrödi
Department of Computer Science and Information Technology, University of Oradea
Oradea, Romania
e-mail: [anonimizat]
Robert Győrödi
Department of Computer Science and Information Technology, University of Oradea
Oradea, Romania
e-mail: [anonimizat]
Sorin Sarca
Department of Informatics, Faculty of Sciences University of Oradea
Oradea, Romania
ail: [anonimizat]
Abstract — Opening Hours can be considered a data type having a human representation; this means that it can be easily understood by human beings and hardly understood by computers because the lack of a standard structured representation. In essence, the opening hours gives us a simple information: opening state at a certain date and time, and this is our focus in this paper. So far, this kind of functionality does not exists in today's database management systems because there are no algorithms developed in this way. The purpose of this paper is to presents a novel and easy to implement algorithm for encoding opening hours in order to quickly search and get the opening state for records.
Keywords – Apache Solr, Opening Hours, Java, optimizations.
INTRODUCTION
A few years ago an application normally only used to have thousands of users to tens of thousands of users in the most extreme cases, and currently there are applications that have millions of users and amount of data is increasing, so is very important to perform efficient search to get relevant information from data [7]. It is important to use search engines to access information, that has grown tremendously based on the needs of users [2]. The goal of information extraction methods is the extraction of specific information from data documents [1].
Opening Hours are becoming increasingly significant in online environment, because is very important to know when a restaurant, store or other business is open, this information can help you avoid unnecessary roads, or in late hours you can search for open places where you can fix urgent problems: dentist, auto service and so on. Businesses is usually listed in directories where you can perform a search to get relevant results, but these results are not relevant to current time, nor the results will not be relevant for a future date.
If there would be a way to filter the results only get the ones that are open now or in a future date, then you will save a lot of time, otherwise you have to loop over the result list and check each opening hour. So far, there is no algorithm to solve this problem.
This paper proposes an encoding of the opening hours, which can show in a very short time if it is opened or closed at a certain date and time. There are many representations for opening hours but this encoding is not assuming any, because there is not a standard representation. Instead it is a byte sequence, not easily readable by humans, which encapsulates opening hours info, so that if given a date and hour it can determine very fast if it is open or closed. This paper make a first attempt to describe a new algorithm for searching through opening hours. To perform comparisons tests, we also achieved an implementation of algorithm written in Java [4][5] for Apache Solr [6].
In the next sections of this paper we will describe the algorithm to encode opening hours, the algorithm for sorting and searching, Java and Apache Solr implementations, benchmarks and optimizations and the final conclusions.
THE ALGORITHM REPRESENTATION
Opening Hours can be composed from many entities (day of week, date, time) and to easily represent them we must choose some notations. We consider the following primitives:
Z – represents the day of the week using one digit, 1 (for Monday) through 7 (for Sunday)
H – represents hour of the day using four digits, 0000 (for 00:00) to 2359 (for 23:59)
D – represents the date of year (only month and day) using four digits, 0101 (for January 1) to 1231 (for December 31)
Having these three primitives, we can move forward and create some types representing time intervals.
ZHH type
With this format, we can encode one interval of opening hours in a single day. Therefore, encoding “Monday from 8:00 to 16:00” will produce 108001600. We can see this representation in Table 1 that indicates every primitive used:
Table 1 – The representation of the ZHH type
DHH type
Similar to ZHH but specifies an exact date instead of day of week. This is just great if you have some special hours in a date, because it can encode something like “Opened on December 25 from 9:00 to 12:00” to 122509001200, and the Christmas day is saved. We can see this representation in Table 2.
Table 2 – The representation of the DHH type
ZDDHH type
This type extends ZHH and DHH types by adding a date interval. More precisely, it can encode something like “Opened every Sunday from 9:00 to 18:00 between February 12 and March 16” to 70212031609001800, so you can easily represent opening hours for seasons. The date interval must be greater than a week, otherwise you can use ZHH or DHH types. This representation is shown in Table 3.
Table 3 – The representation of the ZDDHH type
-D
We can use this type to represent a closed date. Therefore, if we close on December 25 the encoding will be -1225. This representation is shown in Table 4.
Table 4 – The representation of the -D type
-DD
This type is just like -D type but specifies a date interval instead of a single date. Can encode a range of closed dates, “Closed between January 1 and January 3” to -01010103. We show this representation in Table 5.
Table 5 – The representation of the -DD type
Examples of encoded hours
Now that we have our types, we can go ahead and try to encode some opening hours. For example:
Monday – Friday from 8:00 to 16:00
For each day of week, we must add a separate record of type ZHH: 108001600, 208001600, 308001600, 408001600,
508001600.
Saturday from 9:00 to 13:00
Simply use a ZHH record: 609001300.
Sunday – closed
We do not have to do anything, if there is no other data for opening hours we assume it is closed.
Monday – Friday from 8:00 to 12:00 and from 14:00 to 18:00
Because each day contains two intervals of opening time we must add a ZHH record for each interval: (108001200, 114001800), (208001200, 214001800), …, (508001200,
514001800).
Sunday from 9:00 to 12:00 – only during summer season
Considering the summer season as being between June 1 and August 31, we could just use a ZDDHH record: 70601083109001200.
There are some special hours: December 24 from 9:00 to 13:00 December 25 – closed
January 1 & 2 – closed
For December 24 we must use a DHH record: 122409001300, for December 25 a -D record: -1225 and for January 1 & 2 a -DD record: -01010102.
We consider specifying a closed date record (-D or -DD) implies that you also use the other types to specify opening hours, otherwise it is pointless.
Special cases are:
Opened 24/7
We must use seven ZHH records to specify the opening interval as 00:00 – 23:59 for each day of week: 100002359, 200002359, …, 700002359. Please note that specifying interval 00:00 – 00:00 is like saying that it is open only at 00:00 (exactly one minute, until 00:01).
Monday – Friday from 18:00 to 02:00
In this case, the opening interval is past midnight, so it means we must split them in ZHH records to mark as open the first two hours of next day: (118002359, 200000200), (218002359, 300000200), …, (518002359, 600000200).
If Saturday or Sunday is closed, we must not specify a -D record for it.
Evaluation order algorithm
So far, we defined representation types and we know how to represent opening hours using our types. However, to know for a specified date and hour if it is opened or not, we consider that each opening hours contains a list of encoded representations in different types. We also set the evaluation order from the most specific to most general type and compare our date and hour to specified representation. For our representation types, the priorities are (from most important to less important): -D, -DD, DHH, ZDDHH, ZHH. We define further the related algorithm for determining opening state:
Let X = searched date (month and day) Let Y = searched time (hour and minute)
Step 0
Let Q = day of week for X
Step 1
Get next -D, if no -D goto Step 2. If X equals D return CLOSED else repeat Step 1.
Step 2
Get next -DD, if no -DD goto Step 3. If X between DD return CLOSED else repeat Step 2.
Step 3
Get next DHH, if no DHH goto Step 4. If X equals D and between HH return OPEN else repeat Step 3.
Step 4
Get next ZDDHH, if no ZDDHH goto Step 5. If Q equals Z and X between DD and Y between HH return OPEN else repeat Step 4.
Step 5
Get next ZHH, if no ZHH goto Step 6. If Q equals Z and Y between HH return OPEN else repeat Step 5.
Step 6
Return CLOSED.
SORTING AND SEARCHING ALGORITHM
In this section, we describe some code implementations for our algorithm. We are not making any assumptions for the original format of opening hours because there is not a standard to represent them, so there could be many implementations. We just consider that there exists a list of records encoded using our representation types.
Sorting the list of records
Each type has different length (except closing ones, which will be put first) so we will use that to sort the records. We present below the Java code for sorting the list of records.
import java.util.Collections; import java.util.LinkedList; import java.util.List;
public class OH {
final protected static int Z = 1; final protected static int H = 4; final protected static int D =4;
final protected static int _D = 1 + D; final protected static int _DD = _D + D; final protected static int DHH = D + H + H;
final protected static int ZDDHH = Z + D + D
+ H + H;
final protected static int ZHH = Z + H + H; public static void sortRecords(List<String>
records) {
LinkedList<LinkedList<String>> ordered = new LinkedList<LinkedList<String>>();
ordered.add(null); // _D, _DD ordered.add(null); // DHH ordered.add(null); // ZDDHH ordered.add(null); // ZHH
LinkedList<String> list = null; int index; int len;
while (!records.isEmpty()) { String r = records.remove(0); if (r == null) continue;
len = r.length();
if (r.charAt(0) == '-') {
if (len == _D || len == _DD) { index = 0;
} else {
continue;
}
} else {
switch (len) {
case DHH: index = 1; break; case ZDDHH: index = 2; break; case ZHH: index = 3; break; default: continue;
}
}
list = ordered.get(index); if (list == null) {
list = new LinkedList<String>(); ordered.set(index, list);
}
list.add(r);
}
list = ordered.removeFirst(); if (list != null) {
Collections.sort(list); records.addAll(list);
}
while (!ordered.isEmpty()) {
list = ordered.removeFirst(); if (list == null) {
continue;
}
Collections.sort(list); Collections.reverse(list); records.addAll(list);
}
}
public static void main(String[] args) { List<String> records = new
LinkedList<String>();
records.add("10212031208001800"); records.add("-0213"); records.add("102120312019002000");
records.add("20212031208001800"); records.add("209001200"); records.add("109001200"); records.add("409001200"); records.add("031312001400"); records.add("309001200"); records.add("509001200"); records.add("30212031208001800"); records.add("40212031208001800"); records.add("50212031208001800");
sortRecords(records);
public static boolean isOpen(byte[] bytes, int date, int hour, int year) {
int max = bytes.length; if (max == 0) {
return false;
}
long nr = 0; // interval int length = 0; // type
boolean close = false; // is closed marker
Calendar calendar = Calendar.getInstance(); calendar.set(year, date / 100 – 1, date % 100);
int currentD = calendar.get(Calendar.DAY_OF_WEEK); calendar = null;
// Because Sunday is 1, we must decrement
// and set Sunday to 7, so that Monday is 1 if (–currentD == 0) {
currentD = 7;
}
// will be used for checking if the next
// interval has the same type int cType = 0;
int firstDigit = 0; // used for day of week for (int i = 0; i < max; i++) {
if (bytes[i] == '-') { // check if closed close = true;
continue;
System.out.println(records);
/*
Should output:
[-0213, 031312001400, 50212031208001800,
40212031208001800, 30212031208001800,
20212031208001800, 10212031208001800, 509001200,
409001200, 309001200, 209001200, 109001200]
*/
}
}
Searching through records
We implemented in Java [8] the evaluation order algorithm described above and we considered that the function isOpen receives as argument a byte array containing sorted records delimited by semicolon.
import java.util.Calendar;
}
// Check if is a number
if (bytes[i] >= '0' && bytes[i] <= '9') { if (firstDigit == 0) {
// set firstDigit with the day of week firstDigit = bytes[i] – '0';
}
nr = nr * 10 + bytes[i] – '0'; length++;
}
else if (bytes[i] == DELIMITER) {
// finished format interval
if (close) { // check for closed
if (date == nr || (length > 4 && date >= nr
/ 10000 && date <= nr % 10000)) { return false;
}
close = false; continue;
}
if (cType > 0 && length != cType) {
// not the same type as previous return false;
public class OH {
final protected static int Z = 1; final protected static int H = 4; final protected static int D = 4;
final protected static int _D = 1 + D; final protected static int _DD = 1 + D + D; final protected static int DHH = D + H + H;
final protected static int ZDDHH = Z + D + DHH; final protected static int ZHH = Z + H + H; final protected static char DELIMITER = ';';
}
// Check if is open switch (length) { case DHH:
if (date == nr / 100000000L) { if (hour <= (nr % 10000L) &&
hour >= (nr / 10000L) % 10000L) {
return true;
}
cType = DHH;
}
break;
case ZDDHH:
if (firstDigit == currentD
&& date >= (nr / 1000000000000L) % 10000 && date <= (nr / 100000000L) % 10000) {
if (hour >= (nr % 100000000L) / 10000 && hour <= (nr % 10000)) {
return true;
}
cType = ZDDHH;
{!frange u=0}openingHoursFCT(FIELD, DATE, HOUR [, YEAR]) – to get closed
FIELD – field name to check
DATE – four digit date (month and day)
HOUR – four digit hour
YEAR – optional, current year (to establish day of week)
}
break; case ZHH:
if (currentD == firstDigit) {
if (hour >= (nr % 100000000L) / 10000L && hour <= (nr % 10000L)) {
return true;
The parser responsibility is to parse request arguments and pass them further to the function. The next block of code shows the parser implementation.
package husart.solr.openinghours;
} import
}
break;
}
// Reset
nr = firstDigit = length = 0;
}
}
return false;
}
public static void main(String[] args) { String records = "-
0213;031312001400;50212031208001800;4021203120800180
0;30212031208001800;20212031208001800;10212031208001
800;509001200;409001200;309001200;209001200;10900120
0;";
byte[] bytes = records.getBytes();
// false
System.out.println(isOpen(bytes, 1225, 1245,
2016));
// true
System.out.println(isOpen(bytes, 111, 1200,
2016));
// false
System.out.println(isOpen(bytes, 111, 1201,
2016));
}
}
APACHE SOLR IMPLEMENTATION
Apache Solr implementation was made by creating a parser and a function. The exported jar must be placed in a lib folder and referenced in solrconfig.xml file under the <config> tag.
<lib dir="../lib/" regex=".*\.jar" />
<valueSourceParser name="openingHoursFCT" class="husart.solr.openinghours.OpeningHoursParser"
/>
After indexing data in Apache Solr, we can test the function:
{!frange l=1}openingHoursFCT(FIELD, DATE, HOUR [, YEAR]) – to get opened
org.apache.lucene.queries.function.ValueSource;
import org.apache.solr.search.FunctionQParser; import org.apache.solr.search.SyntaxError; import org.apache.solr.search.ValueSourceParser;
public class OpeningHoursParser extends ValueSourceParser {
@Override
public ValueSource parse(FunctionQParser fqp) throws SyntaxError {
ValueSource fieldName = fqp.parseValueSource();
int date = fqp.parseInt(); int hour = fqp.parseInt();
if(fqp.hasMoreArguments()){
return new OpeningHoursFunction(fieldName, date, hour,fqp.parseInt());
}
return new OpeningHoursFunction(fieldName, date, hour);
}
}
The OpeningHoursFunction method is used to compare arguments against field value and return true if open or false if closed. The isOpen() method described in the evaluation order algorithm chapter and implemented by us can see in [8]
package husart.solr.openinghours;
import java.io.IOException; import java.util.Calendar; import java.util.Map;
import org.apache.lucene.index.AtomicReaderContext; import org.apache.lucene.queries.function.FunctionValues; import org.apache.lucene.queries.function.ValueSource; import org.apache.lucene.queries.function.docvalues.BoolDoc Values;
public class OpeningHoursFunction extends ValueSource {
protected final ValueSource field; final public int date;
final public int hour; public int year;
public OpeningHoursFunction(ValueSource field, int date, int hour) {
Calendar calendar = Calendar.getInstance(); this.field = field;
this.date = date; this.hour = hour;
this.year = calendar.get(Calendar.YEAR); // current year
}
public OpeningHoursFunction(ValueSource field, int date, int hour, int year) {
this(field, date, hour); this.year = year;
BENCHMARKS
When comparing the performance of two search algorithms or two sorting algorithms, we consider two types of operations: data movements, or swaps, and comparisons.[3]
Because the test results depend on the computer on which these tests are carried out, it is important to note that all the results presented below in Table 6, were obtained from studies conducted on a computer with the following characteristics: processor Intel Core i7, 4 GB RAM memory and 320 GB SSD.
}
@Override
public String description() { return "Opening hours";
}
@Override
public boolean equals(Object o) {
if (o instanceof OpeningHoursFunction) { final OpeningHoursFunction obj =
(OpeningHoursFunction) o;
return this.field.equals(obj.field) && this.date == obj.date && this.hour == obj.hour && this.year == obj.year;
}
return false;
}
@Override
public BoolDocValues getValues(@SuppressWarnings("rawtypes") Map context,
AtomicReaderContext readerContext) throws IOException {
final FunctionValues fieldValue = field.getValues(context, readerContext);
return new BoolDocValues(field) {
@Override
public boolean boolVal(int doc) {
String code = fieldValue.strVal(doc);
// get field value
if (code == null) { return false;
Table 6 – The results of tests
In Java* implementation we can improve the speed of isOpen() function by removing the code that generates the currentDay integer (by creating a new Calendar instance), and pass it as an argument.
From results tests presented in Table 6 we can say that the resulted times are excellent: we can search through 1.000.000 database records/documents in less than one second and with optimization in less than half a second.
For Apache Solr the search through 100.000 documents took 100 milliseconds, which is great. In a real life case there will often be some enforced search criteria such as a limit or a category, city and so on; because it does not make sense to return 100.000 documents to end-user. In other words, the algorithm can handle very well millions of records if other search criteria are used.
Without taking into account the database management system where this algorithm is implemented, for the best
}
return isOpen(code.getBytes(), date, hour, year);
}
};
}
@Override
public int hashCode() {
return field.hashCode() + date * 10000 + hour
+ year;
}
}
results one must consider to:
use a separate field for the encoded opening hours. Even if you can reverse the encoding, it is not a good idea because the format is intended for search only, so use your original format of opening hours if you want to show it to your users
do the sorting once (when you insert or update), doing it on every search request will slow-down the process
do not return the value of encoded field (to save bandwidth and speed-up the search)
make sure that the check is done only when necessary. Remember that if you have A && B, B is not evaluated if A evaluates to false
remove redundant records from opening hours (for example having a -D between a -DD is useless)
CONCLUSIONS
To sum-up, the advantages of the algorithm are:
O(n) complexity for search (in the worst case)
covers all cases for opening hours, including special hours or season hours
easy to extend (for example adding time zone support) and implement since it operates on a byte sequence
The only downside of algorithm is the size of needed byte sequence. However, when it comes to search, we will not have to worry about disk or ram space because speed comes first. However, there are some possible workarounds:
add another types, such as ZZHH (which extends ZHH, using ZZ as interval) and a special one to represent always open
use a fast method to pack and unpack the sequence (you can reduce the size to half by using something similar to BCD – binary coded decimal)
In the end we can say that our method is a good proof of concept for encoding opening hours in order to determine if is open or closed at certain date and time considering the low complexity and high speed of search.
REFERENCES
[1] Jadhav Bhushan G, Warke Pushkar U, Kuchekar Shivaji P, Kadam Nikhil, “Searching Research Papers Using Clustering and Text Mining”, International Journal of Emerging Technology and Advanced Engineering, Volume 4, Issue 4, April 2014, Available: http://ijetae.com/files/Volume4Issue4/IJETAE_0414_135.pdf, accessed jan 2016.
[2] E.A. Calvillo, A. Padilla, J. Munoz, J.T. Fernandez, ”Searching research papers using clustering and text mining”, International Conference on Electronics, Communications and Computing (CONIELECOMP), 11-13 March 2013, pp. 78 – 81, ISBN 978-1-4673-6156-9, Available:
https://www.researchgate.net/publication/261036444_Searching_researc h_papers_using_clustering_and_text_mining, accessed jan 2016.
[3] Amy Csizmar Dalal, “Searching and Sorting Algorithms”, Supplementary Lecture Notes, 2004, Available: http://www.cs.carleton.edu/faculty/adalal/teaching/f04/117/notes/search Sort.pdf, accessed jan 2016.
[4] S. Wild and M. E. Nebel, “Analysis of Yaroslavskiy's dual-pivot quicksort used in Java 7”. In Proceedings of the 20th European Symposium on Algorithms, 2012.
[5] Java programming language, https://docs.oracle.com/en/java/ [6] Apache Solr, Available: http://lucene.apache.org/solr/
[7] Cornelia Győrödi, Robert Győrödi, George Pecherle, Andrada Olah, “A comparative study: MongoDB vs. MySQL”, IEEE – 13th International Conference on Engineering of Modern Electric Systems (EMES), 2015, Oradea, Romania, 11-12 June 2015, ISBN 978-1-4799-7649-2, pag. 1-6.
[8] Solr Opening Hours solutions, Available: https://github.com/husart/SolrOpeningHours
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: Representation And Searching Algorithm For Opening Hoursdoc (ID: 119808)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
