Univ ersitatea POLITEHNICA din Bucures , ti [611258]
Univ ersitatea POLITEHNICA din Bucures , ti
F acultatea de Automatic s , i Calculatoare,
Departamen tul de Calculatoare
LUCRARE DE DIPLOM
Implementarea unei platforme de
simulare s ,i testare a unui IP core în
QEMU
Conduc tor S , tiint , ic: Autor:
As. Drd. Ing. Carabas , Mihai V asile G. Cristi-Alexandru
ii
Bucures , ti, 2017
Univ ersit y POLITEHNICA of Buc harest
F acult y of Automatic Con trol and Computers,
Computer Science and Engineering Departmen t
BACHELOR THESIS
F ull system level simulation and testing
of an IP core in QEMU
Scien tic A dviser: Author:
As. Drd. Ing. Carabas , Mihai V asile G. Cristi-Alexandru
Buc harest, 2017
Maecenas elemen tum v enenatis dui, sit amet
v ehicula ipsum molestie vitae. Sed p orttitor
urna v el ipsum tincidun t v enenatis. A enean
adipiscing p orttitor nibh a ultricies. Curabitur
v ehicula semp er lacus a rutrum.
Quisque ac feugiat lib ero. F usce dui tortor,
luctus a con v allis sed, lacinia sed ligula.
In teger arcu metus, lacinia vitae p osuere ut,
temp or ut an te.
Abstract
Here go es the abstract ab out MySup erPro ject. Lorem ipsum dolor sit amet, consectetur adip-
iscing elit. A enean aliquam lectus v el orci malesuada accumsan. Sed lacinia egestas tortor, eget
tristique dolor congue sit amet. Curabitur ut nisl a nisi consequat mollis sit amet quis nisl.
V estibulum hendrerit v elit at o dio so dales pretium. Nam quis tortor sed an te v arius so dales.
Etiam lacus arcu, placerat sed laoreet a, facilisis sed n unc. Nam gra vida fringilla ligula, eu
congue lorem feugiat eu.
ii
Contents
A c kno wledgemen ts i
Abstract ii
1 Related w ork 1
1.1 Soft w are face detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Hardw are accelerated face detection . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Bac kground/state of the art 6
2.1 Em ulation v ersus sim ulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 A v ailable Virtual Mac hines and Em ulators . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 The Bo c hs Em ulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 The EM86 Em ulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 The User Mo de Lin ux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.4 The QEMU Em ulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 The ARMR
arc hitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 The Android Op erating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 The Lin ux Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.2 System Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
A Pro ject Build System Mak eles 9
A.1 Mak ele.test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
iii
List of Figures
1.1 Output of the Op enCV face detector . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The cascade-lik e structure of detectors prop osed in the Viola-James algorithm . 3
1.3 The design prop osed b y L. A casandrei and A. Barriga (Source: [2]) . . . . . . . 3
1.4 Example of in tegral image where the v alue in p osition (i, j) is the sum of all the
v alues in the blue rectangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 (a) Haar lik e features (b) Haar lik e features applied to a windo w . . . . . . . . . 4
1.6 Arc hitecture of a con v olutional neural net w ork, Source: [1] . . . . . . . . . . . . 5
2.1 Comparison b et w een the em ulator mo del and the virtual mac hine mo del . . . . 6
2.2 In ten ted blo c k diagram of the pro ject . . . . . . . . . . . . . . . . . . . . . . . . 7
iv
List of T ables
v
Chapter 1
Related work
Although one of the most ask ed for t yp e of detection, face detection has pro v en to b e v ery hard to
ac hiv e for real commercial use (handheld cameras, smartphones) due to its complex algorithms
that w ere pro v en v ery tough to ecien tly implemen t on a regular PC/laptop, let alone deliv er
notable p erformance on em b edded devices with RISC arc hitecture (mainly ARM). Ov er the
y ears, man y prop ositions ha v e b een made on ho w to optimize, accelerate and implemen t these
algorithms on sp ecic hardw are platforms in order to ac hiev e real-time p erformance.
1.1 Soft w are face detection
Soft w are face detection refers to the sum of all detection algorithms that are implemen ted
on a standard pro cessor arc hitecture. This means that the algorithm do es not use sp ecic
computational arc hitectures in order to accelerate detection.
One of the most w ell-kno wn API used for computer vision is the Op enCV (Op en Source Com-
puter Vision) library . It is free for academic and commercial use and it supp orts a wide v ariet y
of op erating systems and programming languages. It con tains image pro cessing algorithms that
implemen t face detection, edge detection, feature detection, mac hine learning framew orks and
GPU-accelerated computer vision.
The do cumen tation can b e found online and can b e easily used to write simple programs (under
20 lines of co de) that can do a lot. F or example here is a co de snipp ed in Python that p erformes
real time face detection b y acquiring images from the camera and passing them through a face
detector:
1import cv2
2
3webcam = cv2.VideoCapture(0)
4faceCascade = cv2.CascadeClassifier("haarcascade_frontalface_default
.xml")
5cv2.namedWindow(’Live feed’)
6
7while (True):
8
9 _, frame = webcam.read()
10 gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
11 faces = faceCascade.detectMultiScale(gray)
12 for rinfaces:
1
CHAPTER 1. RELA TED W ORK 2
13 cv2.rectangle(frame, (r[0], r[1]), (r[0] + r[2], r[1] + r[3]),
(0, 255, 255), 3)
14
15 cv2.imshow(’Live feed’, frame)
16
17 key = cv2.waitKey(1)
18 ifkey == ord(’q’):
19 break
20
21webcam.release()
22cv2.destroyAllWindows()
Listing 1.1: Op enCV Liv e F ace Detection
In the co de, the image is acquired via the W eb cam and con v erted to gra yscale format. The
detector is instanciated from an XML mo del (pro vided b y Op enCV) that describ es its in ternal
structure and parameters. After that, eac h detected face is represen ted b y a R OI (Rectangle
Of In terest) and it is dra wn on to the frame with a thic k green line. The result of running this
program is sho wn b elo w in Figure 1.1.
Figure 1.1: Output of the Op enCV face detector
In their 2004 pap er, Robust Real-Time F ace Detection [4], P . Viola and M.J. Jones prop osed
a breakthrough in the domain of face detection. Their algorithm is complex but yields quite
impressiv e results and is used in the ab o v e-men tioned Op enCV library .
They dev elop ed a series of detectors in a cascade-lik e structure (Figure 1.2), eac h of them
trained using the A daBo ost algorithm. First, the algorithm scans the image with dieren t
size windo ws in order to searc h for faces. Eac h windo w is passed through all the detectors
sequen tially . The structure w as c hosen in suc h a w a y that if a certain frame is probably not a
face, it is discarded after the rst few detectors. Only the ones that are prone to con tain a face
are passed through all the lev els.
This structure ensures that the main computational p o w er of the pro cessing elemen t is sp en t
mainly on the frames with the highest probabilit y to con tain faces. All the detectors are
CHAPTER 1. RELA TED W ORK 3
Figure 1.2: The cascade-lik e structure of detectors prop osed in the Viola-James algorithm
trained as w eek learners through the A daBo ost algorithm b ecause eac h of them should compute
fast to ensure maxim um p erformance. The detection sp eed p eaks at 15 frames p er second
(implemen ted on an desktop PC, around y ear 2000).
1.2 Hardw are accelerated face detection
Hardw are accelerated face detection refers to all the detectors that use sp ecial computational
units (ASICs) together with con v en tional pro cessors in order to ac hiev e acceleration.
L. A casandrei and A. Barriga prop ose a v ery in teresting approac h on this sub ject (FPGA
Implemen tation of an Em b edded F ace Detection System Based on LEON3 [2]). They ha v e
describ ed almost the same algorithm that uses a series of w eak learners to explore an image
and sequencially eliminate non-faces. The metho d used is the Viola-James ob ject detection
algorithm. Due to large computational necessities, the p erformance w as fairly lo w, hence they
decided to split the arc hitecture in to t w o parts: the soft w are blo c k and the hardw are blo c k, as
seen in Figure 1.3.
Figure 1.3: The design prop osed b y L. A casandrei and A. Barriga (Source: [2])
The soft w are accelerated part uses Op enCV to detect faces using the Viola-James algorithm
in a sp ecic frame within the picture while the hardw are mo dule computes and extracts the
CHAPTER 1. RELA TED W ORK 4
windo ws of v arious sizes from the picture. The result they ha v e obtained is a signican t sp eedup
(up to 5 times faster) than the original soft w are-only algorithm. The only problem is that the
input image is not large enough for commercial use at this sp eed.
Another researc h team that to ok men tion of the Viola-James algorithm has tried to implemen t
it on an FPGA to pro cess frames incoming from a analog video camera. M. Kim, D. Lee and
K. Kim ha v e prop osed in their pap er, System Arc hitecture for Real-Time F ace Detection on
Analog Video Camera [3], a hardw are arc hitecture comp osed of four blo c ks: the image scale
blo c k (ISB), the in tegral image pro cessing blo c k (IPB) and the feature pro cessing blo c k (FPB).
After the frame has b een acquired from the camera, it is passed through a ADC (analog to
digital con v erter) and it is stored. The ISB scales the input image to the desired size for the
detector. The next step in v olv es the IPB that calculates the in tegral image. This is a matrix in
wic h the v alue of eac h pixel is replaced with the sum of all pixels in the rectangle b ounded b y
the origin and the curren t pixel, an example can b e seen in Figure 1.4. This image is computed
in order to easily calculate the sum of all pixels in ev ery rectangle b ounded b y t w o random
pixels, needed b y the detection algorithm. The nal hardw are blo c k, the FPB passes the frame
through all the trained detectors to calculate the probabilit y of it b eing a p ositiv e. The FPB
has a pip eline structure and tak es parameters from the feature memory , whic h stores Haar-lik e
features to b e applied in a certain manner to detect faces (Figure 1.5). In the end, the classier
con troller returns the detection results.
This optimized algorithm can detect faces from an analog camera at 42 frames p er second at
the op erating frequency of 100MHz. The image acquired from the camera is 320 x 240 pixels.
Figure 1.4: Example of in tegral image where the v alue in p osition (i, j) is the sum of all the
v alues in the blue rectangle
Figure 1.5: (a) Haar lik e features (b) Haar lik e features applied to a windo w
"Con v olutional neural net w orks or Con vNets are a sp ecial kind of neural net w orks that tak e
adv an tage of the lo calit y of data in images to reduce the n um b er of parameters needed to
pro cess large images." [1]
CHAPTER 1. RELA TED W ORK 5
These t yp e of neural net w orks exploit the shap e, colour and size of an ob ject to try and gener-
alize and detect all ob jects that are the same kind. The learning pro cess is hard and requires
man y examples in order for the accuracy to b e high. An example of suc h detector can b e seen
in Figure 1.6
F or eac h training example the parameters of the net w ork (called w eigh ts and biases) are adjusted
so that the error decreases and the output matc hes the ground truth. This training pro cess
is called sto c hastic gradien t descen t and is hea vily computational. When the error function
reac hes a minim um (hop efully a global minim um) the training is complete and the parameters
are sa v ed. The next step is to use the detector. An imput image is forw arded through all the
la y ers of the net w ork that established whether the frame is or is not an desired ob ject.
This category of detector can b e trained to learn ho w a face lo oks lik e. This task pro v es itself
quite c hallenging b ecause the feature space of common faces is v ery ample. The designer has to
c ho ose b et w een high detection rates (in spite of head rotation, p o or image qualit y , bad ligh ting,
etc.) and go o d p erformance. Usually it is a com bination of the ab o v e.
Figure 1.6: Arc hitecture of a con v olutional neural net w ork, Source: [1]
In the pap er "Hardw are A ccelerated Con v olutional Neural Net w orks for Syn thetic Vision" [1]
the authors prop ose a SIMD sp ecialized pro cessor designed sp ecically for a con v olutional
net w ork. This design allo ws for v ery high pro cessing sp eed due to its m ultiple parallel pro cessing
units. The design is comp osed of three elemen ts, a Con trol Unit CPU, a Memory In terface
Streaming Engine and m ultiple ALUs. The Con trol Unit acts as a master pro cessor and manages
the run time parameters and con trols the datao w b et w een other units. On the other hand, the
ALU's are disp osed in a grid-lik e structure and are capable of pro ducing a result p er clo c k
cycle. These units are mostly used to p erform con v olution and propagation of the input image.
The Memory In terface Streaming Engine is a bus con troller that is sp ecically built for image
manipulations.
This arc hitecture w as implemen ted in a HDL and dev elop ed on a Xilinx Virtex-4 SX35 FPGA
b oard. The results are sho wing a sp eedup of almost t w o orders of magnitude compared to a
regular CPU.
Concluzie + cev a care sa pregureze urmatorul capitol
Chapter 2
Background/state of the art
2.1 Em ulation v ersus sim ulation
An em ulator is a sp ecial t yp e of program that allo ws a user to encapsulate a fully computational
system (hardw are and soft w are) inside a dieren t t yp e of system. The system em ulated is called
the guest and the system that sup ervises this em ulation (the system in whic h the em ulation
tak es place) is called the host. The dierence b et w een an em ulator and a virtual mac hine is
that the em ulator do es not run co de directly on the host's CPU but relies on another en tit y
called the h yp ervisor that connects the host and the guest. A virtual mac hine relies hea vily on
the CPU's virtualization tec hnology and somethimes runs co de directly on the CPU without
an y in termediary (Figure 2.1). Of course, this tec hnique brings a p erformance plus but at a
securit y cost and is not practical for ev ery scenario (e.g. the host runs on a x86 arc hitecture
but the op erating system and the programs inside the em ulator/virtual mac hine should run
ARMR
ISA2)
Figure 2.1: Comparison b et w een the em ulator mo del and the virtual mac hine mo del
Em ulation and sim ulation are sometimes confused with one another, but when talking ab out
computational systems they are en tirely dieren t concepts. Em ulation of a certain system
mimics the in ternals of it exactly as it w ould b e if the system w as indep enden t. Hence, the
2Instruction Set Arc hitecture
6
CHAPTER 2. BA CK GR OUND/ST A TE OF THE AR T 7
em ulation follo ws a rigourous list of rules that describ e the initial system. On the other hand,
the sim ulation of a system is indep enden t of that system's in ternal w orks, the only restriction
is that the output of b oth the original system and the em ulated system matc h when the input
is iden tical. Y ou can think of it this w a y: the sim ulation is a "blac k b o x" that y ou can not
distinguish b et w een the original system unless y ou lo ok inside it whereas the em ulation and the
original system should b e v ery close implemen tation-wise.
Most so called em ulators are not exactly just em ulating a computer system. Some use a com-
bination of sim ulation and em ulation to reac h a stable p erformance b y em ulating a part of the
hardw are in soft w are that b eha v es in a hardw are-lik e manner and sim ulating the other part b y
using certain libraries lo cated in the host.
As y ou can see in Figure 2.2, the in ten ted structure of the pro ject requires a prop er em ulator that
can run an Android op erating system with a Lin ux k ernel on an ARMR
target CPU ab o v e a real-
time Windo ws system. This em ulator should allo w easy in tegration of a sim ulated p eripheral
device that can b e mo deled inside the Lin ux k ernel with a device driv er. Another requiremen t
is that this em ulator is op en-source and has a go o d do cumen tation and implemen tation.
Figure 2.2: In ten ted blo c k diagram of the pro ject
2.2 A v ailable Virtual Mac hines and Em ulators
As w as men tioned earlier, a proprietary/closed-source em ulator is out of the question, hence
there will b e no men tion of them. Although there are man y commercial virtual mac hine/em u-
lator application w orth men tioning, these exceed the purp ose of this pro ject and pap er.
2.2.1 The Bo c hs Em ulator
With the initial release date set bac k in 1994, Bo c hs is one of the rst em ulator ev er written.
A t that date, it w as released under a commercial license, but in Marc h 2000 the source co de
w as published in a GNU Lesser General Public License. Bo c hs is able to em ulate v arious x86
CPUs regardless of the host's arc hitecture, b y using an in terpreter to translate instructions of
the guest and pass them on to the guest op erating system. Its design includes device mo dels
for most PC p eripherals (written in C++), from k eyb oard and mouse to net w ork cards.
CHAPTER 2. BA CK GR OUND/ST A TE OF THE AR T 8
2.2.2 The EM86 Em ulator
EM86 is an em ulator released in 1997 and did not get to v ersion 1.0. The latest release,
v0.2 (Beta 2), it is capable to run an op erating system inside another b y using a proprietary
in terpretor. It is limited to the x86 arc hitecture.
2.2.3 The User Mo de Lin ux
User mo de Lin ux pro vides a virtual mac hine that can run other Lin ux k ernel v ersions or other
programs, all con tained inside a pro cess. It is mainly used to test applications and con tain them
so that an y damage that aects the virtual mac hine, it has no eect on the ph ysical mac hine.
2.2.4 The QEMU Em ulator
QEMU is a free, op en-source em ulator and virtualizer that supp orts v arious t yp es of mac hines.
2.3 The ARMR
arc hitecture
2.4 The Android Op erating System
2.4.1 The Lin ux Kernel
2.4.2 System Applications
App endix A
Pro ject Build System Makeles
A.1 Mak ele.test
1import cv2
2
3cap = cv2.VideoCapture(0)
4faceCascade = cv2.CascadeClassifier("haarcascade_frontalface_default
.xml")
5cv2.namedWindow(’Live feed’)
6
7while (True):
8
9 _, frame = cap.read()
10
11 gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
12 faces = faceCascade.detectMultiScale(gray)
13
14
15 for rinfaces:
16 cv2.rectangle(frame, (r[0], r[1]), (r[0] + r[2], r
[1] + r[3]), (0, 255, 255), 3)
17
18 cv2.imshow(’Live feed’, frame)
19
20 key = cv2.waitKey(1)
21 ifkey == ord(’q’):
22 break
23
24cap.release()
25cv2.destroyAllWindows()
Listing A.1: T esting T argets Mak ele (Mak ele.test)
9
Bibliography
[1] Y ann LeCun et al. Hardw are accelerated con v olutional neural net w orks for syn thetic vision.
[2] A. Barriga L. A casandrei. F pga implemen tation of an em b edded face detection system based
on leon3.
[3] Ki-Y oung Kim Mo oseop Kim. Hardw are arc hitecture for real-time face detection on em b ed-
ded analog video cameras.
[4] M.J. Jones P . Viola. Robust real-time face detection.
10
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: Univ ersitatea POLITEHNICA din Bucures , ti [611258] (ID: 611258)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
