Manuscript Details Manuscript number JKSUCIS_2017_345 Title An Efficient Architecture and [600421]
Manuscript
Details
Manuscript
number
JKSUCIS_2017_345
Title
An
Efficient
Architecture
and
FPGA
Implementation
of
Haar
Wavelet
Filter
Abstract
This
paper
provides
an
efficient
2-D
DWT
architecture.
To
constrain
the
complexities
of
the
design,
a
basic
linear
algebra
approach
is
used
to
denote
the
signal
flow
graph
of
2-D
DWT
architecture.
Based
on
this
context,
the
DWT
was
selected
along
with
the
Haar
function
being
the
mother
wavelet,
as
the
main
analytical
method
for
this
chapter.
This
paper
presents
two
key
parts,
the
first
part
involves
the
proposed
linear
algebra
wavelet
based
coding.
This
linear
algebra
approach
offers
nearly
the
same
output
as
matrix
multiplies
method,
although
it
requires
fewer
resources.
Secondly,
a
distinct
1D-DWT
filter
is
incorporated
into
the
proposed
2-D
DWT
architecture
implementation
in
order
to
minimize
hardware
expenditure.
The
synthesizable
2-D
DWT
module
comprises
1D-DWT
module
which
signifies
the
most
significant
part
of
the
design.
In
order
to
validate
the
proposed
scheme,
a
circuit
for
the
DWT
computation
has
been
designed,
simulated
and
implemented
in
FPGA.
Submission
Files
Included
in
this
PDF
File
Name
[File
Type]
Blinded
manuscript.doc
[Manuscript
(without
Author
Details)]
To
view
all
the
submission
files,
including
those
not
included
in
the
PDF,
click
on
the
manuscript
title
on
your
EVISE
Homepage,
then
click
'Download
zip
file'.
An Efficient Architecture and FPGA
Implementation of Haar Wavelet Filter
Abstract: This paper provides an efficient 2-D DWT architecture. To constrain the complexities of the design, a basic linear
algebra approach is used to denote the signal flow graph of 2-D DWT architecture. Based on this context, the DWT was
selected along with the Haar function being the mother wavelet, as the main analytical method for this chapter. This paper
presents two key parts, the first part involves the proposed linear algebra wavelet based coding. This linear algebra approach
offers nearly the same output as matrix multiplies method, although it requires fewer resources. Secondly, a distinct 1D-DWT
filter is incorporated into the proposed 2-D DWT architecture implementation in order to minimize hardware expenditure.
The synthesizable 2-D DWT module comprises 1D-DWT module which signifies the most significant part of the design. In
order to validate the proposed scheme, a circuit for the DWT computation has been designed, simulated and implemented in
FPGA.
Keywords: linear algebra HWT; FPGA implementation; VHDL; HDL
1. Introduction
The major benefit of FPGA is to prevent designed
hardware architectures from being restricted to a
fixed, unalterable hardware function. Devising
circuits in FPGA is analogous to an ASIC design,
but go together with the added likelihood of
modifying the design [1]. The term "field-
programmable" denotes the shifting operational
ability in the device or "in the field" by a customer
or a designer. The phrase "gate array" signifies the
basic internal architecture after reprogramming has
been performed [2]. In general, FPGAs comprise a
collection of programmable logic blocks LBs
which executes logic functions that can be
interspersed to each other as well as integrated with
the programmable I/O blocks via some sort of
programmable routing architecture that make off-
chip connections [3].
The programmable term in FPGAs refer to their
ability to put into operation, a new function on the
chip, following its fabrication [1]. A traditional
FPGA architecture is composed of a selection of
tiles, and each tile comprises one logic block LB,
two connection blocks CB and one switch block
SB [4]. Each FPGA is dependent on programming
functions that manages the programmable
switches that provide FPGAs with their
programmability. Each of these technologies has
diverse characteristic features which in turn have
considerable impact on the programmable
architecture [5]; [6]; [7].
Xilinx Incorporated and Altera Corporation are
FPGA manufacturers that have performed over
90% fabrication of such devices [8]. These two
large corporations provide diverse arrays of
FPGAs, although both differ in performance and cost. These companies garner competition from
Lattice Semiconductor (SRAM based), Actel
(flash-based), SiliconBlue Technologies
(extremely low power SRAM-based FPGAs),
Achronix (SRAM based), and QuickLogic.
Both Xilinx and Altera offer Windows and Linux
computer aided design CAD software, Xilinx ISE
(Integrated Software Environment) is a software
tool produced by Xilinx while Altera Quartus is
programmable logic device design software
developed by Altera [8]. FPGAs such as Altera
Cyclone II found on the Development and
Education, DE1 and DE2 boards are designed as a
base for digital signal processing (DSP)
applications. DE2 board is one of the broadly
utilized boards for the growth of FPGA design
and implementations [9].
Based on the standard FPGA CAD tools, designs
can be inputted by means of a Hardware
Description Language (HDL), for instance
VHSIC (Very High Speed Integrated Circuits)
Hardware Description Language (VHDL) or
Verilog HDL. It is also feasible to merge blocks
with diverse entry methods into a solitary design
[9]; [10].
A hardware description language is inherently
parallel, i.e. commands, which correspond to
logic gates, are executed (computed) in parallel,
as soon as a new input arrives. A HDL program
mimics the behavior of a physical, usually digital,
system. It also allows incorporation of timing
specifications (gate delays) as well as to describe
a system as an interconnection of different
components [11]. A digital system can be
represented at different levels of abstraction [12].
This keeps the description and design of complex
systems manageable. VHDL includes facilities to
describe structure and function at various levels of
abstraction (above gate level). Figure 1 shows
different levels of abstraction [11].
Figure 1: Levels of abstraction: Behavioral,
Structural and Physical
VHDL allows describing a digital system at the
structural or the behavioral level. The behavioral
level can be further divided into two kinds of
styles: Data flow and Algorithmic [11]. The data
flow representation describes how data moves
through the system. This is typically done in terms
of data flow between registers (Register Transfer
level RTL). The data flow model makes use of
concurrent statements that are executed in parallel
as soon as data arrives at the input [13]. On the
other hand, sequential statements are executed in
the sequence that they are specified [11]. The
performance of the hardware based image
compression scheme can be evaluated by using an
image incorporated FPGA circuits. This would
yield significant reductions in energy consumption
and processing time [14]. The reminder of this
paper is arranged as: Section two examines
roughly fundamental foundation to FPGA. At that
point, the proposed 2-D DWT architecture
framework utilizing VHDL of hypothesis
execution is evaluated in Section three. Lastly in
Section four, this paper is concluded and
recommendations for future work are determined.
2.2-D DWT architecture in FPGA
The proposed 2-D DWT design has been
appropriately confirmed using the VHDL
Language. The design consists of three major
VHDL modules which have been coded to enable
their implementation on the 2-D DWT
architecture. The general design is composed of
2-D DWT module designed as synthesiza ble
VHDL code, as well as memory unit devised for
the sole purpose of simulation. The 2-D DWT
module comprises a 1D-DWT module which is
the main component of the design. ). In data-
intensive algorithms, such as the 2-D DWT, memory accesses are very frequent [15]. Hence,
single-port RAMs are most suitable in a case
where the image memory is off-chip, from an
energy point of view, substituting the higher
performance of multi-port RAMs [16].
3.VHDL Simulation results
VHDL Code is also written to specify how the
read and write process is implemented for still
images with different sizes. This VHDL code can
be used as a RAM for (256×256) Lena test image
data as “TestData.txt” input file name example
with particular location. Memory entity that uses
a data type definition that is useful in constructing
an array of STD_LOGIC_VECTOR as illustrated
in the model given in Figure 2.
Figure 2: Memory entity schematic diagram
Data converter in Figure 3, reads the data from
external memory array and restores it back in to
an external memory array module using Matlab
programs that convert the input grayscale format
image into a hexadecimal format file prior to
saving it in the memory. These data are
afterwards utilized as input into the memory
module, thereby producing wavelet coefficients
text data which is in hexadecimal (ASCII text)
form by dwt entity. The hexadecimal format text
file is transformed again using Matlab programs
into grayscale format image files which are
employed at the output stages in order to examine
the content of the memory files as shown in
Figure 3.
Figure 3: FDWT system block diagram
The vital dwt module involves a synthesizable
dwt VHDL module, which stands for the central
part and the essential component of the proposed
architecture for the FDWT design. This core block
performs the real wavelet computation on the
image data. The VHDL coding of the dwt is
performed in specific manner so that the entire
code can be reutilized. dwt entity BEHAVIORAL
description of the functional one dimensional
DWT for (256×256) Lena test image data example
with particular location in a memory entity (mem)
and supplies results to particular location in that
memory entity is shown in a schematic diagram of
Figure 4.
Figure 4: dwt entity schematic diagram
To enhance the computation speed and lower the
complexity, linear algebra equations of HWT are
incorporated in the implementation of the
algorithm in VHDL. The FDWT module
comprising adder and right shifter is used to
acquire the low-pass and high-pass components.
The low_op (lo) and high_op (hi) computation
functions returns a value for the current x(a) and
following y or x(b) pixel samples using equation
(1) and equation (2)
(1) 𝐿𝑖=𝑎𝑖=1
2(x (a) +x (b) )
(2) 𝐻𝑖=𝑑𝑖=1
2(x (a)‒x (b) )
Contrary to regular computer programs which are
sequential, VHDL statements are inherently
concurrent (parallel). For that reason, VHDL is
usually referred to as a code rather than a program.
The VHDL code structure of dwt entity is:
A finite state machine FSM or simply a state
machine is a model of behavior composed of a
finite number of states, transitions between those
states, and actions. It is like a flow graph where
the logic runs when certain conditions are met.
The dwt module FSM is depicted in Figure 5.
Figure 5: FSM design of dwt Architecture using
VHDL Forward 1-DWT module state machine
As motivated above, the outputs are derived from
then next state. The outputs therefore also depend
of the (status) inputs making this FSM a so-called
Mealy machine (in Moore machines the output
only depends on the current state). In the course
of the horizontal pass and if (is vertical signal
='0'), the low_op (lo) and high_op (hi)
components DWT for the primary current x_a and
subsequent y or x_b pixel samples are calculated.
In generating each low_op (lo) and high_op (hi)
components, the Haar transform performs an
average and difference on a pair of values. 𝑎𝑖𝑑𝑖
Then the algorithm shifts over by two values and
calculates another average and difference on the
next pair and the VHDL code can be written as;
The current and succeeding pixel samples are right
shifted by one (division by two) and added,
resulting in lo average wavelet components. The
difference hi wavelet component is determined by
subtracting the divided by two current and next
pixel values via a shift to right operation. Right
shifting one bit operation x_a (7) for x_a input
data is shown in Figure 6.
Figure 6: Right shifting operation ( x_a(7) & x_a )
Figure 7 is utilized on the first 8×8 Lena image
samples for the size 256×256 pixels version of the
test image Lena. This piece of the work was
completely executed utilizing VHDL to be sure the
HWT calculation was completely comprehended
and to serve as validation and an approval
reference.
Figure7: the first 8×8 Lena image samples for the
size 256×256 pixels
VHDL Simulation results of the transformed DWT
in the memory module is shown in Figure 8.
Figure 8: Waveform indicating DWT results of
memory moduleIt is supposed that the memory stores couple of
coefficients, low coefficient L0=A1 at address 0
and high coefficient H0=A2 at address 1 and so
on, as illustrated in Figure 12. The pair of
coefficients L0=A1, H0=A2 is formed through
IDWT filtering of one level of decomposition.
Given that, input coefficients are presently stored
in the external memory module, L0 and H0 can
be stored in their addresses 0 and 1. Similarly, all
pairs of the resultant wavelet coefficients L(i) and
H(i), can be saved at addresses, 2i and 2i+1, in
that order as shown in Figure 9.
Figure 9: Waveform indicating FDWT results
The process of vertical pass is started, after all the
lines in the horizontal pass are completed. The
third module is the 2d dwt manage unit filtering.
The major purpose of this module is to develop
control signals that are needed to gain access to
the memory as well as the essential signals for
dwt on the horizontal or vertical passes to
symbolize a 2-D DWT processing. In general, the
proposed FDWT algorithm comprises three
phases: initializing phase, horizontal pass phase,
and vertical pass phase. This task is performed by
introducing parameters to 2d dwt control unit
which demands transforming on dwt architecture
and waiting for it to be completed and be
replicated on all of the rows and columns during
the horizontal and vertical passes till the end of 2-
D DWT process. The state machine bubble
diagram in the Figure 10 shows the operation of a
nine-state machine that reacts to input as well
as previous-state conditions.
Figure 10: Practical FSM design of 2d dwt
Architecture using VHDL
Behavioral code is characterized by PROCESS
statements. To describe a state machine in
Quartus II VHDL, enumeration types for the states
can declared, and use a PROCESS statement for
the state register and the next-state logic. This 2d
dwt state machine includes a PROCESS statement
that is activated on every positive edge of the clk
control signal for the next-state logic. This state
machine has an asynchronous reset. At startup, the
2d dwt state machine is initialized to the reset
state. In the proposed 2-D FDWT part, the VHDL
initialization code is implemented by using the
(reset) signal to initiate the 2-D DWT module. The
piece of code that employed to implement the reset
phase of 2d dwt module is shown below:
Each CASE possibility is a state in the state
machine. The first Case Statement determines the
transitions between the states (that is, which state
to enter on the next rising edge of clk) and the
second Case Statement determines the value of the
ctrl_sig outputs for each ctrl_data state. All
assignments to the signal or variable that
represents the state machine are within the
PROCESS. This PROCESS transitions to the next
state on the rising edge of the clk.In the proposed
2-D FDWT part, the VHDL initialization code is
implemented by using the (reset) signal to initiate
the 2-D DWT module. The initialization phase
entails the provision of necessary information
(required addresses) of the input image by the user
to the 2-D DWT module via control bus. This
information holds the start address of memory
space used by the image. Also, the start address of
the memory space is stipulated for the temporary
data storage (to store the intermediate wavelet
transform data results) as depicted in Figure 4.6. In
addition, it includes the style employed to store
images (either row by row or column by column)
as well as amount of the requisite transformation
levels.
Afterwards, the 2-D DWT module incorporates the
internal reset signal into the 1D –DWT to
implement the initialization process. Thereafter,
the 1-D DWT is given the required information to
carry out the horizontal or vertical passes as
follows: the start address of the memory space is filled by the current line vector of pixels, and is
also used for temporary data storage. The line
type is determined through (isvertical) control
signal, it includes the style employed to store
images (either row by row or column by column)
where 0 signifies horizontal passes and 1 denotes
vertical passes. The isvertical signal set to 0 to
carry out the horizontal pass initialization. User
can either testing horizontal pass only to
implement 1-D DWT or continue from horizontal
pass to vertical pass to implement the completed
2-D DWT.The process of vertical pass is started,
after all the lines in the horizontal pass are
completed and dwt_ready = '1'. The width of the
current line and the number of line vectors in the
current image are also provided. The width of the
current line is equal to image size at level one,
and is subsequently divided in two at each new
level.
Following FDWT implementation on one line
vector of the input image pixels, the 1D –DWT
module generates an internal (Ready) signal to
notify the 2-D –DWT module. Afterward, the 2-D
–DWT module gives the 1D –DWT module the
requisite information for the next line. The
process of vertical pass is started, after all the
lines in the horizontal pass are completed. The
Low and High wavelet components for the
FDWT are computed and saved in the course of
the horizontal and vertical pass phases. The
process of vertical pass is completed, after all the
lines in the vertical pass are completed and
dwt_ready = '1'. The intermediate wavelet HEX
coefficients data are stored in the external
memory and then the dump signal is activated
which signifies the end of the transformation
process. The 2-D- DWT module generate ready
signal to the outside environment indicat ing the
end of first level of decomposition process after
989565 clock cycles required for coding of
256×256 pixels through FDWT filtering as shown
in Figure 11, illustrating the number of clock
cycle usage for 2-D DWT coding process.
Figure 11: Waveform indicating end of first level
of decomposition process FDWT and 256×256
pixels image size version (N=256, L=1), at 989565
clock cycles; ModelSim-Altera 6.5 test bench
Following the execution of 1-level Haar transform,
repeating the process to attain multiple levels
transform is a straightforward procedure. The 2-D
DWT module re-initializes the 1-D DWT module
with the requisite information of LL sub-band of
the image derived from the preceding level in
order to initiate new horizontal and vertical pass
processes. Then NEXT LEVEL state started.
Subsequent the implementation of 1-level 2–D
DWT, repeating the process to attain multiple
levels transforms procedure according to (levels)
control signal. The requisite information of LL
sub-band of the image derived from the preceding
level in order to initiate new horizontal and vertical
pass processes. The width of the current line is
equal to image size at level one, and is
subsequently divided in two at each new level. The
2D –DWT creates an active ready signal in the
absence of other levels, signifying the completion
of the entire 2D –DWT process. The 2-D DWT
creates a (ready) signal in the absence of other
levels, signifying the completion of the entire 2-D
DWT process. The design executes a similar code
in the IDWT module which comprises adders
devoid of the shift to the right operation. The
original pixels data input to the FDWT can be
completely recovered from the approximate
averages and detail wavelet coefficients values by
applying linear equations.
4.Conclusions
Throughout this paper, we have chosen the
simplest Haar wavelet function as proposed
wavelet. This work is based on the hardware
achievement of a flexible architecture of multi-
level decomposition HWT. This architecture is
described and synthesized with the VHDL based
methodology. The VHDL module successfully
performed 2D DWT on images of different
dimensions and capable of varied levels of
decomposition. The effective HWT achievement at
substantially decreased hardware space cost block
elements using linear equations instead of matrix
approach. Multilevel decomposition will be
evaluated in future work recommendations.
References
[1] Farooq, Umer, Zied Marrakchi, and Habib
Mehrez. “Tree-Based ASIF Using Heterogeneous
Blocks,” Tree-based Heterogeneous FPGA
Architectures. Springer New York,pp.153-171,
2012.
[2] Pellerin, David, and Scott Thibault. “Practical
fpga programming in c". Prentice Hall Press, 2005. [3] Gokhale, Maya B., and Paul S.
Graham. “Reconfigurable computing: Accelerating
computation with field-programmable gate arrays".
Springer Science & Business Media, 2006.
[4] Cong, Jason, and Bingjun Xiao. “mrFPGA: A
novel FPGA architecture with memristor-based
reconfiguration." Proceedings of the 2011
IEEE/ACM International Symposium on Nanoscale
Architectures. IEEE Computer Society
(NANOARCH), San Diego, CA, pp. 1-8, 2011.
[5] Di Carlo, Stefano, et al. “A low-cost FPGA-
based test and diagnosis architecture for
SRAMs." Advances in System Testing and
Validation Lifecycle, 2009. VALID'09. First
International Conference on. IEEE, 2009.
[6] Jia, James Yingbo, et al. “Performance and
reliability of a 65nm Flash based FPGA." Solid-
State and Integrated Circuit Technology (ICSICT),
2012 IEEE 11th International Conference on. IEEE
Xi'an, PP. 1-3, 2012,
[7] N Patil, D Das, E Scanff, M Pecht, “Long
term storage reliability of antifuse field
programmable gate arrays." Microelectronics
Reliability, vol.53, no.12, pp. 2052-2056, 2012.
[8] Andres, Esther, Markus Widhalm, and A.
Caloto. “Achieving High Speed CFD Simulations:
Optimization, Parallelization, and FPGA
Acceleration for the Unstructured DLR TAU
Code." Proc. 47th AIAA Aerospace Sciences
Meeting Including The New Horizons Forum and
Aerospace Exposition, Orlando, FL, January,
vol.47, pp. 8745-8764, 2009.
[9] J. O. Hamblen, T. S. Hall, and M. D. Furman,
“Rapid Prototyping of Digital Systems: Quartus II
Edition, Springer, 2006.
[10] Altera Corporation. Quartus II Introduction
using Schematic Design, 2008.
[11] der Spiegel, J. V. “Vhdl tutorial,” Department
of Electrical and Systems Engineering, University
of Pennsylvania, 2010.
[12] D. D. Gajski, R. H. Kuhn, “Guest Editors'
Introduction: New VLSI tools", IEEE Computer,
vol.16, pp.11-14, 1983.
[13] Bhasker, Jayaram. “A Vhdl primer,” Prentice-
Hall, 1999.
[14] Hasan, Khamees Khalaf, Umi Kalthum Ngah,
and Mohd Fadzli Mohd Salleh. “Efficient
hardware-based image compression schemes for
wireless sensor networks: A survey,” Wireless
personal communications, vol.77, no. 2, pp.1415-
1436, 2014.
[15] Al-Azawi, Saad. “Low-Power, Low-Area
Multi-level 2-D Discrete Wavelet Transform
Architecture." Circuits, Systems, and Signal
Processing, pp.1-15, 2017.
[16] N.D. Zervas ; G.P. Anagnostopoulos ; V.
Spiliotopoulos,.“Evaluation of design alternatives
for the 2-D-discrete wavelet transform." IEEE
Transactions on Circuits and Systems for Video
Technology, vol. 11, no. 12, pp.1246-1262, 2001.
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: Manuscript Details Manuscript number JKSUCIS_2017_345 Title An Efficient Architecture and [600421] (ID: 600421)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
