XXX -X-XXXX -XXXX -XXXXX.00 20XX IEEE Mpeg -7 features for audio signals [626975]
XXX -X-XXXX -XXXX -X/XX/$XX.00 ©20XX IEEE Mpeg -7 features for audio signals
Department of Electronics Basics
Cluj Napoca, Romania
Tehnical University of Cluj -Napoca
Abstract—The subject of this diploma project is the study
and implementation of MPEG -7 features for audi o signals. To
illustrate these features, an 18ms audio signal (from clavecin)
was used. For a sampling rate of 44.1 kHz, a Hamming window
of length 1323 was used and the FFT was used in 20148 points.
The following MPEG -7 features have been exemplified:
harmonic instrument timbre, audio spectrum centroid, audio
spectrum envelope, spectral analysis, audio spectrum flatness,
audio spectrum basis.
I. INTRODUCTION
In October 1996, MPEG started a new work item to
provide a solution to the questions. The new member o f the
MPEG family, named "Multimedia Content Description
Interface" (in short MPEG -7), provides standardized core
technologies allowing the description of audiovisual data
content in multimedia environments. It extends the limited
capabilities of proprieta ry solutions in identifying content
that exist today, notably by including more data types.
MPEG -7 allows different granularity in its descriptions,
offering the possibility to have different levels of
discrimination. Even though the MPEG -7 description doe s
not depend on the (coded) representation of the material,
MPEG -7 can exploit the advantages provided by MPEG -4
coded content. If the material is encoded using MPEG -4,
which provides the means to encode audio -visual material as
objects having certain rela tions in time (synchronization)
and space (on the screen for video, or in the room for audio),
it will be possible to attach descriptions to elements
(objects) within the scene, such as audio and visual objects.
The level of abstraction is related to the way the
features can be extracted: many low -level features can be
extracted in fully automatic ways, whereas high level
features need (much) more human interaction.
II. CURRENT STAGE
The increasing growth of multimedia data available on
networks, such as the I nternet, requires effective means of
identifying and indexing content for search and retrieval.
For audio data, a number of solutions have been proposed
for attaching additional descriptive data (such as metadata)
Examples are CDID and CD -Text for CDs, or ID3 tags for
MP3 compressed audio files. A. The Mpeg -7 standard consists of the following parts:
1. MPEG -7 Systems – Tools for Preparing MPEG -7
Descriptions for Effective Transport and Storage and
Terminal Architecture
2. Definition of MPEG -7 – language for defining the syntax
of MPEG -7 description tools and for defining the new
description schemes.
3. MPEG -7 Visual – Description Tools dealing with (visual)
descriptions only.
4. MPEG -7 Audio – Description Tools dealing with (audio)
descriptions only.
5. MPEG -7 multimedia description schemes – Descriptive
tools for generic features and multimedia descriptions.
6. MPEG -7 reference software – a software implementation
of the relevant parts of the standard MPEG -7 standard.
7. MPEG -7 compliance testing – instructi ons and procedures
for testing compliance of MPEG -7 implementations
8. Extraction and use of MPEG -7 descriptions – informative
material (in the form of a technical report) on the
extraction and use of some of the description tools.
9. Profiles and MPEG -7 levels – provide standard guidelines
and profiles.
10. MPEG -7 Schema Definition – Specifies the schema using
the description definition language .
B. Definition of the scope
This international standard defines a multimedia content
description interface, specif ying a series of system -level
application -level interfaces to allow disparate systems to
exchange information about multimedia content. Describes
system architecture, a language for extensions and specific
applications, audio and visual description tools, and non –
audio -visual tools. Overall, this international standard, which
includes all of the above mentioned components, is known as
"MPEG -7" [1].
MPEG -7 is not a standard for real -time encoding of
moving and audio images such as MPEG -1, MPEG -2 and
MPEG -4. It uses XML to store metadata, and can be attached
to the time code to label certain events or to synchronize
lyrics for example with a song.
The main elements of the MPEG -7 standard are:
• Descriptive tools: Descriptors (D), which define the
syntax and semantics of each feature (metadata element); and
Scheme Descriptions (DS) that specify the structure and
semantics of the relationships between their components,
which can be both descriptors and description schemes;
• A descriptive definition language ( DDL) to define the
syntax of MPEG -7 description tools and to allow the creation
of new description schemes and possibly descriptors and to
allow the extension and modification of existing description
schemes;
• System tools to support binary coded represen tation for
efficient storage and transmission, transmission mechanisms
(both for textual and binary formats), multiplexing
descriptions, syncing descriptions with content, managing
and protecting intellectual property in MPEG -7 descriptions,
etc.
• Informa tion about user interaction with content (user
preferences, usage history).
III. THEORETICAL FUNDAMENTAL
MPEG -7 audio tools are applicable in two general areas:
lower -level audio description and application -based
description. Audio Framework tools are applicabl e to general
audio without taking into account the specific content of the
encoded signal.
Figure 1 below shows an extremely abstract block
diagram of a possible MPEG -7 processing chain, included
here to explain the scope of the MPEG -7 standard. This
chain includes the extraction of the features (analysis), the
description itself and the search engine (application). To
fully exploit the possibilities of MPEG -7 descriptions,
automatic extraction of features will be extremely useful. It
is also clear that aut omatic extraction is not always possible.
As noted above, the higher the abstraction level, the higher
the automatic extraction and the interactive extraction tools
will be useful. However useful they are, the algorithms for
extracting automated and semi -automated features do not fall
within the scope of the standard. The main reason is that
their standardization is not necessary to allow
interoperability, while leaving space for competition in
industry. Another reason for not standardizing the analysis is
to allow good use of the expected improvements in these
technical areas. Also, search engines, filtering agencies, or
any other program that can make use of the description are
not specified in MPEG -7; again this is not necessary, and
here the competition will produce the best results [1].
Standardized MPEG -7 features support for a wide range
of applications (for example, digital digital libraries,
broadcast media selection, multimedia editing, home
entertainment devices, etc.). MPEG -7 will also make the Web searchable for multimedia content, as can be found
today for text. This will especially apply to high value
content archives that are made publicly accessible, as well as
to multimedia catalogs that allow people to identify their
content for purchase. Information used to retrieve content
can also be used by agents to select and filter the "push"
broadcast material or for personalized advertising.
Additionally, MPEG -7 descriptions will allow fast and
efficient use of underlying data, allowing semi -autom atic
multimedia presentation and editing.
A. Fields of application
All application domains that use multimedia will benefit
from MPEG -7. Since it is currently hard to find one that does
not use multimedia, we'll expand a list below.
• Architecture, Real Estat e, and Interior Design (eg,
Searching for Ideas).
• Broadcast media selection (for example, radio channel, TV
channel).
• Cultural services (history museums, art galleries, etc.).
• Digital libraries ( Image catalog, music dictionary,
Biomedical imaging ca talogs, film archives, video and
radio).
• E-commerce (e.g., personalized advertising, on -line
catalogs).
• Define objects, including patches of colors or textures, and
extract examples including selecting interesting objects to
create your design.
• On a specific set of multimedia objects, describe the
movements and relationships between objects and search
for animations that meet the temporal and spatial
relationships described.
• Describe the actions and get a list of scenarios containing
such actions [1 ].
B. Metadatele
There are data that provide information about other data.
There are three distinct types of metadata: descriptive
metadata, structural metadata, and administrative metadata.
Descriptive metadata describe a resource for purposes
such as discov ery and identification. It may include elements
such as title, abstract, author, and keywords [2].
1) Structural metadata are metadata about data
containers and indicates how compound objects are
grouped, for example, how pages are ordered to form
chapters. D escribes the types, versions, relationships, and
other features of digital materials.
2) Administrative metadata provides information to help
manage a resource, such as when and how it was created,
file type and other technical information, and who can
access this resource.
MPEG -7 can be used independently of other MPEG
standards. Descriptive basics in MPEG -7 are called
Descriptors (D) and represent specific properties of content
or attributes through a defined and semantic syntax. MPEG -7
uses the following tools:
• Descriptor (D): It is a representation of a syntactically
and semantically defined feature. A single object may be
described by several descriptors.
• Descriptive schemes (DS): Specify the structure and
semantics of the relationships between its components, these
components may be descriptors (D) or description schemes
(DS).
• Description Language Definition (DDL): It is based on
the XML language used to define the structural relationships
between the descriptors. This allows creating and modifyin g
description schemes, as well as creating new descriptors (D).
• System Tools: These tools aim to binarize, synchronize,
transport and store descriptors. It also deals with the
protection of intellectual property.
C. Terms and definitions
1.Frame
A frame is defined as a short time of the signal on which
the instant analysis is performed. For a signal, observed
(continuously recorded) and for a hamming analysis window,
also denoted by temporal length, the frame of a signal is
defined as:
𝑥 (𝑓, 𝑡) = 𝑠 (𝑡) ℎ (𝑡-𝑓𝑠), (1)
S-> is the size of the hop size Sisteme in timp real soft / hard
2. Running windows analysis
An analysis of running windows is an analysis obtained
by multiplying the signal by a window function that is
shifted over time by a multiple o f a parameter called hop
size. For a window function and a hop size, changing the
window is equal to ℎ (𝑡-𝑓𝑠).
3. Instant Values
The instant value of a descriptor -based peak estimate
(Stamps) is defined as the result of a frame level analysis.
The glob al value of a tip -based descriptor (Stamps) is
defined as the average on all frames of the instant value
segment.
D. Symbols:
– ASR Automatic speech recognition
– Processor CPU
– D Descriptor
– DC DC (0 Hz) – DDL Description Definition Language
– Discrete Fo urier Transform DFT
– Schema description DS
– Fast Fourier Transform FFT
– HMM Markov Hidden Model
– Hertz Hz, frequency in cycles per second
– Low LLD Descriptor
– logarithm log (non -specified base)
– Linear predictive coding LPC
– Max. inclined MSD (fr om average)
– OOV Out of Vocabulary, describing a word that is not in
the vocabulary of an automatic speech gratitude – RMS Mean Square Square
– Sampling rate SR
– Short Fourier Transform
– Extended XML tagging language [3]
1) MPEG -7 multimedia content (Gene ral MPEG 7
concepts ): O alternativa pentru rezolvarea problemelor
intampinate in planificarea in bucla infinita, este de a utiliza
resursele hardware pentru a putea controla sincronizarea si
puterea de calcul a sistemului, [7]. In acest caz sistemul este
2) Mpeg -7 Audio Descriptors :
The (monophonic) signals of musical signals (with
abbreviations indicated in brackets):
• LLD audio waveform (Maximum – AXV, Minimum –
AMV);
• LLD audio (AP) power;
• LLD centroid audio spectrum (ASC);
• Spectrum Audio (ASF) LLD;
• LLD with Audio Speakers (ASS);
• LLD audio frequency (F0);
• DS audio signature (media – ASM, variation – ASV);
String Instrument DS (Centric Harmonic Spectrum – HSC,
Harmonic Spectral Deviation – HSD, Harmonic Plane
Spectrum – HSS) [5].
3) Waveform LLD : The LLD audio waveform describes
a waveform of the signal in an efficient manner to allow a
"brief" display of an audio file. The LLD attributes used in
the proposed Motion Detection System are the Maximum
Signal (AXV) and Minimum Values (AMV) of each
timef rame.
4) Audio power : The LLD power audio system
describes the instantaneous power P (t) for the input signal s
(t), temporarily smoothed on each frame w.
5) The audio spectrum of the centroid
The audio spectrum of the centroid LLD describes the
center of g ravity of the frequency spectrum logfrequency,
indicating the dominance of the low or high frequency
content. The power spectrum coefficients are given by:
𝑃𝑥 (𝑘) = 𝑙𝑤 * 𝑁𝐹𝐹𝑇 1 | 𝑋𝑤 (𝑘) | 2; (3)
IV. IMPLEMENTATION OF THE SOLUTION ADOPTED
A. Extract a nd represent the characteristics
1) Low level descriptors for audio identification
During the MPEG -7 audio standardization process,
several LLDs were adopted. While each of these descriptors
was designed for specific purposes (power measures, spectral
tire estimation, spectral flatness measurement) [4].
According to the recommended method for extracting the
MPEG -7 Audio Sectrum Envelope Descriptor, generating
the various audio identification features was based on a
short -term frequency analysis using a disc rete Fourier
Transform (DFT). For a 44.1 kHz audio sampling rate, a
Hamming window of 1323 size and a DFT (FFT) length of
2048 values that are used. In order to increase the
discriminating capabilities of an identification system, it is
desirable to calc ulate several values for each block. For
spectral characteristics such as SFM and other tone -related
measures, this can be accomplished by dividing it into
several intervals that may or may not overlap. There are
many possible ways of dividing the freque ncy range, but
only two different rules have been applied and tested. First of
all, the most obvious configuration consists in a linear
division of the frequency target in equally spaced areas. The
second configuration uses a logarithmic partition from the
frequency range. This type of partitioning mimics the natural
selectivity of human frequency the average ear to some
extent is used for the calculation of the Mpeg -7 Audio
Spectrum Envelope .
From Figure 2 we see that there is similarity using
spectrum ce ntroid between restaurant and office
environments and between machine environments. However,
using the fundamental frequency, the restaurant environment
can be separated from the office environment, which can be
seen in Figure 4. These figures demonstrate that different
MPEG -7 audio features have different types of
discrimination capabilities. Therefore, we involve all the
MPEG -7 audio features to recognize the environment in our
method.
PCA designs the features on the smaller space created by
the most significant vectors of their own. All features are
normalized averages and variants. In our experiments, we
use the following sets of characteristic parameters. Numbers
in parentheses by function name correspond to the dimension
of the characteristic vecto r. MPEG -7 audio features after
PCA.
B. Select an audio descriptor
MPEG -7 Audio: Describes audio content using features,
structure, low -level models. The MPEG -7 Audio is to
provide fast and efficient searching, indexing, retrieving
information from audio f iles. The features can be
divided into scalar and vector types. Scalar types return
scalar values, such as power or fundamental frequency,
while vector types return, for example, the flatness of the
calculated spectrum for each band in a frame.
The full b enefit of MPEG -7 features:
• Audio waveform (AWF): describes the shape of the
signal by calculating the maximum and minimum values
of the samples in each frame. The maximum and
minimum waveforms are indicated by MPEG -7 Audio
that describes audio content using features, structure,
low-level models.
• Audio power (AP): Provides smooth instantaneous
signal strength.
• Audio Spectrum Tire (ASE): Describes the power
spectrum shortly for each band in a frame of a signal.
• Audio Spectrum (AWC): Returns the center of gravity
(centroid) of the signal frequency spectrum of a signal. It
highlights dominant components of high or low
frequency in the signal
• Audio Spectrum (ASS): Returns the second moment of
the frequency -frequency spectrum. It demonstrates how
much the power spectrum is scatt ered. It is measured by
the mean square deviation of the spectrum in its center
of gravity. This feature can help differentiate between
noise, tone and speech.
C. MPEG 7 Extracting and Using Descriptions
The "MPEG -7 Extraction and Use of Descriptions"
technic al report includes informative materials on the
extraction and use of some of the description tools,
which provide both a further insight into the
implementation of MPEG -7 reference software as well
as alternative approaches.
V. EXPERIMENTAL RESULTS
In this chapter, some MPEG -7 features that describe low
level audio features will be presented.
By calling the function: HarmonicInstrumentTimbreDS
('Clavecin.wav', 1, 'HarmonicInstrument') from the
command line we get the results in the matlab text file with
the name: HarmonicInstrument
In theory, the signal spectrum ranges from null
frequency,
f = 0, to infinite frequency, practically very large frequency
components are negligible with smaller amplitudes, so that
for concrete signals, the spectrum occupied frequency band
has finite width, ie the spectrum is limited. Decreasing the
amplitudes of the components at the frequency increase is
all the faster as the signal is smoother.
VI. CONCLUSIONS :
This paper discusses the topic of automatic
identification of audio signals based on the description of
MPEG -7 audio content. From a fundamental point of view,
this task can be addressed using generic model recognition
technologies, including feature extraction and classification
techniques.
Naturally, the performa nce of such a system
depends essentially on the choice of the underlying features
for extracting the essence of the audio signal. Specifically, a
feature -based descriptor that captures the spectral blur
property of the audio signal is defined.
Spectrum vi sualization descriptors along with the audio
spectrum basis descriptor can be used to view and represent
in a compact way the independent subspaces of a
spectrogram.
Thanks to low -level descriptors, interoperability is
guaranteed globally, every MPEG -7 se arch engine will be
able to use description compatibility wherever it was
produced.
A compact representation of binary audio is
currently out of the scope of the MPEG -7 audio standard.
However, each application is free to use a customized
internal represen tation if necessary. In this way, the MPEG –
7 representation (based on XML) acts as a point of
interoperability between various compact internal
representations.
The term "spectral tire" denotes a smooth function
that passes through the prominent spectral p eaks. For a
harmonic signal, the prominent spectral peaks are generally
harmonics, but if some of the harmonics are missing or
weakening (clarinet), the spectral sheathing must not pass
through them.
VII. REFERENCES
[1] "MPEG -7 Overview." [Online]. available:
https://hwiegman.home.xs4all.nl/fileformats/mpeg/mpeg -7.htm .
[2] "Audio Descriptors MPEG7." [Online]. available:
ftp://ftp.unicauca.edu.co/Facultades/FIET/DEIC/Materias/compu tacion
intelligent / proyecto / audio_descriptors / MPEG / MPEG7 / MPEG –
7 Overview.htm.
[3] MPEG -7 Multimedia Software Resources." [Online]. available:
http://mpeg7.doc.gold.ac.uk/ .
[4] O. Hellmu th, E. Allamanche, T. Kastner, M. Cremer, and W. Hirsch,
"Advanced audio identification using MPEG -7 content description,"
Audio Eng. Soc., Pp. 1 -12, 2001. [4] [5] D. Smith, E. Cheng, and I. S. Burnett, "Musical Onset Detection
Using MPEG -7 Audio Descriptors," Audio, no. August, pp. 1 -7,
2010.
[5] [6] "MPEG -7 High Level Tools." [Online]. available:
[6] http://citeseerx.ist.psu.edu/viewdoc/d ownload?d oi=10.1.1.68.1611&r
ep=rep1&t ype = pdf.
[7] [7] G. Mu hammad and K. Alg hathbar, "Environmental recognition
for digital audio forensics using MPEG -7 and melte r cepstral
features," J. Electr. Eng., Vol. 62, no. 4, pp. 199 -205, 2011.
[8] [8] "High -level Audio Description Tools." [Online]. available:
[9] https://hwiegman.home.xs4all.nl/fileformats/mpeg/ mpeg –
7.htm#E11E9.
[10] [9] "LowEdge -HighEdge." [Online]. available:
http://mpeg7.doc.gold.ac.uk/mirror/v1/Matlab -XM/XMdocs.txt .
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: XXX -X-XXXX -XXXX -XXXXX.00 20XX IEEE Mpeg -7 features for audio signals [626975] (ID: 626975)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
