Emotionally Intelligent Cognitive Assistant for [614044]

Emotionally Intelligent Cognitive Assistant for
Affective Assistance

Abstract — This paper outlines the impa ct of educational software
applications on children and defines a framework for improving
their knowledge and communication skills. It shows how a bot can
be configured and implemented into a software application. Also ,
it shows how the use of such applica tions increases the children’s
familiarity with the world. The author presents the development
model of a bot who uses the main services from IBM Watson
platform , conversation, text -to-speech, speech -to-text and tone
analyze for a precise analysis of child ren emotions.
Keywords —bot, learning machine , smart toy , conversation, tone
analyzer , IBM Watson
I. INTRODUCTION
Bots have become more and more popular and lately
due to increased functions of artificial intelligence and the
ability to be integrated easily i n devices. That is why various
platforms have been launched for creating such programs,
including IBM Watson.
These bots are programmed to answer to the user ’s
questions and to collect relevant informations from a huge
amount of data. Generally , it runs si mple repetitive tasks in a
faster way than human s can do. Bots are equipped with a
certain degree of artificial intelligence and become more and
more efficient in their responses.
A. Premises and Improvements
In this paper, it is clear from the idea that they are
conversational agents that help children, but do not empathize
with them . In the making of th is paper it has been documented
from the following premises:
1. Deepen the knowledge of children through conversation
agents
2. Interaction interfaces with co nversation agents
3. Conversational agents in referral systems
4. Conversational agents that detect emotions
B. Description approach
The purpose of the paper is to create a conversational
agent that is just like a partner who helps a child to detach from
the detected emotion in the first interaction by recommending
an activity. In order to achieve the purpose of the paper, th e
following tasks were proposed: a nalyze a problem that society
is currently facing with , develo p the steps to create a bot and
study pro gramming languages like C #.
The application focuses on the role that emotion plays
in adapting the behavior of agents and how this emotional
reaction can be changed by recommending an activity. II. APPLICABILITY OF COG NITIVE BOTS WITHIN C HILDREN
ASSISTANCE
A. Bot description and the influence on children
Bots perform simple, structurally repetitive tasks much
more quickly, efficiently and accurately than humans do. A bot
it is also a service powered by machine learning algorithms.
A bot is a computer program that performs an audio or
text conversation. Bots are commonly used in dialogue systems
for various pract ical purposes, such as customer service or
information acquisition. Some bots use natural language
processing systems, but a lot of systems scans the input
keywords, then selects an answer based on those input
keywords.
B. Emotional Bot
There is a wide variety of bots and most of them are
designed for learning purposes to develop creativity,
communication skills, emotions and responsibility . This will
allow chi ldren to interact with their bot and develop more
personal relationships and understand their emotions.
C. Comercial Applications
The chatbots have been implemented into different
devices such as toys. The IBM Watson computer has been used
as a basis for edu cational toys used by companies like
CogniToys which was designed to interact with children for
educational purposes. CogniToys presents a smart toy designed
for kids to offer them an educational experience and also a fun
experience. If a child is scared, the dinosaur consoles him and
encourages the child. Also, i f other smart toys are based on
pre-programmed answers, CogniToys listens to children's
questions and adapts his answer to their age.
Watson is a computerized system with artificial
intelligence ca pable of answering questions in natural
language. Watson went from simply speaking English to
understanding nine languages. He can deal with simple
questions and he can also answer with complex data. Watson
can understand the emotion and tone of the speake r.
With the improvements in artificial intelligence and
integration in mobile devices, a new era of virtual agents has
begun. Siri, Cortana are just a few examples that have enjoyed
tremendous success among users. They’ve become popular due
to the simplici ty of use, the understanding of natural language
and the multitude of activities in which they can assist us.

Question
Intent /Entity
FoundStart
Answer
Tone AnalyzerBot doesn t give you a specific
answer
(You should repeat your question )NO YESYES NO
Stop
Fig. 1. The logical scheme of a bot

IBM Conversation clarifies user input, conducts
convers ation, and collects the information it needs to provide
an answer as shown in Fig . 1. The application connects to a
workspace, which is a container for the dialog and training
data. A workspace contains the following types of words:
 Intent: the internet i s the user's conversational goal.
 Entity: An entity is a term or object that is relevant to
the intent and that provides a specific context for an
intent.
The Dialogue component uses the intents and entities
identified in the user's input to gather the ne cessary
information and to provide a useful answer for each user input.
Dialogue is the logical flow that determines the responses the
bot will give when certain intents and / or entities are detected.
III. INTEGRATED SOLUTION FOR ASSISTIVE BOT
IBM Watson is a cognitive technology platform that uses
natural language processing and learning algorithms to reveal
information from large amounts of unstructured data. Among
developers, IBM Watson is the best option so far because it has
the most complex range of servi ces.
IntentsIBM Conversation
Interface Unity
Emotional and
language tonesIBM Tone AnalyzerAsk
Questions
Entities
Dialog

Fig. 2. Use Case diagram Conversation bots interpret what is said and try to
respond with a relevant dialogue to reach the goals. IBM
Watson Conversation can receive input from any user
interface, then interprets the intent and collect the information
it needs as in Fig. 2 . Users interact with the application through
the interface. It can connect to other Watson services to
analyze user inputs such as Tone Analyzer.
This service uses linguistic anal ysis to detect joy, fear,
sadness, anger, analytical, confident and tentative tones found
in text. Watson Tone Analyzer uses cognitive linguistic
analyzes to identify a variety of tones in sentences or in whole
document. You can analyze JSON files, plain t ext, or HTML
entries that contain writing service. The service returns JSON
results reporting the input tone. The input text is sent into the
service, then the service returns the JSON results showing the
input tone as in Fig . 3.

–-
––––-
––––-
––––-
––––-
––––––
––––-
––––-
––––-
––––-
––––- –-
––
––––
––
––
––––
Submit your
contentAnalyze
your contentView a JSON analysis
of your contentUse the analysis to adjust
the tone of your content

Fig. 3. The structure of Tone Analyzer Service

Tone Analyzer, does not use the user's voice variants,
but uses user input and analyzes the content of the text,
assigning different sentence values. The Tone Analyzer service
was made using 4 different color cylinders that identify the
score of detected tones between 0 and 1.

Fig. 4. Cylinders that sense emotion

Emotion Color
Sadness Blue
Joy Yellow
Fear Purple
Anger Red

Table 1. Emotion with afferent color

Each of these emotions can take different values such as
joy, sadness, fear and anger. Each emotion is assigned a value
between 0 and 1 with a value of <0.5 indicating that emotion is
not present,> 0.5 emotion is proba bly present to some extent
and a value greater than 0.75 the emotion being most likely
present.
Each lower -level model uses a combination of learning
algorithms and language features, such as words, phrases,
punctuation, and global feelings.

TextNLPVoceText
Voce

Fig. 5. The structure of Conversation Service

With IBM Watson Conversation, you can create an
application and agents who understand natural language and
communicate with users by simulating a real human
conversation.
The Fig. 5 shows the de velopment model of a
conversation agent that uses the main services of the IBM
Watson: Conversation and Tone Analyzer for an emotion
analysis plus Text -to-Speech and Speech -to-Text.
The conversation starts in an initially configured node
with the special conversation_start condition. After that, the
dialog moves to the marked node as an active node. The
response configured in this node is visible to user. User input is
analyzed by intents and entities and u sed to select the next
dialog from stream.
The cond itions in child nodes are evaluated in
descending order using intent s and entities extracted. The first
child node that matches with a condition is selected as the next
active node and a new conversation begins . If the child node
does not match, the Conver sation ser vice evaluates the
conditions from each base node in the dialog and selects the
first node that matches with the next active node.

conversation _start
Nod Activ
condiție
condiție
anything _elsecondiție
condiție
condiție1
2
4
53
Noduri de bază Noduri copil

Fig. 6. The algorithm for selecting the next node A usefu l approach is to have a base node configured
with anything_else condition , so that the conversation is set
default when no other node matches with the conditions.
Anything_else condition is always evaluated. You can use this
node in the dialog to tell the user that the input was not
understood and suggesting a valid interaction.
Unlike human -to-human communication, human –
machine communication must produce data structures in a
deterministic way. In this case deterministic means that a
computational system mu st generate the same representation
each time the same signal is processed. Knowledge sources
used by message decoding machines are just models used by
people for the same purpose or for similar purposes. Speech -to-
Speech (STS) is an important part. STS in volves speech
recognition and machine translation. An ideal man-machine
communication is also considered to be determined by speech
for a more natural interaction.
Speech to Text is based on voice as a research object,
allowing the machine to automatically identify and understand
speech -spoken language by processing the speech signal. The
voice is transformed into an electrical signal on the input of the
identification system, and after that the voice input signal is
analyzed and the text is extracted.
A sp eech recognition system takes an audio stream as
input and converts it into a text. Speech to Text processes the
audio stream, isolating the sound segments and converting
them into a series of numeric values that characterize s the
voice sounds from the sig nal. Speech to Text process is a
specialized engine that processes input and searches in three
databases: acoustic model, lexicon and a language model.
The acoustic model represents the acoustic sound s of
a language and can be trained to recognize the ch aracteristics
of the speech patterns and acoustic media of a particular user.
The lexicon displays a large number of words in that
language and provides inf ormation about pronouncing each
word.
The linguistic model represents the way of combining
the w ords of a language.

Text to Speech is the transformation of text into speech.
This transformation converts the text that is as close as possible
to real speech. The text -to-speech process consists of two
phases: first one occurs when the text passes throu gh the
analysis and then the second phase occurs when resulted
information is used to generate the sp eech signal. Text to
Speech has purpose to read texts. The input element of the Text
to Speech system is a text, and the output element is a voice.
Man-machine communication in this case is done
through the graphical interface. This is the one that interfaces
the human -machine link. The interface is the visual
representation of the application and the user interaction mode
being developed in Unity using the scripting C # programming
language.
The interface was made in Unity and it has four
buttons, one input field and a text box where it’s written the
answer given by the bot as shown in Fig. 6. It is easily to use
this interface because the buttons have sugge stive names.
Buttons are interaction components that are selectable,
meaning they have functions for viewing state transitions and
navigating to other selectivity using the keyboard.

Fig. 7. Interface

The main features of the application are as follows:
 Retrieve text entered by the user
 Display responses to the user's questions
 Analysis of the text entered by the user for the detection
of emotions

UsersInterface
UnityBrowser
IBM WatsonWatson
Conversation
APIWatson Tone
Analyzer API
messages
questions
search
Intents or
Enitites
check
Emotions
response
display

Fig. 8. Sequential diagram
The Fig. 7 shows the route of a question from when it is
written by the user and how it is decomposed to receive a
corresponding answer at the end. IV. RESULTS OF THE SOLUTION PROPOSED
Configuration of questions and answers was done on
the IBM Watson platform within the Conversation service
through the three sections: Intents, Entities and Dialog. The
Intents and Entities sections are represented by keywords
that help the Conversation service answer user questions by
detecting the keyword in the que stion, and the response to
the user's q uestion is implemented in the Dialog after the
keyword has been found.
The implementation of intentions on the IBM Watson
platform is represented in Fig. 8 This step was to add
general questions about the topic chosen.

Fig. 9 . Intents configuration

In the Dialog section, the response to user queries is
configured, and this step utilizes the intentions and
enunciations listed above. The implementation of the dialog
section is shown in Fig. 9.

Fig. 10 . Dialog c onfiguration

After the 3 sections were implemented, the application
was tested using the IBM Watson -based chat system called
"Try it out" as shown in Fig. 10.

Fig. 10. Testing the question ontology

The connection to IBM Watson Conversation Service
and Tone Analyzer Service is performed using the 3 unique
keys provided by IBM for each service and used for
communication with the Unity interface in Table 2.

Keys for IBM Conversation Keys for Tone Analyzer
private string _username = "66a1cbfe –
e03f-45cb-9d72 -30d8e068556";
private string _password =
"8267cFw1oEN8";
private string _url =
"https://gateway.watsonplatform.net/
conversation/api";
private string _workspaceId =
"80c1d742 -20b4 -4b46 -ad19 –
26103fc688fd"; private string _username =
"beb9c95a -be0e-49aa -882f-
62269319688f";
private string _password =
"nZTAbjW0GdnS";
private string _url =
"https://gateway.watsonplatform.net/
tone-analyzer/api";

Table 2. Unique keys for services on the IBM Watson platform
V. CONCLUSIONS
In this paper, the development of an educational
software for children, which defines a framework for
recommending activities by detecting the emotional state, was
developed.
In order to implement this application, two services
provided by IBM, called IBM Wats on Conversation and Tone
Analyzer, were used. Watson Conversation processes natural
language by extracting key data from a user -entered phrase to
identify the subject that the user is interested in, and Tone
Analyzer detects emotions i n the text entered by the user. To
implement this application, the Unity program was used to
create the user interface elements, and Watson Conversation
was used to configure the ontology of the questions.
These bots are programmed to answer quickly to the
users questions and to collect relevant information from a huge
amount of data. In general, it runs simple repetitive tasks in a
faster way than a human can do. Bots are equipped with a certain degree of artificial intelligence and become more and
more eff ective in their resp onses.
The contributions made by this application are:
1. Setting up the platform for recommending activities to
children based on the detected emotional state
2. Implementing an interface in Unity
3. Creating affective indicators of different colors depending
on the detected state
Extensions to scale the solution:
Integration of this software solution into a hardware
support, such as a toy, as it would be a more effective way to
interact with children and have a more conducive environment
for conversation. For example, hardware integration would
require: Raspberry Pi, 8GB minimum SD card with Raspbian
operating system, Bluemix account, Watson platform service
and Node -RED application.
In conversation, the conversation agent should detect
the child's emotional state and respond accordingly. It would
be interesting if you could add a visual interaction for
recognizing the user and for easier detection of room emotions.
Another extension would be to improve the
conversation agent by adding the voice inflection recognition
service .
VI. REFERENCES
[1] E. O’Hare, C. Cinekid, “Mobile apps for children”, Criteria and
Categorization, Cinekid, 2014
[2] C. Pelachaud, E. Andre, G. Chollet, K. Karpouzis, “Intelligent Virtual
Agent”, Springer, 2007
[3] Qazi S. M. Zia -ul-Haque, Zhiliang Wang, Cunyang Li, Juan Wang and
Yujun, “A Robot That Le arns and Teaches English Language to
[4] Native Chinese Children”, University of Science and Technology
Beijing, Beijing, China, Proceedings of the 2007 IEEE International
Conference on Robotics and Biomimetics, 2007, Sanya, China
[5] A. Azrag, H. Aziz, N. Nappe, L. Sri, “Building Cognitive Applications
with IBM Watson Services: Volume 2 Conversation”, Redbooks, 2017
[6] “IBM Conversation.” [Online]. Available:
https://console.bluemix.net/docs/services/conversation/index.html#about .
[7] “IBM Tone Analyzer.” [Online]. Available:
https://console.bluemix.net/docs/services/ton e-
analyzer/index.html#about .
[8] J. Meng, J. Zhang, and H. Zhao, “Overview of the Speech Recognition
Technology,” 2012 Fourth Int. Conf. Comput. Inf. Sci., pp. 199 –202,
2012.
[9] K. R. Aida -Zade, C. Ardil, and a M. Sharifova, “The main principles of
text-to-speec h synthesis system,” Int. J. Signal Process., vol. 7, no. 1, pp.
13–19, 2013.
[10] J. Hastings, W. Ceusters, B. Smith, and K. Mulligan, “The emotion
ontology: Enabling interdisciplinary research in the affective sciences,”
Lect. Notes Comput. Sci. (including Su bser. Lect. Notes Artif. Intell.
Lect. Notes Bioinformatics), vol. 6967 LNAI, pp. 119 –123, 2011.
[11] V. Teller, “Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech
Recognition Daniel Jurafsky a nd James H. Martin (University of
Colorado, Boulder) Upper Saddle River, NJ: Prentice Hall (Prentice ,”
Comput. Linguist., vol. 26, no. 4, pp. 638 –641, 2000.

Similar Posts