The business value of language technologies: 2011

2 Sept 2011

Understanding the "foreigner"? Is what he said what he meant?

We have more and more iPhone or Android Apps that, when we are in a foreign country we do not understand the language, translate for us what we see, read or want to ask. Are these Apps of great help? I guess yes, when we look at the usage statistics.

Do these Apps translate well? What about the intention? Can we accurately translate intention?

A friend of mine send me this nice link about cultural differences between Germans and Britons.

I like the text because it intends to explain differences between cultures and thus works on our understanding of each other, brings us nearer, shows us that the different nations in Europe are not that different, actually. I like the text, it is good but not complete: To my opinion it does not go to the end. Or it does, perhaps, but then in a typical British understated way! :-) And I'd like to explain why with some examples of the real life, illustrating how one action (or phrase) translates from one nation to another one.

In every culture there is a gap between what is said and what is meant. Not only in the English culture as the article would like to state it. As much as for Germans as for Britons, this gap is often used to generate humoristic situations. Actually, I think it is more: this gap defines the culture of a nation. This applies to all aspects of life.

Politeness, just to take one of the many dimensions of every social life, is not expressed by the words you are using but by your intention. The words you use to express politeness are the ones typically used in your language and expected by your culture. The words reflect your intention within your culture ...and only within your culture. Translate literally these words into another language, they may reflect something different, because used in the other culture.

To illustrate this, let's take the German question “Bitte gib mir das Brot” what literally translates into “Please give me the bread”. This translation sounds rude to a British ear. But this translation does not take into account the cultural differences. The real good translation for “Bitte gib mir das Brot” should emphasise the "bitte" what indicates the person is very polite and further it should smooth the typical German directness. At the end we should obtain something like "May I have some bread please?” what translates the "real" polite intention of the German questioner. And this is general. A German sentence containing "bitte + verb" should not be translated directly into "please + verb" but at least into "please may + verb", so adding the "may" or something similar.

And now, let's go back to German, just to have some fun: The literal translation of "May I have some bread please", this time back into German would give us “Könnte ich bitte etwas Brot haben?”. How would this sentence be perceived by Germans? They would answer with a comment, something like “sure you can!” and he/she would not move a finger. Knowing the person asking is not German, the German may add with a smile “Do you want me to give you some?”

Yes, this is German culture. One of my German bosses who was reviewing for the first time one of my papers for an international scientific conference erased all the "may", "shall" from my text with a comment, " you did or did not, you will or will not ...".

We can go a little bit further and look at other cultural differences with for example the next situation in a German city, where pedestrians are waiting for the green "walk" signal to cross the road on a zebra. The pedestrian signal is still on red and a french guy arrives and begins to cross the road. Remembe, we are in Germany. A German waiting for the green signal will try to stop the guy crossing saying "Ampel ist rot!" ("Light is red!" a very factual statement, typical of the German culture). The French guy will react to that statement in an impolite and aggressive manner that the German will not understand. World War III is behind the corner ... What happened? the French understood the simple and direct statement "Light is red" as a command to stop. The German was saying, "hey you, be careful, it is dangerous to cross the road while the light is red". The intention of the German was to warn, the french translated literally and understood a command.

Why did the French understand a command? By the way, I am french myself what allows me to be harsh against my friends. The french culture is very hierarchical: you have those who had good marks at school and those who had bad marks, those who are intelligent and the stupid, those who know and those who know less, those who have the power, those who do have to rebel against the power if they want to be taken seriously (because those who know, know and there is no discussion!). It is clear that within this culture, a direct informative statement like the one on the red light made by the German can be understood as a teacher looking at his stupid pupil. The German informative sentence is understood as a command of the one who knows towards the one who does not know, what in France automatically leads to a rejection. By the way, would you be surprised the red light is a french invention? Not at all, it fits the culture: some guys who had good marks at school were looking at a solution to regulate the typical uncontrolled circulation of those who had bad marks at school. And the red light system was born, in Paris!

Why will the German warn? The German society builds on respect to each other. Generally, decisions are not made top down but after consensus taken across all involved parties. This implies a strong feeling of living *together*. Thus the warning to the "family" member when crossing on red.

Going back to our conflictual situation with the red light, the french guy translated what the German guy said literally into his own culture . In France such a sentence would have been the one of a policeman, of a teacher, of somebody who knows better: it would have been an order to stop walking. But the French guy is in Germany and crosses the street in Germany, he should have taken into account the German culture (the real one, not his interpretation of it), and the french guy should have translated the statement into "hey, watch out, a car (might be) is coming".

Oooh now it's beginning to be complex: the French guy should should have taken into account the real German culture, not his interpretation of it. Does he has an interpretation of the German culture? Sure, based on his experience, on his own culture! You cannot escape your own culture, you live in it, with it, it is your reference. When you see something different you will compare it to what you know. "It's smaller than the garden of my uncle, but it's bigger than the helm of my nephew" says Jolitorax about his boat in "Asterix in Britain".

What is the real German culture? How does it come that French have another understanding of the German culture as Germans have of their own? Because it has been translated partially. As I said, the French culture is very hierarchical. The one who knows give commands to the lower class, to the one who knows less. As a consequence, French companies are typically organised top down. Now, even if the french culture tends to ignore it, we all know that "stupidity" is evenly distributed throughout each level of the society. To be less elegant but direct: you have the same proportion of stupids within the class of people who knows as within the class of people who do not know. Knowing does not protect from stupidity. It happens more than often that a group receives a command from the boss, that according to the experience of that group will lead to a disaster. What will this group do in France? The society is hierarchically organised, so the group cannot go to the boss and tell him, what he has decided is stupid. The French solution then is to satisfy the boss by giving him the impression everything is done according to his view of the things and, within the group, to work on the real solution, the one that will be successful. So French see themselves as very flexible. And they are! And this is what I like in France, even in catastrophic situations, the fact that they are used to find alternative paths makes them very creative and come out with excellent refreshing ideas. "impossible is not French!" is what you will then hear. And I like it.

Now the Germans are not as hierarchical as the French are. The decision process is not top-down but based on consensus. Everybody discusses, every opinion will be taken into account. Well not every opinion, sure, but an infinity more when compared to the French process. It takes ages before a decision can be made. But when a decision is made, everybody is on board, and they all go like one man in the same direction ... because they all understand what the goal is and the big majority agreed on how to achieve this goal. The French interpret this behaviour of everybody pulling in the same direction with "the German obey commands and refuse to criticise", "the German is unflexible" and even ... "the German society is very hierarchical". But the French forget the very long decision process of Germans before being operative: if they all pull in the same direction like one man, it is because they have been involved in the decision process.

What happened in the French interpretation of the German working culture? The French do not look at the decision process, because it is nearly nonexistent for them (Sarkosy ,... euh ... the boss has decided!) and they look only at the operative process of the Germans, saying "look at this people, no flexibility". Indeed no flexibility in the operative processes, but a lot in the decision process.

And the Germans will hate the French operative flexibility, saying "you cannot rely on them". Sure, in the German way, a consensus has been found, then why debate again? Actually, the flexibility in operations of the French is their way to find a consensus, to find the good solution out of a problem.

Slowly, we begin to see, when comparing people, comparing societies ... we cannot note any difference in the intention, everywhere you will observe the same distribution of values, my so deared values ... everybody has these values ... but there is a difference in the how to express these values.

Another example? Away from language and processes? In certain West African cultures, the right hand is used for honourable tasks like eating, greeting, giving or receiving a gift; The left hand is used for subaltern tasks like washing. Salutations even from a distance by waiving a hand cannot be done with the left hand: if so, it is considered like an offense. The intention of the West European greeting with his left hand (because he/she carries something heavy with the right hand) is positive, but the west African may feel offended. The European might be surprised by the reaction of the West African.

The same with the white that is associated to death in India and his hence interpreted as a sad colour, completely at the opposite of the European perception of white: the European black is the Indian white.

And they are thousands of similar examples where the distance between the objective signal (what is said or done) and the intention behind the signal (what is meant) can lead to misunderstanding. This difference (between what is said and what is meant) is often used within a culture to laugh. It then defines the humour of this culture. Further this set of differences defines also a culture, as each set (the difference between what is said and what is meant) is typical of a region.

Like most of you, I also like to joke on foreigners, on people from other regions. But what do we do there? We literarly translate what they said into our own culture, what creates (for us) a new distance betwen what is said and what is meant. This new distance makes us laugh. Does it help us understand the people from the other region? No! At the opposite! Because we literarly translate, we forget what is meant: this creates a ground of stereotypes that is hard to get rid of. And I still like jokes on foreigners. But it makes the understanding of foreigners much more complex!

It is not only stupidity that is evenly distributed throughout all levels of the society, it is also our core values, like politeness, kindness, value for good work, respect and whatever "value" one want to define human beings with. We all share the same values, we only express them differently. We should never translate a sentence literally: a good translation takes the initial intention behind the sentence, evaluates the distance between what is said and what is meant, and translates it into the target culture, building a new distance between what is meant and what is said, the one that in the target culture expresses the initial intention.

What can we do? A travel guide for tourists should certainly illustrate in profusion the distance between what is said and what is meant, not only knowing the target culture but also the culture of the reader. Is such an illustration enough? Probably not. It should explain and make the point on why expressions are different.

So why not collect, gather in a blog all translations that lead to mis-understanding, funny situations, desolate or disastrous ones. This will certainly help us understand each other!

Go to the page "translate your culture" on this blog ... and write your short story

6 Jun 2011

Is written communication becoming a trend?

We write more and more, we use less Tipp-Ex and we talk less and less on the phone. Our communication form is changing: What does it means for the customer communication of corporations?

But before going further we may want to ask ourselves:

What are the differences between spoken and written communication?
When, in which situation do we prefer one mode over the one?

A few speakers at the Mobile Voice conference in January in San Jose noted like I did that the number of phone calls is going down since a few years, and that in parallel the number of SMSs is growing, to come at the same level as the number of voice calls in 2008, to be twice as big in 2010. Sure the generations X and Y tend to send more SMSs than the baby boomers or older generations, but the growth rate in number of SMSs is the same across all generations.

These facts lead to a question, why are people using less and less their voice to communicate and why do they use more and more SMS? Is it that we do not want to talk anymore? People have the choice. They can place a call or write an SMS. SMS can even be more expensive as Voice. Why would more and more people use SMS, an unfriendly (with a high cognitive load) communication means? What is behind this move from voice communication to written communication?

Perhaps I should begin by listing the properties of both channels, how does the voice channel compares with the written channel?

Voice is linear, one dimensional and does not allow to go backwards ( what you said is just out, you cannot delete it and what you listen to is just gone).
Voice is synchronous: this means it does not allow for reflection times longer than a few seconds.
Voice is slow: you do not gain time in listening in diagonal, and you will interrupt a long sentence only if you are very sure, the received information is not interesting at all.
Voice is formal: socially, people have to go through a few formalities before going to the point.
Voice is loud, not discrete and intrusive. Everybody around you can listen to what you say.
The only tracking / recording you have of what you have listen to is an interpretation of what has been said.
The only tracking / recording you have of what you have said is what you meant to say, not what you said.
Emotion can easily be conducted through the voice channel and carries quite a bit of information.

Now Looking at the written communication form:

Writing is non linear, multi-dimensional: the eye can go forward and backward at any time (you can rephrase your statement, you can reread what you read).
Writing is asynchronous: you can take as much time as you want to think about what has been written before answering.
Writing is quick: you can read in diagonal and come back when you note something is of interest to you.
Writing is informal: you can go directly to the point, no welcome necessary that anyway will be skipped (see previous point).
Writing is discrete: it can be received on any portable screen and its content is non intrusive.
What has been written is always available to you: if you are not sure, you can go back and read what has been written.
Idem, what you have written is always available to you: you can understand from the answer or reaction of the other party that what you have written was not clear.
It is hard to carry emotion in writing.

First, I would like to make a small remark: While reading the list of points above, the reader’s eyes, your eyes went up and down for you to precisely recall and compare each of the points one after one. I would have brought this comparison in a table (what I would have done if the blog formatting would allow for it), I would have augmented the multidimensional aspect of written communication and your comparison would have been eased. The numbering of each point, as I have done it, is another way to add multidimensionality, what means ease of understanding and comparability.

Let’s go further on point 8, that of emotion. Charismatic persons love it. Some others hate it.

Everybody knows that the less precise a statement, the more interpretations of that statement can be (will be) made. You can make use of this property to understand your counterpart, what for example good sales persons love to use. The buyer on the other side will try to protect himself by requesting written documentation. The typical selling/buying game.

Apart from the selling/buying scenario, looking again at the properties of the voice channel, the consequence of its linearity and discourse interpretation (points 1) + 6) + 7) above) one may notice, as soon as your counter part is more literate, as soon as he/she can express him/herself more fluently and precisely than you, you will be in a weak position and you will be less able to achieve your goal. So an average person will win half the time, will feel weaker half the time.

At the opposite, a statement in writing is always available in its original form. By going back to it, one may understand the reaction of the other party and clear any mis-understanding. The advantage of written communication relies in its ability to keep the whole communication as it was and to leave time between each communication “STROKE”.

But the difficult transportation of emotion on the written channel will challenge the charismatic people who regularly make use of their personal aura.

So when will a person use the voice channel, when will it prefer the written channel?

A charismatic person will always prefer the voice channel. A person that needs or want to carry emotion will take the voice channel. A person that wants to exchange ideas, that wants to discover something new will very often take the voice channel. A person that wants to communicate will prefer the voice channel: it allows for high interactivity.

A person that simply wants a question to be answered, a person that is not sure about its spoken fluency will prefer a short written communication

What can we take from this … we can define 2 type of communication modes, a primary communication mode and a secondary communication mode.

A person is in a primary communication mode (and why not in a communication mood) when its primarily goal is to communicate, to discuss, to exchange ideas, to discover. The path (i.e. the communication) is the goal. This mode is chosen when people are looking for interaction, when they want to carry emotions, if they want to be able to interrupt the information flow and be passionate about their idea. Or also when they are discovering and are not well structured about what they are looking for. Interactivity, interruptability are the main characteristics of the primary communication mode. Voice is the ideal channel for primary communication as it is quick, personal and carries a lot of information like emotion, tone, sincerity , but it can be the web with a flow of pictures that you are browsing through (impersonal). Typical examples are social, politics, sales, philosophical discussions …

A person is in a secondary communication mode when its primary goal is to achieve a result. The path (i.e. the communication) is the means to reach the intended goal. It uses the communication path to obtain information or purchase a good. The what primes over the how. The information primes over the communication means. A search for objectivity underlies this communication mode. Writing is the ideal channel for secondary communication as it allows for reflection time between each turn.

The statistics document the trend towards secondary communication mode. Customer service at corporations is slowly making the same observation. Too slowly! Corporations should support written channel much more than what they do today. Corporations have to adapt their communication form to what people are asking for.

If you are asking yourself when and where to personally communicate with your customer,

when and where automating your customer communication –

contact me any time: dugast@tech2biz.eu.

23 May 2011

What is more easy: to talk to a dumb or to an agile person?

Subtitle: The impact of cognitive load on the acceptance of new technologies

I love language technologies. Whenever I can, I try to use them, on the phone, on the web. On the phone, for example, when an automatic system (IVR) asks me for my phone number, I will always pronounce it. I will never key it using the phone’s keyboard. In our family, we were among the first to have a very well featured telephone at home. It contains most of the phone numbers we dial more than once in a nice directory. And sure, it allows for voice dialing.

However, each time my wife wanted to call her father, she keyed the 10 digit old family phone number manually. I made her aware of the nice features of our phone, that her father’s phone number was in the directory. She did not care, she still went through the 10 strokes on the small phone-keyboard to call her father. So I asked why: using the recorded address book, she would need to press only 3 or 4 keys. Or even better, speaking the name, she would have to press only one button. She said she was aware of all this …but she preferred to type in the whole number anyway.

Another day she was dialing her father and at the same time she was explaining to me what she wanted to ask her father. She was dialing and talking to me. I know, women are multi-tasking enabled and men aren’t. But so much multi-tasking?

When I call her father … I use voice dialing.

OK , even my wife would not be able to talk to the phone AND to me simultaneously.

A few days later a friend, who is working as a physicist, explained to me one of the reasons why elderly persons recall much better what happened in their childhood compared to what they experienced a few days ago: The brain-energy needed to recall long term memory is much lower than the energy needed to access short term memory.

What a relief! I felt much better … at last, my wife is not so much more multi-tasking enabled than I am: Her dad´s phone number simply belongs to her long term memory. She needs nearly no “CPU” to dial it. I tried it with my mother´s phone number. It came out of my fingers, I had nothing to do!

When dialing with voice or with the address book, the brain is much more active: the information we get back from the phone and to which we need to react is always new, and often, the information we get back is surprising “did it really mix up Müller with Miller? I thought I did not have a Miller in my directory” … we need brain activity, we need much more CPU to handle it.

In other words, the cognitive load for Voice Dialling is higher compared to the cognitive load for "finger" dialing.

Is this a specific problem to Voice Dialling?

Let’s take an example from the kitchen. Do you love a really good sauce hollandaise? I do...

For a good sauce hollandaise, you need the correct temperature (between 50°C and 60°C, never, never above 60°C!) and the right speed at which you pour the butter into the sauce (at the beginning very slowly, after 1-2 minutes you can be very quick).

It’s not very difficult, but it needs some attention. And it tastes so much better than the ready-made sauce hollandaise you can buy in every supermarket. And what people typically do? They buy the ready-made sauce. Their kitchen is the most expensive room in their house, they love cooking (see the increasing number of cooking shows on TV, 43 per week already in Germany!), they will complain against a bad dish in a restaurant ... but they will buy ready-made sauce hollandaise, even it if tastes less creamy and smooth than a handmade one.

This is intrinsic to the human being. Quality sells only if it is a no-brainer. A human being will always prefer a “non-cognitive-loaded” solution, to a “cognitive-loaded” solution even if there is a difference in quality. Actually it is worse … if the user has the perception the new solution needs a higher cognitive load as the one he knows, the preference will go for the old habit. In other words, as soon as some cognitive-load is required, customers will seriously consider competitive methods with no-cognitive-load even if they achieve less good results.

Apply this to a voice dialing application. Because a speech recognizer does not have a 100% recognition rate, the customer, the speaker has to be ready to react to any mis-interpretation of the recognizer. The customer is anxious about something unpredictable coming soon. He still does not know when and how, but he knows it is coming. What he is quite sure of, is that his cognitive-load will jump one level higher at some point of the dialogue. If he knows a way to avoid this, he will go for it. He will choose for a competitive method requiring less cerebral activity. The other way round, deciding for a solution demanding a higher cognitive load will be considered only if the competitive methods are perceived more difficult, e.g. dialing while driving in the car.

Apply this to any voice application. It is obvious the cognitive-load required by the user grows with the number of voice interactions needed to go through the voice application. Add to this that, as soon as the customer experiences a surprise with the reaction of the system it is interacting with, its cognitive load jumps one level higher.

In general, the customer will prefer any alternative method with less cognitive load, even if the end-experience is less smooth.

Actually, this is not a surprise. We all know that speech but also written text is subject to various possible interpretations. Even between human beings.

If the person you are talking to is dumb, you have to be very precise and concise in your formulation: the whole cognitive load is on your side.
If the person you are talking to is agile, you can expect a reaction from his side as soon as you are imprecise in your formulation: the cognitive load is shared between both of you.

The higher the cognitive load required to use your voice application, the lower its market reach.

So the big question is how to keep the cognitive load delivered by a voice application as low as possible … This is a combination of:

Addressing long term memory
Reducing the number of possible surprises
Streamlining
Thinking use case
Mastering processes within and across channels
And quite a bit more … in order to make the machine more intelligent or at least more comprehensive, more agile

Talk with us if you feel your customer communication needs to be improved: dugast@tech2biz.eu

16 May 2011

Voice Control is coming back … will it succeed this time? With or without dialogue? With or without error-recovery mechanism?

The idea of Voice Control (also called Voice Command = VC) is not new.

I recall 1998 or 1999 … It was at the beginning of the Automatic Speech Recognition market. A time where we dreamed of being able to browse on the web using speech. I met several times with a German company in Munich. They wanted to automatically speech enable the whole web. Without any human being in between … so, taking the HTML code of a webpage and automatically generating a voice presentation of that webpage for a person who would be on the phone and not in front of his PC.

The idea was straightforward: detect entry fields on a web page and speech enable them by automatically pronouncing (prompting) the name of the field while at the same time automatically loading the speech recognizer with the vocabulary expected by the field. For example, let’s take an entry field asking for airport destinations on the flight-booking webpage of an airline company: The name of this field is “Destination Airport” and it contains all destinations airports and cities of that airline. The voice browser would take the name of that field and ask with a synthesized voice “please enter the destination airport”. Meanwhile, the recognizer would be loaded with the names of all airports and cities covered by the airline. The user pronounces the name of a city and the voice browser goes to the next entry field. Simple, isn´t it? This Munich based company even developed an extension of HTML for that purpose.

Now the thing is that on the booking web page of an airline company, you typically have several forms like that to fill. Date, time, departure and arrival cities at least (and by the way, the time, was it arrival or departure time?). This means the voice browser would need to present each of these forms, one after the other. This also means automatic dialogue definition on the fly. With error recovery strategies at each step … a horror scenario for a “blind” over-the-phone access to the web page. Even today, 12 years later, with all the experience we have on dialogue handling of automated phone systems.

Well you can guess … it did not fly… it could not fly. The company eventually got bought by BurdaDigital … and I lost track of them.

The idea could not fly, for the simple reason that even today, we have just enough understanding to build by hand, let’s be positive, a nice human-machine over-the-phone dialogue-component. Automatically generating human-machine phone-dialogues (as opposed to manually defining them), which was the idea of the Munich company, is still not solved today. And then add the question of presenting on the phone the context of a webpage to a caller, deciding which details of the webpage are interesting for a caller and which not, summarizing the interesting part in such a way that the caller stays on the phone … A lot of work has still to be done here!

Now the idea of the Munich based company could have flown … on the web, not on the phone. On the web, that is, having the web page in front of you: Speech-in, graphic-out. On the web, that is replacing the input keyboard of your internet-connected device with your voice. In 1998 or 1999 the typical internet connection would have a 2 digit Kbit/second rate, far too slow to handle speech. So they could not even try the idea.

12 years later, internet runs at gigabit rates. Even mobile internet. We have smartphones. And we have much more smartphones than PCs at that time. We also have 12 years of experience in speech recognition and dialogue handling. So you may think, we should run for it, shouldn´t we? The answer is, well, we are cautious. Very cautious!

The best examples are given by 2 big names, Apple and Google.

Apple has introduced Voice Control in 2010 on iPhones and iPods. You can use Voice Control to place a call or to control your music library (iPod). In 17 languages plus some variants of some of these language (e.g. US English being a variant to UK English). No dialogue … No error-recovery: in case of error, you pronounce your command again. The decision not to propose error recovery is important: the user tries his chance, if it does not work, he does not want to get involved into error correcting, so Apple’s design concept.

Google goes a little bit further on Android phones. They name it Voice Actions, essentially to differentiate from Apple. In addition to the voice commands of Apple, with Voice Actions you can dictate a text message (or a mail, or a note to yourself) and it integrates with a map and navigation application. Furthermore, it includes Voice Search, so searching the web with your voice instead of typing in keywords on a keyboard. Google´s Voice Actions understand only US-English. No dialogue. … error recovery here is simple: either you pronounce your command again, or you select from a list of proposed alternatives or you correct the entry with the keyboard.

Apple is weaker on the number of command-actions (2 against 5 for Google), but Apple is stronger on the language coverage side (17 against 1 for Google). None of them, Apple nor Google proposes dialogues. We are still far away from the dream of the Munich company at the end of last century.

But will the dream, that of dialoging out of the blue with any website, will that dream become true?
The answer is: Yes and no.

Yes! in Apples's and Google's sense ... that it is now possible to automatically generate a speaker independent, good quality voice-enabled input field together with a text-enabled entry field.

No! in the sense of dialoging with any website. In a first instance, if both Apple and Google want to be successful, they will keep it at the command/action level and not introduce dialogues. At least to train the general public to get use to a voice-enabled user-interface.

What is the next step for both companies? To enlarge the number of commands/actions and the number of languages covered. Will it be 3 or 5 new commands/actions, will it be 10? No. It will be an infinite number. It will be generic and dynamic. Whatever app a user selects, each entry-field of an app will be keyboard-enabled AND voice-enabled. It will be possible to type or to speak the content of each and every entry-field of an app. This step sounds simple but it is a big step forward … because the recognizer has to be configured automatically, at run time, in the context of the entry-field, with all extensions expected by human beings, accepting in the case of an airport destination field users to say for example ”hum yes. To New York … Kennedy Airport not Newark in New Jersey”

So, Voice Control is back, no matter how you name it, Voice Command or Voice Actions. It is back in a reduced form. With no dialogue, just as an input mechanism to replace the keyboard. In terms of feature, a very simple next step … in terms of technology, a big next step. Customers will soon have 2 input alternatives to their smartphones … both with their inherent problems, either keyboard-typos or speech-recognition errors. And then we will see from there. Automatic typo recovery is improving rapidly. Speech-recognition error-recovery will combine speech and keyboard input. We will need to experiment and go by trial and error in order to understand what customers prefer to use what and how.

And what about dialogue handling that overlooks and guides users through a complete website? This is somewhat further behind. But before going that step forward, we need to ask ourselves an important question. Do we really need dialogues? My answer today is to say no: it is too complex for users. Will we ever need dialogue? If yes in what form, what for? Sure I have my idea. Let's put it that way: The perceived relationship we have with a computer is still a master-slave relationship. Whatever communication model we build, we cannot forget this simple statement.

If you are asking yourself when and where to personally communicate with your customer,

when and where automating your customer communication –

contact me any time: dugast@tech2biz.eu.

17 Mar 2011

The Watson gift for the customer care market.

The timing for Watson could not be better. It is a nice anniversary’s gift IBM has given itself for the 100 years of its existence. But also a very nice gift for the customer care market. Why so?

Perhaps I shall first recall the essentials of Watson. Watson is the new computer brain from IBM which can answer questions on different topics, varying from science, culture, history up to economics or even the last pop music events and their related “chit chats”. The questions to Watson are formulated in natural language. Just like human beings do. And Watson formulates the answer to each question also in natural language. Sometimes Watson may say, it does not know any answer.
Furthermore, Watson has been trained to play in competition against human beings on the US television quiz show Jeopardy. Jeopardy is the number one quiz show in North America. The best Jeopardy players are respected for their encyclopedic knowledge. Watson being able to understand a question and to give the correct answer to the formulated question, makes it somewhat human.

And IBM wanted to set a mile stone. Just like it did in 1997 with DeepBlue, the chess computer who won against the world champion chess player, Garri Kasparow. IBM brought Watson between the 14th and the 16th of February 2011 in competition against the best human Jeopardy players ever, Brat Rutter and Ken Jennings . And Watson did win. Winning twice as much money as the best of the two human beings. A new milestone, for IBM. For the computer science. For the definition of human intelligence.

So what technology is behind Watson? It is called DeepQA for Deep Question and Answer. At a very high level, DeepQA analyses natural language sentence and extracts information it contains. For example from the sentence “Albert Einstein, born on the 14th of March 1879” it can fill its database of birthdates with the entry Albert Einstein and the corresponding date. It can also go further and extract interesting knowledge across sentence boundaries that are obviously in the same context. In the following example Watson makes use of temporal calculation, geographical relationships and paraphrasing statistics to relate Vasco da Gama stated in the sentence “on the 27th of May 1498, Vasco da Gama landed in Kappad Beach” together with the explorer mentioned in the question “In may 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India”.

Without going into the details of the technology behind, this allows for 2 interesting activities.

The first one is data mining: Watson analyses text offered to it, categorises this knowledge and collects it into tables, knowledge bases and lists of semantically annotated sentences.
The second one is answering a question: Watson extracts the context from a question and evaluates the best possible fit between the question and Watson’s accumulated knowledge.

So like a human being? Yes if we reduce a human being’s capacity to access its knowledge using simple calculation. The complex reasoning of the human being, deduction and induction capabilities or reasoning per analogy is still to be modeled by a computer.

But Watson is good enough to help human beings in decision making: Watson can extract from a huge database the relevant piece of information to support the decision process of human beings, for example on illness diagnosis.

And why is the showing of Watson good for the customer care market?

The essence of customer care is to answer questions. And if Watson can answer open questions on a very wide variety of themes, it certainly can be very efficient in answering customer care questions. More, Watson does it automatically.

And automation brings in quality: answer quality, process quality.

Automation brings control: you know exactly what your automated service delivers.
Automation brings repeatability. Same question same answer. As long as it is not changed or improved.
Automation allows to work on increase of quality: one can teach, fine tune an automated system.

And as Watson shows it, with a very high quality!

Just as DeepBlue validated the market for automated chess players, DeepQA will validate the market for virtual assistants. Good news for a market (that of automating customer care) which until now took its time to mature.

14 Mar 2011

Customer Service ... or ... Service for the customer ?

There is customer service - and there is service for the customer, as some people say.
Ok, but what about the following experience. Does that fall into the category “service for the customer”?
I booked a flight online. When finished, I was pleased to immediately receive the booking confirmation per email. I went rapidly through and noticed something went wrong. Something not important for the flight itself, something more of an administrative issue relevant to taxes. Something that if not done well, may end up next year in a long mail exchange. The billing address was not complete. Probably the software on the web had cut long entries in the form. Annoying but it is something that can be solved rapidly. So I wanted to reply to the incoming mail and ask the airline to replace the billing address with the one the tax authority would accept.

Click on reply, write something like, “with regard to attached booking Nr xyz, please replace the billing address with the next address. Thank you. Why my best regards”. At most 2 minutes work. It ended up with more than 20 minutes of stress, tension, call waiting and real Euro costs.

Why? Because first of all the mail I received was a no-reply mail. It did not contain any contact information, no link to customer service, no phone number. But plenty of links to renting cars, hotel booking and all the stuff you can imagine finding at the end of an airline mail. Only the attachment to the mail contained a 14 cent per minute phone number to call. I looked again … no way to contact the airline company per mail.

So I called the 14 cent a minute phone number, landed in a waiting queue and after 15 minutes got an agent who needed some context information like the booking number and my name. Just say “Dugast”, a French name to a German person on the phone and you are good for 20 seconds of spelling. And eventually we got to the reason of my call, to correct the billing address. But he did not want to go for that, so we went for a completely new billing address, starting from scratch with plenty of “Alpha Tango, yes I got it .. sorry … would you mind repeating please …”.

A 20 minute phone call in total with 15 minutes in the waiting line. Their buggy software cost me 2,80 €! Their buggy software cost me 20 minutes of my time. Their buggy software cost me unneeded negative stress!

And … at the end, no apologies! I can go for no apologies from an automatic system. But from an agent who even noted the form filling system on the web was doing too much in cutting long entries, I would expect some basic natural human behavior. Why do I need personal assistance if it is no more human than speaking with a machine? And on top if the conversation with the agent is much more complex than using automatic written assistance.

A few days later, I went on the web site to look for online check-in. This is where I noted that this airline company does in fact allow for written communication. Yes they do. With a web form. As always hidden behind some small button. But I was surprised. How could I have overlooked this button when I wanted to change the billing address?

The answer is very simple: I was on the email channel. I was reading an email when I saw the problem. I was not on the web. I naturally wanted to reply to the mail … or to look within the mail for some communication means. How can I imagine more communication capabilities on the web (which is per definition not communicative) as within a communication channel like email? So they write me an email, they communicate with me, but only one way … They do not allow me to use the channel they opened themselves. “Please do not come back to us, we do not want to know what problems or questions you may have.” This is the message they are sending implicitly.

How could they have done better? From a customer´s point of view, it is very simple: They could have kept alive the communication channel they opened themselves. That is: to allow for a mail answer when they are sending mails.

If the customer service is afraid of opening a communication channel with no constraints, than the next best solution would have been to display in the mail a link to the web form. As simple as that. Nothing more. The communication is established and kept with no channel disruption. That´s the way you provide service for the customer – instead of customer service.

2 Feb 2011

Speech-IVR is dead - Long live speech command ?

Jay Wilpon, Executive Director for Speech Services Research at AT&T stated at the Mobile Voice Conference in San José (California) last week:
“until now we forced speech into where we wanted to have it (i.e. IVR), we are now bringing it to where it belongs to (i.e. mobile)”.

Marcello Typrin, Vice President Products from Yap Inc. a company that provides Voice to Text SMS services, began his talk with the statement “The Phone Call is Dead” citing among others an article from TechCrunch.

These statements (plus a few others in the same spirit, all made during the Mobile Voice Conference 2011) are based on the fact that the number of calls is permanently going down and that the data traffic on phone lines is higher than the voice traffic and increasing. In 2008, for the first time the average number of SMS per user reached the average number of voice calls to be twice has high at the end of 2009. And the gap is growing.

Did we err so many years developing voice enabled IVR applications? Does this mean we have to stop all IVR voice applications? Certainly not, but we do not need to be obsessed about Voice IVR. As several speakers mentioned it, there are good reasons why the number of calls is going down: speech is linear, formal, emotional, slow and implicitly puts a time pressure on callers. As long as the caller can keep the pressure it is OK, as soon as the pressure is reversed (e.g. from the IVR to the caller) the caller will prefer a written communication (e.g. SMS, Email …). On the other side, speech enabled apps or voice search on smart phones are more fault tolerant: multi-modality allows for easy error correction. There certainly is more to think and talk about. I will blog more on this during the coming days.

Further, with the Cloud well established, most mobile applications have their recognition engine in the cloud. It means, the results of the recognition engine are directly accessible to speech recognition vendors. The results … and the spoken data. Speech recognition vendors can now access the spoken data and look for recognition errors, even for application design errors and retrain using that real data. Ilya Bukchteyn from Microsoft-Tellme said, they can learn from data 350 times a second. Vlad Sejnoha from Nuance Communications reported 1,2 Mio transactions per day with their Dragon Dictate and Dragon Search Apps on the iPhone. This huge amount of data can now be used near real-time to train and adapt technologies, to adapt Voice User Interfaces and immediately see the consequences of corrections made. This is what vendors were always looking at. Vendors will experience an exponential growth of their learning curve what should improve the user experience on mobile applications.

I note that during the conference not much attention has been devoted to embedded speech recognition. With smart phones that now are able to handle a powerful recognizer offline (i.e. a large vocabulary recognizer can now run on the smart phone), the game rules have also changed here: the recognizer can be speaker dependent in all its terms, e.g. acoustically, vocabulary wise, semantic wise, application wise ….just to name a few restrictions that can easily be built in and dramatically improve usability of spoken commands. I am sure a few things will happen here, as much for TV, for cars as for smart phones.

To summarize, here are the most interesting statements that came out of the conference:

Speech Recognition goes away from IVR, essentially because the volume of voice calls is going down
Speech Recognition goes mobile, essentially with Smart Phones but also in the car (Multi modal application interaction)
Speech Recognition is well accepted for web search on Smart Phones
Business Models will change, but we do not know how
Speech Recognition is better accepted now, essentially because it is used on applications where recognition errors are acceptable/correctable
In terms of technology needs, understanding the meaning of a spoken sentence has to be improved
The Cloud allows for exponential growth of the learning curve (Microsoft has 350 possibilities per second to learn)

Search This Blog