Confessions of a Speech Recognition Consultant

Thursday, July 19, 2012

Time for an update!

Well it's been far too long since I last blogged . . . . I've been buried in several large projects with clients over the past year and have barely looked up. As they've all been wrapped up, I've decided to start a new chapter in my career and have accepted an offer to join Lumevox LLC of San Diego, as Senior Director of Client Services.

Essentially, I'll be doing much of the same work that I have for the past 8 years in my consulting practice, only with a great team working for me to support the efforts. My new team includes the technical support staff, training team and professional service delivery teams. Our collective job is to enable Lumenvox customers and partners to succeed in their speech recognition based projects.

Lumevox produces ASR, TTS and Call Progress Analysis engines along with the supporting tools and services. You can check them out at http://www.lumenvox.com.

For now, I'll be working part of the time in San Diego at Lumenox's headquarters and part of the time remotely in Seattle, where I live.

If you're planning a trip to New York for SpeechTek 2012, be sure to stop by Lumenvox's booth and say "hello", also check out the program at SpeechTek, I'll be making a couple of presentations.

Stay tuned and I'll try to ramp back up my blogging and tweeting, especially from SpeechTek.

Cheers!

Wednesday, August 17, 2011

Observations from SpeechTek 2011 - New York

I'm just back from SpeechTek, the major industry conference in the Speech Recognition / Text-To-Speech / Voice Biometrics industry. I spent three great days attending sessions, catching up with friends in the industry and see the latest offerings from vendors in the space. I've been attending this conference for better than a dozen years now and it's interesting to see how the industry has evolved and matured during that time. As I flew home to Seattle, I jotted down a few of my thoughts and observations. Three Themes seemed to run through the conference:

· Cloud Computing
· Analytics, Analytics & Analytics
· Smart phones (and multi-modal applications)

and each of themes converged to produce a trend I'd call Adaptive Personalization.

Cloud Computing
I've said it before and it's worth repeating, the clouds are gathering! By that, I mean that the speech recognition industry (and it's related applications) are running full speed towards the trend in cloud computing. In fact, I think it may be the vanguard of that advance. So many major customer self-service applications today run in the cloud on platforms like Microsoft's TellMe, Nuance's BeVocal, Voxeo, Angel.com or others that it would be impossible to argue it's not a full-fledged trend. Millions of automated self-service calls (both inbound and outbound) pass through each of these today. Supporting this growth of Cloud Computing related to speech recognition is the parallel movement of applications and data to the cloud that's being driven by the advent of Apple's iPad (and other tablet computers) along with the ever growing use of Smart Phones. Both of these items share a common trait, much of their application smarts or functionality come from cloud based services and data using a model in which the device is primarily a presentation layer in an application and the functional work and data storage are largely handled in a cloud based platform or platforms. Many of these applications are even mashups which aggregate data and services from multiple cloud based applications. A whole new generation of speech applications are cloud based, using the cloud for application functionality, speech recognition, voice biometrics and data aggregation from multiple sources. This approach allows for incredibly rich applications with access to large data sets far beyond the limited processing power and storage capabilities of the typical individual smart phone.

Analytics, Analytics & Analytics
If there was a single buzzword that prevailed at SpeechTek it was Analytics. The use of the term was so prevalent and so overloaded that it almost lost all meaning (the true sign of a buzzword). Every presentation, every piece of product literature, every vendor booth in the exhibit hall had some reference to analytics. Despite the overuse of the term, it was clear that it represents a major trend in the industry and I believe on that offers the potential of significant benefit to the end users of these systems. Perhaps we can look to the web and the evolution of e-commerce for some clues for what lies ahead in the speech industry. Analytics has found wide use on the Internet as a tool to understand user behavior, customer needs and help companies provide more carefully filtered and tailored information to users.

In reality, I think we saw three distinct applications of analytics (Analytics is defined as the science of analysis. A simple and practical definition, however, would be the application of computer technology, operational research, and statistics to solve problems): (1) Using analytics as a discovery tool in customer service operations to help identify hot spots or problems (such as issues in self-service speech applications or e-commerce web sites), (2) Using analytics (and computerized semantic processing) to process data from a variety of channels (Twitter, Facebook, email, blogs, etc., speech based self-service applications) to identify trends and customer issues and (3) Using analytics and data from all customer interface modalities (web, smartphone, IVR, call center agents, SMS messages, Twitter, etc.) to model and infer meaning & intent for individual customers. I believe that this third use is the most significant and potentially most game changing of the three.

Smartphone (and multi-modal applications)
With the rapid growth of smart phones, iPads and similar data/voice enabled portable devices, we're seeing a new generation of applications emerge. The availability of voice, Internet and background access to large amounts of data (especially real time data) a new generation of mobile applications that are truly multi-modal, that is they are capable of accepting typed and spoken inputs and delivering visual and audible outputs. This gives users a choice in their preferred communications channel and opens up these devices to more effective and efficient means of delivering complex data, such as lists which don't lend themselves to audio output. A good example of this type of mixed mode application is Nuance's "Dragon Go!" which is available on the iPhone or iPad. With this application, you can speak a simple query phrase. The application captures your utterance, ships it off to be processed in "the cloud" using natural language understanding and returns search results form multiple data sources in visual form. You can get more information about the application from Nuance's web site or Apple's App Store.

Adaptive Personalization
The convergence of these three: Cloud Computing, Analytics and Multi-modal applications offers us the most compelling theme of all. By having access to large amounts of data and computing power in the cloud, combined with the "intelligence" that can be gleaned from analytic (which can process information about the user from a variety of sources and channels) with the powerful presentation and input possibilities of multi-modal applications, we can make a leap forward to a "brave" new world" where applications understand the context of our actions across multiple channels and products and present us with information, help or services tailored to exactly what we need and exactly when we need it. I'm calling this trend Adaptive Personalization. This kind of personalization goes far beyond the kind of customizing we see in things like a search query using your location data to constrain the choices presented.

An example of this kind of adaptive personalization might be for the customer of a financial services company or bank who is applying for a loan on the institutions web site, when they encounter a question or issue not address in the online application process. Imagine that they might grab their cell phone and call the institution's customer service number for assistance.

When they reach the customer service number the applications identifies the caller from their cell phone ANI information and then rather than presenting them with a natural language question or deep menu of choices, through the use of analytics the application can see their most recent activity was on the loan application process on the web and offer them the option of being transferred directly to a loan specialist to assist. One early example of a product that supports this in a product is Genesys's Conversation Manger.

I don't think it will be too many years before this will be common place in advanced customer service environments. When melded with information about customer channel preferences and proactive notification it will completely turn the customer experience inside out, and in a good way.

That's my two cents worth, let me know that you think or feel free to add you own ideas and observations in the comments. If you'd like to see my tweets from the conference (and those of the other attendees) search using the tag #SpeechTek.

Monday, May 10, 2010

GM = Google Motors?

Thanks to Dan Miller at Opus Research for bringing this to my attention. I've included a hyper-link back to his original blog of the story.

After Ford showcased the full spectrum SYNC services on a sub-$16K Fiesta (even taking Kara Swisher for a test sit), GM appears prepared to counter with a broad variety of wireless mobile apps offered in conjunction with Google. In this article in Motor Trend Todd Lassa lays out the basics of a relationship whereby the the “open” Android operating system would be licensed for use in GM automobiles.

Lassa asserts that the GM/Google relationship would place emphasis on a better phone-to-car interface, as opposed to the voice control and voice user interface that Microsoft’s Speech Application Group has played up. Thus GM’s approach will enable drivers to use their phones to do such things as start or turn off their cars, lock and unlock doors, and make other adjustments. It was not spelled out explicitly in the article, but given Google’s efforts to invoke automated speech recognition whenever a keyboard comes into play on a mobile device, it is highly likely that all of these functions can be voice controlled – making starting your car another “speechable moment”.

As for the supposition that Android in the car spells the end of OnStar, that is highly unlikely. Lassa notes that turn-by-turn directions through OnStar would become unnecessary because Android phones using Google Maps and a special mount have been successfully deployed for in-car navigation. But OnStar has been sold more as a safety feature and remote diagnostic service. The Android operating system in the car is more likely to augment, rather than compete with OnStar.

The prospects for more automobile-based Android apps is provocative. The car is destined to be the most fertile spawning ground for speech-based apps and the prospects for Android-oriented developers to define a range of “hands-on-the-wheel/eyes-facing-forward” capabilities and activities is very promising. Meanwhile, Ford remains ahead of the game with a well-defined, and now time tested, suite of voice control applications for frequent activities like carrying out phone conversations, messaging and controlling the car’s entertainment system.

Friday, April 30, 2010

Voice Biometrics Conference Next Week

I'll be in the New York area next week for Opus Research's Voice Biometrics Conference. It's being held at the Hyatt Regency Jersey City on the Hudson. If you're attending and we've not had a chance to meet in person feel free to say hello. it's not too late to register at http://www.voicebiocon.com/. There is a great lineup of speakers and sessions.

You can follow my comments live from the conference on Twitter at http://twitter.com/jeff_hopper or using the tag #voicebiocon.

Tuesday, March 30, 2010

Nuance Shutters SpinVox Consumer Service

It's been just over 3 months since Nuance acquired SpinVox, saving the company from a death spiral. Since the acquisition a string of news stories have surfaced about the financial shenanigans that went on at SpinVox prior to Nuance's purchase. It's a shame that Spinvox fell to such lows. They had a great idea and were developing credible technological solutions.

A recent and not too surprising post on SpinVox's website indicates that they will discontinue their consumer offerings allowing them to focus on their carrier and network operator business.

A Twitter post from SpinVox stated: "We regret to inform you that SpinVox is no longer supporting individual user accounts. Your account will expire in 7 days. "

No word yet on what they intend to do with consumer customers of Jott, the Seattle based competitor of SpinVox that Nuance also purchased.

Wednesday, January 20, 2010

Speaker Authentication using Voice Biometrics - Now's the time!

I'm working on a project for one of my clients who's interested in using voice biometrics to authenticate callers. Voice biometrics uses the unique ways that individuals formulate phonemes to create a voice signature that can be used to validate a person's identity at a later time. It seems that the actual use cases today are fairly narrow, mostly password reset applications such as those from Nuance Communications or access to corporate based auto-attendants.

Based on the research I've been doing for my client, several recent environmental and economic changes make this a compelling time to investigate integrating voice biometric based authentication into your transactional environment.

Three factors make this so:

The advent of so many hosted SaaS (software as a service) offerings from experienced voice services firms like Voxeo, TradeHarbor, BeVocal, Convergys, Authentify, Angel.com, CSIdentity, PhoneFactor and others.
Ubiquity of telephones (both land line and cellular) as a transaction end point for authentication across all channels. Even internet based transactions can use outbound phone calls to reach a user to authenticate them.
The ability to combine speech recognition and voice authentication to achieve true multi-factor authentication and the corresponding higher confidence in the security provided by using speech recognition to gather content (something the user knows) and voice authentication (some the user is).

The move to SaaS offerings is a real game changer, significantly lowering the barrier of entry by lowering the cost and the integration complexity with existing applications - regardless of the channel. Since it's voice and there is a phone available almost everywhere to use as the authentication end-point there is no need to invest in expensive dedicated hardware like fingerprint scanners and cameras for facial recognition.

When faced with the need for more secure access to transactions across a variety of channels (phone, web, smart phones, etc.) voice based authentication can provide high confidence, secure, multi-factor authentication with a lower capital expenditure, less complexity and quicker time to implementation that any other biometric solution that I've examined.

Friday, July 24, 2009

On the BBC news story bashing SpinVox

I read with interest this morning a BBC story about SpinVox which suggests that the majority of messages on it's platform have been heard and transcribed by call centre staff in South Africa and the Philippines rather than being transcribed into text using speech recognition technology.

The article goes on to say that messages appear to have been read by workers outside of the European Union raises questions about the firm's data protection policy. SpinVox's entry on the UK Data Protection Register says it does not transfer anything outside the European Economic Area.

Anyone with a working knowledge of the voice mail to text transcription industry (which includes other vendors like Jott Networks and Google Voice) understands that no speech recognition process available today can achieve perfectly accurate automated transcriptions for large numbers of voice mail messages from thousands of different callers and the wide variety of audio quality typical of phone calls, especially those like poor cellular connections.

Today, almost everyone working in this space uses a combination of speech recognition technologies and human (read: caller center agent) based quality assurance (q/a) to obtain transcriptions of a usable quality. The human touch adds two elements: first, it can edit out errors from the automated transcription process and secondly, the markup data from the human q/a agents can be used to further refine the recognition process.

In the rare cases where no human q/a is used before delivering the transcription to the end users, the quality of the transcription almost always suffers. By example, has anyone seen "Great" transcription yet from Google Voice?

Unfortunetly, the economic model playing out in this industry forces this q/a work to off-shore or third world call centers.

The BBC story is important in it's discussion of the data security issues. So far, none of these services has provided sufficent details about the processes they use to assure data security and it does appear on the surface that SpinVox may be violating the EU Data Protection Policy that it's committed to. To quote Ross Perot: "The nut's in the detail". We've not yet seen enough detail to know much about the "nut".

I've used Jott, Google Voice and SpinVox myself (in fact I currently use SpinVox on my cellular voice mail) and I've found all to be useful but none to be superbly accurate in their transcriptions. However the services with human based q/a have faired much better.

What's your experience been? Are you concerned about the security of your message content when using these services for voicemail transcription? I'd be interested in hearing your comments!

Confessions of a Speech Recognition Consultant

Thursday, July 19, 2012

Time for an update!

Wednesday, August 17, 2011

Observations from SpeechTek 2011 - New York

Monday, May 10, 2010

GM = Google Motors?

Friday, April 30, 2010

Voice Biometrics Conference Next Week

Tuesday, March 30, 2010

Nuance Shutters SpinVox Consumer Service

Wednesday, January 20, 2010

Speaker Authentication using Voice Biometrics - Now's the time!

Friday, July 24, 2009

On the BBC news story bashing SpinVox

About Me

Blog Archive

Podcast

My Blog List

Twitter Updates

Twitter Updates

Delicious/jeff.hopper

Call me via Skype

Followers

Google Book Search

Confessions of a Speech Recognition Consultant

Thursday, July 19, 2012

Time for an update!

Wednesday, August 17, 2011

Observations from SpeechTek 2011 - New York

Monday, May 10, 2010

GM = Google Motors?

Friday, April 30, 2010

Voice Biometrics Conference Next Week

Tuesday, March 30, 2010

Nuance Shutters SpinVox Consumer Service

Wednesday, January 20, 2010

Speaker Authentication using Voice Biometrics - Now's the time!

Friday, July 24, 2009

On the BBC news story bashing SpinVox

About Me

Blog Archive

Podcast

My Blog List

Twitter Updates

Twitter Updates

Delicious/jeff.hopper

Call me via Skype

Followers

Google Book Search

RSS Subscription