Happy Talk

BelfastJS, 10/12/14

Peter Gasston

@stopsatgreen

broken-links.com

We are born to talk. Typing is a barrier to communication. Moviemakers know this.

Amazon : ‘Alexa’

Apple : Siri

Google : Voice Search

Microsoft : Cortana

Google clear winner according to https://www.stonetemple.com/great-knowledge-box-showdown/

55% of teens

41% of adults

use voice search every day*

*maybe

From Google research, but sources not provided. Could be ‘of teens who use, 55% use every day’. http://googleblog.blogspot.co.uk/2014/10/omg-mobile-voice-survey-reveals-teens.html

10% of Baidu

search queries are by voice

That’s ~500m per day

Character input is hard, plus high rural illiteracy. http://blogs.wsj.com/digits/2014/11/21/baidus-andrew-ng-on-deep-learning-and-innovation-in-silicon-valley/ http://iaminchina.wordpress.com/2010/04/13/crowded-street-in-xian/

Synthesis

Long history of replicating voice with sound (Brazen Heads back to ~12th C.) but first systems emerged in 1960s. Bell Labs 1961 sang Daisy Bell, coincidentally Arthur C. Clarke was visiting. Today Stephen Hawking uses system with old voice as it’s ‘his’.

Chrome/Safari

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt);
window.speechSynthesis.speak(say);

Play

SSU Attributes

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt);
	say.lang = 'en-GB';
	say.pitch = 0.75;
	say.rate = 1.5;
	say.volume = 0.5;
window.speechSynthesis.speak(say);

Play

SpeechSynthesis Methods

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt);
window.speechSynthesis.speak(say);
window.speechSynthesis.pause(say);
window.speechSynthesis.resume(say);
window.speechSynthesis.cancel(say);

Play (Safari)

SpeechSynthesis Attributes

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt),
	speak = window.speechSynthesis.speak(say);
if (speak.pending) {}
if (speak.speaking) {}
if (speak.paused) {}

SSU Events

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt);
say.onstart = function () {};
say.onpause = function () {};
say.onresume = function () {};
say.oncancel = function () {};
say.onerror = function () {};
say.onend = function () {};
window.speechSynthesis.speak(say);

Play (Safari)

Synthesis As A Service

http://developer.att.com/apis/speech https://ws.neospeech.com/ https://www.cereproc.com/en/products/cloud http://www.ivona.com/en/for-business/speech-cloud/

Neospeech

https://tts.neospeech.com/rest_1_1.php?method=ConvertSimple&email=mail@example.com&accountId=abcd1234&loginKey=LoginKey&loginPassword=123abc45de&voice=TTS_PAUL_DB&outputFormat=FORMAT_WAV&sampleRate=16&text=Hello+Belfast+JS

<response conversionNumber="28" resultCode="0" resultString="success" status="Queued" statusCode="1"/>

https://tts.neospeech.com/rest_1_1.php?method=GetConversionStatus&email=mail@example.com&accountId=abcd1234&conversionNumber=28

<response statusCode="1" downloadUrl="https://tts.neospeech.com/audio/a.php/23841309/d44caf624653/result_26.wav" resultCode="0" resultString="success" status="Queued"/>

(Neospeech demo)

Play

SSML

<speak version="1.0" etc>
  <p>
	<s>Hello Belfast.</s>
	<s>This is <prosody rate="-20%">SSML</prosody>.</s>					</s>
  </p>
</speak>

Recognition

Developed by Bell in 1952. Could recognise numbers spoken by one person. [Get screengrab / find picture]. 1970s Carnegie Mellon HARPY could recognise 1,000 words. 1980s Hidden Markov method [Teddy Ruxpin]. Chops waves into phonemes and attempts to form words.

Challenges

Accents Multiple users Multiple languages Scottish + Siri

Web Speech API

var recog = new SpeechRecognition();

x-browser

var speechRecognition = (
	window.SpeechRecognition ||
	window.webkitSpeechRecognition
);
var recog = new speechRecognition();

SpeechRecognition Methods

var recog = new SpeechRecognition();
recog.start();
recog.stop();
recog.abort();

SpeechRecognition Events

var recog = new SpeechRecognition();
recog.onresult = function () {};
recog.onnomatch = function () {};
recog.onerror = function () {};

SpeechRecognitionError interface for reporting errors.

MVS

var recog = new SpeechRecognition();
recog.onresult = function (result) {
	output.textContent = results[0][0].transcript;
};
btn.onclick = recog.start();

SpeechRecognitionEvent, results list

SpeechRecognition Events

start audiostart soundstart speechstart speechend soundend audioend end

Interim Results

var recog = new SpeechRecognition();
recog.interimResults = true;

recog.onresult = function (result) {
  var thisResult = result.results[0],
	transcript = thisResult[0].transcript;
  if (thisResult.isFinal) {
	finalOutput.textContent = transcript;
  } else {
	interimOutput.textContent = transcript;
  }
};
btn.onclick = recog.start();

Interim Results

Continuous

var recog = new SpeechRecognition();
recog.continuous = true;
recog.onresult = function (result) {
  output.textContent = result.results[0][0].transcript;
};

btn.onclick = function () {
  if (listening) {
	recog.stop();
  } else {
	recog.start();
  }
}

SpeechRTC +

Web Speech API

Node online, Web Workers offline. https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web

(Picard demo)

Basically just matching a string / regex

JuliusJS

var recog = new Julius();
						recog.onrecognition = function (result) {
							console.log(result);
						}

Wit.ai : Node API

Web Speech API + text Direct speech : GuM, Web Audio API http://blog.groupbuddies.com/posts/39-tutorial-html-audio-capture-streaming-to-node-js-no-browser-extensions

Wit.ai : Microphone.js

WebRTC. Opinionated. Gives you a handful of methods & events, no fine control. http://localhost/~petergasston/prototypes/mucking-about/wit/

Wit.ai : Response

The End

Happy Talk BelfastJS, 10/12/14

Happy Talk – BelfastJS, 10/12/14

stopsatgreen

Happy Talk – BelfastJS, 10/12/14

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

belfastjs

Happy Talk

BelfastJS, 10/12/14

Peter Gasston

@stopsatgreen

broken-links.com

Amazon : ‘Alexa’

Apple : Siri

Google : Voice Search

Microsoft : Cortana

55% of teens

41% of adults

use voice search every day*

*maybe

10% of Baidu

search queries are by voice

That’s ~500m per day

Synthesis

Chrome/Safari

SSU Attributes

SpeechSynthesis Methods

SpeechSynthesis Attributes

SSU Events

Synthesis As A Service

Neospeech

(Neospeech demo)

SSML

Recognition

Challenges

Web Speech API

x-browser

SpeechRecognition Methods

SpeechRecognition Events

MVS

SpeechRecognition Events

Interim Results

Interim Results

Continuous

SpeechRTC +

Web Speech API

(Picard demo)

JuliusJS

Wit.ai : Node API

Wit.ai : Microphone.js

Wit.ai : Response

The End

0 0