Happy Talk – BelfastJS, 10/12/14



Happy Talk – BelfastJS, 10/12/14

0 0


belfastjs


On Github stopsatgreen / belfastjs

Happy Talk

BelfastJS, 10/12/14

Peter Gasston

@stopsatgreen

broken-links.com

We are born to talk. Typing is a barrier to communication. Moviemakers know this.

Amazon : ‘Alexa’

Apple : Siri

Google : Voice Search

Microsoft : Cortana

Google clear winner according to https://www.stonetemple.com/great-knowledge-box-showdown/

55% of teens

41% of adults

use voice search every day*

*maybe

From Google research, but sources not provided. Could be ‘of teens who use, 55% use every day’. http://googleblog.blogspot.co.uk/2014/10/omg-mobile-voice-survey-reveals-teens.html

10% of Baidu

search queries are by voice

That’s ~500m per day

Character input is hard, plus high rural illiteracy. http://blogs.wsj.com/digits/2014/11/21/baidus-andrew-ng-on-deep-learning-and-innovation-in-silicon-valley/ http://iaminchina.wordpress.com/2010/04/13/crowded-street-in-xian/

Synthesis

Long history of replicating voice with sound (Brazen Heads back to ~12th C.) but first systems emerged in 1960s. Bell Labs 1961 sang Daisy Bell, coincidentally Arthur C. Clarke was visiting. Today Stephen Hawking uses system with old voice as it’s ‘his’.

Chrome/Safari

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt);
window.speechSynthesis.speak(say);
Play

SSU Attributes

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt);
	say.lang = 'en-GB';
	say.pitch = 0.75;
	say.rate = 1.5;
	say.volume = 0.5;
window.speechSynthesis.speak(say);
Play

SpeechSynthesis Methods

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt);
window.speechSynthesis.speak(say);
window.speechSynthesis.pause(say);
window.speechSynthesis.resume(say);
window.speechSynthesis.cancel(say);
Play (Safari)

SpeechSynthesis Attributes

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt),
	speak = window.speechSynthesis.speak(say);
if (speak.pending) {}
if (speak.speaking) {}
if (speak.paused) {}

SSU Events

var txt = 'Hello world',
	say = new SpeechSynthesisUtterance(txt);
say.onstart = function () {};
say.onpause = function () {};
say.onresume = function () {};
say.oncancel = function () {};
say.onerror = function () {};
say.onend = function () {};
window.speechSynthesis.speak(say);
Play (Safari)

Synthesis As A Service

http://developer.att.com/apis/speech https://ws.neospeech.com/ https://www.cereproc.com/en/products/cloud http://www.ivona.com/en/for-business/speech-cloud/

Neospeech

https://tts.neospeech.com/rest_1_1.php?method=ConvertSimple&email=mail@example.com&accountId=abcd1234&loginKey=LoginKey&loginPassword=123abc45de&voice=TTS_PAUL_DB&outputFormat=FORMAT_WAV&sampleRate=16&text=Hello+Belfast+JS
<response conversionNumber="28" resultCode="0" resultString="success" status="Queued" statusCode="1"/>
https://tts.neospeech.com/rest_1_1.php?method=GetConversionStatus&email=mail@example.com&accountId=abcd1234&conversionNumber=28
<response statusCode="1" downloadUrl="https://tts.neospeech.com/audio/a.php/23841309/d44caf624653/result_26.wav" resultCode="0" resultString="success" status="Queued"/>

(Neospeech demo)

Play

SSML

<speak version="1.0" etc>
  <p>
	<s>Hello Belfast.</s>
	<s>This is <prosody rate="-20%">SSML</prosody>.</s>					</s>
  </p>
</speak>

Recognition

Developed by Bell in 1952. Could recognise numbers spoken by one person. [Get screengrab / find picture]. 1970s Carnegie Mellon HARPY could recognise 1,000 words. 1980s Hidden Markov method [Teddy Ruxpin]. Chops waves into phonemes and attempts to form words.

Challenges

Accents Multiple users Multiple languages Scottish + Siri

Web Speech API

var recog = new SpeechRecognition();

x-browser

var speechRecognition = (
	window.SpeechRecognition ||
	window.webkitSpeechRecognition
);
var recog = new speechRecognition();

SpeechRecognition Methods

var recog = new SpeechRecognition();
recog.start();
recog.stop();
recog.abort();

SpeechRecognition Events

var recog = new SpeechRecognition();
recog.onresult = function () {};
recog.onnomatch = function () {};
recog.onerror = function () {};
SpeechRecognitionError interface for reporting errors.

MVS

var recog = new SpeechRecognition();
recog.onresult = function (result) {
	output.textContent = results[0][0].transcript;
};
btn.onclick = recog.start();
SpeechRecognitionEvent, results list

SpeechRecognition Events

start audiostart soundstart speechstart speechend soundend audioend end

Interim Results

var recog = new SpeechRecognition();
recog.interimResults = true;
recog.onresult = function (result) {
  var thisResult = result.results[0],
	transcript = thisResult[0].transcript;
  if (thisResult.isFinal) {
	finalOutput.textContent = transcript;
  } else {
	interimOutput.textContent = transcript;
  }
};
btn.onclick = recog.start();

Interim Results

Continuous

var recog = new SpeechRecognition();
recog.continuous = true;
recog.onresult = function (result) {
  output.textContent = result.results[0][0].transcript;
};
btn.onclick = function () {
  if (listening) {
	recog.stop();
  } else {
	recog.start();
  }
}

SpeechRTC +

Web Speech API

Node online, Web Workers offline. https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web

(Picard demo)

Basically just matching a string / regex

JuliusJS

var recog = new Julius();
						recog.onrecognition = function (result) {
							console.log(result);
						}

Wit.ai : Node API

Web Speech API + text Direct speech : GuM, Web Audio API http://blog.groupbuddies.com/posts/39-tutorial-html-audio-capture-streaming-to-node-js-no-browser-extensions

Wit.ai : Microphone.js

WebRTC. Opinionated. Gives you a handful of methods & events, no fine control. http://localhost/~petergasston/prototypes/mucking-about/wit/

Wit.ai : Response

The End

Happy Talk BelfastJS, 10/12/14