1 0


Presentation for Comprehensive 2

On Github tbekolay / comp2

Biologically inspired methods inspeech recognition and synthesis:

Closing the loop

Trevor Bekolay Centre for Theoretical Neuroscience Follow along:

argminuTarget cost⏞∑1,nC(un,tn)Join cost⏞∑2,nC(un−1,un)
Lots of data + lots of compute power


Train an articulatory synthesizer to repeat utterances.


x⇒End-effector position⇒Auditory featuresq⇒Joint coordinates⇒Control parameters
Control signal:(q,˙q,¨xd)→u
Evaluation: E=∫d(xd,x)

A phenomenological model of the synapse between the inner hair cell and auditory nerve: long-term adaptation with power-law dynamics.Zilany, et al. The Journal of the Acoustical Society of America, 126:2390–2412, 2009. A computational model of filtering, detection, and compression in the cochlea.R. F. Lyon. In Proceedings of IEEE-ICASSP-82, 1282-1285, 1982.

Modeling consonant-vowel coarticulation for articulatory speech synthesis.Birkholz. PloS one, 8(4):e60603, 2013.

Normal Angry Scared

Name Description Min. Max Unit HX Horizontal hyoid position 0.0 1.0 HY Vertical hyoid position -6.0 -3.4 cm JX Horizontal jaw displacement -0.5 0.0 cm JA Jaw angle -7.0 0.0 deg LP Lip protrusion -1.0 1.0 LD Vertical lip distance -2.0 4.0 cm VS Velum shape 0.0 1.0 VO Velic opening -0.1 1.0 TCX Tongue body center X -3.0 4.0 cm TCY Tongue body center Y -3.0 1.0 cm TTX Tongue tip X 1.5 5.5 cm TTY Tongue tip Y -3.0 2.5 cm TBX Tongue blade X -3.0 4.0 cm TBY Tongue blade Y -3.0 5.0 cm TRX Tongue root X -4.0 2.0 cm TRY Tongue root Y -6.0 0.0 cm TS1 Tongue side elevation 1 -1.4 1.4 cm TS2 Tongue side elevation 2 -1.4 1.4 cm TS3 Tongue side elevation 3 -1.4 1.4 cm TS4 Tongue side elevation 4 -1.4 1.4 cm MA1 Minimum area tongue back region 0.0 0.3 cm2 MA2 Minimum area tongue tip region 0.0 0.3 cm2 MA3 Minimum area lip region 0.0 0.3 cm2

minuC(u)=uTNus.t. J¨q=¨xref−˙J˙q¨xref=¨xd+Kd(˙xd−˙x)+Kp(xd−x)

Learning to control in operational space. Peters & Schaal. The International Journal of Robotics Research 27(2), 197-212, 2008

Extension 1

Generalize several utterances of the same category

Evaluation: User studies Learning movement primitives. Schaal et al.Robotics Research 561-572, 2005. Learning attractor landscapes for learning motor primitives. Ijspeert et al. NIPS 2003: 1547-1554, 2003.

Extension 2

Classify utterances corresponding to different categories

Extensions of recurrent neural network language model. Mikolov et al. In ICASSP 2011, 2011. Learning recurrent neural networks with hessian-free optimization. Martens & Sutskever. ICML '11, 2011.

Thank you

This presentation:

My progress: