Scaling WebRTC Audio
for Gaming and other Applications
Ross Kukulinski / @rosskukulinski / SpeakIt.io
SpeakIt.io is providing a browser-based collaboration environment
for distributed teams and remote employees.
WebRTC is Peer-to-Peer (usually)
Peer-to-Peer is awesome
Until it isn't
End to end encryption, decentralization, reduced server load/complexity
Audio Vocabulary 101
Transcoding
Mixing
Acoustic echo cancelation (AEC)
Fully-Meshed Architecture
Traditional WebRTC p2p conference
No central server (except for signaling)
No central point of failure
Participants can come and go
Con: More sophisticated endpoints (audio mixing)
Larger Fully-Meshed Architecture
Star Mesh: Endpoint as Mixer
Still p2p
Mixing endpoint (A) can't leave
As number of participants goes up, A's requirements increases
COULD do more complex system with multiple supernodes, but increases
client side complexity
Also, how do you decide which end point is the mixer?
Multi-Star
Scales better, but still how do you decide which end point to be a mixer
Also: more complex if B wants to mute E.
Multipoint Control Unit (aka 'Media Server')
Offloads mixing computation
Reduces bandwidth
Density important (how many streams can you mix?)
Con: Central point of failure
Larger MCU
Offloads mixing computation
Reduces bandwidth
Density important (how many streams can you mix, transcode?)
Con: Central point of failure still
Advantages of MCUs
- Offloads processing from endpoints
- Recording / Transcription
- Re-broadcast (podcasts, live gaming events, etc)
- Sound Effects / Text-to-Speech / Music
SpeakIt WebSocket Mixing Cluster
SpeakIt PeerConnection Mixing Cluster
So, that's cool. Now what?
Analyze your requirements
Roll your own vs Commercial
Open Source vs Off-the-shelf vs Hosted
Thanks!
Ross Kukulinski
ross at SpeakIt.io
@rosskukulinski