webrtc-can-be-easy



webrtc-can-be-easy

0 1


webrtc-can-be-easy

WebRTC can be easy!

On Github michelle / webrtc-can-be-easy

WebRTC is easy

^

can be
Thanks for the introduction, Chris! I'm really excited to be here today--this is the first time I've given a talk at a web conference, and I was super nervous, so in the days leading up to today I practiced talking to myself a lot. It feels weird at first, but I think I've gotten a hang of it, somewhat. We'll see. Since I'm pretty used to the flow of doing these talks live, I'll answer any questions that come up at the end. So, hopefully everyone can see my slides now. For those following along with their own copy of the slides, you'll hopefully be able to see what slide I'm on in the URL bar. Today, I'll be talking to everyone about a technology called WebRTC: what it is, how it works, and how you can play around with it. If you already know everything about WebRTC, this talk probably won't be too exciting for you. A lot of people I've talked to before have mentioned finding WebRTC intimidating--in particular, it's a pretty frustrating technology to develop with, and I agree with that on many counts. However, my goal today is to show you all that WebRTC can be easy.
A little bit about myself, first: I'm a product engineer at a company called Stripe. We build an API and related tools to help developers and businesses process all sorts of payments, from credit card payments to ACH, Alipay to ACH. We're located...
We're located in sunny San Francisco, California, and I'm currently in one of our conference rooms. This one's called Zebra, as you can probably tell from the pictures behind me.
Now, we don't actually use WebRTC here at Stripe, but we do use Google Hangouts for our ultra-time-sensitive communications. I don't know if you've ever experienced this, but oftentimes, Google Hangouts will be pretty laggy. Chats will appear out of order momentarily, and often take more than 5 seconds to be delivered. I wish I had a dime everytime I get this message. So, we could just turn around and talk to each other face-to-face, but sometimes I find it hard to express myself fully without being able to use emoji. Jokes about in-person-communication aside, the way Hangouts works right now doesn't make too much sense when I'm talking to a person who's sitting 20 feet away from me. My message has to travel from my computer here in San Francisco, to Stripe's router, all the way to some..
..random data center in Dowles, Oregon.
That's over 600 miles (or 1000 km) away! My message not only takes the time to physically travel across the wire to Oregon,
but once it's there, I no longer have control over what I sent! It's out there. On someone random lurker's (or Google's) server!
And after all that, it still needs to make that 600 mile/1000 km journey back to Stripe, where my message will end up 20 feet from where it started.
So, let's see if we can't eliminate some of parts of this process. Here's the lifecycle of a Google Hangouts chat message. The chicken emoji represents a chat message that I commonly send, and you can see some third party servers up top.

Why not this?

Well, it's hard.

Why can't we just send packets directly to each other's networks, if we know our IP addresses? (fragment) Well, it's hard.

NAT and firewalls

Some of you probably know much more than I do about NAT, but I'll explain it briefly. NAT is most simply a sort of IP mask. It stands for network address translator, and it turns your internal ip address (192.168.something.something) to a public IP address. This means that if two browsers want to talk to each other, they have to figure out (a) if one or both are behind NAT and (b) what each others' actual IPs are.

:(

Additionally, even if we were to get past NAT and make a connection, we'd want our data to go over the wire securely, so I can't steal your chicken. It's best to have a standard for this, rather than rolling your own. Finally, web browsers have historically used TCP as its transport protocol of choice. For many peer-to-peer applications, such as real time games and video streams, TCP becomes too slow. The good news is that WebRTC has solved these issues for us.

WebRTC is peer-to-peer video, audio, and data in the browser

WebRTC stands for Web real time communications, and is a set of standardized APIs for peer to peer video, audio, and data in the browser. It's pluginless, meaning you can send files and stream media without having to download any apps or browser extensions.

WebRTC is getUserMedia, RTCPeerConnection, and RTCDataChannel

More specifically, it's mainly made of of these three components: getUsermedia, which you heard about at the Web v. Native talk yesterday.

NAT traversal

End-to-end security with DTLS

New transport protocols available (UDP and SCTP)

With WebRTC in browsers, there's now built in support for NAT traversal, built in end-to-end encryption with DTLS, and new transport protocols available! What's really cool about WebRTC is that it shifts the paradigm of apps in the browser. We're entering a world where the server never has to touch any data. And even beyond that, it's pretty cool that you can now use UDP (which is unreliable transport) and SCTP in the browser. UDP is unreliable transport, and is useful for any application where the order and reliability of sending messages doesn't matter as much, such as realtime games or video streaming. SCTP--is reliable transport that's speedier and more secure than TCP

Demo!

Now that your mind is full of all these transport protocols and APIs, let's make them a little less abstract and see a pluginless, WebRTC video chat in action! (NEXT)
http://cdn.peerjs.com/demo/videochat/demo.html I've set up a second laptop at my desk. It's a few hundred feet from this conference room, so I hope the connection is fast, or you'll all laugh at me. http://cdn.peerjs.com/demo/videochat/demo.html Okay, so I'll just open up this tab with the demo here... Just gotta allow access to my mic/video, and we'll be able to sneak a peek at what people are talking about around my desk. Hopefully it's nothing too confidential. Whoo! Usually I do this demo with a person on the other end, to prove it's not prerecorded or something. It didn't work out this time, but trust me, it's not prerecorded! I'm actually directly streaming to and from a computer in the other room, without going through any third party server. It's pretty nice quality on my end, but because you may be seeing a stream through a stream, I'm not sure how nice it'll seem.

How did that work?

Let's walk through a simple peer-to-peer chat between

and
. So, how did that work? Let's walk through an example chat connection between blue box and purple box.
// Purple: I want to have a video chat with my friend!
var purpleConnection = new RTCPeerConnection(...);

          
// I'll create a data channel to relay chat messages with my friend.
var p2bChat = purpleConnection.createDataChannel('CHAT', ...);

          
// and another one for sending files.
var p2bFiles = purpleConnection.createDataChannel('FILES', ...);

          
// I'll create a video/audio stream so we can videochat.
navigator.getUserMedia({audio: true, video: true}, function(stream) {

  // And I'll add that stream to my connection.
  purpleConnection.addStream(stream);

  ...

});

          
What I'm about to explain is a very simplified version of the raw WebRTC API, so if something seems magical right now to you, I'll probably come back and fill in the blanks later. The pseudowebrtc in the next two slides can all run on the client side, in your browser, which is a webRTC client. I would first create an RTCPeerConnection object with some configuration object. At this point I should decide if I want to have a video call or just a text chat over DataChannel or even a filesharing session over DataChannel. If I want to create a data channel, at this point I would call createDataChannel on my peer connection object. If I want to add a mediastream, I can use `getUserMedia` to access my webcam and microphone. You'll notice that the `getUserMedia` sticks out a bit. We'll get back to this in a bit, because it's a pretty cool API on its own. Theoretically, there's no reason I can't both create a data channel and add a media stream, because another cool feature of PeerConnections is that they can multiplex many mediastreams/datachannels.
// I want to talk to Blue, so I'll make Blue an offer to chat.
purpleConnection.createOffer(function(offer) {
          
  // I'll save it locally...
  purpleConnection.setLocalDescription(offer, function() {
          
    // ...and pass it on to Blue.
    magicallySend(offer, blueClient);

    // we'll talk about our magical sending apparatus in a bit.
          
  }, errorHandler);
          
});
          
So now that PURPLE has decided who he want to talk to and how I'm going to talk to you, I'll create something called an "offer". The format of the offer is called "SDP", or Session Description Protocol. SDP doesn't actually deliver any media, but rather serves as a way of letting your peer know of your configuration--like the media format or type you want to share, or the transport protocol you're using for your data channel. I record this offer on my peer connection locally using setLocalDescription, then magically send it to you. (typeof offer is RTCSessionDescription) This means that if you ever add a new stream or change an existing stream on your peer connection, we'll have to go through this negotiation process again.
// Blue: I've magically received an offer, and I want to chat.

var blueConnection = new RTCPeerConnection(...);

blueConnection.setRemoteDescription(purpleOffer, function() {
          
  // I'll share my own media, but I only want to share video.
  navigator.getUserMedia({video: true}, function(stream) {
    blueConnection.addStream(stream);
          
    blueConnection.createAnswer(function(answer) {
          
      blueConnection.setLocalDescription(answer, function() {
          
        magicallySend(answer, purpleClient);
          
      }, errorHandler);
          
    });

          
  });

          
}, errorHandler);
          
Blue magically receives purple's offer, and she decides to answer. Blue is a pretty shy, so she's not going to add her camera stream.
// Purple: Amazing! I've magically received an answer.
purpleConnection.setRemoteDescription(blueAnswer, function() {

  // At this point, (as far as we care to know,) the connection is
  // established.

}, errorHandler);

Interlude

Remember getUserMedia?

"FaceKat"

      <aside class="notes">

Before PeerConnection even existed, though, getUserMedia enabled developers to do many cool things with the camera, including CV hacks that let you track a hand or a face--FaceKat, for example, allows you to navigate through vanilla dipping dots by moving your head. There are also cool photo filter apps, including one that lets you

asciify yourself.

shinydemos.com/getusermedia

(from Opera)

and there's even more you can play around with. http://shinydemos.com/getusermedia/ Definitely try some of these out when you gte the chance!

Still pretty simple, theoretically

And cross-platform, cross-browser, etc.

So, offers, answers, local, remote...pretty simple, right? Basically just a simple handshake. And theoretically it's all supposed to just work, cross browser, cross plaform, and have built in security and NAT traversal and UDP But that seems too good to be true. WebRTC is in a much better than it was a year, or even a few months ago. But it's still a reality that it's difficult to get a grasp on how the APIs works across browsers, depending on how to spec browser implementations are. And even if you force all your users to use Chrome, you have to deal with different version of the browser...and mobile versions.
iswebrtcreadyyet.com (&yet)
In this really cool browser support table from iswebrtcreadyyet.com, you can see that there's a lot of red and yellow. and these are the parts you really end up pulling your hair over. They're parts of the API that are not fully up to spec or not interoperable.
iswebrtcreadyyet.com (&yet)
Compared to the browser scorecard from almost a year ago when I first gave this talk, there's not an amazing amount of growth in percentage of the green portions. A lot of the work on WebRTC in recent months has been in nailing down the API and in supporting a broader range of data channel and media stream options.
Similarly, if you try to find info about the webRTC apis on mdn, you might first get a page that tells you that it's outdated... and then you'll go to the page they claim to be migrating to, and find this.
I wanted to make this whole process less of a pain, so I created and to this day maintain this library called PeerJS. It's okay-popular, and people use it for some real things and it scares me to death sometimes. I'll come back to this again later, but I wanted to take you through a pretty memorable bug that I encountered last year.
Last year, there was an issue filed on PeerJS, where mobile devices on Chrome 31/32 could not communicate with desktop browsers of the same version. How strange. So I got my hands on an android device and checked the chrome flags settings.
In Chrome 31, SCTP transport, the type of transport we wanted to see, because that was what was in desktop browsers, was indeed behind a flag, so this was somewhat expected. But even with the flag enabled, I couldn't get a WebRTC connection to be successful The story here is that I couldn't quickly figure out how to take a screenshot on an android phone, so I took a picture with my iphone.
So I search the equivalent of stackoverflow for webrtc: the webrtc-discuss google group. Sorry, less searchable equivalent of stackoverflow. it appears that Blue censor bar here knows that it's not supported until 33. I don't know who blue censor bar is, and
I spent a good 5 minutes getting to this page from the last because google groups now has google plus tipsies hanging around but blue censor bar seems legit. It's not working in Android. Which means Android is lying to me. Anyone less jaded than I am about WebRTC might hesitate to believe that. But after months of strangling with standards noncompliance, trying to implement webrtc browser interoperability with two browsers that did not have a complete implementation, "firefoxisms", versions of firefox onyl supporting servers specified by IP address, random breaking changes in both browsers, I was more than willing to believe blue censor bar.
Issue #138 The bug today? It's closed, but I never fixed it or anything. I wrestled with a few hacks to detect whether SCTP was really enabled, but nothing felt satisfying. Eventually I decided that it wasn't worth the time. Android for Chrome would just roll out their fixes soon anyways. And indeed, now we're on like version 40 of chrome, so it's no longer an issue!

Just a few missing pieces

There's just a few missing pieces from earlier; I'll go over them briefly because they're probably out of the scope of this talk.

Remember the magicallySend(something, someone) function from earlier?

It's a function that sends something to someone via a server.

Surprise! The peers don't just magically know how to call each other. We need what's commonly known as a signalling server to initiate their connection. Alas, something needs to relay the offer and answer. You might be thinking that I've misled you about not needing servers. Well, the configration information is all the signalling server touches. Once the peer connection is established, the server no longer plays any role in the data transport. with one caveat that if you want to add another stream or data channel on yoru peer connection, you'll have to renegotiate and go through that offer answer process again.

Remember "..." from earlier?

var someConnection = new RTCPeerConnection(...)

var someDataChannel = someConnection.createDataChannel('CHAT', ...);
Now, remember ..., the object that we passed into our RTCPeerConnection constructor, and a separate object that we passed as our createDataChannel options?
// A simple config for an RTCPeerConnection...
var pcDotDotDot2 = {'iceServers': [
  { url: 'stun:stun.l.google.com:19302' },
  { url: 'turn:homeo@turn.bistri.com:80', credential: 'homeo' }
]};

          
// What it looks like for some older versions of some browsers...
var pcDotDotDot1 = {'iceServers': [
  { url: 'stun:23.21.150.121:19302' }
]};

          
// For a UDP data channel on some browsers...
var dcDotDotDot1 = {
  maxRetransmits: 0,
  ordered: false
};

          
// For a UDP data channel on older versions of some browsers...
var dcDotDotDot2 = {
  reliable: false
};
          
even more servers, right? You'll notice that the two servers passed in are a STUN and a TURN server, respectively. Let's talk a bit about what those are.

STUN, TURN, ICE(, NAT)

you.onicecandidate = function(event) {
  magicallySend(event.candidate, client);
};

STUN servers and ICE allow you to connect to most peers behind NAT. TURN servers are the fallback.

And of course, the acronyms you'll probably hear a lot in talks about WebRTC have to do with how to get past NAT: There's STUN (sesion traversal utilities for NAT), TURN (traversal using relay NAT), and ICE (interactive connectivity establishment), which is the protocol used by WebRTC, which, in conjunction with a STUN server facilitates NAT traversal. Third-party STUN servers are lightweight and on the public internet. allows application to determine whether its located behind a NAT. it sends a message. the stun server responds with the IP address and port of the client, as observed from the public internet. The STUN/ICE method success rate is actually 80%, and in cases where a p2p connection cannot be made (technical term: symmetric NAT), you can specify a TURN server URL, which is basically a last-ditch effort to try to get data to your peer. If no turn server is specified, the connection will simply fail. These things really add complexity to the simple webrtc flow I showed you earlier. Now I'm going to show you a pretty terrifying picture.

You may or may not be able to read the title of the slide, but it says "Simple Call Flow" This diagram is supposed to represent the smallest set of events and signalling required to make a peer to peer connection, but of course, this includes interactions with STUN and TURN servers, ICE candidate transmission, renegotation of offers when the peer adds a new stream, among other things It's actually really interesting if you want take the time to dive in.

So simple~

So now, going back to our original flying chicken example, we've transformed this very simple, aspirational peer-to-peer chicken chat...

???

offer
answer
Into something a bit more like this. You've got your offer and answer exchange, and ice candidate exchange, and like 3 different 3rd party servers are involved now. Signaling, STUN, TURN...and in the end your chicken might still go through a 3rd party server if there's a symmetric NAT going on...

But it doesn't have to be that scary.

(WebRTC can be easy, remember?) So at this point you're probably like, "But the first slide of the talk says that WebRTC can be easy! And all you're doing is scaring me!" Well, despite all the scary things I've just shown you, the good news is that you can play around with WebRTC without understanding any of it. There are a few libraries out there that'll make it super easy to prototype quickly with WebRTC. WebRTC is a native browser API like any other, and native browser APIs are often hard to digest without some nice wrappers.
Here are a few that have been around. Definitely look some of these up. As I mentioned earlier, I help maintain PeerJS. And as a maintainer, one thing I find that make open source WebRTC libraries a bit different from other js libraries is that they require a bit of background knowledge about the webrtc apis, which can understandably seem scary. this makes it so that great developers, like yourselves, don't tend to contribute as much. but now that you've sat through this talk, i'd like to encourage you to try your hand at contributing to some of these libraries! it continues to be a really exciting time for webrtc, and the more folks we have using and contributing to these libraries, the better they can become for everyone.

"Write a realtime chat app in just 1 line of code!!!"

github.com/michelle/jquery.peer

Time for a final demo! This is a small WebRTC videochat library I made last year called jQuery.peer. I made this jQuery plugin right before I first gave this talk beacuse I wanted to show off just how easy WebRTC can be if we all contributed to these nice little WebRTC wrappers. Back when realtime was all the rage, there were all these demos that were like, "build a realtime chat room in 5 lines of code!" and so since WebRTC is also all about realtime, I felt compelled to one-up them. jquery.peer is built on top of PeerJS, so the scary server components that we previously went over are all taken care of by the PeerJS cloud server. --- Here on the left, in my HTML, I import PeerJS, jquery.peer, and jquery. The only element I add to my markup is one with an id of "videochat" which I'll then select in my javascript. $('#videochat').peer({id: 'mbu', room: 'jssummit'}); So, I've gone ahead and add my one line of code and I've connected to the jssummit room. I should be once again able to connect to my desk! http://jsbin.com/kutigewogo/1/edit?html,js,output

Join in!

cdn.peerjs.com/jquery.peer

If you want to join in on the demo, you can visit the link above. If you have trouble connecting, it's often due to company firewalls.

Thanks!

@michelle on Github

@hazelcough on Twitter

michelle@michellebu.com

That's all! Thanks for listening, everyone, and thanks to Chris/Ari for organizing and keeping track of questions. I'll go through see if I can't answer a few of these, now. If I don't get around to yours, or somehow miss it, feel free to tweet at me, or shoot me an email.