The 30-Second WebRTC Guide

(Web technology changes fast! Mind the date this post was written, which was November 2019.)

I get the feeling nobody uses WebRTC in the real world, since all of the tutorials use the same toy examples that don’t involve any actual network connectivity. That’s a shame, because WebRTC makes peer-to-peer communication a cakewalk. Somewhere in our imaginations, there’s a whole category of decentralized web apps, just waiting to get written!

Anyway, this post serves as a quick, practical guide to WebRTC. The first thing to realize is that it’s not just another web API that’s ready to go out of the box—WebRTC requires three distinct services to work its magic. Fortunately, the browser handles much of the communication behind the scenes, so you don’t need to worry about all of the nitty-gritty details.

A network diagram illustrating the relationships between signalling, STUN, and TURN servers and browsers.

The relationships between browsers and servers in WebRTC. Diagram courtesy of draw.io.

let me = { isInitiatingEnd: () => { ... },
           sendToOtherEnd: (type, data) => { ... } };

To use WebRTC, you need some kind of out-of-band signalling system—in other words, a middleman—to deliver messages between the browsers. This is how they exchange the networking information necessary to negotiate a direct connection. Obviously, if they could deliver it directly, then they would have no need for WebRTC!

The design of the signalling system itself is left entirely up to you. Choose any technology you please—WebSocket, QR code, email, carrier pigeon. As we will see, WebRTC provides the necessary hooks to abstract over the underlying technology.

const STUN_SERVERS = { urls: ["stun:stun.l.google.com:19302"] },
      TURN_SERVERS = { urls: "stun:stun.example.com", username: ..., credential: ... };
let rtc = new RTCPeerConnection({ iceServers: [STUN_SERVERS, TURN_SERVERS]});

If you expect your WebRTC session to traverse different networks, your clients will also need access to a Session Traversal Utilities for NAT (STUN) server. This is a service that informs browsers of their public IP address and port number, which can only be determined from a host on the public Internet. (STUN servers consume very little resources, so there are many that are freely available.)

Sometimes, despite the browsers’ best efforts, the network topology is too restrictive to achieve a direct connection. When this happens, WebRTC can fallback to a Traversal Using Relays around NAT (TURN) server, which is another middleman that can forward network traffic between clients. It’s like your signalling server, except it uses a standardized protocol explicitly designed for high-bandwidth streams. The more clients need such a middleman, the more bandwidth the TURN server will consume; therefore, if you want one, you will most likely need to run your own.

if (me.isInitiatingEnd())
        rtc.addEventListener("negotiationneeded", async (event) => {
                await sdpOffer = await rtc.createOffer();
                await rtc.setLocalDescription(sdpOffer);
                me.sendToOtherEnd("SDP-OFFER", sdpOffer);
        });
rtc.addEventListener("icecandidate", async (event) => {
        if (event.candidate)
                me.sendToOtherEnd("ICE-CAND", event.candidate);
});

me.receiveFromOtherEnd = async (type, data) => {
        switch (type) {
        case "SDP-OFFER":
                await rtc.setRemoteDescription(data);
                const sdpAnswer = await rtc.createAnswer();
                await rtc.setLocalDescription(sdpAnswer);
                me.sendToOtherEnd("SDP-ANSWER", sdpAnswer);
                break;
        case "SDP-ANSWER":
                await rtc.setRemoteDescription(data);
                break;
        case "ICE-CAND":
                await rtc.addIceCandidate(data);
                break;
        }
};

Okay, this is the big one—here, the browsers use your signalling system to perform a two-phase pairing operation. First, in the Session Description Protocol (SDP) phase, they share information about audio, video, and data streams and their corresponding metadata; second, in the Interactive Connectivity Establishment (ICE) phase, they exchange IP addresses and port numbers and attempt to punch holes in each other’s firewalls.

WebRTC provides the negotiationneeded and icecandidate events to abstract over your signalling system. The RTCPeerConnection object fires these events whenever the browser needs to exchange SDP or ICE information (respectively), which can happen multiple times over the course of a WebRTC session as network conditions change.

Only the side that initiates the connection need be concerned with negotiationneeded. There’s a specific protocol both sides need to follow when responding to these events, or to messages from each other—it’s best to let the code speak for itself.

let dataChannel = rtc.createDataChannel("data", { negotiated: true, id: 0 });
dataChannel.addEventListener("open", (event) => { ... });

Finally, set up your media and data streams. (For data channels, you can get away with a negotiated opening, which means the stream is pre-programmed on both ends and doesn’t require another handshake.) Wait for any open events to be fired.

You’re all done!