[front end real-time audio and video] overview of WebRTC

In the front-end field, WebRTC is a relatively small technology; But for online education, it is very core. There are many articles about WebRTC on the Internet. This article will try to introduce the working process of WebRTC, so that readers can have a complete concept of this technology.

WebRTC (WEB real time communications) is an audio and video technology open sourced by Google and promoted to be incorporated into the W3C standard. It aims to realize real-time audio and video communication between browsers in a point-to-point manner without the help of intermediate media.

The biggest difference from the classic B/S architecture in the Web world is that WebRTC communicates directly with the client without going through the server, saving server resources and improving communication efficiency. To achieve this, a typical WebRTC communication process includes four steps: find the other party, negotiate, establish a connection, and start communication. These four steps are described below.

Step 1: find the other party

Although there is no need to communicate through the server, you must know the existence of the other party before starting the communication. At this time, you need a signaling server.

signaling server

The so-called signaling server is an "intermediary" that helps both parties establish a connection. WebRTC does not stipulate the standard of signaling server, which means that developers can use any technology to implement it, such as WebSocket or AJAX.

The two ends that initiate WebRTC communication are called peers, and the successfully established connection is called PeerConnection. A WebRTC communication can contain multiple peerconnections.

const pc2 = new RTCPeerConnection({...});

In the phase of finding peers, the work of the signaling server is generally to identify and verify the identity of participants. The browser connects to the signaling server and sends the necessary information for the session, such as room number and account information. The signaling server finds the peer that can communicate and starts trying to communicate.

In fact, the signaling server plays a very important role in the entire WebRTC communication process. In addition to the above functions, SDP switching and ICE connection are inseparable from signaling, which will be mentioned later.

Step 2: negotiation

The negotiation process mainly refers to SDP exchange.

SDP protocol

SDP (Session Description Protocol) refers to the session description protocol. It is a general protocol, and its scope of use is not limited to WebRTC. It is mainly used to describe multimedia sessions, including session declaration, session invitation, session initialization, etc.

In WebRTC, SDP is mainly used to describe:

  • Media capabilities supported by the device, including codec, etc
  • ICE candidate address
  • Streaming media transport protocol

The SDP protocol is based on text. The format is very simple. It consists of multiple lines. Each line is in the following format:

<type>=<value>

Where, type represents the attribute name and value represents the attribute value. The specific format is related to type. The following is a typical SDP protocol example:

v=0
o=alice 2890844526 2890844526 IN IP4 host.anywhere.com
s=
c=IN IP4 host.anywhere.com
t=0 0
m=audio 49170 RTP/AVP 0
a=rtpmap:0 PCMU/8000
m=video 51372 RTP/AVP 31
a=rtpmap:31 H261/90000
m=video 53000 RTP/AVP 32
a=rtpmap:32 MPV/90000

Of which:

  1. v= represents the protocol version number
  2. o= represents the initiator of the session, including username, sessionId, etc
  3. s= represents the session name, which is a unique field
  4. c= represents connection information, including network type, address type, address, etc
  5. t= represents the session time, including the start / end time. 0 indicates a persistent session
  6. m= represents media description, including media type, port, transmission protocol, media format, etc
  7. a= represents an additional attribute, which is used here to extend the media protocol

Plan B VS Unified Plan

During the development of WebRTC, the semantics of SDP has also changed many times. At present, Plan B and Unified Plan are the most used. Both can represent multiple media streams in one PeerConnection. The difference is:

  • Plan B: all video streams and all audio streams are placed in an m= value, which is distinguished by ssrc
  • Unified Plan: each flow uses an m= value

At present, the newly released WebRTC 1.0 adopts the Unified Plan, which has been supported by mainstream browsers and is enabled by default. Chrome browser supports obtaining currently used semantics through the following API s:

// Chrome
RTCPeerconnection.getConfiguration().sdpSemantics; // 'unified-plan' or 'plan b'

Negotiation process

The negotiation process is not complicated, as shown in the following figure:

The session initiator creates an offer through createOffer and sends it to the receiver through the signaling server. The receiver calls createAnswer to create an answer and returns it to the sender to complete the exchange.

// Sender, sendOffer/onReveiveAnswer is a pseudo method
const pc1 = new RTCPeerConnection();
const offer = await pc1.createOffer();
pc1.setLocalDescription(offer);
sendOffer(offer);

onReveiveAnswer((answer) => {
  pc1.setRemoteDescription(answer);
});

// Receiver, sendAnswer/onReveiveOffer is a pseudo method
const pc2 = new RTCPeerConnection();
onReveiveOffer((offer) => {
  pc2.setRemoteDescription(answer);
  const answer = await pc2.createAnswer();
  pc2.setLocalDescription(answer);
  sendAnswer(answer);
});

It is worth noting that the SDP exchange may occur many times as the relevant information of both parties changes during the communication process.

Step 3: establish connection

The modern Internet environment is very complex. Our devices are usually hidden behind layers of gateways. Therefore, to establish a direct connection, we also need to know the available connection addresses of both sides. This process is called NAT traversal, which is mainly completed by the ICE server, so it is also called ICE tunneling.

ICE

ICE (Interactive connectivity infrastructure) server is a third-party server independent of both communication parties. Its main function is to obtain the available address of the device for peer-to-peer connection. It is completed by STUN (session universal utilities for NAT) server. Each available address is called an ICE Candidate. The browser will select the most appropriate one from the candidates. The types and priorities of the candidates are as follows:

  1. Host candidate: obtained through the device network card, usually the intranet address, with the highest priority
  2. Reflection address candidate: it is obtained by the ICE server and belongs to the address of the device on the external network. The acquisition process is complex. It can be simply understood as: the browser sends multiple detection requests to the server, and comprehensively judges and obtains its own address in the public network according to the return of the server
  3. Relay options: provided by the ICE relay server. The first two options are not feasible. The priority is the lowest

When creating a PeerConnection, you can specify the ICE server address. Each time WebRTC finds an available candidate, an icecandidate event will be triggered. At this time, you can call the addIceCandidate method to add the candidate to the communication:

const pc = new RTCPeerConnection({
  iceServers: [
    { "url": "stun:stun.l.google.com:19302" },
    { "url": "turn:user@turnserver.com", "credential": "pass" }
  ] // Configure ICE server
}); 
pc.addEventListener('icecandidate', e => {
  pc.addIceCandidate(event.candidate);
});

ICE connections established through candidates can be roughly divided into the following two cases:

  1. Direct P2P connection refers to the above 1&2 candidates;
  2. The third case is the connection through the TURN (transverse using relays around NAT) relay server.

Similarly, due to network changes and other reasons, ICE holing in the communication process may also occur many times.

Step 4: communicate

WebRTC selects UDP as the underlying transport protocol. Why not choose TCP that is more reliable? There are three main reasons:

  1. UDP protocol has no connection, low resource consumption and high speed
  2. A small amount of data loss during transmission has little impact
  3. The timeout reconnection mechanism of TCP protocol will cause very obvious delay

Over UDP, WebRTC uses two repackaged RTP and RTCP protocols:

  • RTP (Realtime Transport Protocol): a real-time transport protocol, which is mainly used to transmit data with high real-time requirements, such as audio and video data
  • RTCP (RTP transport control protocol): RTP transport control protocol, as the name suggests, is mainly used to monitor the quality of data transmission and give feedback to the data sender.

In the actual communication process, the data transmission and reception of the two protocols will be carried out at the same time.

Key API s

The following is a demo code to show which API s are used in the front-end WebRTC:

HTML

<!DOCTYPE html>
<html>
<head>

    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, user-scalable=yes, initial-scale=1, maximum-scale=1">
    <meta name="mobile-web-app-capable" content="yes">
    <meta id="theme-color" name="theme-color" content="#ffffff">
    <base target="_blank">
    <title>WebRTC</title>
    <link rel="stylesheet" href="main.css"/>
</head>

<body>
<div id="container">
    <video id="localVideo" playsinline autoplay muted></video>
    <video id="remoteVideo" playsinline autoplay></video>

    <div class="box">
        <button id="startButton">Start</button>
        <button id="callButton">Call</button>
    </div>
</div>

<script src="https://webrtc.github.io/adapter/adapter-latest.js"></script>
<script src="main.js" async></script>
</body>
</html>

JS

'use strict';

const startButton = document.getElementById('startButton');
const callButton = document.getElementById('callButton');
callButton.disabled = true;
startButton.addEventListener('click', start);
callButton.addEventListener('click', call);

const localVideo = document.getElementById('localVideo');
const remoteVideo = document.getElementById('remoteVideo');

let localStream;
let pc1;
let pc2;
const offerOptions = {
  offerToReceiveAudio: 1,
  offerToReceiveVideo: 1
};

async function start() {
  /**
   * Get local media stream
   */
  startButton.disabled = true;
  const stream = await navigator.mediaDevices.getUserMedia({audio: true, video: true});
  localVideo.srcObject = stream;
  localStream = stream;
  callButton.disabled = false;
}

function gotRemoteStream(e) {
  if (remoteVideo.srcObject !== e.streams[0]) {
    remoteVideo.srcObject = e.streams[0];
    console.log('pc2 received remote stream');
    setTimeout(() => {
      pc1.getStats(null).then(stats => console.log(stats));
    }, 2000)
  }
}

function getName(pc) {
  return (pc === pc1) ? 'pc1' : 'pc2';
}

function getOtherPc(pc) {
  return (pc === pc1) ? pc2 : pc1;
}

async function call() {
  callButton.disabled = true;
  /**
   * Create call connection
   */
  pc1 = new RTCPeerConnection({
    sdpSemantics: 'unified-plan', // Specify to use unified plan
    iceServers: [
        { "url": "stun:stun.l.google.com:19302" },
        { "url": "turn:user@turnserver.com", "credential": "pass" }
    ] // Configure ICE server
  }); 
  pc1.addEventListener('icecandidate', e => onIceCandidate(pc1, e)); // Listening for ice candidate events

  /**
   * Create answer connection
   */
  pc2 = new RTCPeerConnection();

  pc2.addEventListener('icecandidate', e => onIceCandidate(pc2, e));
  pc2.addEventListener('track', gotRemoteStream);

  /**
   * Add local media stream
   */
  localStream.getTracks().forEach(track => pc1.addTrack(track, localStream));

  /**
   * pc1 createOffer
   */
  const offer = await pc1.createOffer(offerOptions); // Create offer
  await onCreateOfferSuccess(offer);
}

async function onCreateOfferSuccess(desc) {
  /**
   * pc1 Set local sdp
   */
  await pc1.setLocalDescription(desc);

  
  /******* The following takes pc2 as the opposite party to simulate the scenario of receiving an offer*******/

  /**
   * pc2 Set up remote sdp
   */
  await pc2.setRemoteDescription(desc);
  
  /**
   * pc2 createAnswer
   */
  const answer = await pc2.createAnswer(); // Create answer
  await onCreateAnswerSuccess(answer);
}

async function onCreateAnswerSuccess(desc) {
  /**
   * pc2 Set local sdp
   */
  await pc2.setLocalDescription(desc);

  /**
   * pc1 Set up remote sdp
   */
  await pc1.setRemoteDescription(desc);
}

async function onIceCandidate(pc, event) {
  try {
    await (getOtherPc(pc).addIceCandidate(event.candidate)); // Set ice candidate
    onAddIceCandidateSuccess(pc);
  } catch (e) {
    onAddIceCandidateError(pc, e);
  }
  console.log(`${getName(pc)} ICE candidate:\n${event.candidate ? event.candidate.candidate : '(null)'}`);
}

function onAddIceCandidateSuccess(pc) {
  console.log(`${getName(pc)} addIceCandidate success`);
}

function onAddIceCandidateError(pc, error) {
  console.log(`${getName(pc)} failed to add ICE Candidate: ${error.toString()}`);
}

Write at the end

As an "Overview", this paper introduces WebRTC technology from a relatively shallow level. Many details and principles are not elaborated in depth due to space limitations. The author has only been in contact for a few months. Please let me know if there is any mistake.

In actual business, the use of WebRTC is not a simple P2P communication. Later, we will open another topic to talk about how the online education business uses WebRTC in real-time audio and video, and how to achieve millions of online classes at the same time.

Tags: Javascript Front-end webrtc

Posted by aaron_karp on Mon, 30 May 2022 14:07:51 +0530