From audio and video chat to social gaming, WebRTC protocols are involved in more and more of what people do online. But the same qualities that make WebRTC optimal for real-time apps can also increase the challenge of consistently delivering a great user experience.
This is because WebRTC was designed for maximum flexibility, allowing for various technologies and protocols to be used. While this has helped to expand the use of WebRTC, this flexibility also means that sometimes less-than-ideal negotiations occur. This can create quality of service issues like slow speed and lags that are not optimal for users.
Whether it’s a peer-to-peer or peer-to-media connection, the first step in delivering real-time communications is signaling. This is like a handshake where both parties set mutually agreeable terms for exchanging data during a session, and it’s one of the most complicated aspects of WebRTC.
The best user experience is achieved when this session establishment is fast. There are various techniques to achieve this, and one is called perfect negotiation, introduced to minimise the chances of offer/answer conflicts between two peers while establishing a session.
In this IntroGuide, we’ll look at what needs to happen for WebRTC to be successful and discuss how to optimize negotiation patterns for the best—and fastest—possible user experience.
Keep reading to learn the ins and outs of delivering fast, high-quality real-time connectivity, every single time.
Why negotiation patterns matter
From a user perspective, the makings of a good WebRTC experience are fairly straightforward: instant connectivity with no connection delays or buffering, no jitter, and no lags. For developers, however, delivering an optimal WebRTC experience every time can be a whole other story.
In order to understand why there is so much variability in things like latency and quality of service, it’s important to first know what we are actually talking about when we refer to WebRTC.
WebRTC doesn’t refer to a specific technology or solution—it encompasses the various protocols that enable real-time communications over the web. These include more than a dozen standards, including protocols, application and browser APIs, and data formats. For example, one protocol option is Session Initiation Protocol (SIP), which is often used in VoIP applications. An interesting note here is that during the WebRTC standardization phase, a decision was made not to mandate a specific signalling protocol. From this point of view, SIP has become one of the possible signalling protocols.
Because the goal of WebRTC is to enable real-time audio and video in all browsers—something that requires a highly flexible structure —WebRTC isn’t prescriptive. There are no rigid specifications; it is up to app developers to decide how they want to handle connections between peers. When signaling and the application layer must work together, as with multimedia-specific metadata, Session Description Protocol (SDP) is used.
This ultimately allows for a great deal of design freedom. But when you consider the many underlying protocols that can enable real-time applications, it’s also not hard to understand how this also results in a high level of variability in latency and quality of service among apps (and even sessions!).
Essentially, the flexibility of WebRTC is a double-edged sword: while developers have a lot of choices in how to implement it, this same flexibility creates greater potential for suboptimal results. Most developers use Trickle-ICE negotiations, which pass potential paths between peers asynchronously as soon as candidates are ready. This is instead of waiting for all the candidates to be collected before sending them to the other party. But—womp, womp—this often results in machines getting out of sync.
Understanding how to optimize negotiation patterns can help to reduce these issues and pave the way to faster, smoother real-time connectivity and performance.
The steps of successful WebRTC negotiation
In order to explore the opportunities for optimizing RTC negotiation patterns, it’s helpful to first understand what is involved in establishing a successful connection.
Step 1: User discovery and signaling
Before two peer computers can connect with each other, they first have to find each other. Because WebRTC does not specify how to connect, the computers then need to agree on which signaling mechanism to use.
This initial process is done via a third server, called a signaling server. Once two machines have discovered each other, they provide some basic information before they can establish a session. (This sounds simple, but the speed of this process comes down to how many options are on the table. For example, if you’re trying to establish a session and each client puts forth five or six TURN servers, then there are 30 or so combinations that need to be tested before the optimal connection is determined.)
The information exchanged during signaling includes:
- Control messages, which are used to establish, open and close the connection and handle any errors
- IP and port information for each available interface to be used to exchange media
- Media capability negotiation, such as the codecs and media formats that can be supported during the session
Step 2: NAT/firewall traversal
For security and privacy reasons, many devices live behind NATs and firewalls. As far as WebRTC is concerned, this creates some challenges.
More specifically, NATs are not capable of accessing or rewriting IP packet contents, and this poses a problem for WebRTC protocols such as VOIP. When one or more computers involved is behind a NAT, WebRTC requires workarounds such as data relays to be successful. In such instances, ICE, or Interactive Connectivity Establishment, is the framework for establishing a connection.
There are several categories of NATs, and the ease of exchanging information to establish a WebRTC session depends on which is in play. Symmetric NATs pose unique challenges because the port IP addresses are concealed, rendering the use of STUN (Session Traversal Utilities for NAT) servers—the simplest and most direct method of NAT traversal—insufficient.
“With ICE, each client can also collect publicly reachable candidates and communicate them to the other party. STUN servers are used to learn their public IP and port. When NAT conditions prevent direct communication between clients, then TURN servers can be used for relaying media."
-Giacomo Vacca (Senior Voice Engineer, Subspace)
The sluggishness of traditional TURN, however, makes it a last resort. Though TURN is necessary for symmetric connections, the delays in establishing a session means it’s often only preferable to not being able to connect at all. (Don’t worry, we discuss an even better alternative to traditional TURN further down in this post.)
Establishing perfect negotiation
Each step of establishing a WebRTC connection takes just a fraction of a second. But WebRTC negotiations are a complex process, and testing all the potential network paths can take some time.
Ultimately, these micro delays all add up, which is why users may experience a lag establishing a connection. Given users’ high expectations for real-time services (read: instant), this is inevitably a formula for disappointment.
But there is a way to reduce some of the complexity of WebRTC negotiation and deliver smoother connections more consistently. Perfect negotiation is a recommended design process for making the process of signaling flow as smoothly as possible.
Essentially, each peer is assigned a specific role in the negotiation process. This reduces the opportunity for “glare” or signaling collisions that can prolong the process.
Here’s how it works. In perfect negotiation, one peer is assigned the role of “polite” peer, which sends out offers. This so-called polite peer will also roll back its offer in the event of another incoming offer. Meanwhile, the “impolite” peer is also capable of sending out offers. But it will ignore any conflicting incoming offers in favor of its own. When there is a collision, the impolite peer always prevails.
Assigning these roles helps make negotiation proceed more transparently and smoothly. In the event of conflicting offers, both devices will know how to resolve the conflict rather than continuing the process. If you are handling RTCPeerConnection code yourself, checking against the perfect negotiation pattern specifications makes for a more reliable implementation.
A network that’s purpose-built for real-time
But some WebRTC problems can’t be accounted for in the coding. That’s because sometimes, the blame for suboptimal WebRTC negotiations lies with the internet itself.
It’s yet another double-edged sword scenario. The internet is built for resilience, and its countless pathways and routing options are a design feature meant to ensure that if one path fails, there are always lots of other routes. The problem is, this work of finding optimal routing often hinders a seamless negotiation process that would deliver the best real-time experience.
“As a developer of WebRTC solutions, there are steps I can take to make the session negotiation as smooth as possible. But that’s only part of the equation of delivering a fast connection. Network conditions play a significant role too, and will impact the setup time and the overall perceived quality”
- Giacomo Vacca(Senior Voice Engineer, Subspace)
Thankfully, it’s not an either/or situation. It’s possible to improve WebRTC negotiation patterns and have path selection that’s always on and optimized with a network that’s purpose-built for real-time apps.
Using a global IP Proxy like Subspace, which is built to accelerate real-time protocols over TCP, UDP, or SIP, removes much of this negotiation complexity. And with TURN as a service, there’s no need to continually optimize TURN servers to identify the best path. Instead of having to test all the options, the fastest path selection is automatically optimized using AI.
With Subspace’s WebRTC-CDNN, traffic is distributed and does not go through a single relay point, thus reducing bottlenecks. And by utilizing a direct path, the solution prevents issues such as hairpinning and hops that can further erode the user experience.
Latency, jitter, and packet loss are all improved by up to 80%, and real-time apps perform just as they were intended.
As real-time apps become prolific, user expectations are mounting—and the connection lags caused by inefficient webRTC negotiation patterns are becoming less and less tolerable.
Though the negotiation patterns associated with establishing real-time communications are complicated, developers can take steps to unleash faster connectivity and drive better user experiences. These steps include simplifying negotiations by optimizing peer communications and working with a technology partner that can put you in the fast lane. By addressing these inherent challenges, real-time apps can indeed deliver on their promise and offer smooth, instant connections—every time.
See how Subspace can help you optimize and accelerate your real-time communication.