Real-time performance is a complex puzzle that isn’t easy to solve or define. As we’ve seen, there are a host of networking and architecture questions that need answering before real-time performance can be achieved in applications. Subspace is here to answer these complex questions and provide developers and service providers the steps needed to successfully complete the puzzle.
Estimated read time: 7 minutes
“Blink and you’ll miss it.”
Is there any more intuitive way to define “real-time performance”? Users don’t care how much delay they experience. They become irate over any perceived delay, period. To meet these demanding expectations, real-time applications must employ the fastest and most efficient resources, from coding and network quality to server and switch infrastructure.
The public internet, of course, does not always employ the fastest or most efficient … anything. We’re all too familiar with slow page loads, the effects of high latency, and stream buffering. It’s 2021 yet we continue to watch website pages crawl to life component-by-component, our gameplay has suffered from bad ping, and that live stream we tried to watch was laggy, buffering every few minutes. There are myriad reasons for this, but ultimately network performance is a thousand-piece puzzle that must fit together in just the right way to work for applications with real-time performance requirements.
Fortunately, solving this puzzle isn’t left to chance. Application developers and service providers can take steps to get their puzzle perfectly assembled.
Start with the Right Questions
Some (OK, most) network factors will be outside your control. You should still build those factors into your planning, and that starts with answering questions in several key areas.
Where will users be located? What is the range of distances between them? Don’t make the mistake of planning for the average distance; aim instead for the 99th percentile so that users with maximum physical separation will still have satisfactory experiences.
What’s the minimum amount of users you’ll need for multiplayer matchmaking? What projections are expected for player growth? What is the average and maximum number of concurrent sessions? Even if all is well on the coding side, heavy loads may still overwhelm an unprepared server as well as network bandwidth resources.
What type of network will users likely be on—the public internet? Behind a corporate firewall? Connecting through a VPN? All of these carry certain limitations and performance overhead ranges that must be factored into service-level expectations.
Do you give users any data guarantees such as latency maximums, sustained throughput, or percentage of packets delivered? In the last case, such assurances may dictate the use of certain protocols (i.e., TCP rather than UDP), but then the application and host network must support those protocols. Again, lots of puzzle pieces must interlock to make sure you deliver on those guarantees.
Selecting Performant Protocols and Architecture
The type of network protocol you select for your application should be bound to the application’s use model and scaling needs. This can start with decisions around whether the application is one-to-one or one-to-many. For example, WebRTC is a well-known protocol for one-to-one, media-centric applications. Critics point to WebRTC being a poor solution for scaling to many users, as more connections breed traffic that can lead to lag and choppy performance. However, while it may be difficult to scale WebRTC, it’s not impossible. Scaling a protocol like WebRTC will require more planning and investment. Thus, developers must decide if such costs are worthwhile as they design for scale.
Beyond protocols, designers must turn to architecture options. Each architecture type carries its own benefits, downsides, and compromises. Some examples include:
Mesh. Service mesh architecture focuses on efficient communication between microservices, such as load balancing, authentication, traffic management, and telemetry. Coordinating these microservices can be difficult to implement and to maintain over time. Service mesh architecture establishes service proxies to abstract these functions and more easily manage them.
Forwarding. The concept of packet forwarding—relaying packets from one network segment to another—has been integral to networks for decades. As packet/data volumes rise, though, forwarding requires more efficient architectures to handle the load. As outlined in one academic paper co-authored by Intel, the rise of hardware components able to accommodate parallel processing opened the door to software that would take advantage of such capabilities. However, there is no single industry standard for “the most effective” forwarding system. Implementations can be endlessly tuned, and results will vary from one network to another.
Mix. As the name implies, a mixing architecture involves having a server perform media services for a number of clients, including decoding, mixing, and encoding. Especially with newer hardware platforms, mixing architecture can perform stream scaling well and adapt smoothly to changing conditions. However, the workload can quickly become crushing as user counts climb. Resource monitoring and load balancing solutions will be important.
Hybrid. The term “hybrid” pops up all over the computing and network worlds, along with “heterogeneous” (as in heterogeneous compute across CPU, GPU, and FPGA chips). Here, it means a melding of the above three models, depending on user needs. For a point-to-point session, mesh may be the easiest and most resource-efficient approach, while forwarding will likely be better for larger sessions.
No matter the architecture, you’ll need to consider your use cases and their latency tolerances, then see how well these needs map to the network infrastructure.
Choosing Hardware Requirements with Scale in Mind
Eventually, every application or service must arrive at its hardware requirements. Platform selection sounds simple, and it’s tempting simply to pad current needs and call it a day. (After all, everyone knows that “640K of memory ought to be enough for anybody.”) However, several best practices apply in making sure that hardware requirements will satisfy application needs for the long term.
First, beware of small bandwidth tests. There’s a difference between evaluating a new restaurant during the 4:00 PM doldrums and the 6:30 PM dinner stampede. Smaller bandwidth tests are more likely to let an application appear optimized. However, imagine rampant success and mass user growth, then slam that application and its hardware platform with 10x the traffic, or whatever you imagine the right edge of the traffic bell curve would look like. (For perspective, recall the Pokemon Go rollout, when developer Niantic expected 5x the normal traffic to be worst case...and then actual traffic went to 50x.) Too often, subsystems such as memory, networking, and storage will bottleneck or throttle to cope with the load.
Make sure your stress testing incorporates the protocol and architecture model needs mentioned above. For instance, if you plan on shoehorning an ostensibly one-to-one protocol like WebRTC into a one-to-many or many-to-many deployment, then don’t rely on protocol benchmarking tools made for the lighter use model.
And when, in this case, you’re ready for WebRTC load testing, you’ll likely need signaling and TURN servers, a specific type of system for peer-to-peer traffic relaying that often deploys into WebRTC solutions. with failover provisioning, a server outputting strenuous datasets (preferably real production workloads), and dynamic load balancing. You may also need to see whether and how much GPU acceleration is needed on the client side; integrated graphics may not be sufficient for some rich content streams.
Only Count on Real-World Conditions
It doesn’t make sense to test applications in ways that won’t map to actual deployment. You don’t want to test a conferencing app from on-prem resources to an audience within the same city when you expect to run in production from the cloud across multiple geographies. If you test locally, you’ll be inclined to optimize for local use cases during development, and this can come back to bite you after launch.
Compounding this problem, many cloud providers will feed service from locations closest to developers. While this is done with the best intentions to provide the best service, it can artificially give the impression of good performance in ways that won’t apply to real deployment conditions. What looked flawless in the same metro area suddenly looks like a return to dial-up when accessed from rural areas.
Consider network optimization options
The internet is a hodge-podge of connections, with some links faster than others and some networks routing differently than others. It makes trying to run real-time applications over the network challenging, to put it politely. Content delivery networks (CDNs) abound, but they can’t provide real-time results for all users across all applications. Especially when used across large distances, CDNs and edge computing aren’t likely to resolve performance gaps.
That leaves one fundamental question for your application—perhaps the biggest question of all: Within your stack, what can you do to shape and optimize traffic paths?
Historically, the answer has been “Nothing. It’s just the internet.”
But that’s not true anymore.
New options like Subspace can provide an alternative, intersecting, fully optimized, real-time network that dovetails seamlessly with the public internet. As we’ve seen, there are a host of networking and architecture questions that need answering before real-time performance can be achieved in applications. But the biggest criterion of all may be the network itself.
Subspace GlobalTURN allows you to run TURN globally without having to maintain servers around the world. And with RTPspeed, we provide the highest quality voice and video steaming (via a global RTP proxy) the internet has ever seen; both solutions include in-line zero latency DDoS protection.
Want to start building on Subspace today? Sign up here.