TL;DR
Real-time applications need a healthy, optimized network to run. That requires the right tools, both for network monitoring and software optimization. Countless factors go into tuning a healthy network, but it’s a doable feat—especially if you are plugging into a network service that’s already been through the learning and development curve.
Estimated read time: 9 minutes
Who shot first? It’s impossible to know when network lag creates in-game rubber banding.
Why are you watching a circle spin around and around instead of enjoying your video? Because a spike in network requests created a localized traffic storm, and the platform isn’t smart enough to work around it.
The public internet does not run at optimal performance levels, and the evidence is just one end-user expletive away. The obvious lack of network reliability and wide variance in bandwidth takes a hard toll on application providers. Playing it safe and restricting performance to stay within sub-optimal network thresholds limits functionality and throttles a provider’s ability to scale both applications and the customer base.
For reasons outlined in recent articles, ISPs either can’t or don’t optimize their traffic. Some of the cause lies in insufficient metrics and imprecise measurement. Some of it has to do with protocol compromises. And some of it just stems from how the internet was built decades ago: to survive nuclear blasts, not explosions in user traffic.
Naturally, service providers do the best they can within these restrictions, but they’re trapped within conventional (and outdated) infrastructures and models. A better, optimized approach— one that is free from public network constraints but remains seamlessly compatible with that network—has been needed for a long time.
Approaching network optimization
An optimized network is one in which every single component—from node to switch to server, including each device’s processes and connections—consistently performs at peak levels. Of course, nothing constantly performs at 100 percent. Rather, optimization means that the network maintains performance at levels that never impair the user experience.
There will be network issues; in fact, they arise constantly. However, a healthy, optimized network has the precision and integrated intelligence to take corrective action before those issues become detectable.
We measure network health with a range of metrics, including:
- Users measure uptime with the number of “nines.” “Four nines” refers to 99.99% uptime. For consumer gaming, three-nines uptime may keep players happy. Mission-critical enterprise applications often demand five nines.
- Bandwidth utilization: Just as some people seem to monopolize entire meetings while others never get in a word, some network hosts process an inordinate amount of inbound or outbound traffic while others sit relatively idle. Monitoring tools combined with load balancers can help keep hosts and bandwidth running at resource- and cost-efficient levels.
- Packet loss: Often caused by congestion, device saturation, and/or faulty hard/software, packet loss involves data being sent from one network node never reaching its destination. This results in either incomplete data reception or the packet(s) being resent, which then incurs delay and potentially further congestion.
- Latency and jitter will wreck real-time services. Latency is the time it takes for a packet to travel from one point to another over a network, and jitter is the variance in latency times. Few people care if an email fetch or file download experiences jitter, but these delays can be maddening with voice and video applications.
- Packet errors and discards. As we all learn in high school biology, sometimes cells don’t divide properly, and the body has its ways of recognizing and disposing of such errors. Similarly, transmission and format errors can damage packets, leading to discarding and slowing of packet flow. Too much discarding often indicates misconfiguration or hardware failure.
- WAN performance embodies all the above and often takes a more granular look at how given application types impact total bandwidth. Local content caching can help with some application types.
Optimization approaches
Famed business manager Peter Drucker once said, “If you can’t measure it, you can’t improve it.” Well, measuring network performance and health starts with monitoring. Network monitors can be divided into two main groups: bandwidth tools (which help identify what traffic is slowest) and performance tools (which focus on monitoring, availability, and intra-network performance analysis). The market offers plenty of tools in both categories.
Just as traffic congestion in a city center can be reduced by drawing those drivers out to the urban perimeter and suburbs, placing high-demand or large content away from the network core and closer to users can sometimes alleviate network congestion. Thus, content delivery networks often improve user experiences and network capex/opex costs.
For over three decades, virtualization technologies have steadily grown on mainstream computing platforms, allowing physical hardware resources to be abstracted as software so that multiple operating systems and “virtual machines” can run on a single physical system. The abstraction of networking components has swept from desktop PCs to core network infrastructure. There are many advantages to network virtualization, especially for centralized management. In turn, network virtualization can play a key role in wide-area network (WAN) optimization, which then dovetails into software-defined WANs (SD-WANs).
SD-WANs are very flexible and can accommodate various routing schemes and network types to connect users with applications. Moreover, SD-WANs can centrally control hardware infrastructure or virtualized versions of that equipment through policy, function, and management tools. Because they allow so much control over network loading and real-time conditions, SD-WAN can improve how inter-node links are made and managed, including how they are secured. However, there are no free lunches; expect potential cost and compatibility issues, and plan accordingly.
Where conventional approaches disappoint
The methods and tools we’ve covered here will do the trick for a wide variety of online content. However, applications that are more graphics-intensive or demand increased real-time performance may require more than these conventional approaches. Caching content locally is great, but it’s not of much use in cases of real-time audio or video communications and gaming—you can’t pre-cache live events. (If you figure out a time displacement methodology that enables this, please call us to discuss.) The problem grows worse in very compute-heavy real-time tasks, such as augmented and virtual reality (AR and VR).
The rule of thumb is simple: If your users can detect network traffic impairment, you need network optimizations. Potentially, that means procuring and using all the tools we discussed earlier and plenty more. That’s what real-time performance demands.
One common approach for handling live media applications today is to use Web Real-Time Communications (WebRTC). This protocol arrived a decade ago as a peer-to-peer solution that allowed multimedia to stream into web browsers and run inside web pages. WebRTC is aimed at clients that may not have sufficient CPU or GPU resources for graphics-rich content. Applications span from gaming and videoconferencing to industrial design and scientific modeling. Not surprisingly, applications looking to leverage WebRTC often need optimized networks.
Putting WebRTC and TURN to work
WebRTC operates on top of several protocols. For example, its peer-to-peer functionality works because of the underlying Interactive Connectivity Establishment (ICE) framework, one way for computers to communicate directly. ICE is common for real-time voice and video communications in which going through a server would cripple application performance.
Even though WebRTC helps applications run on clients without powerful GPU resources, modern platforms can recognize if a client has a suitable GPU and use it to assist in stream decoding. Offloading the decoding work from the CPU improves system performance and complements network optimization for a superior end-user experience. Thus, WebRTC users should confirm that their clients integrate this hardware-level support.
Another way to improve network performance is with the Traversal Using Relays around NAT (TURN) protocol. That’s a mouthful, so let’s back up. In routers, Network Address Translation (NAT) commonly issues the device public IP addresses, then issues private IP addresses to connected devices. The Session Traversal Utilities for NAT (STUN) protocol allows a system to discover a router’s public IP address and whether that address is available. However, some routers employ “Symmetric NAT” to approve only connections to systems with which there have been previous connections. This sort of whitelisting introduces a series of extra checks and blocks that can hamper traffic speed and communications. The TURN protocol, running on a small, separate “TURN server,” can bypass Symmetric NAT restrictions and essentially operate as a stream-forwarding machine. TURN servers carry some traffic overhead but often nowhere near as much as servers running STUN.
Or just use an already-optimized network
In this piece, we’ve touched on several ways to improve conventional network operation for real-time applications. There are hundreds more, from minor code tweaks to core hardware configuration. It’s a lot to track.
We know because we have done it all in creating and managing the Subspace network. Our engineers collected every possible enhancement and created today’s only purpose-built, fully real-time optimized network solution.
On the hardware side, Subspace is a sort of parallel internet, running alongside the public network and intersecting with it.
On the software side, Subspace has devised incredibly sophisticated and precise systems for measuring conditions between points of presence. Subspace analyzes this data in real time and uses it to construct a continuous, dynamic “weather map” of network conditions. Because Subspace analysis operates at sub-millisecond speeds, the network can automatically detect when communications issues are beginning and reroute traffic to the next most optimal path. The process is transparent to users and vastly more efficient than the human-conducted routing decisions made by ISPs.
Subspace has built its own network stack specifically for real-time applications and opens these resources to application developers through an easy-to-implement API. The net result is immediate optimization of TCP and UDP traffic, allowing real-time app communications to move around the world at previously unachievable speeds with far greater reliability than ever possible before.
Certainly, developers and service providers should learn some of the intricacies involved in operating an optimized network. Such knowledge can only help inform and improve their offerings. They should also do whatever is possible to maintain an edge over the competition and ensure that those offerings deploy to customers over an optimized network, whether they take on the time and cost needed to tackle the job in-house or opt to plug into an already-optimized solution like Subspace.
Want to start building on Subspace today? Sign up here.