MaskMask

How WebRTC Is Driving New and Interesting Use Cases for the Metaverse

PublishedDec 08, 2021BySubspace Team
TL;DR
WebRTC’s affordances create unique benefits and functionality for the metaverse, including security and better video quality. Read more about the use cases WebRTC is driving for the metaverse here. See how Subspace can improve your WebRTC app by trying it for free now.
Estimated read time: 7 minutes

What is the Metaverse?

If it seems like everyone is talking about the Metaverse lately, that’s because it’s a concept whose time has come. The Metaverse is a maturing concept of an immersive branch, extension, or reimagination of the Internet that deals with virtual and augmented realities. Numerous private companies, organizations, and individuals are deeply invested in creating the concept, and many are making significant strides.
It’s difficult to imagine the Metaverse right now, just as it was difficult for most people to imagine a smartphone prior to 2007. Most people had cell phones at that time, but those phones weren’t also cameras, GPS devices, and small computers, and they certainly weren’t as ubiquitous as smartphones are today. Matthew Ball wrote a seminal essay titled The Metaverse: What It Is, Where to Find It, Who Will Build It that describes many of the economic, creative, and social elements of the metaverse.
In some ways, we are already beginning to see the early phases of the metaverse. One of the most obvious manifestations is happening in the gaming industry, but there are other examples as well. Non-fungible tokens, or NFTs, are enjoying a moment, but they are likely to become a core component of the metaverse. NFTs are being used right now in the visual arts, writing, music, and gaming. Although the current bubble may burst, NFTs aren’t going anywhere (at least for a while).
Conversational AI, which is used in industries from healthcare to gaming, is an important foundational component of the metaverse. The stereotype of a robotic voice will soon be a nostalgic thing of the past — or already is, if you consider devices like Alexa and Siri. NVIDIA demonstrated their progress in perfecting the combination of conversational and real-time graphics processing during the keynote of the 2021 NVIDIA annual conference. #WebRTC’s Underlying Technology Advances Real-Time Experiences Web Real-Time Communication, more commonly known as WebRTC, is a free and open source project providing web browsers and mobile applications with real-time communication (RTC) through application programming interfaces (APIs). It allows audio and video communication to work inside web pages by allowing direct peer-to-peer communication, eliminating the need to install plugins or download native apps.
One example of how important WebRTC is for normal-feeling communication on the Web is Google’s Project Starline. According to Matthew Ball, this is “a hardware-based booth designed to make video conversations feel like you’re in the same room as the other participant, powered by a dozen depth sensors and cameras, as well as a fabric-based, multi-dimensional light-field display, and spatial audio speakers.” All of this information is compressed and delivered through WebRTC.
WebRTC eliminates the middleman by not using an intermediary server to transfer communication from one computing client to another. One of the reasons for lag in video conferencing can be the time it takes for one client to send information to a server, which then redistributes the information to others. WebRTC allows peer-to-peer communication — that is, from one computer to another, with no server involved. With WebRTC, there’s a server for the original connection, then the clients (browsers) communicate directly with each other. This removes the delay that communicating through a server can cause, because there’s no need for the server to discover peers or determine how to establish connections between them. A STUN server (Session Traversal of User Datagram protocol [UDP] Through Network Address Translators [NATs] server) establishes an initial connection between clients. After that initial connection is established, the clients’ browser APIs can communicate directly with each other.
Some of the components of WebRTC include:
  • getUserMedia acquires the audio and video media, and so enables access to a device’s camera and microphone
  • RTCPeerConnection, which allows for audio and video communication between peers
  • RTCDataChannel makes bidirectional communication between peers possible
Want to learn more about the Subspace WebRTC solution? Check out this white paper here.

Sound in the Metaverse

Real-time communication, such as web conferencing, only works when there’s no lag or jitter. When the video and audio don’t match, or there’s an echo, or the video lags or pixelates, it’s frustrating, and most everyone would prefer a phone call. When you think about the metaverse, and how it’s necessary for users to feel like they are physically present, the importance of real-time communication (very good real-time communication) becomes clearer. For example, high-quality spatial audio (so that a person to your right sounds like they are to your right) is a crucial aspect. This technology exists, and is in use in applications such as hubbub, a chat app, and Google Starline. Dolby has been using WebRTC to improve the quality of audio, paying particular attention to spatial audio and things like overlapping speech, laughter, and other aspects of natural communications.
Along with the need for voices to come from the correct place in space, sound also needs to be adapted to the environment in virtual space. If you’re standing in a virtual cave, there should be an echo compared to if you’re standing in an open field. (And all of that is not to mention whether you should sound hoarse when you’re tired in a virtual world.)
Another concern is the architecture for audio routing, which must be based on location, and not room identifiers, and must be able to scale to an enormous number of participants. If you’re at a concert in the metaverse, it should sound like a concert venue in the physical world where thousands of people are talking at the same time, but you can still hear the person next to you. Although the technology for excellent audio exists, it’s important to remember that people use different sorts of microphones and headphones. Plus, companies that manufacture equipment don’t always choose to include high-end audio technology in their end products. For example, many mics only record in mono rather than in stereo.
Venture Beat sums up the problem: “When we’re in the metaverse, it won’t be good enough to just hear a flat voice from a person’s avatar. We’ll need 3D audio and more to make those voices sound realistic and convey the full range of possible emotion. Without it, conversations in the metaverse will sound no less realistic than a Zoom conference.” High-quality audio that conveys emotion that is appropriate for the virtual location, and that comes from the correct virtual space is crucial for the metaverse — but of course, along with audio, there’s video. Facial features and expressions must be conveyed from the sender and received and appear realistic in 3D.

Video May Be Less Important

It’s possible that “traditional video transmission will become less relevant and even marginal” and that it will be replaced for the most part with something akin to “super high quality 3d animojis that will even be photorealistic for some use cases.” Consider Bitmoji and Animoji, among others. These applications track thousands of points on a user’s face to create avatars that are high fidelity. With the right high-quality audio and video, data capabilities are also necessary to bring the metaverse to fruition. Low latency is obviously paramount.

WebRTC Benefits

Security matters — even, or perhaps especially, in the metaverse. One of the benefits of WebRTC is that it has always-on voice and video encryption through the Secure RTP protocol (SRTP). Other benefits include:
  • Free
  • Platform and device independence, which is necessary for bringing together people using different equipment
  • Secure voice and video
  • Advanced voice and video quality
  • Reliable session establishment, so that you don’t get kicked out of the metaverse
  • Multiple media streams, which are necessary to build a virtual world
  • Adaptability to network conditions which vary considerably
  • Interoperability with VoIP and video
  • Rapid application development, allowing the metaverse to continue to evolve in the same way the physical world is constantly changing

Conclusion

To support the initiatives pushed forward, along with the growth of the metaverse, networks will need to provide safe, secure, and robust infrastructures. At Subspace, we can offer the right low-latency connection and support resources to build your metaverse-related application.
Want to start building on Subspace today? Sign up here.

Share this post

Subscribe to our newsletter

The world’s fastest internet for real-time applications—period. Every millisecond counts. Learn more in our newsletter.

Related Articles