MaskMask

Why In-Game Voice Chat Became Essential to Multiplayer Gaming and How It Impacts the Metaverse

PublishedDec 14, 2021BySubspace Team
TL;DR
In-game voice chat played a key role in the development of multiplayer gaming. Voice deepens the immersive world, helps forge social bonds, and strengthens online play. Now, as gaming is quickly blending into the ascending metaverse, real-time communications matter more than ever, which means both gaming and metaverses need a network capable of enabling them. See how Subspace can improve your voice app by trying Subspace WebRTC-CDN for free now.
Estimated read time: 10 minutes

From its earliest implementations, voice chat has been critical in driving multiplayer gaming evolution. Before widespread broadband and Voice over Internet Protocol (VoIP) availability, players needed to game over their data connection and communicate over a separate phone line, which would be tied up for the duration of game play. Things began to change in 1995, with the integration of VoIP technology into MechWarrior 2: 31st Century Combat, which had a networked player vs. player mode under Windows 95. Sega had numerous Dreamcast games with voice chat in 1999, and then Microsoft introudced voice into their Xbox. Voice would become pivotal in the success of Microsoft’s subscription-based Xbox Live service.
On the PC side, third-party applications, such as Ventrilo, TeamSpeak, and Xfire began to offer voice services that ran alongside multiplayer games. Some of these applications gained impressive functionality over time. For example, by the end of the early 2000s, Xfire had added support for Google Talk, Facebook Chat, and even in-game video broadcasting. Many PC game developers didn’t bother to integrate VoIP into their titles because players were already using these third-party tools (although there were plenty of exceptions, beginning with Dungeons and Dragons Online in 2005). And upon its public release in 2015, Discord blew every other gaming-centric chat app out of the water, in no small part because it was high-quality, scalable to 5,000 simultaneous users, and free. Microsoft provided Discord integration for Xbox Live in 2018. While Discord’s focus has broadened beyond gaming, it remains the favored solution for gaming voice chat, as evidenced by Sony bringing Discord to the PlayStation Network in 2022.
A decade ago, University of Melbourne student Gregory Robert Wadley submitted his 313-page PhD thesis titled “Voice in virtual worlds.” Despite its age, the project remains arguably the best summation of how real-time voice technology influences, elevates, and occasionally inhibits amazing online entertainment experiences. It’s a sweeping analysis with extensive context and insight on how modern use of voice can enhance multiplayer gaming and the coming metaverse. Yet many people resisted voice when it first began to replace text in online multiplayer games, in part because users felt self-conscious about their voices and how their voice might be at odds with the persona they wanted to project.
Modern voice modulation technologies can change speech to fit the user’s preferences in real time. Wadley found that “voice transforms the user experience of virtual worlds” and “makes some forms of collaboration more efficient.” Spatial voice, in which speakers in a virtual space are perceived in terms of direction and proximity, “increases immersion, reduces channel clutter, and affords new strategies for team coordination.” In short, when deployed well, voice can radically improve the social elements of virtual existence…provided the software and underlying infrastructure can deliver the performance necessary to fulfill voice’s promise.

Why in-game voice chat matters to the gaming experience

But having voice chat in games isn’t always desirable. Technically, it can be difficult to achieve satisfactory real-time voice performance across multi-platform games, although Fortnite’s Party Hub is one exemplary exception. Bringing in voice and background sounds from the real world can break the fantasy illusion of a gaming universe (especially parental hollering). Some players find the auditory chaos of public game lobbies confusing and overpowering (and potentially abusive). In situations where players on a team have not developed group habits or discipline, players may talk over each other and inhibit satisfying gameplay.
In most cases, though, players find in-game voice chat to be both more enjoyable and additive to in-game effectiveness, as when working as a squad in tactical first-person shooters. Voice alleviates the need to type messages in text chat, allowing both hands to stay on the keyboard/mouse and better keep players in the immediate game experience. Socially, voice in games can heighten fun, suspense, and even fear. It also enables easier conversation, allowing strangers to become acquaintances, and allows friendships to form that may even extend offline.
Voice also brings humanity into the gaming experience. Sometimes, that humanity distinguishes other in-game entities from AI-based non-player characters and makes the world more immersive. Voice makes the fun of in-game trash talk more visceral and engaging. (Try trash-talking your Alexa device for an exercise in futility.) Better engagement translates to more time playing. Ali Nhari, head of sales for audio/video SDK firm Agora, notes, “It’s not only about strategy, shooter, or role-playing games. Casual and card games, for instance, are directly benefiting from In-game Video/Voice chat, increasing their daily active users by at least 20% with double session time!”

Voice chat is transitioning multiplayer games to the metaverse

Everyone with two working eyes knows the benefits of stereoscopic vision — seeing the world in 3D depth rather than the strangely flatter world witnessed with only one eye. Similarly, we perceive stereo sound as richer and more lifelike than mono. The benefits continue with added channels and speaker locations. So-called surround sound via 5.1, 7.1, Dolby Atmos, and DTS:X grow increasingly close to reality, albeit with more hardware, complexity, and cost.
Since the 1990s, companies have sought to mimic surround sound through as few as two speakers (as with headphones) by using complex auditory algorithms. Put simply, it’s possible to model how sounds from various sources penetrate the ear canals. Imagine someone drops a ball several feet away to your right. The sound made by the bounce reaches your right ear before your left. It also reflects off walls and other surfaces, so it will reach your ears differently. These varying paths can be simulated in a model known as head-related transfer function (HRTF). The best results require calibration to the specific user, and headphones can yield more consistent results than desktop speakers due to how the user changes position within the speakers’ sound field. When done well, 3D spatial audio can sound like it’s coming from any direction (including above, below, and behind the user) and any distance.
In a field that benefits from immersion, the leap from stereo to spatial audio feels like the jump from mono to stereo. However, the computation needed to create and play spatial audio is formidable, and today, that means requiring a dedicated audio chip to handle that computation so the CPU remains free for other tasks. Fortunately, gaming hardware manufacturers are increasingly ready for this. Oculus has supported 3D audio in its API since the Rift headset. The PlayStation 5 and Xbox Series X|S have hardware-enabled 3D audio for their PS5 3D, and Windows Sonic for Headphones products, respectively.
The implications of spatial audio for gaming and in-game voice chat should be crystal clear. In conventional shooter games, the sound of gunfire from a rooftop sniper sounds like it’s coming from the ground. Spatial audio fixes this. In horror games with low visibility, when the user relies more on audio cues, spatial audio can be more informative and deliciously more ominous. Similarly, when a teammate shouts, “Get down!” the user has a better sense of the other player’s direction and distance. In fact, in any game environment, from FPS to casino blackjack, spatial audio will provide a greater sense of environmental immersion, improved communication clarity, and closer bonding to nearby characters. It also allows game developers to limit perception of other voices to only nearby speakers “within hearing range,” thus cutting down on that previously mentioned environmental confusion.
As you might guess from Oculus’s early integration of 3D audio, it’s only a small hop from the virtual worlds of conventional gaming to a bona fide metaverse. Battle royale titan PUBG incorporates spatial audio, which helps with team mechanics, and now PUBG creator Krafton is talking about expanding the game into a “complete interactive virtual world.” In September 2021, Roblox followed suit, announcing an in-game voice chat feature along with a new spatial voice beta for developers, all wrapped up in the context of Roblox being a metaverse in which “people can come together within millions of 3D experiences to learn, work, play, create, and socialize.” In Roblox’s mind, these elements are inextricably linked. The object is to let users communicate in the most natural, convenient, and immersive ways possible.

Impact of in-game voice chat on the metaverse

In Part III of his “The Metaverse Primer,” Matthew Ball discusses differences between a “mirrorworld” like Microsoft Flight Simulator and a more conventional game like Roblox. Flight Simulator is based on 2.5 petabytes of information (potentially even including real-time) pulled straight from the real world. The titanic size of this virtual world prohibits keeping more than a sliver of it on a local system. During play, mass amounts of data must stream in from remote servers, filling in the world the user is about to enter just before he or she reaches it. On the other hand, Roblox keeps most world data on the client device and primarily updates incidental data, such as object positions. The burden falls on local storage and computation rather than the network.
The existence of these two metaverse models raises several questions. With Roblox, changes within the world are temporary and only seen by a relative few people. This involves far less world-level resource load than having all changes be permanent for all users forever. Which model is better for a metaverse? If the answer is some hybrid mix, how will that be handled? And what happens if and when metaverses overlap, yet they use different permanence models?
As so often happens in “this or that” technology questions, the answer is “both.” We’ll need local compute and storage for impermanent worlds, and exceptional network performance for permanent and/or mirrorworld ones. And keep in mind that online games based in the former model already strain for performance across conventional network links. Latency — the delay between transmission and reception between two network entities — currently bedevils many games and real-time communication applications, because even a split-second delay will ruin the experience. Closer to the user, network bandwidth becomes a potential issue, as the resolution and refresh rates of next-generation virtual worlds can dwarf the bandwidth needs of, say, high-def streaming video. And not least of all, reliability must persist from the user to the application’s core servers and all network exchange points in between. In a world of congestion, cut lines, and misconfigured equipment, this is easier said than done.
The point is that the network bottlenecks that have challenged real-time communications for decades will not only persist but magnify as gaming evolves into even more resource-intensive metaverses. In-game voice chat is an essential, enabling component of bridging games into metaverses, but it is only one of many such components. Metaverses of any stripe will need sufficient network performance to enable real-time communications that span the real and virtual worlds concurrently. Very little distinction will soon exist between games and virtual worlds in this regard.
This rising need is why Subspace is now rolling out, as VentureBeat calls it, the company’s “parallel and real-time internet service for gaming and the metaverse.” This will be a network-as-a-service offering for developers, especially for those making real-time games set in massive virtual worlds. Subspace is a direct response to the public internet’s technical shortcomings and their impacts on gaming and communication experiences. The company’s globally deployed network operates alongside the public internet and addresses those needs around latency, bandwidth, and reliability.
Tomorrow’s metaverse will likely be made by game publishers. Some of those publishers are already giving better gaming experiences to millions of new users on Subspace’s network. As the benefits of those improved experiences drive more revenue, Subspace will continue to expand its infrastructure, so the metaverse will have a network foundation able to meet any emerging model or need.

Share this post

Subscribe to our newsletter

The world’s fastest internet for real-time applications—period. Every millisecond counts. Learn more in our newsletter.

Related Articles