What if our only method of communication on a daily basis was verbal—no hand gestures, no facial expressions, and no body language? Do you realize that we unconsciously change our physical gestures and posture to match what we’re saying? These actions have become integrated into our daily behavior, altering based on the message we are trying to convey. Multimodal communication is a process in which various methods, ranging anywhere from speaking to nonverbal signals, are combined to provide enhanced comprehension and meaning. This comes across as a natural instinct for most people, and it is an essential part of daily social interactions, which utilize five main different approaches: linguistic, visual, aural, gestural, and spatial [1].

The way humans communicate is based on a system that is constantly being modified and adapted by social interactions, beginning with our early ancestors around 2.5 million years ago [2]. Within this system, broad categories have been defined to sort these different ways, or “modes”, of transferring information. The linguistic mode is the most widely used, and consists of anything concerning written and spoken words that can be delivered through either audio or paper. Another type that is especially helpful for many people is the visual mode, which combines images, videos, and color to convey meaning more efficiently. In addition, the aural mode is also one of the most common approaches to communication, relating to anything that uses sound, such as music or tone of voice, to enhance emotion in a text. Compared to the linguistic mode, where the focus is on the actual content and meaning, this method specifically emphasizes how the message is delivered. Specifically in face-to-face interactions, people are constantly using their hands and changing their facial expressions to emphasize their points, all of which can be classified under the gestural mode because they require movement. Finally, the spatial mode relies on the physical layout and organization of multiple elements, such as website builders determining where to put each button to maximize user interactions.

Language isn’t always aural—sign language, for example, uses hand motions and visual cues to transmit information. In fact, it is estimated that more than 70 million deaf people worldwide rely on this form of communication [3]. Specifically, American Sign Language, or ASL, has certain non-manual markers, in addition to the manual signals, that refers to the facial expressions or head tilting that signers use to replace voice inflections [4]. For instance, when signing a yes or no question, the eyebrows should be raised with the head slightly tilted forward, while signing a question that includes “who”, “what”, “when”, “where”, or “why”, eyebrows should be lowered and slightly furrowed with the head tilted back a bit [5]. Such variations in visual cues allow for easier interpretation of a language where emotion is not based on pitches of voice.

Gestures, along with talking, serve a foundational role in language, and they are an innate component of language production in humans. This is shown by babies gesturing before their first words, or congenitally blind speakers using hand motions to other blind people also lacking sight, indicating that gesturing is a foundational brain mechanism instead of a learned observation [6]. These manual signals are more frequently used by people with neurogenic communication & language disorders that affect how you speak, like aphasia. Specifically, studies have shown that those with non-fluent aphasia (limited to short phrases) tend to gesture at a higher rate compared to those with fluent aphasia (smooth speech) [7]. Using a multimodal approach to communication has been proven to be highly effective and to hold an advantage over unimodal methods [8]. An emotion recognition study reported that multimodal systems had, on average, a 9.83% increase in accuracy in conveying a speaker’s intention [9]. Gestures and visual stimuli make up the core of signals, but between using solely gestures and adding on vocalizations, the extra sound component often makes for a more coherent understanding.

The combination of different communication methods has proven to maximize effectiveness in understanding the meanings of conveyed messages, in addition to increasing expression and emotions in everyday conversations. Multimodal communication has already had profound effects by increasing accessibility for those with different needs and disabilities, and will continue to evolve over time.