Since Siri was introduced in 2010, the world has been increasingly enamored with voice interfaces. When we need to adjust the thermostat, we ask Alexa. If we want to put on a movie, we ask the remote to search for it. According to some estimates, some 33 million voice-enabled devices reported for duty in American homes by the end of 2017.
But there are limitations to voice-enabled interactions. They’re slow, embarrassing when other humans are around, and require awkward trigger phrases like “Okay, Google” or “Hey, Siri.”
Thankfully, though, talking into midair is no longer our only—or best—option.
The new iPhone introduced a camera that can perceive three dimensions and record a depth for every pixel, and home devices like the Nest IQ and Amazon's Echo Look now have cameras of their own. Combined with neural nets that learn and improve with more training data, these new cameras create a point cloud or depth map of the people in a scene, how they are posing, and how they are moving. The nets can be trained to recognize specific people, classify their activities, and respond to gestures from afar. Together, neural nets and better cameras open up an entirely new space for gestural design and gesture-based interaction models.
Thankfully, talking into midair is no longer our only—or best—option.
These new options beg the question: Of the existing interaction modalities—haptics (touch), sound (voice), and vision (gesture)—which is better to use when, and why?
The use case points toward an answer. When you’re SCUBA diving, or water skiing, or directing traffic on the deck of an aircraft carrier, the auditory channel isn’t available, so gesture or touch become essential.
In an operating room, a surgeon’s hands are sterile; she can’t flip through radiology scans—only speech and gesture are available. If you’re conducting an orchestra or on a military raid you can’t call out commands, so we’re back with gesture.
From Charlie Chaplin to Cricket, there are a wide range of sources to inspire us as we design for gestural communication.
To dig into it further, our team at the Cambridge studio snagged a camera like the one in the new iPhone and performed a series of experiments to figure out when gesture might be the best choice.
First, we gave pairs of people an idea, then we asked them to make a four-handed pose to express that idea.
IDEO designers pair up to express ideas through gestures.
Then we recorded stories and tracked people’s hands using computer vision to study when we naturally deploy gestures to amplify emotion or explain a concept.
Tracking how we gesticulate to bring stories to life.
Ask a group of people to perform an action or request and you get some variation. The trick is finding gestures that are as universal as possible.
Lastly, we trained a neural network to recognize a small set of gestures, and used these to control a Philips HUE light set and a Spotify station to create an installation for the office.
Using a set of gestures, we trained a few devices in our office to respond to our cues and adjust the lights and music.
In messing around with these exercises, we discovered that gestures need to be either sequential, like a sentence—noun then verb, object plus operation. For example, for “speaker, on,” one hand designates the noun, and the other the verb: You point to speaker with left hand and turn the volume up by raising right hand.
Another surprising insight: Gestures are generation-specific.
When asked to signal turning up the volume, a few people twisted an invisible knob, but most of the under 30s lifted a palm or made a pinching gesture with their fingers.
Turns out gestures are generation-specific.
After analyzing our results, we boiled our thoughts down to four reasons to opt for gesture over voice or touch:
It's exciting to imagine a whole new category of products that will take advantage of gesture’s subtlety, expressiveness, and speed.
How might we use gesture in unexpected ways? I’d love to hear your thoughts.
Many thanks to the rest of the team: Lisa Tacoronte, Todd Vanderlin, Jason Robinson, Danny DeRuntz, Brian Standeford, Eric Chan, and Ari Adler.