Growing up with blindness or low vision can be difficult for kids, not just because they can’t read the same books or play the same games as their sighted peers; Vision is also a big part of social interaction and conversation. This Microsoft research project uses augmented reality to help kids with vision impairment “see” the people they’re talking with.

The challenge people with vision impairment encounter is, of course, that they can’t see the other people around them. This can prevent them from detecting and using many of the nonverbal cues sighted people use in conversation, especially if those behaviors aren’t learned at an early age.

Project Tokyo is a new effort from Microsoft in which its researchers are looking into how technologies like AI and AR can be useful to all people, including those with disabilities. That’s not always the case, though it must be said that voice-powered virtual assistants are a boon to many who can’t as easily use a touchscreen or mouse and keyboard.

The team, which started as an informal challenge to improve accessibility a few years ago, began by observing people traveling to the Special Olympics, then followed that up with workshops involving the blind and low vision community. Their primary realization was of the subtle context sight gives in nearly all situations.

“We, as humans, have this very, very nuanced and elaborate sense of social understanding of how to interact with people — getting a sense of who is in the room, what are they doing, what is their relationship to me, how do I understand if they are relevant for me or not,” said Microsoft researcher Ed Cutrell. “And for blind people a lot of the cues that we take for granted just go away.”

In children this can be especially pronounced, as having perhaps never learned the relevant cues and behaviors, they can themselves exhibit antisocial tendencies like resting their head on a table while conversing, or not facing a person when speaking to them.

To be clear, these behaviors aren’t “problematic” in themselves, as they are just the person doing what works best for them, but they can inhibit everyday relations with sighted people, and it’s a worthwhile goal to consider how those relations can be made easier and more natural for everyone.

The experimental solution Project Tokyo has been pursuing involves a modified HoloLens — minus the lens, of course. The device is also a highly sophisticated imaging device that can identify objects and people if provided with the right code.

The user wears the device like a high-tech headband, and a custom software stack provides them with a set of contextual cues:

  • When a person is detected, say four feet away on the right, the headset will emit a click that sounds like it is coming from that location.
  • If the face of the person is known, a second “bump” sound is made and the person’s name announced (again, audible only to the user).
  • If the face is not known or can’t be seen well, a “stretching” sound is played that modulates as the user directs their head towards the other person, ending in a click when the face is centered on the camera (which also means the user is facing them directly).
  • For those nearby, an LED strip shows a white light in the direction of a person who has been detected, and a green light if they have been identified.

Other tools are being evaluated, but this set is a start, and based on a case study with a game 12-year-old named Theo, they could be extremely helpful.

Microsoft’s post describing the system and the team’s work with Theo and others is worth reading for the details, but essentially Theo began to learn the ins and outs of the system and in turn began to manage social situations using cues mainly used by sighted people. For instance, he learned that he can deliberately direct his attention at someone by turning his head towards them, and developed his own method of scanning the room to keep tabs on those nearby — neither one possible when one’s head is on the table.

That kind of empowerment is a good start, but this is definitely a work in progress. The bulky, expensive hardware isn’t exactly something you’d want to wear all day, and naturally different users will have different needs. What about expressions and gestures? What about signs and menus? Ultimately the future of Project Tokyo will be determined, as before, by the needs of the communities who are seldom consulted when it comes to building AI systems and other modern conveniences.