A new frontier for artificial intelligence

Using algorithms to process sound is a booming field. Here are four promising innovations.

Machine-learning is increasingly expanding to acoustics. Engineers feed data to an algorithm, which can then recognise an object or system and act on that knowledge. “Big brands like Google and Microsoft have been very interested in this over the last few years”, says Sebastian Tschiatschek, researcher at the Institute of Machine Learning at ETH in Zurich. Four innovations are particularly promising:

Identifying personalities

By recording someone’s speech for 15 minutes you can determine if they are, for example, optimistic or self-confident. Developed by Germany’s Precire, the algorithm uses more than 5,000 voice samples to spot personality traits and conditions like stress. Some 20 companies now use Precire to filter candidates for jobs. In the process, the number of recorded voice samples keeps increasing. This is just the beginning, says CEO Dirk Gratzel, as the innovative technology could also be used by marketers to target audiences in the right tone. But is it accurate? A recent study and several experts say yes.

Improving orientation for robots

A recent innovation at MIT could help robots understand their surroundings. Researchers fed an algorithm 1,000 videos containing 46,000 sounds, most of which came from hitting objects with a stick. When the algorithm was fed a video without sound, it could still determine which noise applied to which video. “If robots can predict the acoustical properties of an object, they can better understand their surroundings and adapt their interactions”, explains Tschiatschek.

algorithms artificial intelligence sound acoustic fullwidth - A new frontier for artificial intelligence

Imitating someone else’s voice

Whether it’s your GPS, your smartphone or the smart fridge updating you on your groceries, any gadget will soon be able to speak in the voice of your choice. This is the aim of CandyVoice, a French start-up working on an algorithm with Microsoft. “The technology isn’t very complicated”, says Matthias Althoff at the Technical University of Munich. “And there’s real demand because the fun factor is high.” Reading 160 short sentences out loud gives the algorithm enough material to imitate your own voice on different devices. This could help people who have lost their voice after an accident or operation, if they created a voice file beforehand. Still, Tschiatschek worries about the potential for abuse by people who might pretend to be someone else on the telephone.

Composing songs

By classifying audio files according to musical style, artificial intelligence can compose new songs. You just have to select the desired style and add such criteria as length and instruments. UK start-up Jukedeck already uses this technology to write songs for Coca-Cola. But, according to Althoff, it’s hard to say whether artificial intelligence would make music for the mass market. “I see this more as a tool for professional composers.”

Tracking aircraft with sound waves

Where radar struggles with interference from electronic gadgetry, “acoustic prisms” could offer a new solution

Interference from electronic devices disturbs the tracking of aircraft on or near the ground, says Hervé Lissek from the École Polytechnique Fédérale de Lausanne. That is why he and this team have invented the audio equivalent of the optical prism, which splits light into its component frequencies (seen as different colours). Lissek notes that devices functioning like an acoustic prism already exist, but that they rely on signal processing. In contrast, his group’s invention acts purely through the physical manipulation of sound waves. That, he argues, is a good thing because the transition from acoustics to electronics inevitably distorts the signal. “There’s some advantage to going back to old-school engineering”, he says.

How does it work? The “acoustic prism” is a rectangular aluminium tube about 50 cm long that is divided into a series of chambers, each with its own hole to the outside and separated from the neighbouring one by a membrane. When a multi-frequency sound wave is generated at the open end of the prism, it travels along the tube and emits sub-waves from each side hole. Because the delay between emissions from successive holes depends on frequency, the way those emitted waves relate to one another is likewise frequency dependent. Waves of different frequencies are in fact emitted at different angles, which means that microphones dotted around the prism pick up sounds of varying pitch.

This principle can also be turned on its head, so that the prism acts like a direction finder. When a source of sound, such as a plane, travels overhead along the axis of the tube, the sound waves arrive at the prism with a slight delay from one hole to the next. Because the amount of delay will depend on exactly where the object is relative to the prism, the object’s position can be tracked by measuring the changing frequency of the wave set up inside the tube.

Sebastian Tschiatschek (researcher at the Institute of Machine Learning at ETH in Zurich), Matthias Althoff (TUM), Hervé Lissek (EPFL)