Quadrat mit generierten Farben. Unterschiedliche Helligkeit. Außen ist es dunkler als im inneren Bereich. — Photo: Deutsches Museum | Karl Wienand

Digital cultures of technology and knowledge

Listening to pictures and seeing music with AI

This project sits at the interface between technology and art. We train neural networks to interpret and connect information in images and music. Finally, this artificial intelligence can generate abstract pictures based on melodies and compose music based on images.

Content

Funded by

Helmut Fischer Stiftung

Project description

AI has advanced computer vision and image generation tremendously. But how does AI understand images? Can it connect what it “sees” to different known concepts? And when it generates new samples, is it creative? We train a system of neural networks to read images and music. Then we link it with another network that connects lines, shapes and colours into rhythm and pitch.

Grafik mit zwei neutronalen Netzen. Das erste interpretiert Bilder und Musik. Das zweite neuronale Netz übersetzt Linien, Formen und Farben in Rhythmus und Höhe. — Photo: Deutsches Museum | Karl Wienand

Organising information

Babies teach themselves how to see: they learn to distinguish colours, shapes and objects long before they know what those objects are. The autoencoder neural networks do the same: they start out knowing nothing, but teach themselves, for example, to extract, encode and decode visual information.

Rastertunnelmikroskop-Bilder werden Pixel für Pixel von einem Algorithmus in Musiknoten übersetzt. — Photo: Deutsches Museum | Karl Wienand

See the invisible, hear the inaudible

In Wolfgang Heckl’s installation “Atomare Klangwelten” (“Atomic Soundscapes”, 2018), an algorithm translates scanning tunneling microscope images, pixel by pixel, into musical notes. The microscope shows what was invisible, the algorithm plays what had no sound. Coming from a similar principle, we built a mapping between colours and notes, which turns colour pictures into music and vice versa.

Please be creative

While these pixel-note maps are intuitive, they rarely produce captivating pictures or pleasurable music. However, we can use their consistent audiovisual associations to guide the neural networks to connect pictures and music. Thus the networks learn to freely associate shapes and colours with musical ideas: they translate music samples into abstract pictures and compose short melodies inspired by images. A person doing this is undoubtedly creative. Are the machine as well?

Skip content carousel