May, 2023.- María José Apolo, a former IMFD student, and graduate in Civil Computer Engineering at the Federico Santa María University, has been working in the field of creative artificial intelligence, studying how the style may be transferred from music to images, by using deep learning techniques.
In the research “Bimodal style transfer from musical composition to image using deep generative models”, carried out with Marcelo Mendoza, IMFD associate researcher and professor at the Computer Science Department in the Pontificia Universidad Católica (DCC UC), the authors analyze the fact that different musical styles use certain images as a representative and they study the transfer of style from songs to covers of the albums.
The goal of the research is to evaluate whether is possible for a system to automatically create – using the music as information – an image that is suitable, for example, as a cover for an album.
Every song is an image
To achieve this, the first thing they did was feed a model with data: “The data set I generated is a multimodal dataset that had each song –represented as a spectrogram– associated with its album cover. The model was fed with a number of pairs (song, cover) of approximately 20,000 data”, explains María José Apolo.
What does it mean that the songs have been processed as spectrograms? A spectrogram is a visual representation of sound, which shows the variations in intensity and frequency over the period of time of a song. It is a much more advanced and complex version of what we see, for example, when we use the computer microphone, and it shows us –in a horizontal bar– the volume of our voice (intensity) and is activated only when we speak (frequency).
Spectrograms of similar songs will tend to share similar measurable characteristics, and an AI model can analyze these to identify those that are similar or far away on the musical spectrum.
After this, the researchers started to make queries to the model: when asked for a specific song, which the system has identified through a spectrogram, the model displayed what it considered to be the 100 covers that were most similar or closest to the consulted song. María José comments that, for the most part, the covers that the system produced were of the same genre as the song consulted or of similar genres.
Creating album covers with IA
In a second phase, the researcher began to use those 100 covers that the system provided for a consulted song, to train a generative model of images, something like the Dall-e or Stable Diffusion systems, which extracted the common characteristics of those covers and began to create new ones.
This, with the aim that, when entering a song of a certain genre, the generative system would create a completely new image that actually corresponded to the theme. This was done with songs from 10 different genres, and it was seen that the generative model, when generating new covers, obtained an accuracy of 20.89 on a scale of 0 to 250, where the closest to is more accurate or similar zero.
“This research topic is of a particular interest to me: I think it is fascinating, from the deep learning point of view, to analyze the posibility to model and give an interpretation to an abstract and subjective question, such as the transfer of style from a musical work to an image, where a translation must be made of multimodal attributes that are conditioned by the individuals’ own concepts and their biological and social background”, explains María José Apolo.
The researcher’s future hope is to generate a system that allows emerging artists who do not have enough resources to use such a platform and obtain an original album cover that matches the author’s theme.