Projects

 

A selection of creative projects I have been involved in.

Vocal Detection for Pioneer DJ

A common mistake among amateur DJs is to overlay tracks which both contain vocals, leading to a sonic clash and a jarring listening experience. To avoid this, DJs traditionally had to precisely memorize the location of the vocals in every track, and make sure they are constantly aware of any potential overlap risk.

In my role as Principal Researcher at Qosmo, I developed and implemented a deep learning based vocal detection algorithm that can automatically find the locations off all the vocals in a track. Pioneer DJ, the world’s leading manufacturer of DJing technology, integrated this algorithm into their rekordbox software, indicating with a simple overlay where the vocals are in a track, allowing the DJ to fully focus on the creative aspects of the mix.

More details of this project on the Qosmo website.


Screenshot of Neural Beatbox. Design by Alvaro Arregui (Nuevo.Studio)

Screenshot of Neural Beatbox. Design by Alvaro Arregui (Nuevo.Studio)

Neural Beatbox

Rhythm is one of the most ancient means of human communication. Neural Beatbox enables anyone to collectively create beats and rhythms using their own sounds. The AI segments and classifies sounds into drum categories and continuously generates new rhythms. By combining the contributions of multiple viewers, it creates an evolving musical dialogue between people. The AI’s slight imperfections enrich the creative expression by generating unique musical experiences.

I worked on this project in my role as Principal Researcher at Qosmo, being responsible for research and management. As an installation, the piece was exhibited at the Barbican Centre in London as part of the exhibition AI: More than Human. A browser based version is now available online.

You can find more details about this project on the Qosmo website.


SampleVAE

SampleVAE is a multi-purpose tool for sound design and music production. The deep learning-based tool allows for various types of new sample generation, as well as sound classification, and searching for similar samples in an existing sample library. The deep learning part is implemented in TensorFlow and consists mainly of a Variational Autoencoder (VAE) with Inverse Autoregressive Flows (IAF) and an optional classifier network.

Building on my NeuralFunk project from 2018 (see below), I wanted to create a tool that is accessible for creatives without deep learning background, and only minimal coding experience. SampleVAE was my attempt at this. As technical partner and one of the facilitators of the MUTEK.JP AI Music Lab 2019, this was one of the tools explored by the artists in the lab, and one group, Gadara, ended up incorporating it into their live performance at the main MUTEK.JP festival.

The tools and code are freely available on GitHub. Trained sample models are provided, but the tool can also easily be trained on your own data. For more details, check out the article I wrote about SampleVAE.

Presenting SampleVAE during my public talk at the Mutek AI Music Lab 2019 at EDGEof in Tokyo.

Presenting SampleVAE during my public talk at the Mutek AI Music Lab 2019 at EDGEof in Tokyo.


One of the plots from my study, linking sleep latency (in seconds) with blood caffeine concentration.

One of the plots from my study, linking sleep latency (in seconds) with blood caffeine concentration.

Sleep Tracking

Sleep is one of the most important factors for optimal mental and physical performance. As Dr. Matthew Walker points out in his book Why We Sleep, “Sleep is the single most effective thing we can do to reset out brain and body health each day.” As someone interested in quantified self and biohacking, and someone who has always had sleep issues, tracking and optimising my sleep has been a big interest of mine.

In this particular study published in Better Humans, I analysed the results of tracking my sleep (with an Oura ring), caffeine intake, exercise, and alcohol consumption over several months. In a nutshell: Alcohol is terrible for my sleep, caffeine does not have as bad an effect as I had feared, and exercise was inconclusive (except not exercising too close to bed time). For all the details, have a look at the article.

I have since then been consistently tracking the same data for well over a year, as well as my blood glucose for several month, and plan to analyse this more extensive data in the near future.


Cover of NeuralFunk. Design by Alvaro Arregui (Nuevo.Studio)

Cover of NeuralFunk. Design by Alvaro Arregui (Nuevo.Studio)

NeuralFunk

NeuralFunk is an experiment in using deep learning for sound design. It is an experimental track entirely made from samples that were synthesized by neural networks. It is not music made by AI, but music made using AI as a tool for exploring new ways of creative expression.

Two types of neural networks were used in the creation of the samples, a VAE trained on spectrograms and a WaveNet (which could additionally be conditioned on spectrogram embeddings from the VAE). Together these networks provided numerous tools for generating new sounds, from reimagining existing samples or combining multiple samples into unique sounds, to dreaming up entirely new sounds completely unconditioned.

The resulting samples were then used to produce the final track. The title NeuralFunk is inspired by the drum & bass sub-genre Neurofunk which was what I initially had in mind. But over the course of the project it turned into something more experimental, matching the experimental nature of the sound design process itself.

NeuralFunk, as well as another artwork of mine got accepted to the NeurIPS Workshop on Machine Learning for Creativity and Design 2018, and you can also listen to it as well as my other tracks on Spotify now. For more details, check out my article on NeuralFunk.


Illustration by Kittyzilla.

Illustration by Kittyzilla.

The Variational Autoencoder as a Two-Player Game

Variational Autoencoders (VAEs) are a type of generative deep learning model with many practical applications. They can be used to compress data, or reconstruct noisy or corrupted data. They allow us to smoothly interpolate between real data, e.g. taking a photo of one face and then gradually morphing it into another face. They allow for sophisticated data manipulation, for example realistically varying the hair length on the image of a person, or smoothly changing a voice recording from male to female without varying any other sound characteristics.

Besides their practical applications, they are also theoretically very appealing, especially from an information theoretic point of view.

In this three part article series, which I created in collaboration with my illustrator friend Kittyzilla, I try to make the basic ideas behind VAEs and their application to natural language processing (NLP) as accessible as possible, as well as encourage people already familiar with them to view them from a new perspective. Specifically, I invite the reader to imagine the VAE as a collaborative game played by two players.

Part I explores the foundations of autoencoders. Part II takes a look at why it makes sense to make them variational (and what that even means). Finally, Part III looks at why encoding text is particularly challenging, and some approaches to make it more feasible.