Experiment #2 - The AV Drumkit

Leigh Davies
Dec 30, 2019
8 min read

Updated: Jan 22, 2020

So the outward-facing deep dive into the relationships between light and sound that I conducted earlier this week left me feeling seriously invigorated and with plenty of new ideas to test out. All of this fuel resulted in a somewhat unexpected process of drum recording and iterative video editing in an experiment that I'm calling The AV Drumkit. The AV Drumkit was a 2-day exploration of the relationship between 'sound emitters' and 'reactive visuals'. In total contrast to my last audiovisual project featured in this research and development blog (Sono-Subjective), this week's experiment has allowed me to work in a more measured way, breaking down discrete components of visuals and sound in order to pull together more considered harmonious combinations for the interplay between the visual and sonic modalities. As I began scribbling some initial ideas down, it dawned on me pretty quickly that I was dealing with technical keywords and phrases that actually spanned disciplines associated with each of these modalities. Such ingredients as attack, decay, temperature, velocity, colour, shape, position, movement, volume, and so on. Words familiar to all of us, because they're words that are used within a multitude of contexts - used in everyday language, but also in established specialist lexicon. In the spirit of my pursuit of a multimodal praxis, let's begin to think about these words with an interdisciplinary mindset i.e. how they span specialised disciplines. So this is where this experiment would be valuable - to explore this context-spanning language and see how it can influence and inform the design of new media and multimodal outputs.

For this experiment, I was looking at the relationship between 2 sensory modalities: sight and sound. The plan was to first establish a fairly basic drum groove, one that (for the majority of the phrase) consisted of a somewhat monophonic voice structure (i.e. only one drum being sounded at any one moment) Like so...

Iteration #1 - The drum loop.

Badda bing badda boom. It even comes complete with a cheeky tooth 'ding' and a rickety drum fill to play us out for good measure... Get used to it, this clip is going to be used in each of the following iterations. To start this exploration the first thing to implement was a basic reactive visual. So let's start with a look into attack and decay. With each hit, there are a variety of sonic ingredients that we can draw upon, the initial attack of each hit, the decay of each hit, and the silences in-between each hit. It was time to explore what attack and decay meant within the manipulation and design of a reactive visual. So, from these elements I assembled the first layer of reactive visual for this experiment:

Iteration #2 - The reactive visual.

In this second iteration, we see a simple rectangular plane that now occupies the right-hand side of the video clip. The opacity of this rectangle hits 100% at the peak of each drum hit. From here the opacity decays to 0% as each drum hit decays to silence. This relationship is all about content presence. The attack and decay being interpreted as the arrival and departure of any 'event' within each modality. This works fairly well... but it only currently provides a tiny slice of what is actually going on within the sonic modality. As mentioned earlier, most of the drum groove established for this clip was made to ensure that most of the drum hits were separate and distinct as part of a monophonic phrasing. This was to ensure that the visualisation efforts for this experiment could focus on one sound at a time. However, during the instances where there are fills, two or more voices of the drumkit do indeed sound at the same time, and this isn't explicitly represented with the current reactive visual. For this, we need to now take a look at volume. Considering the cold hard physics, the increase in amplitude created when 2 or more drum vocies sound at the same time (shown in dBs) is a negligible qualifier for a discernable visual counterpart. This is because the additive amplitude of these two sounds is not double what each would be on their own. The science behind this is well documented. Perceived loudness isn't just about sound intensity. However, when two drum hits are sounded simultaneously the audio seemingly offers 'more' to the listener at that moment, and this is a psychoacoustic phenomenon that absolutely needs to be reflected within the reactive visual in the video beyond the pure maths that describes the volume of the audio. With this in mind this next iteration aims to resolve this:

Iteration #3 - Audiovisual volume.

This paints the picture regarding 'volume' a little more accurately now. The opacity of the reactive visual now only reaches 100% when the most amount of drum voices sound simultaneously throughout the phrase, in this case, 3 is the maximum that ever sound simultaneously. The white rectangle hits around 80% opacity when 2 drums are sounded simultaneously, and it hits 60% when only one drum is sounded. Then for some extra expressive measure, any subtle ghost notes on the snare drum come in at 30% opacity. This may seem like a subtle addition to the visual component of this experiment, but in the same way that velocity automation is subtle (within DAW workflows), it is undoubtedly an effective ingredient to generate more expressive sound - therefore this would serve as the expressive counterpart for the design of reactive visuals. But let's face it, it's not implemented in the best way it could be here. This current visual is feeling choked, erratic, and cluttered, and as a result it's seemingly doing a lot of work. In what currently seems to work as some form of audiovisual illusion, iteration #3 makes it look as though the visual component is doing more work than the sonic component, even though this is not the case. This is because the reactive visual is one entity, it is a singular 'object' that is representing many sonic 'objects', some of which overlap. So in response to this, the next ingredient I explored was spatialisation. When thinking about the spatialisation of the sound within a digital medium, stereo, surround, and binaural audio would allow for the spatialised placement of the voices that make up the drums within the sonic modality. In the case of this video clip, we are dealing with rudimentary stereo audio. But looking past the video file itself, and thinking about the drums within the space in which they are situated, we can see the hardware that makes up this instrument occupy different spaces and would, therefore, emit from different spatial positions in the room in which it occupies. Let's reflect this visually in the next iteration of the video:

Iteration #4 - Spatialised voices.

...and with the introduction of spatialisation, we now have what can be described as a visual drumkit. A multi-voice spatialised reactive visual that distinctly shows each discrete hit. Also, with this new configuration, other phenomena emerge here that ape the qualities of the original audio, such as repetition of discernible patterns. This is suddenly becoming a very simple but powerful way to visualise a drumkit. Now let's deal with colour. I hadn't mentioned this yet, but I decided to use white to represent the sound of the drumkit in a response to some of the observations that emerged from the outward-facing research earlier this week. Through the last iteration of this video, I absolutely could have used different colours, instead of spatialisation, to help us decern the separate voices that make up the full drum kit, but this wouldn't have been the most harmonious interdisciplinary ingredient to utilise for that purpose. Why? Well... the tone of the voices that make up the drum kit are not the most melodic (at least, relative to other instruments that often accompany drums within a wider musical arrangement). Their tonal makeup is much more textural as they are made up of many more frequencies than say a piano key or a plucked guitar string. To draw on a comparable visual ingredient, they are much closer to that of 'white light', i.e. many summed frequencies. Okay, they're not quite white noise! (which would be the direct the sonic equivalent) but for the purpose of the relative range of sonic content that will form this experiment, they will absolutely be the closest thing to such an ingredient. So with this, I'm deciding to leave these white, as this is something that the drums can confidently and distinctly own within the next iteration of this video. Because next, I'm adding a new voice into the sonic mix - a simple synth melody - so we will need a new unallocated visual ingredient to represent this new sonic ingredient, and for this, I'm going to utilise colour:

Iteration #5 - Adding a melodic voice.

When I said simple, I meant simple. This newest iteration features a synth line that features 3 sustained notes - a C, a D# and another C, which is then rounded off by a series of staccato notes, again pitched at D#, then a final staccato note pitched at C at the very end of the phrase. The C is represented in blue, and the D# is represented in pink, which are colour selections I've made in line with the new colour scale I daringly established in my previous post. Instead of introducing a new visual to denote the attack and decay of this new instrumental voice, I have instead used the pitch of this melody to colour the rhythmic elements of the drum kit. This is in an effort to explore the influence that multi-voice music compositions have on the listener and the way this can be represented in the visual modality. It's very safe to say that the rhythm of a track is owned by the percussion. Okay, yes.. rhythm is naturally applied to other voices within musical arrangements, but the dominant element, the one that is responsible for championing this musical component is the percussion. Likewise, the 'colour' of a piece of music is carried by the melodic instrumentation. Drawing upon these standardised roles within typical music arrangements, we end up with a reactive visual where the rhythmic motion of the visuals is driven by the various percussive voices, and the colour of the visuals is driven by the melodic content. ...and here is where I've decided to stop. I could literally go deeper and deeper with this experiment, and at some point, I fully expect to do so because the sheer wealth of learning here has been utterly invaluable to me, and I can confidently say that this will drastically change the way I conduct the design of any audiovisual artifacts in the future! (I can already see some version of iteration #5 manifesting itself as a projection-mapped live AV drumkit experience...I totally need to find time to explore this soon!) However, by iteration #5 the allocated time for this 2-day experiment was up, and other priorities within this explorative journey dictate that I must move on. After all, the purpose of this process was to help inform a wider objective - what does my multimodal toolkit/praxis need to consider in order to support multisensory and interdisciplinary methods like the process above? So, let's reflect and summarise. The nature of my multimodal praxis-in-waiting will categorically avoid the notion of 'right or wrong', when it comes to assembling intermodal relationships in new media design... the above process of iterative videos was indeed trying to explore a cognitive 'harmony' between these modalities, but there may very well be creative briefs or opportunities to develop experiences that are actually 'dissonant' in the assembly of their modal components, not 'harmonious'. Core objectives of any given new media experience or artifact will ultimately dictate the nature of this, but even given this, this experiment has still been very useful. It's shown that an awareness of transdisciplinary language, technical keywords that are used across different creative disciplines, is vital when it comes to conducting the landscape to achieve what is desired via any given piece of work, which is undoubtedly going to be a valuable ingredient within my multimodal praxis. This is exactly the kind of line-in-the-sand drawing I was desperate to achieve via these experiments - what components of specialist theory needs to be considered and abridged within a toolkit that supports generalist workflows? ...well I can confidently say that a rosetta stone for interdisciplinary language is without a doubt one such ingredient!

Iteration #1 - The drum loop.

Iteration #2 - The reactive visual.

Iteration #3 - Audiovisual volume.

Iteration #4 - Spatialised voices.

Iteration #5 - Adding a melodic voice.

Comments