Humanisation in Sonic Pi

Not sure where to put this, so: General.

I’m playing around with humanisation. I wanted to make a live-coded piece using MIDI control of Logic Pro X. Originally I tried ProTools - but ProTools started having response issues when I sent it a lot of MIDI on multiple tracks.

I was working on a loop that would take a description of a guitar chord, and play it back using a fingerstyle guitar pattern and an acoustic guitar sound on a DAW. However, the lack of humanisation made the result sound very mechanical. Mechanical seems to work with synth sounds, but not with natural sounds it sounds bad.

The following is me playing around with humanisation. I use normally distributed errors, and specify my drum pattern as an array of rings and individual values. Describing all of: the drum sounds played, their times, typical velocities, and the standard deviation of errors for the times and velocities.

When played using built-in samples, the result is still quite mechanical sounding unless I increase the size of the errors, and make the drums sound extremely poorly played. When I set it to output MIDI and use drum kids in Logic, they sound more humanised, and I had to reduce the average size of the errors.

I’m not sure if going to the effort of making the timing and velocity errors normally distributed was worth it.

I’ll post my code here. Are there things that I’m doing wrong, or inefficiently. Note that I know that .is_a?(Numeric) is undocumented. Is there a documented method of achieving the same effect? How do others achieve humanisation?

#
# Calculate a normally distributed random number with mean and standard
# deviation
#

define :boxMuller do |mean=0, sd=1|
  pi = 3.14157
  u1 = rrand( 0, 1 )
  u2 = rrand( 0, 1 )
  return mean + sd * Math.sqrt( -2 * Math.log( u1 )) * Math.cos( 2 * pi * u2 );
end

#
# humanise a value, most likely time or velocity, by adding
# normally distributed 'noise' to it. Numbers are constrained
# within a minimum and maximum, and may be rounded to an integer
#

define :humanise do |value,dev,min,max,integer=false|
  
  nvalue = value + boxMuller( 0, dev )
  
  if nvalue < min
    nvalue = min
  end
  
  if nvalue > max
    nvalue = max
  end
  
  if integer
    return nvalue.round
  else
    return nvalue
  end
end

#
# Play a drum pattern to use the above
#

use_bpm 127
use_midi_defaults port: 'iac_driver_bus_1', channel: 10
set_mixer_control! limiter_bypass: 1
use_debug false

#
# A data structure. First element (ring) of the array is midi notes
# Second element is typical velocity for each note.
# Third element is typical time that the note is played.
# Fourth element is standard deviation of velocity.
# Fifth element is standard deviation of time.
#
#

pattern1 = [ (ring :bd_haus, :sn_dolf, :bd_haus, :bd_haus, :sn_dolf,
              :drum_cymbal_closed, :drum_cymbal_closed,
              :drum_cymbal_closed, :drum_cymbal_closed ),
             (ring 100, 90, 80, 90, 90, 70, 70, 70, 70 ),
             (ring 0, 1, 1.5, 2, 3, 0.5, 1.5, 2.5, 3.5),
             10, 0.01 ]

pattern2 = [ (ring 36, 38, 36, 36, 38, 42, 42, 42, 42 ),
             (ring 100, 90, 80, 90, 90, 70, 70, 70, 70 ),
             (ring 0, 1, 1.5, 2, 3, 0.5, 1.5, 2.5, 3.5),
             3, 0.01 ]

#
# choose the sample or MIDI based pattern
#

pattern = pattern2;

#
# use this to switch humanisation on or off.
#

set :human, true

#
# Actually play the pattern. With or without humanisation.
#


live_loop :drums do
  sync :loop
  human = get :human
  
  veldev = pattern[3]
  timedev = pattern[4]
  
  pattern[0].length.times do |i|
    
    samp = pattern[0][i]
    vel = pattern[1][i]
    time = pattern[2][i]
    
    if human
      vel = humanise( vel, veldev, 0, 127, true )
      time = humanise( time, timedev, 0, 8 )
    end
    
    at time do
      if samp.is_a?(Numeric)
        midi samp, sustain: 0.125, velocity: vel
      else
        sample samp, amp: vel / 127.0
      end
    end
  end
end

#
# metronome loop
#

live_loop :timing do
  cue :loop
  sleep 4
end


2 Likes

I looked into implementing the humaniser patch that gets used with Ableton Live/Max MSP.

There’s an algorithm to simulate group interaction here https://gist.github.com/xavriley/2a36215e43f6503e19bc

Single player humanisation is here https://gist.github.com/xavriley/c3e79c8b3fa4c1d85aac

Let me know if you have any questions

2 Likes

It sounds great! Definitely feels more alive. I tried to learn more about it but the two first links on the page describing it are dead. There is the reference though so I can find it easily.

I’m still a bit confused about it. Could it be implemented in the same fashion as an advanced randomization function, doing a very precise drunk-walk around a value or is it totally different?

1 Like

Sorry I didn’t see this until now. There’s a link to one of the original papers here: The Nature and Perception of Fluctuations in Human Musical Rhythms The Google Scholar page for Holger Hennig has others he’s done (but he also works in other fields so you may need to dig a bit)

Essentially, yes it’s a kind of advanced randomisation. The research used something called “detrended fluctuation analysis” (DFA) which is a useful tool for teasing apart a noisy signal to show what the distributions of noise were that make it up. White noise is a totally random distribution. Pink noise (aka 1/f noise) is like white noise but with the randomness sloping off at higher frequencies. The humaniser work managed to figure out that there’s a bit of both of these when a human plays a rhythm.

a very precise drunk-walk around a value

This is a really good way of putting it! The only clarification is that a drunk walk is classified as brown noise (if memory serves me correctly) but the idea is basically right. The other thing is that we’re talking about timing variations of the smallest subdivision of a beat, which is typically very short. That means the model in the paper needed to be super precise to achieve the same results as they observed. In turn, that means that it needs a fancy algorithm to work out the pink noise as the more common ones weren’t accurate enough.

I managed to find and include this fancy pink noise in Sonic Pi recently so the building blocks are all there to implement a proper humaniser - I just need to knuckle down and do it

3 Likes

I think that humanisation should be about more than adding randomness across the board. In a real performance - say take a little combo with drums, bass, guitar, vocals - you’d expect the bass and drums to be solidly in time, with the guitar maybe a bit more fluid. Whereas the vocals could meander around really quite a lot, to taste.

Even then, depending on the style, the drummer might push ahead of the pulse or pull it back, which can involve playing some parts of the kit in time and moving other parts around - like playing the snare slightly early or late. But I think these variations are typically at lot smaller than the singer is allowed. And the variation from bar to bar should be a lot less than the advance/lag itself - or the drums just sound wrong.

Just some thoughts.

1 Like

Totally, I think there’s more going on in a real performance. However in another paper by the same author he puts forward a model that allows for these (he calls it “mutually interacting complex systems”) which has a paramter for how loose each player is.

The other aspect of humanization is dynamics, which this doesn’t cover at all, but it’s likely to follow similar principles (e.g. random variation around a center).

There’s a cool project by Google called GrooVAE where they managed to capture some of the timing and dynamics variation from hours of recordings from real drummers playing an electric kit. It allows you to transfer a “groove” to a midi drum input. It’s pretty cool - I’d love to get it into Sonic Pi when I have time

1 Like

Yes, that sounds interesting - much better than the simple ‘add some randomness’ algo’s I’ve seen in the DAWs, although I expect Abelton will have some AI in it now or soon! (Ableton, pah… :smile:)

I’ve got to say - and this is very my own idiosyncratic view - that the more that the electronic stuff approaches real playing, the more I like synthetic sounds that really do sound like a machine made them. I liked synths originally because they didn’t sound like real instruments.

I feel the same about CGI - it’s just too realistic to be fun any more, make any sense? When it was less good, it had a bigger wow factor. Anyone? Just me… ?

1 Like

I wanted a normal distribution. This deserves more attention. Thank you.

Edit: I went to use it and just saw that with your value of pi, tires run slightly flat. That last digit would usually be a 9, not a 7.