Markov chains for beginners

Recently there have been some discussions on the usage of Markov chains in SPI programs. However, the discussions have been on a technically high level from the beginning and basically within a small group of experts. Therefore, I would like to do a basic educational contribution explaining what a Markov chain and matrix is and why this is useful for SPI programmers.

The motivation to use Markov chains is to reduce the randomness of fully random functions such as rand, choose, shuffle, pick,... that basically are based on equal probabilities for all choices they make. It is possible to give the randomness a ‘color’ using the command use_random_source, but still it is not possible to really give the random process a purposeful direction.

This is what Markov processes do. The name stems from a russian mathematician who was born in the 19th century. In a nutshell, using Markov chains it is possible introduce probabilities for a sequence of notes or chords, where there is a progression in a certain direction, but still has some randomness.

I’ll explain this concept using two examples. The first is a sequence of notes that is purely random and most of you have used such randomness in your programs many times.

use_synth :tri
my_scale = (scale :C3, :ionian)

live_loop :pure_random do
  sca = my_scale.shuffle
  sca.each do |n|
    play n
    sleep 0.5
  end
end

The shuffle command is used to generate a random sequence of notes from a given scale. Listen to it and you will hear how pure randomness sounds like. Mathematicians would call that a ‘white noise random process’.

Now lets give the random process an upward direction, not strictly but with some randomness remaining. First we construct a matrix that contains the probabilities we would like to use:

markov_matrix = [
  [0.1, 0.8, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.1, 0.8, 0.1, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.1, 0.8, 0.1, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.1, 0.8, 0.1, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.1, 0.8, 0.1, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.8, 0.1],
  [0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.8],
  [0.8, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1],
]

The interpretation is as follows. If we are playing the first note of the scale :C3, then this note has got the index 0 in the scale: :C3 = my_scale[0]. The probabilities for the notes to follow are contained in the row of our markov_matrix that has got the same index 0: markov_matrix[0]. If you read the first row, you will find the propability 0.1 as the first entry. This means: choose as next note the one with index 0 with a probability of 0.1. In other words: play the same note again with 10% probability. The next entry in the row has got the value 0.8. That means, play note with index 1 (my_scale[1] = :D3) with a high probability of 0.8. So, if we are playing a :C3, then a :D3 will follow with probability 80%. And so on.

If you compare the rows of the matrix, you will recognize a pattern. No matter which row number n we take, the probability to play as next the note my_scale[n+1]is always given by 0.8. There are no probabilities given for stepping back a note, only probabilities for playing the same note again (0.1), playing the next note (0.8) and playing the second next note (0.1). This will produce a raising sequence of notes, but not strictly and subject to some randomness.

The rest of the program is using some technique on how to use the built-in function rand to actually get the next note with the probabilities specified in the Markov matrix. In case you have some question on how exactly the function next_note_idx works, just ask.

Here is the complete Markov chain program. Listen to it and compare it to the pure random program. What you will hear is what mathematicians call a Markov process.

# Markov chains for beginners

use_synth :tri
my_scale = (scale :C3, :ionian)

markov_matrix = [
  [0.1, 0.8, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.1, 0.8, 0.1, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.1, 0.8, 0.1, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.1, 0.8, 0.1, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.1, 0.8, 0.1, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.8, 0.1],
  [0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.8],
  [0.8, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1],
]

define :next_note_idx do |current_note_idx|
  n = 0
  r = rand
  pp = 0
  markov_matrix[current_note_idx].each do |p|
    pp += p
    break if pp > r
    n += 1
  end
  return n%8  # avoid numbers > 7
end


n = 0  # initial index of note in scale
live_loop :markov do
  n = next_note_idx(n)
  play my_scale[n]
  sleep 0.5
end

Play with the probabilities to try chains that also are allowed to make backward steps or larger jumps. Just make sure that all numbers in a given row sum up to 1.0. That’s a mandatory requirement for Markovian matrices, because there should be a next note to play with probability 1.0 (otherwise the process may be caught in a never ending loop without playing something).

If you put single 1.0 probabilities in a row, then you can actally force a certain note always following a certain previous note.

16 Likes

Great tutorial! I’ll throw my few pennies to this tutorial for Markov chains. There are always other ways to do the same thing.

Instead of matrix, you can also use different data structures to represent the Markov chain. For example hash:

h = {
  0=>[0,1,2,3,1],
  1=>[2,3,4,5,3,2],
  2=>[2,0,1,5],
  3=>[1,3,4,6],
  4=>[2,0,2],
  5=>[3,2,5],
  6=>[2,3,1]
}

degree = 0

live_loop :markov do
  play scale(:E, :major)[degree]
  degree = h[degree].choose
  sleep 0.25
end

Dont mind the chain - its prob awful to listen to but i hope it proves the point. I usually like this approach because it’s easier to follow which degree might go to which and so on. The chain contains essentially the same information as the Markov matrix.

Don’t believe me? Try it yourself

def chain_to_matrix(h)
  h.values.map do |o|
    arr = o.inject(Array.new(h.length, 0.0)) {|r,v| r[v]+=1.0; r}
    arr.map{|v| v/=arr.inject(0, :+)}
  end
end

print chain_to_matrix h
5 Likes

Hi @amiika thanks for the support! Yes, in practise you would use hash maps for an efficient implementation of the Markov matrix, as you suggested. I tried to be close to the notion of a matrix because this is what you’ll find in the literature or in Wikipedia on the subject matter.

In the example Harmonic random walk in major I am using a 2-dimensional hash map and a state space that is not just a scale of 7 notes but a complete selection of chords and tonic keys. But the idea is the same: Whenever you want to combine rules and randomness, Markov chains are a great way of achieving it.

Why would we want to combine rules and randomness? Well, the rules can implement the concept of style, while the randomness is the variation within a style. If we would be able to map Beethoven’s style of composition inta a Markovian matrix and do the same with Schubert’s or Chopin’s style, they would have different Markovian matrices. And when you let the program play, it will produce a random output but sound like the respective styles (ok, in practise this would be a very difficult and complex task, but I just wanted to illustrate the idea of combining rules and randomness).

2 Likes

Just to make it complete: The matrix of my example above would translate into @amiika map as follows:

h = {
  0=>[0,1,1,1,1,1,1,1,1,2],
  1=>[1,2,2,2,2,2,2,2,2,3],
  2=>[...],
  3=>[...],
  4=>[...],
  5=>[...],
  6=>[1,1,1,1,1,1,1,1,1,2,6]
}

The repetition of the numbers gives us the 0.8 versus 0.1 probabilities when using a simple choose. From this example it also becomes clear, that this data structure is not so efficient when it comes to strange proportions of probabilities such as one note having 0.13 versus other note having 0.87 probability.

Another possibility would be using the knit command in order to establish the right proportion of numbers.

h = {
  0=>(knit 0, 1, 1, 8, 2, 1),
  1=>(knit 1, 1, 2, 8, 3, 1),
  2=>(...),
  3=>(...),
  4=>(...),
  5=>(...),
  6=>(knit 1, 8, 2, 1, 6, 1)
}
2 Likes

Should do a transformation method from matrix to hash just for fun :slight_smile:

As a next chapter to this tutorial would be to show a simple way to create a Markov chain from existing melody.

This example shows how to create a chain from existing melody using a very simple approach by simply following the melody and listing all the changes from one note to another. I used a melody from a flute suite that @robin.newman posted few years ago:

# https://gist.github.com/rbnpi/f17ab21789cf94bf42c62638473ec76c#file-bjs1031b-rf-rb-L9
a=[:D5,:Ef5,:D5,:D5,:G5,:Ef5,:C5,:C5,:D5,:C5,:C5,:A5,:C5,:C5,:Bf4,:A4,:G4,:r,:Bf4,:D5,:Ef5,:C6,:Ef5,:Ef5,:D5,:F5,:Bf5,:F5,:G5,:F5,:Ef5,:D5,:C5,:F5,:A5,:C6,:Bf5,:D6,:C6,:Bf5,:A5,:G5,:F5,:Ef5,:D5,:r,:F5,:G5,:F5,:F5,:Bf5,:G5,:Ef5,:Ef5,:F5,:Ef5,:Ef5,:C6,:Ef5,:D5,:F4,:G4,:A4,:Bf4,:C5,:D5,:F5,:Bf5,:D5,:D5,:C5,:A5,:C5,:A4,:Bf4,:D5,:G5,:A5,:Bf5,:A5,:G5,:A5,:G5,:F5,:E5,:G5,:F5,:E5,:D5,:Df5,:D5,:E5,:Af5,:A5,:E5,:F5,:D5,:Df5,:D5,:E5,:Af5,:A5,:E5,:F5,:D5,:Df5,:D5,:E5,:A4,:Bf4,:D5,:Df5,:D5,:E5,:D5,:A4,:D5,:Df5,:D5,:E5,:D5,:G4,:Bf4,:A4,:G4,:F4,:E4,:F4,:D4,:r,:r,:D5,:Ef5,:D5,:D5,:G5,:Ef5,:C5,:C5,:D5,:C5,:C5,:A5,:C5,:Bf4,:G4,:A4,:Bf4,:C5,:D5,:Ef5,:F5,:Ef5,:D5,:C5,:Bf4,:A4,:Df5,:D5,:A4,:Bf4,:G4,:Fs4,:G4,:A4,:Df5,:D5,:A4,:Bf4,:G4,:Fs4,:G4,:A4,:G4,:A4,:Bf4,:C5,:D5,:Ef5,:D5,:C5,:Bf4,:D5,:C5,:Bf4,:A4,:G4,:Ef5,:C5,:Bf4,:A4,:Bf4,:A4,:G4,:D5,:Bf4,:G4,:G5,:Bf4,:r,:Bf4,:A4,:Bf4,:G5,:Bf4,:Bf4,:C5,:A4,:Bf4,:A4,:G4]
b=[0.75,0.25,0.5,0.5,0.5,0.5,3.0,0.75,0.25,0.5,0.5,0.5,0.375,0.125,0.75,0.25,0.5,1.5,0.5,0.5,1.0,0.5,0.5,0.5,1.0,0.5,0.5,1.0,0.5,1.0,0.5,0.5,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,1.5,1.5,0.75,0.25,0.5,0.5,0.5,0.5,3.0,0.75,0.25,0.5,0.5,0.5,0.5,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.5,0.5,0.5,1.5,0.5,0.75,0.25,0.25,0.25,0.25,0.25,0.75,0.25,0.5,0.75,0.25,0.75,0.25,0.25,0.25,0.25,0.25,1.0,0.5,0.25,0.25,0.25,0.25,0.25,0.25,1.0,0.5,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.5,0.5,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.5,1.0,1.5,3.0,0.75,0.25,0.5,0.5,0.5,0.5,3.0,0.75,0.25,0.5,0.5,0.5,0.5,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,1.0,0.5,0.25,0.25,0.25,0.25,0.25,0.25,1.0,0.5,0.25,0.25,0.25,0.25,0.25,0.25,0.5,0.5,0.5,0.25,0.25,0.25,0.25,0.25,0.25,0.375,0.125,0.125,0.0625,0.0625,0.75,0.5,0.5,0.5,0.5,0.5,0.5,0.25,0.25,0.25,0.25,0.25,0.25,0.375,0.125,0.0625,0.0625,0.875,3.0]

def to_markov(a,chain = {})
  a.each_with_index do |n, i|
    chain[n] = [] if !chain.key?(n)
    if a[i+1]
      next_n = a[i+1]
      chain[n].push(next_n)
    end
  end
  chain
end

melody_chain = to_markov a
rhythm_chain = to_markov b

print melody_chain
print rhythm_chain

note = melody_chain.keys.choose
rest = rhythm_chain.keys.choose

with_fx :reverb, room: 0.8,mix: 0.6 do
  use_synth :tri
  live_loop :markov do
    play note, sustain: rest*0.9, release: rest*0.1
    sleep rest
    note = melody_chain[note].choose
    rest = rhythm_chain[rest].choose
  end
end

Here you have it. New music from an existing piece. This simple approach is sometimes called first order markov chain, which does not take account different movements and series of notes. You could also create n-order markov chains that would store more information and thus create new melody that is more true to the original piece.

5 Likes

Great! This is a very nice demonstration of how a style can be coded using Markov matrices.

As a teacher I can`t but add another educational bit:

This is an example of what is called machine learning. You have got samples of existing data (= the melody and rythm of the existing piece) and then your algorithm extracts the structure of the piece and represents it in a condensed form as a model (= Markov matrix/map). Then, using the model, you create new things that are similar to the data that have been used to learn the model.

Isn’t it fascinating how simple things can be?

2 Likes

:exploding_head: I didn’t have a practical application for Markov matrices till now, because I found them too involved, when a general approach could do the trick. You’ve got me thinking. Thanks for teaching!

My thinking is this: Markov chains provide a causal mechanism between musical ideas. What’s more, these chains can be combined!

I’ve also been thinking about genetic algorithms and “gene transfer”. I wonder how all these pieces might fit together!

I think so far we have seen two different possible applications:

First is to construct a Markov matrix in order to achieve a behaviour that is intended and therefore designed: an upwards movement of a melody, or, a chord progression that respects a given ruleset.

Second application is to imitate the style of existing stuff by means of machine learning. You do not explicitely construct the matrix but learn it.

These are very different approaches to achieve different goals. But both use Markov matrices as a tool.

I didn’t want to take this approach, because: 1) it takes more work to construct a good matrix than to generate something at random with a “good enough” heuristic, and 2) it reduces surprises.

This second application still reduces surprises, but can be automated. I can just keep generating ideas until I find some I like. Then I can develop a piece based on variations of the ideas.

I guess my definition of “practicality” is automatability and ease of use.

Chains can be combined as well as be more complex. For example you could create n-order chains that combine both the rhythm and takes account consecutive notes:

{
  [:C,0.5]=>[[:B,0.25],[:D,0.15]], 
  [[:B,0.25],[:D,0.15]]=>[[:F,0.25],[:C,0.15]] 
}
2 Likes

I understand that point of view. My personal approach is to follow the hard path and put a lot of effort into designing the matrix/map. Because when learning the matrix from data, you depend on the existence of previously created material, see @amiika 's example. And you always will follow the paths taken by your predecessors :wink:

1 Like

I think the more complex the chain is however, the more data is required to train it? I’ll be working with random data, so it may actually be the case that less is more for me. I might even want to reduce the model further to interval data as opposed to pitch and duration data. It’s very general, but more specific than the heuristic I’m currently using.

It might take a similar amount of time to take the easier path and to collect random ideas until I have a suitable gene pool :sweat_smile:

1 Like

Not necessarily more, at least for example 2-order chain with the rhythm included. More complexity with small melody means that it would just sound very similar to the original piece. When trained with multiple pieces from the same composer it would then start to mimic the style of the composer. There’s also companies who are training their sets and selling the results like http://aiva.ai. I bet they are just training their sets with massive amount of music from the same genres.

More is always more with the AI, but less is still sometimes better :slight_smile:

Also, it’s fairly common to use intervals and chromatic scale with the Markov matrix approach.

1 Like

Mhm, depending on the piece, that might actually be desirable. I’m considering complementing motifs with Markov chains, which would generate variants of musical ideas instead of repeating them exactly. Maybe I could train a model with some improv sessions and my personal compositions too!

Ooh, I remember AIVA! There’s also MuseNet by OpenAI. Really cool stuff!

I never thought I’d have the chance to work with AI. Excited!

Let’s get philosphical. If companies are training their models from existing pieces of music and sell their music generators to music creaters, and then the music creaters will create new music using those generators, and the companies will use this new music to further train their models, which they then sell to music creators, who … :scream:

Then basically creators will help train the model! It’s a symbiotic relationship between humans and the artificial music brain. Composers who are contemporaries of each other do it too when they share ideas.

Nobody works in a vacuum, so I think that these models will physically manifest what past composers meant when they said that ideas were “in the air”.

If machines would produce outputs that other machines use as inputs and so on, this would just be a neverending cycle of imitation without creation. This way there is no progress in ideas and style. That’s actually the main reason why I believe it is better to design a Markov matrix because that’s a modern way of being creative.

You are right, also humans tend to do a lot of imitation and not always create new things. But sometimes they do. And if it’s ‘in the air’, the new idea has already been created. It’s just not yet widely known and established.

Then the only thing preventing that kind of stasis is copyright :laughing: Music half a century ago was quite different from music now…

But that’s not the only way. I think what we would need is a small chance of mutation in the system. Less can be more. Nature makes coral reefs and rainforests even in places that are nutritionally poor. All with variations of the four nucleotides of DNA! Gradually and in punctuated bursts, the systems evolve and become ever more diverse, as nature finds a niche to accommodate differences. So too, maybe music could evolve organically, using an AI-human ecosystem as a medium.

1 Like

Funny, I had the same thought whether evolution would help. But it won’t. In nature there is THE GREAT SELECTOR that eliminates not so good mutations. Who or what is THE GREAT SELECTOR in the world of machines producing music? If this is another machine that has learned what is good and what is bad, then the process becomes stationary agian, because who or what would evolve that super machine?

Maybe the future role of the humans is to take that role? Listen to ever more of produced variations of music based on previous music? Then click the like button and tell the machines what they like? Maybe …