Why is array's sample method not random?

I’ve written a simple function to do Brownian motion on an array by producing an auxArray of transitions that are 50% 1’s and 50% -1’s. You can see from the prints of auxArray that it’s 50/50, but when I do auxArray.sample, it consistently returns twice as many -1’s as +1’s. I understand SPI is deterministic, so I run this function on various sets of data, but it consistently produces 67% -1’s. Does anyone know what’s wrong or know an alternative that will give me the 50/50 behavior I need?

define :BrownianTransition do |currIndex, vector|
  # Given a vector and a current index within it, update the index in a Brownian manner and return an array
  # consisting of the new index and the vector's value at that new index.
  auxArray = []
  
  vector.length.times {|idx|
    auxArray[idx * 2] = -1
    auxArray[(idx * 2) + 1] = 1
  }
  
  print("Brownian transition probabilities are", auxArray)
  
  brownianMove = auxArray.sample
  print("brownianMove is", brownianMove)

  return([(currIndex + brownianMove), vector[(currIndex + brownianMove) % vector.length]])
end

BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 2, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 3, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 4, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 5, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 6, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 7, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 8, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 2, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

I replaced your sample line with this one:

brownianMove = [-1, 1].choose

and then it worked. No idea, why auxArray.sample does not work as expected.

ADDITION: If you need different probabilities, this is a way to do it:

brownianMove = (knit -1, 3, 1, 5).choose

This will give you a 3:5 chance for -1 (=3/8 probability for -1)

1 Like

Ah yes, sample behaves oddly, based on my brief experience. If you need a non-repeating selection of more than 1 element (say 5), the workaround I’ve found is something like this:

brownianMove = auxArray.shuffle.take(5);

Otherwise, if you only need 1, then choose is the way to go, as @Nechoj has already suggested.

1 Like

Sure, [1, -1], but this is a toy case of the Markov chain transitions I’m working on, which generates more complex auxArrays in a similar manner. I did try the choose method instead of sample on my toy above, and choose generates exactly the same 2:1 proportion. By the way, I don’t see choose documented here.

Even with [1, -1].choose, I seem to be getting a 3:2 ratio on the data I’ve tested so far. In fact, on this data set, [1, -1].choose gives 18:12, but my auxArray.choose gives 16:14, which is acceptable:

BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 2, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 3, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 4, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 5, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 6, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 7, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 8, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 2, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 2, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 3, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 4, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 5, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 6, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 7, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 8, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7, 8]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7]
BrownianTransition 2, [0, 1, 2, 3, 4, 5, 6, 7]
BrownianTransition 1, [0, 1, 2, 3, 4, 5, 6, 7]

Using rrand, I get a 55:45 proportion, which is promising, I guess:

  brownianMove = rrand_i(0, 1)
  if(brownianMove == 0)
    brownianMove = -1
  end

Interesting, have you looked at the other types of noise available with the new use_random_source? It should be :white by default, so it’s odd that there’s an uneven distribution. That said, the probabilities are independent, so it’s within the realm of probability that you’ll get slight differences.

Also, have you tried using other random seeds to see if it might lean the other way? 2:3 instead of 3:2 for instance. That would indicate it’s just luck.

Given a much larger sample, things should average out.

Over here, I found this code:

t = Time.now.to_i
use_random_seed t

Doing that in conjunction with rrand_i does give different sequences, so it’s not deterministic—but the ratios are just as bad, like 60-67%. As pointed out, pseudo-randomness and repeatability have a musical value, but I’m concerned about melodies always ending up in the stratosphere, and things like that.

I think it’s just a particularity of the particular deterministic sequence of random numbers used by Sonic Pi. Changing the dataset would not affect the sequence of 1s and -1s (at least f the sequences of data are the same length). Instead you could change it by adding a call to use_random_seed 10000 (or some other number) at the start.

In my simultaneous post with yours, I did try use_random_seed, and results are not better. It’s odd that Sonic Pi’s random list doesn’t tend to 50/50 over the long run. There might be some other factor involved here.

You really need to look at large numbers. All following three possibilities give about 5000 ones

ones = 0
10000.times do
  if [-1, 1].choose == 1
    ones += 1
  end
end
puts ones
ones = 0
10000.times do
  if [-1, 1].sample == 1
    ones += 1
  end
end
puts ones
auxArray = []
20.times {|idx|
  auxArray[idx * 2] = -1
  auxArray[(idx * 2) + 1] = 1
}
ones = 0
10000.times do
  if auxArray.sample == 1
    ones += 1
  end
end
puts ones
2 Likes

That’s great to know, thanks for doing that. It does restore my faith.

I’m definitely getting consistent 60-67% unbalance at around 20 trials, so I wonder if the devs can tweak the randomization stream so that it doesn’t take so long to lose its bulge. I mean, even changing seeds reproduces the early segment bulge, which does cause a problem as my Markov melodies—whose matrices I purposely assemble for balance—bottom out too soon and throw an error when the note is out of playable range (MIDI < 0). I can have the melody return early if an error condition is detected, or I can adjust my Markov chains to SPI’s biases. I actually think adjusting to SPI’s quirks is fine because re-weighting and aborting early are themselves musical effects that can work out nicely when you know generally what to expect (i.e., exploiting the happy accidents). I wouldn’t complain, though, if they flattened out the bulge, or had some kind of randomness alternatives.

Statistical theory gives us the following estimate:

If p is the probability for a 1 and n is the number of retries, then sd=sqrt(p*(1-p)/n) is the standard error for the number of 1s (if the process is Markovian = independent retries).

That means, if p=0.5 and n=25, then sd=0.1. This gives you a 95% confidence interval of [0.5-2sd, 0.5+2sd] = [0.3, 0.7]. That means with a confidence of 95% the achieved proportion of 1’s will be within this interval when trying 25 times. You shouldn’t be surprised when you get a bias of 0.6 at this low number of retries.

When you do n=100 retries, then sd=0.05 and the 95% confidence interval is [0.4, 0.6].
When you do n=10000 retries, then sd=0.005 and the 95% confidence interval is [0.49, 0.51].

That’s called the Law of Large Numbers :wink:

OK, but it’s so consistently 60-67%. I guess I could do a large number of small number trials to see if I ever get something close to 50/50. But there’s another aspect I’ve forgotten to mention: the bulge is always in the same direction, so that’s weird, isn’t it?

Any updates? Here’s a large number of small trials:

leanFalse = 0;
leanTrue = 0;

(0...1000000).each do |i|
  use_random_seed(i);
  numTrues = 0;
  25.times do
    numTrues += 1 if one_in(2);
  end
  if (numTrues > 12)
    leanTrue += 1;
  else
    leanFalse += 1;
  end
end

puts("leanFalse: #{leanFalse}");
puts("leanTrue: #{leanTrue}");

Results:
leanFalse: 478156
leanTrue: 521844

That’s 52:48 over a million trials of 25 each.

Great, but I’ve never seen a single one that was better than 60/40—not one. How about this: assuming overall odds of 52% for true, what are the odds of getting 100 consecutive trials that are greater than 60% true?

Here, every time I hit 100 consecutive matches of your condition, I reset the counter, so as to count every hundred:

numChains = 0;
currentChain = 0;

numTrials = 25;
threshold = (numTrials * 0.6).to_i;

(0...1000000).each do |i|
  use_random_seed(i);
  numTrues = 0;
  numTrials.times do
    numTrues += 1 if one_in(2);
  end
  if (numTrues > threshold)
    currentChain += 1;
    if (currentChain == 100)
      numChains += 1;
      currentChain = 0;
    end
  else
    currentChain = 0;
  end
end

puts("numChains: #{numChains}");

Result: 0

The streaks you’re observing are probably all within the realm of probability.

It looks pretty balanced around 50% to me:

auxArray = []

10.times {|idx|
  auxArray[idx * 2] = -1
  auxArray[(idx * 2) + 1] = 1
}

define :checkBalance do |samples|
  plus = 0
  minus = 0
  samples.times do
    if auxArray.sample > 0 then
      plus += 1
    else
      minus += 1
    end
  end
  100.0 * plus / samples
end

20.times do
  puts "balance (%):", checkBalance(15)
end
{run: 14, time: 0.0}
 ├─ "balance (%):" 33.333333333333336
 ├─ "balance (%):" 46.666666666666664
 ├─ "balance (%):" 60.0
 ├─ "balance (%):" 53.333333333333336
 ├─ "balance (%):" 40.0
 ├─ "balance (%):" 40.0
 ├─ "balance (%):" 60.0
 ├─ "balance (%):" 60.0
 ├─ "balance (%):" 53.333333333333336
 ├─ "balance (%):" 53.333333333333336
 ├─ "balance (%):" 53.333333333333336
 ├─ "balance (%):" 66.66666666666667
 ├─ "balance (%):" 60.0
 ├─ "balance (%):" 40.0
 ├─ "balance (%):" 40.0
 ├─ "balance (%):" 60.0
 ├─ "balance (%):" 46.666666666666664
 ├─ "balance (%):" 40.0
 ├─ "balance (%):" 60.0
 └─ "balance (%):" 60.0
 

Here is a modification of @d0lfyn 's try with 1000000 samples of size 25. There is a statistics of each result and compared to the theoretical value (based on binomial numbers).

def binom(n,k)
  if k > 0 and n != k
    (1+n-k..n).inject(:*)/(1..k).inject(:*)
  else
    1
  end
end

results = (knit 0, 26).to_a
num_tries = 100000

(0...num_tries).each do |i|
  use_random_seed(i)
  numTrues = 0
  25.times do
    numTrues += 1 if one_in(2)
  end
  results[numTrues] += 1
end

results.each_with_index do |r, i|
  puts "#{i} #{r} #{(binom(25, i)*0.5**25*num_tries).to_i}"
end

The result is a list of 3 figures: the number of 1s achieved (0 … 25), how often this result was achieved during the run, and the theoretical number of such results.

Trying with 100000 samples of size 25 there is a good correspondence between measurement and theory. So, there is no issue with small sample sizes, if the seed is choosen randomly.

ADDITION: It works the same way when you omit the use_random_seed line. It’s not needed to proove that small samples are ok

1 Like

Pardon my math, but I don’t follow you here. I’m asking if the chances of true are 52% on every trial, what are the odds (1 in how many) that 100 consecutive trials will be more than 60% true? When you post “Result: 0”, I don’t know what you’re referring to.

For example, a parallel with slightly different numbers: the chances of a coin flip coming up heads is 50%. The odds of getting heads 10 times in a row is 1 in 1024–I know how to calculate that because it’s easy. So then, if I do 25-flip trials, what are the odds that 100 consecutive of these 25-flip trials produce 15 or more heads? I don’t know how to do that (without a good bit of studying).

To complete the analogy, say I have a bad coin that comes up heads 52% of the time. What are the odds that 100 consecutive 25-flip trials produce 15 or more heads? I don’t know how to do that.

How is the seed chosen in each of those trials? If the seed is random and independent in each trial, that would be a very low probability, but if the seed is the same each time, or highly correlated (e.g. Time.now.to_i) then it could be pretty likely.

1 Like