Choosing random ints, favouring large/small

For my current project, I’ve decided to improve one of my random int generators, which is supposed to favour small or large numbers within a given array. I googled and found out that a proper solution would involve a cumulative distribution function for probabilities, so I tried to build one myself. I’m no math wiz, but the results of my tests have been promising (I’ve included my test below). I’m looking to build an efficient and effective solution, so any suggestions would be appreciated! Also, please feel free to use this code if it’s helpful!

Where pInts is an array of integers, and pWeight is a float between [-1,1] (inclusive) telling the function how heavily it should favour large values (positive) or small values (negative):

define :chooseFavouredAbsInt do |pInts, pWeight|
  sortedUniqAbsValues = pInts.map { |v| v.abs }.uniq.sort
  if (pWeight.zero? || (sortedUniqAbsValues.length < 2))
    return pInts.choose
  else
    minInt = sortedUniqAbsValues.first
    maxInt = sortedUniqAbsValues.last
    refInts = (minInt..maxInt).to_a
    diffs = refInts.map { |v| (v - minInt) }
    partialSum = 0
    partialSums = diffs.take(diffs.length)
    diffs.each_index do |i|
      partialSum += diffs[i]
      partialSums[i] = partialSum
    end
    probabilities = partialSums.map { |ps| ((ps * pWeight.abs) / partialSums.last.to_f) }
    
    if (pWeight > 0)
      pool = pInts.select { |i| evalChance?(probabilities[refInts.index(i.abs)]) }
      while pool.empty?
        pool = pInts.select { |i| evalChance?(probabilities[refInts.index(i.abs)]) }
      end
      
      return pool.choose
    else
      pool = pInts.select { |i| evalChance?(1 - probabilities[refInts.index(i.abs)]) }
      while pool.empty?
        pool = pInts.select { |i| evalChance?(1 - probabilities[refInts.index(i.abs)]) }
      end
      
      return pool.choose
    end
  end
end

The above code depends on this little function, which returns a boolean by evaluating a given chance (a float between [0,1] inclusive):

define :evalChance? do |pChance|
  return ((pChance >= 1) || ((pChance > 0) && (rand() < pChance)))
end

I’ve tested my code with the following:

randomSeed = Time.new.to_i
testInts = (0...10).to_a
trials = 100000

successes = 0
use_random_seed(randomSeed)
trials.times do
  successes += 1 if (testInts.choose >= 5)
end
puts("random choose >= 5: #{successes.to_s}")

successes = 0
use_random_seed(randomSeed)
trials.times do
  successes += 1 if (chooseFavouredAbsInt(testInts, 0.8) >= 5)
end
puts("favoured choose >= 5: #{successes.to_s}")

successes = 0
use_random_seed(randomSeed)
trials.times do
  successes += 1 if (testInts.choose < 5)
end
puts("random choose < 5: #{successes.to_s}")

successes = 0
use_random_seed(randomSeed)
trials.times do
  successes += 1 if (chooseFavouredAbsInt(testInts, -0.8) < 5)
end
puts("favoured choose < 5: #{successes.to_s}")

The results are what I would expect. Something like this:
├─ “random choose >= 5: 49674”
├─ “favoured choose >= 5: 88853”
├─ “random choose < 5: 50326”
└─ “favoured choose < 5: 65728”

I have looked into the math side of the problem and suggest to use basic functions that produce weighted random numbers. The following functions behave like rrand and rrand_i except that they have a third parameter weight that you can choose between -1 and +1.

define :rrand2 do |min, max, w|
  if w < -1 or w > 1
    return nil
  end
  
  if w == 0
    r = rrand(min, max)
  else
    r = 0.5*(w + Math.sqrt(w**2 + 4*w*rrand(0, 1) - 2*w + 1) - 1)/w.to_f
    r = (max - min)*r + min
  end
  r
end

define :rrand2_i do |min, max, w|
  rrand2(min, max+1, w).to_i
end

From there you can easily build functions that select elements of an array by means of random indices produced by rrand2_i. Just test it out whether this fits your problem.

1 Like

Wow, so 5 arrays aren’t necessary at all, because that frightening quadratic equation with the weight can do the trick! Thank you, I couldn’t have figured that out myself (it’s pretty much basic arithmetic for me) :sweat_smile: I’d like to use your solution with one small simplification (rand == rrand(0,1) if I’m not mistaken), and also I’m going to change the variable names to my preference. I’ll build a choose function on top of it.

Would you like to be attributed in my source code? If so, do you have a contact you’d like me to add? At the very least, I’ll reference your post. (# solution adapted from Jochen Hertle: https://in-thread.sonic-pi.net/t/choosing-random-ints-favouring-large-small/5234/2?u=d0lfyn ) This will soon be a project on GitHub, and I think I’ll set up a CONTRIBUTORS.md just like the Sonic Pi project.

I love this thread: I’ve been wanting to do something pretty much like this, but bitwise. I hope I can adapt what’s posted here.

1 Like

Thanks for the offer, if you add a link to the post that’s very kind, thanks. Of course, everything I posted here is provided for free usage anyhow.

If I understand you correctly you would like to rename rrand(0, 1) to rand because it’s shorter, right?

Yep! Also, it’s kind of unnecessary, but I prefer to spell out my names so I don’t have to remember what they stand for :slightly_smiling_face:

Here’s what my adaptation of your solution looks like. I’ve broken it down into the purely rand component, so that its usage matches Sonic Pi’s rand:

# solution adapted from Jochen Hertle: https://in-thread.sonic-pi.net/t/choosing-random-ints-favouring-large-small/5234/2?u=d0lfyn
define :randWithWeight do |pWt, pMax = 1|
  if ((pWt < -1) || (pWt > 1))
    return nil
  elsif pWt.zero?
    return rand(pMax)
  else
    return (pMax * (0.5 * (pWt + Math.sqrt(pWt**2 + 4*pWt*rand() - 2*pWt + 1) - 1) / pWt.to_f))
  end
end

rrand follows easily:

define :rrandWithWeight do |pWt, pMin, pMax|
  return (randWithWeight(pWt, (pMax - pMin)) + pMin)
end
1 Like

In case somebody would like to understand what happened here:

The rand function has got a uniform distribution, i.e. the propability density function is a horizontal line above the interval [0, 1]. The parameter ‘weight’ turns this into a straight line with gradient 2*weight, symmetric over the interval [0, 1]. Now, to generate random numbers for this new probability density function, we need to calculate the propability distribution function, which is the integral over the density. For lines with a gradient, this gives us a quadratic function. The inverse of the distribution function is called the quantile function, and in our case this is the sqrt function you can see inside the above code. When we put uniformly distributed random numbers generated by rand into this sqrt quantile function, the result are random numbers according to the new linear density function with a gradient.

1 Like

EDIT: minAbsInt and maxAbsInt weren’t abs before.

I’ve coded, tested, and benchmarked the following choose function based on @Nechoj’s solution (many thanks!), comparing it to my initial solution to see how they performed relative to each other. First, some observations:

  • my initial solution is skewed and also underperforms extremely when the weight is small
  • under optimal conditions however (i.e. a large weight and a large sample size) and with a few fixes (I can share the code if anyone is interested), my initial solution can outperform the new solution
  • that said, the new solution performs consistently under all conditions I’ve tested (i.e. small and large weights, small and larger sample sizes, and samples with negative values)

With that, here’s the code along with the functions it depends upon:

define :chooseAbsIntWithWeight do |pWt, pInts|
  if (pWt.zero? || (pInts.length < 2))
    return pInts.choose
  else
    minAbsInt = pInts.min { |a, b| (a.abs <=> b.abs) }.abs
    maxAbsInt = pInts.max { |a, b| (a.abs <=> b.abs) }.abs
    weightedPool = []
    while weightedPool.empty?
      if (pWt < 0)
        weightedPool = pInts.select { |i| (i.abs <= rrandIWithWeight(pWt, minAbsInt, maxAbsInt)) }
      else
        weightedPool = pInts.select { |i| (i.abs >= rrandIWithWeight(pWt, minAbsInt, maxAbsInt)) }
      end
    end
    
    return weightedPool.choose
  end
end

define :randIWithWeight do |pWt, pMax = 1|
  return randWithWeight(pWt, (pMax + 1)).to_i
end

# solution adapted from Jochen Hertle: https://in-thread.sonic-pi.net/t/choosing-random-ints-favouring-large-small/5234/2?u=d0lfyn
define :randWithWeight do |pWt, pMax = 1|
  if ((pWt < -1) || (pWt > 1))
    return nil
  elsif pWt.zero?
    return rand(pMax)
  else
    return (pMax * (0.5 * (pWt + Math.sqrt(pWt**2 + 4*pWt*rand() - 2*pWt + 1) - 1) / pWt.to_f))
  end
end

define :rrandIWithWeight do |pWt, pMin, pMax|
  return (randIWithWeight(pWt, (pMax - pMin)) + pMin)
end
r=[1,2,2,3,3,3,4,4,4,4]
weighted = choose(r)
1 Like

:exploding_head: I need to test this!

1 Like

Regarding performance, it is always faster to stay with integers and avoid complex float operations. If you just need integers, a targeted solution for the specific problem is certainly fast. If you need a generic solution, special versions of rand, rrrand, … are the way to go because those can be used in any application.

If software needs fast float computations, often the problem is translated into a discrete computation problem, also with pre-computed tables for e.g. sqrt. This would be the way forward when integrating such functionality into the SPI core and make it really fast.

1 Like

If you really want to go down that route, just imaging how your code would look like if you wanted two integers 1 and 2, where the probabilities are p(1) = 1.0001 and p(2) = 0.9999

Yeah, I’m not sure how to implement this with weights. If the weight were 0.99999 for instance, I think I would need a lot of ints.

I appreciate the suggestion though, it’s attractive for its seeming simplicity.

And if you stay with very even probabilities such as 1/4, 3/4, 1/5 etc, then @hitsware s solution is good enough. Just like this:

(knit 1, 5, 2, 4, 3, 13, 4, 5).choose()
1 Like

" I need to test this! "

w=[1,2,2,3,3,3,4,4,4,4]

one = 1
two = 1
three = 1
four = 1

for i in 0..63
  a = choose(w)
  if a == 1 then one = one + 1; end
  if a == 2 then two = two + 1; end
  if a == 3 then three = three + 1; end
  if a == 4 then four = four + 1; end
end

puts "one   =", one
puts "two   =", two
puts "three =", three
puts "four  =" , four
2 Likes

Thanks! Now to turn this into a weighted function to compare to the current solution… because I think I like the approach, but I realise ahead of time that an irrational weight would complicate things.

Absolutely. Just imagine p(1) = 1/Pi (Pi as in Sonic Pi)

1 Like

Haha, when I first started I wondered whether the random source was connected to Pi in some way!

@Nechoj I’m doing my best here, but I’m rusty as hell. Can you elaborate a little (in words) what rrand2() does? What I understand is this:

It returns a random number in the range, but it biases the choice to the left or right by the weighting factor. If that’s correct, what I’m unable to get a feel for is how much influence the weighting factor has. Because rrand() is pseudo-random and SPI is deterministic, I can’t simply run a whole bunch of trials manually to get a sense of the kind of results rrand2 produces (right?), so I wonder if you can say something to shed a bit more light on that.

@d0lfyn d0lfyn I do appreciate more descriptive labels. Are you aiming for some specific functional refinement, or is your adapting effort strictly a readability thing?

If you’d like to do that, here’s my complete testing code to get you started! You can take it apart to make things easier for manual testing. Note that I’m testing the custom choose function I built.

require 'benchmark'

define :chooseAbsIntWithWeight do |pWt, pInts|
  if (pWt.zero? || (pInts.length < 2))
    return pInts.choose
  else
    minAbsInt = pInts.min { |i0, i1| (i0.abs <=> i1.abs) }.abs
    maxAbsInt = pInts.max { |i0, i1| (i0.abs <=> i1.abs) }.abs
    weightedPool = []
    while weightedPool.empty?
      if (pWt < 0)
        weightedPool = pInts.select { |i| (i.abs <= rrandIWithWeight(pWt, minAbsInt, maxAbsInt)) }
      else
        weightedPool = pInts.select { |i| (i.abs >= rrandIWithWeight(pWt, minAbsInt, maxAbsInt)) }
      end
    end
    
    return weightedPool.choose
  end
end

define :randIWithWeight do |pWt, pMax = 1|
  return randWithWeight(pWt, (pMax + 1)).to_i
end

# solution adapted from Jochen Hertle: https://in-thread.sonic-pi.net/t/choosing-random-ints-favouring-large-small/5234/2?u=d0lfyn
define :randWithWeight do |pWt, pMax = 1|
  if ((pWt < -1) || (pWt > 1))
    return nil
  elsif pWt.zero?
    return rand(pMax)
  else
    return (pMax * (0.5 * (pWt + Math.sqrt(pWt**2 + 4*pWt*rand() - 2*pWt + 1) - 1) / pWt.to_f))
  end
end

define :rrandIWithWeight do |pWt, pMin, pMax|
  return (randIWithWeight(pWt, (pMax - pMin)) + pMin)
end

define :rrandWithWeight do |pWt, pMin, pMax|
  return (randWithWeight(pWt, (pMax - pMin)) + pMin)
end

randomSeed = Time.new.to_i
fulcrum = 2
testInts = (-3...7).to_a
trials = 100000
weight = 0.8

results = Benchmark.bm do |x|
  x.report("random") do
    successes = 0
    use_random_seed(randomSeed)
    trials.times do
      successes += 1 if (testInts.choose >= fulcrum)
    end
    puts("random choose, >= #{fulcrum.to_s}: #{successes.to_s}")
  end
  
  x.report("weighted") do
    successes = 0
    use_random_seed(randomSeed)
    trials.times do
      successes += 1 if (chooseAbsIntWithWeight(weight, testInts) >= fulcrum)
    end
    puts("weighted choose, >= #{fulcrum.to_s}: #{successes.to_s}")
  end
  
  x.report("random") do
    successes = 0
    use_random_seed(randomSeed)
    trials.times do
      successes += 1 if (testInts.choose < fulcrum)
    end
    puts("random choose, < #{fulcrum.to_s}: #{successes.to_s}")
  end
  
  x.report("weighted") do
    successes = 0
    use_random_seed(randomSeed)
    trials.times do
      successes += 1 if (chooseAbsIntWithWeight(-weight, testInts) < fulcrum)
    end
    puts("weighted choose, < #{fulcrum.to_s}: #{successes.to_s}")
  end
end

results.each do |result|
  puts result
end

In the above you’ll find a benchmark kit along with code for printing the results of running a specific trial a number of times. Here’s some sample data you may find useful:

The following configuration produces the following results:

randomSeed = 0
fulcrum = 2
testInts = (-3...7).to_a
trials = 100000
weight = 0.2

├─ “random choose, >= 2: 49699”
├─ “weighted choose, >= 2: 70859”
├─ “random choose, < 2: 50301”
├─ “weighted choose, < 2: 68066”
├─ #<Benchmark::Tms:0x000000001051e130 @label=“random”, @real=0.12473049998516217, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.125, @total=0.125>
├─ #<Benchmark::Tms:0x000000001a6ccb80 @label=“weighted”, @real=2.6765335999953095, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.6720000000000255, @total=2.6720000000000255>
├─ #<Benchmark::Tms:0x000000001a6cc1f8 @label=“random”, @real=0.12538829998811707, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.125, @total=0.125>
└─ #<Benchmark::Tms:0x00000000186adf48 @label=“weighted”, @real=2.682990199973574, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.6719999999999686, @total=2.6719999999999686>

Whereas the exact same configuration with a weight of 0.8 yields this:

├─ “random choose, >= 2: 49699”
├─ “weighted choose, >= 2: 78443”
├─ “random choose, < 2: 50301”
├─ “weighted choose, < 2: 74023”
├─ #<Benchmark::Tms:0x000000001051da28 @label=“random”, @real=0.12240550000569783, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.125, @total=0.125>
├─ #<Benchmark::Tms:0x000000001a6cd4b8 @label=“weighted”, @real=2.6702070999890566, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.6719999999999686, @total=2.6719999999999686>
├─ #<Benchmark::Tms:0x000000001a6ccbf8 @label=“random”, @real=0.12392630000249483, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.125, @total=0.125>
└─ #<Benchmark::Tms:0x00000000186afd20 @label=“weighted”, @real=2.682374300027732, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.6720000000000255, @total=2.6720000000000255>

As you can see, the weighting isn’t actually very strong. I might revisit my custom function because it yielded stronger results, if I’m able to balance it properly. I’d like something more linear perhaps.

I try to have each function do one basic thing, developed from fundamentals, which aids re-use, extension, maintenance, and readability. I think I read somewhere in Steve McConnell’s Code Complete that modularity should reduce the cognitive load of working with a piece of code, by reducing the amount of context a coder needs to be aware of. For me, the rand function adaptation is an easily understood black box I can use as I see fit.

Re: labeling. Yeah, I want to know at a glance what my code does without having to read documentation and work through the logic again. As a result however, I get some rather long descriptions like define :eliminateOverlapExceptSpacesWithHypotheses do |pPositionsAvailable, pHypotheses, pAllActiveSyntheses| and variable names like chanceCreateNoConsecutivelyRepeatedDisplacements. With a good text editor (i.e. Atom), I deal with such long variable names as semantic chunks. Cf. how we read words and not individual letters. In the same way, I (try to) read chunks and not words.