Choosing random ints, favouring large/small

d0lfyn · March 9, 2021, 10:56am

For my current project, I’ve decided to improve one of my random int generators, which is supposed to favour small or large numbers within a given array. I googled and found out that a proper solution would involve a cumulative distribution function for probabilities, so I tried to build one myself. I’m no math wiz, but the results of my tests have been promising (I’ve included my test below). I’m looking to build an efficient and effective solution, so any suggestions would be appreciated! Also, please feel free to use this code if it’s helpful!

Where pInts is an array of integers, and pWeight is a float between [-1,1] (inclusive) telling the function how heavily it should favour large values (positive) or small values (negative):

define :chooseFavouredAbsInt do |pInts, pWeight|
  sortedUniqAbsValues = pInts.map { |v| v.abs }.uniq.sort
  if (pWeight.zero? || (sortedUniqAbsValues.length < 2))
    return pInts.choose
  else
    minInt = sortedUniqAbsValues.first
    maxInt = sortedUniqAbsValues.last
    refInts = (minInt..maxInt).to_a
    diffs = refInts.map { |v| (v - minInt) }
    partialSum = 0
    partialSums = diffs.take(diffs.length)
    diffs.each_index do |i|
      partialSum += diffs[i]
      partialSums[i] = partialSum
    end
    probabilities = partialSums.map { |ps| ((ps * pWeight.abs) / partialSums.last.to_f) }
    
    if (pWeight > 0)
      pool = pInts.select { |i| evalChance?(probabilities[refInts.index(i.abs)]) }
      while pool.empty?
        pool = pInts.select { |i| evalChance?(probabilities[refInts.index(i.abs)]) }
      end
      
      return pool.choose
    else
      pool = pInts.select { |i| evalChance?(1 - probabilities[refInts.index(i.abs)]) }
      while pool.empty?
        pool = pInts.select { |i| evalChance?(1 - probabilities[refInts.index(i.abs)]) }
      end
      
      return pool.choose
    end
  end
end

The above code depends on this little function, which returns a boolean by evaluating a given chance (a float between [0,1] inclusive):

define :evalChance? do |pChance|
  return ((pChance >= 1) || ((pChance > 0) && (rand() < pChance)))
end

I’ve tested my code with the following:

randomSeed = Time.new.to_i
testInts = (0...10).to_a
trials = 100000

successes = 0
use_random_seed(randomSeed)
trials.times do
  successes += 1 if (testInts.choose >= 5)
end
puts("random choose >= 5: #{successes.to_s}")

successes = 0
use_random_seed(randomSeed)
trials.times do
  successes += 1 if (chooseFavouredAbsInt(testInts, 0.8) >= 5)
end
puts("favoured choose >= 5: #{successes.to_s}")

successes = 0
use_random_seed(randomSeed)
trials.times do
  successes += 1 if (testInts.choose < 5)
end
puts("random choose < 5: #{successes.to_s}")

successes = 0
use_random_seed(randomSeed)
trials.times do
  successes += 1 if (chooseFavouredAbsInt(testInts, -0.8) < 5)
end
puts("favoured choose < 5: #{successes.to_s}")

The results are what I would expect. Something like this:
├─ “random choose >= 5: 49674”
├─ “favoured choose >= 5: 88853”
├─ “random choose < 5: 50326”
└─ “favoured choose < 5: 65728”

Nechoj · March 9, 2021, 2:26pm

I have looked into the math side of the problem and suggest to use basic functions that produce weighted random numbers. The following functions behave like rrand and rrand_i except that they have a third parameter weight that you can choose between -1 and +1.

define :rrand2 do |min, max, w|
  if w < -1 or w > 1
    return nil
  end
  
  if w == 0
    r = rrand(min, max)
  else
    r = 0.5*(w + Math.sqrt(w**2 + 4*w*rrand(0, 1) - 2*w + 1) - 1)/w.to_f
    r = (max - min)*r + min
  end
  r
end

define :rrand2_i do |min, max, w|
  rrand2(min, max+1, w).to_i
end

From there you can easily build functions that select elements of an array by means of random indices produced by rrand2_i. Just test it out whether this fits your problem.

d0lfyn · March 9, 2021, 3:37pm

Wow, so 5 arrays aren’t necessary at all, because that frightening quadratic equation with the weight can do the trick! Thank you, I couldn’t have figured that out myself (it’s pretty much basic arithmetic for me) I’d like to use your solution with one small simplification (rand == rrand(0,1) if I’m not mistaken), and also I’m going to change the variable names to my preference. I’ll build a choose function on top of it.

Would you like to be attributed in my source code? If so, do you have a contact you’d like me to add? At the very least, I’ll reference your post. (# solution adapted from Jochen Hertle: https://in-thread.sonic-pi.net/t/choosing-random-ints-favouring-large-small/5234/2?u=d0lfyn ) This will soon be a project on GitHub, and I think I’ll set up a CONTRIBUTORS.md just like the Sonic Pi project.

blipson · March 9, 2021, 3:58pm

I love this thread: I’ve been wanting to do something pretty much like this, but bitwise. I hope I can adapt what’s posted here.

Nechoj · March 9, 2021, 4:14pm

Thanks for the offer, if you add a link to the post that’s very kind, thanks. Of course, everything I posted here is provided for free usage anyhow.

If I understand you correctly you would like to rename rrand(0, 1) to rand because it’s shorter, right?

d0lfyn · March 9, 2021, 4:25pm

Yep! Also, it’s kind of unnecessary, but I prefer to spell out my names so I don’t have to remember what they stand for

Here’s what my adaptation of your solution looks like. I’ve broken it down into the purely rand component, so that its usage matches Sonic Pi’s rand:

# solution adapted from Jochen Hertle: https://in-thread.sonic-pi.net/t/choosing-random-ints-favouring-large-small/5234/2?u=d0lfyn
define :randWithWeight do |pWt, pMax = 1|
  if ((pWt < -1) || (pWt > 1))
    return nil
  elsif pWt.zero?
    return rand(pMax)
  else
    return (pMax * (0.5 * (pWt + Math.sqrt(pWt**2 + 4*pWt*rand() - 2*pWt + 1) - 1) / pWt.to_f))
  end
end

rrand follows easily:

define :rrandWithWeight do |pWt, pMin, pMax|
  return (randWithWeight(pWt, (pMax - pMin)) + pMin)
end

Nechoj · March 9, 2021, 4:52pm

In case somebody would like to understand what happened here:

The rand function has got a uniform distribution, i.e. the propability density function is a horizontal line above the interval [0, 1]. The parameter ‘weight’ turns this into a straight line with gradient 2*weight, symmetric over the interval [0, 1]. Now, to generate random numbers for this new probability density function, we need to calculate the propability distribution function, which is the integral over the density. For lines with a gradient, this gives us a quadratic function. The inverse of the distribution function is called the quantile function, and in our case this is the sqrt function you can see inside the above code. When we put uniformly distributed random numbers generated by rand into this sqrt quantile function, the result are random numbers according to the new linear density function with a gradient.

d0lfyn · March 9, 2021, 7:35pm

EDIT: minAbsInt and maxAbsInt weren’t abs before.

I’ve coded, tested, and benchmarked the following choose function based on @Nechoj’s solution (many thanks!), comparing it to my initial solution to see how they performed relative to each other. First, some observations:

my initial solution is skewed and also underperforms extremely when the weight is small
under optimal conditions however (i.e. a large weight and a large sample size) and with a few fixes (I can share the code if anyone is interested), my initial solution can outperform the new solution
that said, the new solution performs consistently under all conditions I’ve tested (i.e. small and large weights, small and larger sample sizes, and samples with negative values)

With that, here’s the code along with the functions it depends upon:

define :chooseAbsIntWithWeight do |pWt, pInts|
  if (pWt.zero? || (pInts.length < 2))
    return pInts.choose
  else
    minAbsInt = pInts.min { |a, b| (a.abs <=> b.abs) }.abs
    maxAbsInt = pInts.max { |a, b| (a.abs <=> b.abs) }.abs
    weightedPool = []
    while weightedPool.empty?
      if (pWt < 0)
        weightedPool = pInts.select { |i| (i.abs <= rrandIWithWeight(pWt, minAbsInt, maxAbsInt)) }
      else
        weightedPool = pInts.select { |i| (i.abs >= rrandIWithWeight(pWt, minAbsInt, maxAbsInt)) }
      end
    end
    
    return weightedPool.choose
  end
end

define :randIWithWeight do |pWt, pMax = 1|
  return randWithWeight(pWt, (pMax + 1)).to_i
end

# solution adapted from Jochen Hertle: https://in-thread.sonic-pi.net/t/choosing-random-ints-favouring-large-small/5234/2?u=d0lfyn
define :randWithWeight do |pWt, pMax = 1|
  if ((pWt < -1) || (pWt > 1))
    return nil
  elsif pWt.zero?
    return rand(pMax)
  else
    return (pMax * (0.5 * (pWt + Math.sqrt(pWt**2 + 4*pWt*rand() - 2*pWt + 1) - 1) / pWt.to_f))
  end
end

define :rrandIWithWeight do |pWt, pMin, pMax|
  return (randIWithWeight(pWt, (pMax - pMin)) + pMin)
end

hitsware · March 9, 2021, 7:50pm

r=[1,2,2,3,3,3,4,4,4,4]
weighted = choose(r)

d0lfyn · March 9, 2021, 7:54pm

I need to test this!

Nechoj · March 9, 2021, 7:56pm

Regarding performance, it is always faster to stay with integers and avoid complex float operations. If you just need integers, a targeted solution for the specific problem is certainly fast. If you need a generic solution, special versions of rand, rrrand, … are the way to go because those can be used in any application.

If software needs fast float computations, often the problem is translated into a discrete computation problem, also with pre-computed tables for e.g. sqrt. This would be the way forward when integrating such functionality into the SPI core and make it really fast.

Nechoj · March 9, 2021, 8:01pm

If you really want to go down that route, just imaging how your code would look like if you wanted two integers 1 and 2, where the probabilities are p(1) = 1.0001 and p(2) = 0.9999

d0lfyn · March 9, 2021, 8:06pm

Yeah, I’m not sure how to implement this with weights. If the weight were 0.99999 for instance, I think I would need a lot of ints.

I appreciate the suggestion though, it’s attractive for its seeming simplicity.

Nechoj · March 9, 2021, 8:09pm

And if you stay with very even probabilities such as 1/4, 3/4, 1/5 etc, then @hitsware s solution is good enough. Just like this:

(knit 1, 5, 2, 4, 3, 13, 4, 5).choose()

hitsware · March 9, 2021, 8:21pm

" I need to test this! "

w=[1,2,2,3,3,3,4,4,4,4]

one = 1
two = 1
three = 1
four = 1

for i in 0..63
  a = choose(w)
  if a == 1 then one = one + 1; end
  if a == 2 then two = two + 1; end
  if a == 3 then three = three + 1; end
  if a == 4 then four = four + 1; end
end

puts "one   =", one
puts "two   =", two
puts "three =", three
puts "four  =" , four

d0lfyn · March 9, 2021, 8:36pm

Thanks! Now to turn this into a weighted function to compare to the current solution… because I think I like the approach, but I realise ahead of time that an irrational weight would complicate things.

Nechoj · March 9, 2021, 8:37pm

Absolutely. Just imagine p(1) = 1/Pi (Pi as in Sonic Pi)

d0lfyn · March 9, 2021, 8:54pm

Haha, when I first started I wondered whether the random source was connected to Pi in some way!

blipson · March 10, 2021, 8:52am

@Nechoj I’m doing my best here, but I’m rusty as hell. Can you elaborate a little (in words) what rrand2() does? What I understand is this:

It returns a random number in the range, but it biases the choice to the left or right by the weighting factor. If that’s correct, what I’m unable to get a feel for is how much influence the weighting factor has. Because rrand() is pseudo-random and SPI is deterministic, I can’t simply run a whole bunch of trials manually to get a sense of the kind of results rrand2 produces (right?), so I wonder if you can say something to shed a bit more light on that.

@d0lfyn d0lfyn I do appreciate more descriptive labels. Are you aiming for some specific functional refinement, or is your adapting effort strictly a readability thing?

d0lfyn · March 10, 2021, 10:03am

If you’d like to do that, here’s my complete testing code to get you started! You can take it apart to make things easier for manual testing. Note that I’m testing the custom choose function I built.

require 'benchmark'

define :chooseAbsIntWithWeight do |pWt, pInts|
  if (pWt.zero? || (pInts.length < 2))
    return pInts.choose
  else
    minAbsInt = pInts.min { |i0, i1| (i0.abs <=> i1.abs) }.abs
    maxAbsInt = pInts.max { |i0, i1| (i0.abs <=> i1.abs) }.abs
    weightedPool = []
    while weightedPool.empty?
      if (pWt < 0)
        weightedPool = pInts.select { |i| (i.abs <= rrandIWithWeight(pWt, minAbsInt, maxAbsInt)) }
      else
        weightedPool = pInts.select { |i| (i.abs >= rrandIWithWeight(pWt, minAbsInt, maxAbsInt)) }
      end
    end
    
    return weightedPool.choose
  end
end

define :randIWithWeight do |pWt, pMax = 1|
  return randWithWeight(pWt, (pMax + 1)).to_i
end

# solution adapted from Jochen Hertle: https://in-thread.sonic-pi.net/t/choosing-random-ints-favouring-large-small/5234/2?u=d0lfyn
define :randWithWeight do |pWt, pMax = 1|
  if ((pWt < -1) || (pWt > 1))
    return nil
  elsif pWt.zero?
    return rand(pMax)
  else
    return (pMax * (0.5 * (pWt + Math.sqrt(pWt**2 + 4*pWt*rand() - 2*pWt + 1) - 1) / pWt.to_f))
  end
end

define :rrandIWithWeight do |pWt, pMin, pMax|
  return (randIWithWeight(pWt, (pMax - pMin)) + pMin)
end

define :rrandWithWeight do |pWt, pMin, pMax|
  return (randWithWeight(pWt, (pMax - pMin)) + pMin)
end

randomSeed = Time.new.to_i
fulcrum = 2
testInts = (-3...7).to_a
trials = 100000
weight = 0.8

results = Benchmark.bm do |x|
  x.report("random") do
    successes = 0
    use_random_seed(randomSeed)
    trials.times do
      successes += 1 if (testInts.choose >= fulcrum)
    end
    puts("random choose, >= #{fulcrum.to_s}: #{successes.to_s}")
  end
  
  x.report("weighted") do
    successes = 0
    use_random_seed(randomSeed)
    trials.times do
      successes += 1 if (chooseAbsIntWithWeight(weight, testInts) >= fulcrum)
    end
    puts("weighted choose, >= #{fulcrum.to_s}: #{successes.to_s}")
  end
  
  x.report("random") do
    successes = 0
    use_random_seed(randomSeed)
    trials.times do
      successes += 1 if (testInts.choose < fulcrum)
    end
    puts("random choose, < #{fulcrum.to_s}: #{successes.to_s}")
  end
  
  x.report("weighted") do
    successes = 0
    use_random_seed(randomSeed)
    trials.times do
      successes += 1 if (chooseAbsIntWithWeight(-weight, testInts) < fulcrum)
    end
    puts("weighted choose, < #{fulcrum.to_s}: #{successes.to_s}")
  end
end

results.each do |result|
  puts result
end

In the above you’ll find a benchmark kit along with code for printing the results of running a specific trial a number of times. Here’s some sample data you may find useful:

The following configuration produces the following results:

randomSeed = 0
fulcrum = 2
testInts = (-3...7).to_a
trials = 100000
weight = 0.2

├─ “random choose, >= 2: 49699”
├─ “weighted choose, >= 2: 70859”
├─ “random choose, < 2: 50301”
├─ “weighted choose, < 2: 68066”
├─ #<Benchmark::Tms:0x000000001051e130 @label=“random”, @real=0.12473049998516217, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.125, @total=0.125>
├─ #<Benchmark::Tms:0x000000001a6ccb80 @label=“weighted”, @real=2.6765335999953095, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.6720000000000255, @total=2.6720000000000255>
├─ #<Benchmark::Tms:0x000000001a6cc1f8 @label=“random”, @real=0.12538829998811707, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.125, @total=0.125>
└─ #<Benchmark::Tms:0x00000000186adf48 @label=“weighted”, @real=2.682990199973574, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.6719999999999686, @total=2.6719999999999686>

Whereas the exact same configuration with a weight of 0.8 yields this:

├─ “random choose, >= 2: 49699”
├─ “weighted choose, >= 2: 78443”
├─ “random choose, < 2: 50301”
├─ “weighted choose, < 2: 74023”
├─ #<Benchmark::Tms:0x000000001051da28 @label=“random”, @real=0.12240550000569783, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.125, @total=0.125>
├─ #<Benchmark::Tms:0x000000001a6cd4b8 @label=“weighted”, @real=2.6702070999890566, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.6719999999999686, @total=2.6719999999999686>
├─ #<Benchmark::Tms:0x000000001a6ccbf8 @label=“random”, @real=0.12392630000249483, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.125, @total=0.125>
└─ #<Benchmark::Tms:0x00000000186afd20 @label=“weighted”, @real=2.682374300027732, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=2.6720000000000255, @total=2.6720000000000255>

As you can see, the weighting isn’t actually very strong. I might revisit my custom function because it yielded stronger results, if I’m able to balance it properly. I’d like something more linear perhaps.

I try to have each function do one basic thing, developed from fundamentals, which aids re-use, extension, maintenance, and readability. I think I read somewhere in Steve McConnell’s Code Complete that modularity should reduce the cognitive load of working with a piece of code, by reducing the amount of context a coder needs to be aware of. For me, the rand function adaptation is an easily understood black box I can use as I see fit.

Re: labeling. Yeah, I want to know at a glance what my code does without having to read documentation and work through the logic again. As a result however, I get some rather long descriptions like define :eliminateOverlapExceptSpacesWithHypotheses do |pPositionsAvailable, pHypotheses, pAllActiveSyntheses| and variable names like chanceCreateNoConsecutivelyRepeatedDisplacements. With a good text editor (i.e. Atom), I deal with such long variable names as semantic chunks. Cf. how we read words and not individual letters. In the same way, I (try to) read chunks and not words.

Topic		Replies	Views
Customised or exclusive random range Support, Help & Resources	1	89	June 9, 2024
Weighted randomization Support, Help & Resources	2	319	January 28, 2023
Why is array's sample method not random? Support, Help & Resources	29	734	March 2, 2021
Random choose by chance Creations & Ideas	0	428	November 16, 2019
Algorithm to partition an integer? Support, Help & Resources	40	1319	April 16, 2021

Choosing random ints, favouring large/small

Related topics