Random numbers are everywhere in computing. They bring life and unpredictability to machines of otherwise relentless routine but sometimes we need to reign in their blissful chaos and exercise just a little bit more control. In this post I’ll take a look at a few ways of doing just that.

# Part 1: The Basics

In mathematics, probabilities are usually described as a proportion of one; so a 50% chance is 0.5, 100% is 1 and so on. This aligns neatly with random number generation in many languages: Math.random() in JavaScript, ActionScript and Java all produce a floating point number between 0 and 1. This can be used in a really simple way, lets say our program has to draw a grey monkey 30% of the time, the rest of the time she should be red:

if(Math.random() < 0.3) // draw grey monkey else // draw red monkey

each time it is run our program will randomly chose to draw a grey or red monkey, overall the red monkey will be chosen 70% of the time and the grey the remaining 30%.

# Part 2: Many Outcomes

The above example is fine in simple cases but quickly becomes cumbersome when many outcomes are necessary. We could cater for red, yellow, blue, grey and green monkeys with multiple *else if’s* but it would be ugly, difficult to read and adjust the probabilities if needed. What we really want is a simple data structure which defines a series of outcomes and their individual probabilities:

// array of prizes (value) with their specific probability of being awarded (p) var outcomes = [ { value: 'red', p: 0.3 }, { value: 'grey', p: 0.2 }, { value: 'yellow', p: 0.1 }, { value: 'blue', p: 0.05 }, { value: 'green', p: 0.35 } ]; getOutcome(outcomes); function getOutcome(outcomes) { // n is the 'event' from which we'll determine an outcome var n = Math.random(); // p_total is the cumulation of all the potential outcomes we've considered var p_total = 0; var count = 0; var len = outcomes.length; while(count < len) { var outcome = outcomes[count]; p_total += outcome.p; if(n < p_total) return outcome.value; ++count; } }

So our getOutcome function considers each outcome in turn until p_total (the cumulation of all probabilities we’ve considered so far) is greater than our random number, that signifies our hit so at that point the current value is returned. Let’s see it in action, here getOutcome is executed many times and the results outputted:

This is already pretty powerful, we have complete control over each outcome however we have to be careful that the sum of all the probabilities is 1; anything less means there’s a chance no outcome will occur, anything more could mean that certain outcomes cannot occur at all. If we want to adjust the balancing we’ll end up juggling lots of numbers less than one, it’s not very intuitive. Let’s look at how we can formulate the probabilities.

# Part 3: Quadratic Distribution

You might be familiar with the basic quadratc parabola of y = x^{2}:

We’re going to use the area between the x-axis and the curve for different regions to represent the probability of our various outcomes.

The actual equation we’ll use is:

By changing the values of a, b and c we’ll be able to produce different types of curve and so give us probability distributions to suit many different situations. In the graph above, a=1, b=0 and c=0.

First we need to integrate the function. Fortunately (given the scope of integration) this is a simple one:

Using our integral we can find the area under the curve for any given region like so:

This gives us a total area for our outcomes. Next we divide this region into smaller portions, one for each outcome and calculate their individual areas:

Dividing each region by the total area gives its proportion; in other words the probability.

We can plug these probabilities straight back in to our getOutcome function. Now when we come to adjust the balancing we can simply experiment with different values for a, b and c until we find something suitable:

But wait!! Something isn’t quite right. Certain values of a, b and c produce some rather unusual results. Try inputting a=1, b=-3, c=-10 (lower bound=0, upper bound = 10) in the example above and you’ll see some negative probabilities; not good! The problem is that regions *under* the x-axis are considered to have negative area. This upsets all of our calculations; we’re not done with the maths just yet.

To calculate the true area of each region we need to separate it into subregions wherever the graph crosses the x-axis. The total area for the region is the sum of the *magnitude* of each of it’s parts. We need to find where the quadratic equation crosses the x-axis, i.e. where y=0, also known as finding the roots.

Not all quadratic curves cross the x-axis so we first determine if it does with the following formula:

if this produces a negative result our equation cannot be solved for zero; *it doesn’t cross the x-axis*. If it is positive (or zero) we can find a solution, and we do so with the following:

Most quadratic curves will have two solutions (if any), hence the two formulae above; this is a result of the ambiguity which arises when working with square roots.

Ok, that is all of the formulae we need 🙂 When it’s all put together we can produce a range of distributions and easily adjust the skewing of our probabilities by adjusting a number or two.

With the right values of a, b and c we can produce a broad range of distributions:

# Source

var assignQuadraticP = function(outcomes, a, b, c, lower_bound, upper_bound) { // our integrated quadratic function var integral = function(x) { return 1 / 3 * a * Math.pow(x, 3) + 1 / 2 * b * Math.pow(x, 2) + c * x; }; // this tests whether our quadraric formula crosses the x-axis within a given range // returning the x coordinates of any intersects as an array. Note that the x-axis // intersects have already been calculated and stored in the variable roots when this // is invoked var get_encapsulated_roots = function(lower_bound, upper_bound) { var contained = []; var len = roots.length; for(var idx = 0; idx < len; ++idx) { var root = roots[idx]; if(root < upper_bound && root > lower_bound) contained.push(root); } return contained; }; // find the area under the quadratic curve for any given region. Any region with // encapsulated roots is split into subregions and the magnitude of those subregions // is summated var find_area = function(lower_bound, upper_bound) { // find any roots within the region var sub_regions = get_encapsulated_roots(lower_bound, upper_bound); // sort encapsulated roots in ascending order sub_regions.sort(function(a, b){return a - b;}); // if there are encapsulated roots... if(sub_regions.length > 0) { sub_regions.unshift(lower_bound); sub_regions.push(upper_bound); var area_total = 0; var len = sub_regions.length; // add up the total area from each subregion for(var idx = 1; idx < len; ++idx) area_total += find_area(sub_regions[idx - 1], sub_regions[idx]); console.log('find area between: ', lower_bound, upper_bound, '=', area_total); return area_total; } else { // if there are no encapsulated roots return the area between the boundaries console.log('find area between: ', lower_bound, upper_bound, '=', integral(upper_bound), '-', integral(lower_bound), Math.abs(integral(upper_bound) - integral(lower_bound))); return Math.abs(integral(upper_bound) - integral(lower_bound)); } }; var roots = []; // test whether the curve crosses the x-axis var exprss_1 = Math.pow(b, 2) - 4 * a * c; // if it does then calculate the roots if(exprss_1 >= 0) { var exprss_2 = 1 / 2 * a; var exprss_3 = Math.sqrt(exprss_1); roots.push(exprss_2 * (-b + exprss_3)); roots.push(exprss_2 * (-b - exprss_3)); } // find total area of our region var area_total = find_area(lower_bound, upper_bound); console.log(area_total); var evaluated = []; var len = outcomes.length; var step_x = (upper_bound - lower_bound) / len; // for each possible outcome calculate its probability by finding the area of its assigned region // as a proportion of the total area for(var idx = 0; idx < len; ++idx) { evaluated.push({ value: outcomes[idx], p: find_area(lower_bound + (step_x * idx), lower_bound + (step_x * (idx + 1))) / area_total }) console.log(find_area(lower_bound + (step_x * idx), lower_bound + (step_x * (idx + 1)))); } return evaluated; }; var getOutcome = function(outcomes) { // n is the 'event' from which we'll determine an outcome var n = Math.random(); // p_total is the cumulation of all the potential outcomes we've considered var p_total = 0; var count = 0; var len = outcomes.length; while(count < len) { var outcome = outcomes[count]; p_total += outcome.p; if(n < p_total) return outcome.value; ++count; } }; // define out possible outcomes, the order in which they're defined will dictate the // order in which they're allocated a region when we start integrating var outcomes = [ 'red', 'grey', 'yellow', 'blue', 'green' ]; var probabilities = assignQuadraticP(outcomes, a, b, c, low, high); getOutcome(probabilities);