Introduction

The presentation here is based on the following resources:

PPAML Summer School 2016.
Language intro (DIPPL).
Language intro (AgentModels).
Bayesian Conitive Modeling, A Practical Course; M. D. Lee and E-J. Wagenmakers, Cambridge University Press, 2013. (BCM Book)

Simple Start (Sum)

Goal: If U1 and U2 are two independent variables with uniform distribution over the integers 1, 2, and 3, what is the distribution of U1 + U2 ?



var main = function () {

  var u1 = uniformDraw([1,2,3])
  var u2 = uniformDraw([1,2,3])
  
  return u1 + u2
}

var dist = Infer(
  {method: 'enumerate'},
  main);

viz.table(dist);
viz.auto(dist);

Consider: What are the distributions of U1 - U2, U1 * U2, and U1 / U2? What changes if the distribution of U2 is uniformDraw([0,1,2])?

Simple Start (Comparison)

Goal: If U1 and U2 are two independent variables with uniform distribution over the integers 1, 2, and 3, what is the distribution of U1 == U2 ?



var main = function () {

  var u1 = uniformDraw([1,2,3])
  var u2 = uniformDraw([1,2,3])
  
  return u1 == u2
}

var dist = Infer(
  {method: 'enumerate'},
  main);

viz.table(dist);
//viz.auto(dist);

Consider: How can you simplify the above program to draw only a single sample (from another distribution)?

Consider: How does the inference results change if we instead use this inference method: var dist = Infer( {method: 'MCMC', samples: 100}, main); (try running several times).

Consider: How much do the run-to-run differences change if we increase the number of samples to 1000 or 10000?

Several Simple Models (AgentModels)

WebPPL is a functional subset of JavaScript, with several probabilistic constructs. To get accustomed with the syntax, read and try out the examples from this page.

Consider:

How to filter only numerical array elements? (e.g., this is useful for data cleaning)
What do the functions map, filter, and zip do? (How do they relate to loops?)
How is recursion used to implement a geometric distribution?
See different ways to plot data in 1D or 2D.

Simple Rate Inference (Beta-Bernoulli)

Goal: Beta distribution is a common prior for the Bernoulli distribution. We want to infer the underlying probability of the coin that is tossed three times:


  var main = function () {
  var theta = beta(1,1)
  
  var k1 = flip(theta)
  var k2 = flip(theta)
  var k3 = flip(theta)

  condition (k1+k2+k3 == 2)  
  
  return theta
}

var dist = Infer(
  {method: 'MCMC', samples: 100000},
  main);
viz.auto(dist);

Consider: How would you modify the example to model tossing 5 coins? 7 coins?

Consider: Why we cannot use the inference method 'enumerate' for this example?

Consider: Try plotting the prior (beta(1,1)). You can do it by using the inference method 'forward'. What distribution does it look like?

Simple Rate Inference (Beta-Binomial)

Goal: Infer the underlying success rate for a binary process (a binomial distribution represents a sum of independent coin tosses). We want to determine the posterior distribution of the rate theta.


  var main = function () {
  var n = 10 //100
  var kk = 5 //50
  var theta = beta(1,1)
  
  var k = binomial(theta, n)
  
  condition (k == kk)  
  
  return theta
}

var dist = Infer(
  {method: 'MCMC', samples: 10000},
  main);
viz.auto(dist);

Consider: Let us instead have n=100 examples instead of 10 and we observe kk = 50 heads. What changes in the posterior? What remains the same?

Difference Between Two Rates (BCM Book)

Goal: We want to compare two different binomial processes.



var main = function () {
  var n1 = 10
  var n2 = 10
  var theta1 = beta(1,1)
  var theta2 = beta(1,1)

  
  var k1 = binomial(theta1, n1)
  var k2 = binomial(theta2, n2)
  
  condition (k1 == 5)
  condition (k2 == 7)
  
  var delta = theta1 - theta2
  
  return delta
}

var dist = Infer(
  {method: 'MCMC', samples: 100000},
  main);

viz.auto(dist);

Consider: What can we conclude about the individual processes and their difference? The code below may be helpful:


// same as before
var main = function () {
  var n1 = 10
  var n2 = 10
  var theta1 = beta(1,1)
  var theta2 = beta(1,1)

  
  var k1 = binomial(theta1, n1)
  var k2 = binomial(theta2, n2)
  
  condition (k1 == 5)
  condition (k2 == 7)
  
  var delta = theta1 - theta2
  
  return [theta1, theta2] // note: here we return the joint distribution
}

var dist1 = Infer(
  {method: 'MCMC', samples: 10000},
  main);

viz.auto(dist1);
viz.marginals(dist1); // we can plot the marginals directly

Inferring a Common Rate (BCM Book)

Goal: infer the underlying success rate for a binary process. We want to determine the posterior distribution of the rate theta.



var main = function () {
  var n1 = 10
  var n2 = 10
  var theta = beta(1,1)
  
  var k1 = binomial(theta, n1)
  var k2 = binomial(theta, n2)
  
  condition (k1 == 5)
  condition (k2 == 7)
    
  return theta
}

var dist = Infer(
  {method: 'MCMC', samples: 10000},
  main);

viz.auto(dist);

Comparing the exact and approximate computations

We will now analyze the accuracy of a computation with injected errors First let us define a simple function and define the inputs (recall, Beta(1,1) === Uniform (0,1):



var func = function (x) {
  return 2.5*(x + x^2)
}

var main = function () {

  var input = 10.0*beta(1,1)
  
  var res_o = func(input)

  return res_o
  
}

var dist = Infer(
  {method: 'MCMC', samples: 100000},
  main);

viz.auto(dist);

Second let us define a custom error model. The error is additive to the function result (for simplicity - we can make it a more 'interesting' by trying it after every instruction):


var customerr = function(d) {
  if(flip(0.5))
    return 0.01*gaussian(0,d)
  else
    return 0.01*uniform(-d,d)

}

var dist = Infer(
  {method: 'MCMC', samples: 100000},
  function () {customerr (0)});

viz.auto(dist);

Now it's the time to see how the approximate function works:


///fold:
var customerr = function(d) {
  if(flip(0.5))
    return 0.01*gaussian(0,d)
  else
    return 0.01*uniform(-d,d)

}

var func = function (x) {
  return 2.5*(x + x^2)
}
///
var main = function () {

  var input = 10.0*beta(1,1)
  
  var res_o = func(input)
  
  var res_a = res_o + customerr(input)

  return [res_o, res_a]
  
}

var dist = Infer(
  {method: 'MCMC', samples: 100000},
  main);

viz.marginals(dist);

viz.auto(dist)

Let us try to compute the (1) difference of the means and (2) the mean of the differences:


var customerr = function(d) {
  if(flip(0.5))
    return 0.01*gaussian(0,d)
  else
    return 0.01*uniform(-d,d)

}

var func = function (x) {
  return 2.5*(x + x^2)
}

var inputD = function() {
  return 10.0*beta(1,1)
}

///fold:

var main_o = function () {

  var input = inputD()
  
  var res_o = func(input)
  
  return res_o
  
}
var dist_o = Infer(
  {method: 'MCMC', samples: 100000},
  main_o);

var main_a = function () {

  var input = inputD()
  
  var res_o = func(input)
  
  var res_a = res_o + customerr(input)

  return res_a
  
}
var dist_a = Infer(
  {method: 'MCMC', samples: 100000},
  main_a);
///


var main_d = function () {

  var input = inputD()
  
  var res_o = func(input)
  
  var res_a = res_o + customerr(input)

  return res_o - res_a
  
}
var dist_d = Infer(
  {method: 'MCMC', samples: 100000},
  main_d);

viz.auto(dist_d)


print ("\n")

print ("|E(main_o) - E(main_a)|: " + Math.abs(expectation(dist_o) - expectation(dist_a)) + "\n" +
       "E|main_o - main_a|: " + expectation(dist_d, function (x) { return Math.abs(x) } ) + "\n")

Inferring Mean and Variance (BCM)

Warmup: factor vs condition


var dist = Infer({method: 'MCMC', samples: 10000},
                  function () { return gaussian(0,1) } );
viz.auto(dist)


var xs = [0, 0.1, -0.1, 0.2, -0.2, 0.02];

var main = function () {
    var mu = gaussian(0, 3)

    var XDist = Gaussian({mu: mu, sigma: 1})
    observe (XDist, xs[0]) // factor(XDist.score(xs[0])); 
    observe (XDist, xs[1])
    observe (XDist, xs[2])
    observe (XDist, xs[3])
    observe (XDist, xs[4])
    observe (XDist, xs[5])
  
    return mu
}

var dist = Infer({method: 'MCMC', samples: 10000},
                  main );
viz.auto(dist);

The score function computes the log-probability of a distribution:


print("Score of val = 0 for N(0,1): " + Math.exp(Gaussian({mu: 0, sigma: 1}).score(0)) )
print("Score of val = 1 for N(0,1): " + Math.exp(Gaussian({mu: 0, sigma: 1}).score(1)) )
print("Score of val = 2 for N(0,1): " + Math.exp(Gaussian({mu: 0, sigma: 1}).score(2)) )
print("Score of val = 3 for N(0,1): " + Math.exp(Gaussian({mu: 0, sigma: 1}).score(3)) )

Goal: Infer the mean and the variance of the Gaussian process that likely generated the data


var dataset = [10, 7, 9, 10, 15, 3];

var main = function () {

  var mu = gaussian(7, 1)
  var sigma = uniform(0,3)
  
  var Xdist = Gaussian({mu: mu, sigma: sigma})
   map(
      function(val) { observe(Xdist, val);
                      // factor(x.score(val)); 
                    },
      dataset
    );


  return [mu,sigma]
}

var dist = Infer(
  {method: 'MCMC', samples: 10000},
  main);

viz.auto(dist);
viz.marginals(dist);

Some more statistics on the mean:


///fold:
var dataset = [10, 7, 9, 10, 15, 3];

var main = function () {

  var mu = gaussian(7, 1)
  var sigma = uniform(0,3)
  
  var Xdist = Gaussian({mu: mu, sigma: sigma})
   map(
      function(val) { observe(Xdist, val)
                      // factor(Xdist.score(val)); 
                    },
      dataset
    );


  return [mu,sigma]
}

var d = Infer(
  {method: 'MCMC', samples: 10000},
  main);

viz.auto(d);
///

var getMean = function(dist_){
  var support = dist_.support()
  var e = map(function(x){return x * Math.exp(dist_.score(x))},support)
  return sum(e)
}

var getSecondMoment = function(dist_){
  var support = dist_.support()
  var e = map(function(x){return (x * x) * Math.exp(dist_.score(x))},support)
  return sum(e)
}


var getVar = function(dist_){
    var secondMoment = getSecondMoment(dist_)
    var mean = getMean(dist_)
    var mean_square = mean * mean
    var Variance = (secondMoment - mean_square)
    return Variance
}


var getNthMoment = function(dist_,n){
  var support = dist_.support()
  var e = map(function(x){return Math.pow(x,n) * Math.exp(dist_.score(x))},support)
  return sum(e)
}

display("Mean:")
display(getMean(d))
display("Variance:")
display(getVar(d))

Students Learning (BCM Book)

Goal: The exam consists of 40 T/F questions. In total, 15 students are taking the test. Given the scores of the students, infer whether they were studying for the exam or just randomly guessing.

//some data:

var allscores = [21, 17, 21, 18, 22, 31, 31, 34, 34, 35, 35, 36, 39, 36, 35 ];

var main = function () {

  var preparationprob = uniform(0.5, 1)
  
  var guessprob = 0.5
  
  var total = 40
  
  var studied = map(
      function(testscore) { 
        
        var z = bernoulli(0.5);
        var theta = z? preparationprob : guessprob;
        
        var k = Binomial({p: theta, n: total})


        // condition (k == testscore)
        factor(k.score(testscore))
        
        return z
      },
    allscores);


  return studied
}

var dist = Infer(
  {method: 'MCMC', samples: 1000},
  main);

viz.marginals(dist);

Return to Simple Rate Inference (BCM Book)

Goal: infer the underlying success rate for a binary process. We want to determine the posterior distribution of the rate theta.

Show how to distinguish between:

Prior distribution over parameters -- describes our initial belief.
Prior predictive distribution -- describes what data to expect from the model given the prior beliefs.
Posterior distribution over parameters -- describes our belief about the parameter distribution updated after observing data.
Posterior predictive distribution -- describes what data to expect from the model given the updated beliefs.


  var main = function () {
  var n = 10 
  
  // prior: 
  var theta = beta(1,1)
  
  // posterior: 
  var k = binomial(theta, n)
  
  condition (k == 5)  

  // prior predictive: 
  var thetaprior = beta(1,1)
  var priorpredictive = binomial(thetaprior, n)

  // posterior predictive:
  var postpredictive = binomial(theta, n)

  return [theta, priorpredictive, postpredictive]

}

var dist = Infer(
  {method: 'MCMC', samples: 10000, burn: 1000, lag: 2},
  main);
viz.marginals(dist);

Seven Scientists (BCM Book)

Goal: Seven Scientists are measuring the same (inherently noisy) phenomenon. Some scientists are better at measurement than the others. Based on the values, infer which scientists are more likely to identify the true value.

//some data:

var measurements = [-27, 3.50, 8.19, 9., 9.6, 9.95, 10.1 ];

var main = function () {

  var mu = gaussian(9, 1)
  
  var sigmas = map(
      function(datapt) { 
        
        var sigma = uniform(0,10);
        
        var x = Gaussian({mu: mu, sigma: sigma}); 
        factor(x.score(datapt)); 
      
        return sigma
      },
    measurements);


  return sigmas
}

var dist = Infer(
  {method: 'MCMC', samples: 10000},
  main);

viz.marginals(dist);

Linear Regression (WebPPL Examples)

//some data:
var xs = [0, 1, 2, 3];
var ys = [0, 1, 4, 6];

var model = function() {
  var m = gaussian(0, 2);
  var b = gaussian(0, 2);
  var sigma = gamma(1, 1);

  var f = function(x) {
    return m * x + b;
  };

  map2(
    function(x, y) {
      factor(Gaussian({mu: f(x), sigma: sigma}).score(y));
    },
    xs,
    ys);

  return f(4);
}

viz.auto(Infer({method: 'MCMC', samples: 10000}, model));

HMM (PPAML 2016)


var trueObservations = [false, false, false];

var transition = function(s) {
  return s ? flip(0.7) : flip(0.3)
}
var observe = function(s) {
  return s ? flip(0.9) : flip(0.1)
}
var hmm = function(n) {
  var prev = (n == 1) ? {states: [true], observations: []} : hmm(n - 1);
  var newState = transition(prev.states[prev.states.length - 1]);
  var newObs = observe(newState);
  return {
    states: prev.states.concat([newState]),
    observations: prev.observations.concat([newObs])
  };
}
var stateDist = Infer({method: 'enumerate'},
  function() {
    var r = hmm(3);
    map2(function(o,d){condition(o===d)}, r.observations, trueObservations)
    return r.states;
  }
)

viz.table(stateDist)