The PyMC library offers a solid foundation for probabilistic programming and Bayesian inference in Python, but it has some drawbacks. Although the API is robust, it has changed frequently along with the shifting momentum of the entire PyMC project (formerly "PyMC3"). This is most evident in the abandoned "PyMC4" project and the stranding of Theano. The PyMC developers have picked up where the developers of Theano left off, introducing "Aesara" as a replacement; but, as it stands, little has been written about how to migrate from Theano to Aesara, especially for PyMC users who don't have prior experience with computational graphs or tensor libraries.

PyMC is actually quite expressive and there are rich examples of how to use it for more than just Bayesian inference. These community-contributed examples are not always updated, however, to reflect the changes to PyMC over the years. In particular, there are few examples of how to use PyMC for approximate Bayesian computation (ABC), and those I've seen [1,2] are functionally deprecated given the current state of the library.

ABC, also referred to as likelihood-free inference [3] involves Bayesian simulation—primarily through Markov Chain Monte Carlo (MCMC)—for models which have intractable or analytically unavailable likelihood functions. This includes models that have no formal likelihood, such as physical models or other simulations that have no analytical representation. In such cases, we require an approach to modeling with a "black-box" likelihood function, for instance, the mean squared error between a model's predictions and observed values. This approach is popular for fitting a variety of models to observed data and is the focus of another Python library, SPOTPY [4]. However, in my experience, SPOTPY has some outstanding issues related to the proposal distribution, in which bounded priors are sampled in a way that fails to preserve balance in the Markov chain [5]. Moreover, it doesn't have the advantages that PyMC offers, with its rich diagnostics and performance enhancements. I spent some time figuring out how to do ABC in PyMC, so I think it's worth reporting here in case anyone else is struggling to navigate the vast and shifting sands of PyMC's API.

Lotka-Volterra Example

Here, I use the example of the Lotka-Volterra ordinary differential equations (ODEs), as presented in one of the deprecated PyMC examples [2]. These well-known ODEs are used to model the dynamics of a pair of interacting predator and prey populations. \(N_t\) and \(M_t\) describe the size of prey and predator populations, respectively, at time \(t\). Change in the populations is given:

$$ \frac{d N_t}{d t} = \alpha N_t - \beta N_t M_t $$
$$ \frac{d M_t}{d t} = -\gamma M_t + \delta N_t M_t $$

Where \(\alpha\) is the prey growth rate (in the absence of predators), \(\beta\) is the prey mortality rate, \(\gamma\) is the predator mortality rate (in the absence of prey), and \(\delta\) is the conversion efficiency by which prey are consumed and predators increased. A Python implementation of this model, postponing a solution, could be written as follows where, for simplicity, we set \(\gamma = -1.5\) and \(\delta = 0.75\):

import numpy as np

# True parameter values (alpha, beta)
alpha = 1.0
beta = 0.1

# Initial population of rabbits and foxes
X0 = [10.0, 5.0]
size = 100 # Size of data
time = 15 # Time lapse
t = np.linspace(0, time, size)

# Lotka-Volterra equations
def dX_dt(X, t, a = alpha, b = beta):
    'Return the growth rate of fox and rabbit populations.'
    return np.array([
        a * X[0] - b * X[0] * X[1],
        -1.5 * X[1] + 0.75 * b * X[0] * X[1]
    ])

The solution to this system of ODEs might be written:

from scipy.integrate import odeint

def competition_model(params, x = None):
    a, b, _ = params
    'Simulator function (an ordinary differential equation to be integrated)'
    return odeint(dX_dt, y0 = X0, t = t, rtol = 0.01, args = (a, b))

We generate some synthetic, "observed" data as:

# Observed data (with added random noise)
data = competition_model(
    params = (a, b, None)) + np.random.normal(size = (size, 2))

The Black-Box Likelihood Operation

PyMC is built on top of Theano, a computational graph library that allows for symbolic computing and automatic differentiation. When trying to do ABC, we have to develop a way for Theano to sample the posterior likelihood even though we have no formal likelihood function that can be represented analytically. With ABC, there is no likelihood function that Theano can differentiate with respect to model parameters; no gradient to be calculated.

We can, however, create a custom Theano operator, tt.Op, that returns a quasi-likelihood value, allowing for a sampler (e.g., Metropolis-Hasting) to sample from the posterior likelihood even though a gradient is unavailable. Here, the quasi-likelihood function is the root-mean squared error (RMSE), though we could use any other goodness-of-fit metric. The perform() method represents the operation that is performed on Theano's computational graph: given a vector of proposed parameter values, calculate the posterior likelihood. For calculating the RMSE, it is necessary that this operation have the observed values available in-memory, so we set them as an instance attribute, self.observed. Because our model may require some additional driver data, we also set these data as an instance attribute, self.x.

import theano.tensor as tt

class BlackBoxLikelihood(tt.Op):
    itypes = [tt.dvector] # Expects a vector of parameter values when called
    otypes = [tt.dscalar] # Outputs a single scalar value (the log likelihood)

    def __init__(self, model, observed, x):
        '''
        Parameters
        ----------
        model : Callable
            An arbitrary "black box" function that takes two arguments: the
            model parameters ("params") and the forcing data ("x")
        observed : numpy.ndarray
            The "observed" data that our log-likelihood function takes in
        x:
            The forcing data (input drivers) that our model requires
        '''
        self.model = model
        self.observed = observed
        self.x = x

    def loglik(self, params, x, observed):
        # The root-mean squared error (RMSE)
        predicted = self.model(params, x)
        return -np.sqrt(np.nanmean((predicted - observed) ** 2))

    def perform(self, node, inputs, outputs):
        # The method that is used when calling the Op
        (params,) = inputs
        logl = self.loglik(params, self.x, self.observed)
        outputs[0][0] = np.array(logl) # Output the log-likelihood

Note that we return the negative RMSE, as PyMC samplers are used to working with log-likelihoods and we wish to maximize the likelihood function (i.e., obtain the smallest negative RMSE).

For some applications, we may want to use the Gaussian log-likelihood or a similar exponential likelihood function.

$$ \mathrm{log}\,\mathcal{L}(\hat{\theta}, \hat{\sigma}) = -\frac{N}{2}\,\mathrm{log}(2\pi\hat{\sigma}^2) - \frac{1}{2\hat{\sigma}^2} \sum (\hat{y}(\hat{\theta}) - y)^2 $$

In this case, we need to include as a parameter in our model the error variance, sigma; it's most convenient to make this last the parameter in the parameters vector:

def loglik(self, params, x, observed):
    predicted = self.model(params, x)
    sigma = params[-1]
    return -0.5 * np.log(2 * np.pi * sigma**2) - (0.5 / sigma**2) *\
        np.nansum((predicted - observed)**2)

Parameter Estimation with PyMC

Finally, we're ready to estimate our model's parameters using PyMC. If you're unfamiliar with PyMC and how models are defined, check out the tutorial, "Getting started with PyMC3" before trying to figure out what's going on below. If you are already familiar with PyMC models, hopefully there are no surprises here.

In this example, we use a Gaussian log-likelihood function, so we add the error variance, sigma, as a model parameter.

import arviz as az
import pymc3 as pm
from matplotlib import pyplot

loglik = BlackBoxLikelihood(competition_model, data, None)

with pm.Model() as my_model:
    # (Stochastic) Priors for unknown model parameters
    a = pm.HalfNormal('a', sd = 1)
    b = pm.HalfNormal('b', sd = 1)
    sigma = pm.HalfNormal('sigma', sd = 1)

    # Convert model parameters to a tensor vector
    params = tt.as_tensor_variable([a, b, sigma])

    # Define the likelihood as an arbitrary potential
    pm.Potential('likelihood', loglik(params))

    trace = pm.sample(
        draws = NUM_DRAWS, step = pm.Metropolis(),
        start = {'a': 0.5, 'b': 0.5, 'sigma': 10})
    az.plot_trace(trace)
    pyplot.show()

Because, in this toy example, we know the true parameter values, a look at the trace plot suggests that they are correctly identified. However, there is a lot of uncertainty in the \(\beta\), or b, parameter.

Let's try it again but using the RMSE quasi-likelihood this time. Again, we'll take 50,000 samples of the posterior in each of four chains. At first glance, our posterior distributions seem narrower with the RMSE quasi-likelihood; however, if we look closely at the horizontal axes, we see that the RMSE quasi-likelihood is permitting a few larger jumps in the proposal distribution than we saw with the Gaussian likelihood.

Despite these differences, the choice of whether to use the RMSE or a formal likelihood function should be based on substantive modeling concerns. In this case, is the Gaussian likelihood appropriate, given that our observed data might not be independent and identically distributed?

Other Considerations

When fitting a model to data with MCMC, we gain the ability to estimate the uncertainty in our model parameters, which can be visually assessed from plots likes the ones above. But how do we pick the best-fit parameters? In this toy example, the maximum posterior estimate obviously corresponds to the true parameter values, but in models with less well-identified parameters (or more process or instrument noise), it may not be obvious what to choose. We may want the parameters associated with the maximum model log-likelihood. You may have expected that PyMC would record the model's log-likelihood at each sampling step—I certainly did. But that's not the case, as can be seen from the InferenceData that were returned:

>>> trace
Inference data with groups:
    > posterior
    > sample_stats

The reason for this is complicated and related to the active development of PyMC. For now, you can store the model log-likelihood by explicitly defining it during the model's configuration:

with pm.Model() as my_model:
    ...
    # Store the model log-likelihood as well
    loglik = pm.Deterministic('log_likelihood', model.logpt)

References

  1. "Using a 'black box' likelihood function" (Accessed: November 13, 2021.)
  2. "Sequential Monte Carlo - Approximate Bayesian Computation" (Accessed: November 13, 2021.)
  3. Sisson, S.A. & Y. Fan. 2011. "Likelihood-Free MCMC." Chapter 12, Handbook of Markov Chain Monte Carlo. CRC Press, LLC.
  4. Houska, T., Kraft, P., Chamorro-Chavez, A. & Breuer, L. 2015. "SPOTting model parameters using a ready-made Python package." PLoS ONE, 10(12), e0145180.
  5. Vrugt, J. A. 2016. Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation. Environmental Modelling & Software, 75, 273–316.