Bayesian Inference in Python

CosmoMC Bayesian Inference Package - sampling posterior probability distributions of cosmological parameters.

Observational astronomers don’t simply present images or spectra, we analyze the data and use it to support or contradict physical models. A key aspect of data analysis is understanding the certainty of claims that are made. Thus using statistics is a fundamental part of observational astronomy.

Statistical inference is one method of drawing conclusions, and establishing their certainty, given a set of observational data that is subject to random variation. There are two broad categories of statistics in wide-spread use in astronomy: Frequentist and Bayesian statistics. The frequentist sees probability as the long-run expected frequency of occurrence while the Bayesian view of probability is related to degree of belief; it is a measure of the plausibility of an event given incomplete knowledge. I won’t go into the pros and cons of each approach as there is plenty of fodder to be found on the internet. For beginners, check out this Astrobites post introducing Bayesian methods, or if want a to learn a little of both, try Practical Statistics for Astronomers by Wall & Jenkins. I will say that for complex data sets and complex models, bayesian inference is gaining traction in astrophysics (Loredo 1992Trotta 2008).

Bayesian inference is quite simple in concept, but can seem formidable to put into practice the first time you try it (especially if the first time is a new and complicated problem). Here are two interesting packages for performing bayesian inference in python that eased my transition into bayesian inference:

  • pymc – Uses markov chain monte carlo techniques. There is also a useful set of examples using and extending pymc on the Healthy Algorithms blog. A useful introduction was presented at the SciPy 2011 conference and was recorded here.
  • multinest – Uses nested sampling techniques, which are superior for parameter spaces with strong and non-linear correlations as is usually the case for astronomical applications. The package is actually written in Fortran and has a C-wrapper that is callable from python. Here are two possible python wrappers used by lisasolve and a general purpose pymultinest. I have successfully used the second one, but if you are interested in running in parallel (with MPI), the first might be a better alternative. The hardest part about this package was getting it installed (fortran and C compilers have to play nice together).
In my own personal experience, I used pymc and, once I got passed the learning curve of understanding how they handle parameter and model descriptions, I found it to be elegant and easily extensible to other MCMC sampling techniques. However, I eventually moved to multinest due to the strongly degenerate and curved nature of the parameter space I needed to sample. Multinest was far superior in handling this situation (ran 5-10 times faster and provided a more accurate result on simulated data sets), even over hit-and-run MCMC algorithms supposedly designed for such problems.
Here are a few other packages I found with a simple google search, but I don’t have experience with these:
What packages do you use or do you role your own? Any other thoughts on the pros and cons of the various packages?
3 comments… add one

Leave a Reply

Your email address will not be published. Required fields are marked *