Observational astronomers don’t simply present images or spectra, we analyze the data and use it to support or contradict physical models. A key aspect of data analysis is understanding the certainty of claims that are made. Thus using statistics is a fundamental part of observational astronomy.
Statistical inference is one method of drawing conclusions, and establishing their certainty, given a set of observational data that is subject to random variation. There are two broad categories of statistics in wide-spread use in astronomy: Frequentist and Bayesian statistics. The frequentist sees probability as the long-run expected frequency of occurrence while the Bayesian view of probability is related to degree of belief; it is a measure of the plausibility of an event given incomplete knowledge. I won’t go into the pros and cons of each approach as there is plenty of fodder to be found on the internet. For beginners, check out this Astrobites post introducing Bayesian methods, or if want a to learn a little of both, try Practical Statistics for Astronomers by Wall & Jenkins. I will say that for complex data sets and complex models, bayesian inference is gaining traction in astrophysics (Loredo 1992, Trotta 2008).
Bayesian inference is quite simple in concept, but can seem formidable to put into practice the first time you try it (especially if the first time is a new and complicated problem). Here are two interesting packages for performing bayesian inference in python that eased my transition into bayesian inference:
- pymc – Uses markov chain monte carlo techniques. There is also a useful set of examples using and extending pymc on the Healthy Algorithms blog. A useful introduction was presented at the SciPy 2011 conference and was recorded here.
- multinest – Uses nested sampling techniques, which are superior for parameter spaces with strong and non-linear correlations as is usually the case for astronomical applications. The package is actually written in Fortran and has a C-wrapper that is callable from python. Here are two possible python wrappers used by lisasolve and a general purpose pymultinest. I have successfully used the second one, but if you are interested in running in parallel (with MPI), the first might be a better alternative. The hardest part about this package was getting it installed (fortran and C compilers have to play nice together).
- BIP – Bayes Melding Module
- OpenBayes – Bayesian Nets
- Inference Package – developed by astronomers
- emcee – developed by astronomers (see astrobites summary)
Another piece of software that performs Bayesian Estimation via MCMC is WinBUGS and OpenBUGs (BUGS = Bayesian using Gibbs sampler). Both of these programs are easy to use and there are plenty of examples to help you along the way. OpenBUGS is essentially the newer form of WinBUGS (they have stopped development of WinBUGS and are now concentrating on OpenBUGS). I believe the biggest difference between the two (and I may be wrong) is WinBUGS is heavily dependent on the Gibbs sample while OpenBUGS is based on the Metropolis-Hastings algorithm. Fortunately, both are free!
If anyone is interested, I can provide some references to good books on Bayesian Statistics.
WinBUGS: http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml
OpenBUGS: http://www.openbugs.info/w/
There is also Martyn Plummer’s JAGS which can run on its own, and has an R interface, with a Python interface in development: http://doingbayesiandataanalysis.blogspot.com/2014/09/doing-bayesian-data-analysis-in-python.html
Bayes++ (http://bayesclasses.sourceforge.net/Bayes++.html) is a strong Bayesian framework. Commonly used in Robot Navigation.