For many astronomers, statistics is an integral part of our analysis procedure; yet, we typically get very little formal training in this area. Recently, several astro-statistics papers have been posted on astro-ph that specifically address some of the unique attributes of astronomical data (small numbers of data points, frequent systematic errors or outliers, non-linear models, heteroskedastic error bars and limits, etc.). I would like to round up these and other similar astro-stats papers and post them as a collection on the wiki. Do you know of additional astro-stats papers to include in this collection?

- Error Estimation in Astronomy: a Guide, Andrae’ (arXiv:1009.2755)
- critical look at a common practice of rescaling error bars to get reduced chi-squared=1.
- very good description of the different methods for defining confidence intervals

- Dos and Don’ts of Reduced Chi-Squared, Andrae’, Schulze-Hartung, & Melchior (arXiv:1012.3754)
- Data Analysis Recipes: Fitting a Model to Data, Hogg, Bovy, & Lang (arXiv:1008.4686)
- describes the basics of both frequentist and bayesian approaches

Note: There are some excellent books for purchase on astro-statistics; but here I am collecting papers that are available via astro-ph or in journals that most institutions have access to.

On the Estimation of Confidence Intervals for Binomial Population Proportions in Astronomy, Cameron (arXiv:1012.0566)

–The correct way to determine confidence intervals for population proportions using the beta distribution. Relevant when you want to measure the fraction of a population falling into a given category.

The classic for survival analysis, regression etc with upper/lower limits, is Isobe+(1986, ApJ, 306, 490I) Statistical methods for astronomical data with upper limits. II – Correlation and regression. It’s worth making a note of.

Here is a textbook that focuses heavily on censored data (both left, right, and interval). However, the applications are primarily estimation of lifetime distribution for items, I do feel that the presentation is general enough to be applied to other situations. There are also simple, graphical methods for assessing distributional fits using Kaplan and Meier estimates.

http://www.public.iastate.edu/~stat533/meeker_escobar.html

http://adsabs.harvard.edu/abs/2007ApJ…665.1489K , B. Kelly, “Some Aspects of Measurement Error in Linear Regression of Astronomical Data”

How to fit a straight line to data when there are errors on both axes and there is intrinsic scatter in the relationship – which is practically always.

Don’t let me catch you using bisector fitting to anything important, especially if there is a selection limit on one of the coordinates.

By the way, a lot of people cite and use Isobe et al 1990, http://adsabs.harvard.edu/abs/1990ApJ…364..104I , “Linear regression in astronomy.” But it’s my opinion that the recommendations of that paper are seriously flawed in practice. The flaw will only be significant sometimes (when you have data with a lot of scatter and a selection limit) but that’s exactly when you should care about the fitting method. If you have enough data to make a rock solid measurement the details of choosing a fitting method are less important.

Would also like to add

http://arxiv.org/abs/1012.3589

hi~ 🙂 how about this one? http://arxiv.org/abs/physics/0511182