How to easily create, read, write and manipulate tables and databases in Python with ATpy and asciitable

With the release of ATpy 0.9.4 and asciitable 0.2.4, it’s about time we told our readers how incredibly easy it is to create, read, write, and manipulate tables and databases of data in Python!
 
Most, if not all of us have to manage tabular data at one point or another, and it is safe to say that the number of different formats for tables is well beyond ridiculous. FITS, VO, HDF5, IRSA/IPAC, MRT/CDS, DAOphot, RDB, and CSV tables are some of the most common (and I’ll spare you the acronym definitions). In addition, most of us shy away from using databases, because the learning curve is steep, and often requires learning new tools. This is where asciitable and ATpy come in.

The goal of asciitable, developed by Tom Aldcroft, is to provide an extensible framework for reading in tables of ASCII data in arbitrary formats. If the table format you are using is not one of the presets, you can easily define your own!
 
But that is only half of the story. Manipulating tables (adding/removing/renaming columns, sorting, etc.) is still not easy, and asciitable does not help with non-ASCII formats. Enter ATpy, a tool that Eli Bressert and I have developed, which brings everything together in an easy-to-use package. ATpy uses asciitable to read ASCII tables, but also uses other packages to directly read and write from FITS, VO, and HDF5 tables, and can also talk to SQL databases, including MySQL, PostGreSQL, and SQLite. Reading tables is a one liner for each format:
 
>>> import atpy
>>> t = atpy.Table('mytable.fits') # FITS table
>>> t = atpy.Table('mytable.hdf5') # HDF5 table
>>> t = atpy.Table('mytable.xml') # VO table
>>> t = atpy.Table('mytable.tbl') # IRSA/IPAC table
>>> t = atpy.Table('sqlite', 'mytable.db') # SQLite database
>>> t = atpy.Table('mysql', db='python', user='bob', \
                   passwd='1(@N(!10))') # MySQL database
>>> t = atpy.Table('postgres', database='python', \
                   user='alice', password='1(@N(!10))') # PostGreSQL database
and so on. Writing tables is equally easy:
 
>>> t.write('mynewtable.fits') # FITS
>>> t.write('mynewtable.hdf5') # HDF5
>>> t.write('mynewtable.xml') # VO
    etc.
The file type is automatically recognized from the extension, but you can always override the default or define your own extensions. You can also create tables from scratch:
 
>>> import atpy
>>> import numpy as np
>>> ra = np.random.random(5)
>>> dec = np.random.random(5)
>>> flux = np.random.random(5)
>>> t = atpy.Table()
>>> t.add_column('ra', ra, unit='deg')
>>> t.add_column('dec', dec, unit='deg')
>>> t.add_column('flux', flux, unit='mJy')
Either way, accessing and modifying the data is simple and straightforward:
 
>>> t['ra']
array([ 0.2331109 ,  0.23474117,  0.62186612,  0.65640893,  0.71596516])
>>> t['ra'][1] = 0.1
>>> t['ra']
array([ 0.2331109 ,  0.1       ,  0.62186612,  0.65640893,  0.71596516])
as is manipulating tables:
 
>>> t.sort('ra')
>>> t.rename_column('ra','ra_j2000')
>>> t.remove_column('flux')
Hopefully, if you are not already using ATpy, we’ve convinced you to at least give it a try! Installation instructions and a user-friendly manual are located here. We’ve recently created a new user discussion group at Google groups, which we encourage you to make use of! To read ASCII tables, you will also need to install asciitable. For more general information about using Python in Astronomy, we encourage you to visit the astropython.org web site!
11 comments… add one
  • Ben Maughan Sep 6, 2010 @ 8:22

    No discussion of table manipulation is complete without mentioning TOPCAT
    http://www.star.bris.ac.uk/~mbt/topcat/
    This is an extremely flexible and powerful tool for visualising, filtering, matching, manipulating and plotting table data. It is mostly GUI based so is complementary to scripting methods like the one discussed in this article, but certainly worth a mention. How about an astrobetter post on TOPCAT?

  • John Sep 6, 2010 @ 10:51

    Ben makes a good point. And, in fact, STILTS basically provides all the TOPCAT functionality wrapped up into a series of modules callable either from the command line or from Jython.

    Personally, I don’t think Jython is (yet?) a realistic CPython alternative for most applications, but it’s definitely worth considering.

  • Prasanth Sep 6, 2010 @ 22:15

    According to the SAMPy library website, http://cosmos.iasf-milano.inaf.it/pandora/sampy.html, this library can be used to access TOPCAT services using the SAMP protocol. Perhaps there is a way to incorporate it into ATpy?

  • Tom Sep 6, 2010 @ 23:21

    Getting ATpy and TOPCAT to talk is a great idea! One thing that I did not mention is that ATpy uses a modular extension system to add read/write capabilities, so a TOPCAT reader/writer using SAMPy would be easy to develop. If anyone is interested in helping out with this, please drop us an email at astropython [at] gmail.com!

  • Andy Sep 9, 2010 @ 3:47

    Hey Tom,

    I wrote something to interact with ATpy and TOPCAT a couple of weeks ago using SAMPy. It performs all the calculations I need on my spectra, initiates a SAMP hub then if TOPCAT is running it broadcasts the completed data table. It’s completely switched me to using VO table formats instead of ASCII data, and I’ll never turn back.

    After the table is broadcasted, I can just select ‘Broadcast row’ in TOPCAT then the python script waits to see what data is highlighted. When it gets a highlighted row, all the relevant information is shown in a figure. This includes the spectra, and all of the IRAF routine images (from fitprofs, fxcor, etc). I HIGHLY recommend using something like this to everyone, as I can quickly see why any outliers in my data exist, and interactively see EXACTLY what fittings and correlations were made to the spectra.

    Andy

  • Tom Sep 9, 2010 @ 8:37

    @Andy: this sounds great! Would you be happy to share the part of your code that deals with SAMP with us to get us started on implemented something like this in ATpy by default?

  • Andy Casey Sep 10, 2010 @ 7:32

    @Tom: No problems, I’m happy to help out with whatever I can. Keep your judging hat off though, because I’m only recent to astronomy and python. 🙂

    Below is a link to some sample code that uses ATpy to generate a table with some faux data, and a sampCommunications class I’ve written to (as cleanly as possible) interact with TOPCAT. I’ll be outlining this in more detail, as well as some quirks and ideas I’ve come across in the process, in a blog post on my website in the next few (working) days. Hope this helps out!

    http://pastebin.com/WPahuQEv

    With the code in the above link, the table will load instantaneously in TOPCAT, and if you click the ‘Broadcast row’ tick, any highlighted data points or rows will be sent back to the samp.highlightRow function.

    If you’re going to be tinkering around with SAMPy and TOPCAT/Aladin etc, I highly recommend using JSAMP (also by Mark Bristol who made TOPCAT), which is a SAMP hub that you can use to track all SAMP messages, so you can see which end the problem is if something doesn’t work. If JSAMP isn’t your thing, use the log=’logfile’ parameter in sampy.SAMPHubServer

    Please note as well you can use TOPCAT to broadcast in any way you want. If you click the Activation Actions you can send any info you want!

    Cheers

  • Andy Casey Sep 10, 2010 @ 7:34

    Oops,..

    *Mark Taylor, at Bristol University

    Sorry Mark 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *