Some drawbacks to Python: Python Provocation | The e-Astronomer

Python Provocation | The e-Astronomer

Python occupies a strange territory between the easy peasy world of “download this app and start clicking” and the stern world of “if you don’t know what a makefile is, you’d better look somewhere else mate”. At first I thought this was precisely its strength : grown up stuff for busy people. But now I ain’t so sure. Neither use nor ornament, as EG used to say.

I’m neither here nor there about Python yet…all I know is that the tide is slowly but surely changing in its direction. I can also report that my method of having my students learn Python rather than IDL is working well. Without much effort, I’m learning how to use it (they bring me printouts of their code), I don’t/can’t do debugging for them and so they work it out as a team, and my IDL library is gradually being converted. And I feel better about them having Python on their CVs rather than IDL.

Other than the obvious problem with the packages strewn across everywhere and there not being a centralized astro library yet, what do you think the major drawbacks of Python are?

11 comments… add one
  • Tom Oct 2, 2010 @ 14:52

    I think that the number one difficulty with using Python, especially for beginners, is getting to the stage where one has a fully functional Python installation with the necessary packages installed. There are many different ways to get there, some very easy, and some very hard, and beginners will often just try the first set of instructions they find (which could range from one-click or one-command installers to building Python from source). In the end, I think astronomers will sometimes get frustrated with the installation and give up, and forget that this should really be the job of system administrators.

    With a fully functional distribution of Python, I think that the challenge is then the overabundance of tutorials for beginners. There are many introductions to Python, some excellent, and some confusing, but finding them or deciding which to use is not necessarily straightforward.

  • Christopher Hanley Oct 2, 2010 @ 17:11

    I think that there is a strong push to make astrolib a central library / repository for Python code. People from STScI and Gemini are already contributing code. Number of different university faculty member are also starting to use it. You can find its Wiki/Trac page here:

    https://trac6.assembla.com/astrolib

  • PaulHancock Oct 2, 2010 @ 19:20

    I have been using python for a good number of years now and have recently seen a huge explosion in the number of astro related modules that I am able to use in python. Weather this is because they have only recently been made, or because I only recently became aware of them I can’t say. I am now at a point where any data reduction or analasis that needs to be done is prefaced with the question: is there a python module/interface for this. The answer is almost always yes. In fact I now use the python interpreter instead of a calculator, and it is the first tool that I pull out when looking at a new data file.

    One of the problems that I still have, though, is that there are many different ways of obtaining/installing the different modules. The simplest ones require only that you copy a file/folder to an accessable location and you are good to go. The more tricky ones require a setup/install stage which can sometimes break or interfere with other modules. The worst are the programs that require all sorts of os/sys specific files and non python libraries that take a lot of coaxing to work, and easily break. I tend to find that the more useful or complex modules are harder to install but are nearly always worth the effort – Matplotlib is a good example of this.

  • Michael Aye Oct 3, 2010 @ 7:01

    @Tom:
    Installation: At least for IDL the nightmare of installation is just the same, at least on a 64-bit Linux. I had to go through their feedback website to find the tricks that were missing for a 64-bit install.
    Python’s ‘easy_install’ at least enables the users that only have one Python interpreter at a time, to very quickly install most of relevant new modules.
    If you need several Python interpreters (the 64-bit mess on the Mac to name one reason), it is sad, that even something clever as ‘virtualenv’ does not work right out of the box, at least not for me. On Windows, it seems, virtualenv is able to provide you the safety of not compromising your existing Python setup when installing new modules (which to be fair, happens quite rare if you only need the standard science modules).
    If any beginner reads this: Try http://www.enthought.com, they provide the same full scientific python environment for Mac, Win and *nixes. I have no affiliation, just good experience with that package.
    And the best: If you are member of a university, you can get the full Enthought library for free. (‘Academic license’)
    Documentation/Tutorials: And for what scientific working environment it is NOT the case that there are many tutorials and you don’t know which one is good or bad? I find it strange to say that this is a problem of Python.
    For beginners of Python: Just read the tutorial of the Python inventor himself on python.org. It’s one of the best and get’s you running in 1-2 hours! I myself started to code GUIs in Python, 12 hours after I first started with it, using all infos I found on the web! It really is that simple.
    @Paul: At least for Mac, there is a good installer now for matplotlib. But if one used the Enthought distribution, it’s already included anyway.

  • Prasanth Oct 5, 2010 @ 0:08

    I agree with Tom, in comment #1 above, that a large part of the installation of python libraries is a job for the system administrator. This part of setting up a working installation of python for astronomy, i.e., the work to be done by a system administrator, is similar to setting up a C or Fortran development environment for scientific programming, where one has to install a large set of libraries such as ATLAS, GSL, pgplot etc., and system dependent libraries such as readline, JPEG libraries, GCC and others, in addition to the software needed by the particular research that a user is interested in. This is not surprising, since like C, python is a general purpose programming language.

    Instead of providing a central distribution of python libraries, which I understand is extremely hard because there are too many dependencies that are difficult to solve in a generic manner, we could provide a community resource, at astrobetter/astropython, aimed at system administrators and sufficiently motivated individuals who have administrator privileges on a computer, where instructions for setting up different operating systems are provided. Administrators can then adapt these for their specific needs. These instructions can come from those who have already set up python for their work. An example is the py4science article by Fernando Pérez.

    Regarding conflicting python libraries, I have found virtualenv and virtualenvwrapper, to be of great use. They allow a user to test different versions of python modules, without interfering with existing installations. One can even use these two tools to setup an alternative python environment that satisfies the requirements for a particular module, if access to that module is vital.

    Proliferation of tutorials is an issue that can be mitigated to a certain extent by a article or two, followed by a collection of comments and enhancements, at astrobetter, astropython and webpages of individual research groups, that describe a set of tutorials/books that people have found to be good. Professors and advisers can then point to these, as starting points for beginners in their research groups.

    An enhancement to these, or perhaps even an alternative to wikis and forums, will be a StackExchange like platform, that will provide a question-and-answer style resource to tackling issues raised in the article above. Perhaps the administrators of astrobetter and
    astropython can set up a knowledge exchange site in the same style of the various StackExchange sites. Solace is an open source platform, written in python, that can be used to create such a resource, without waiting for StackExchange to approve an astronomy Q&A site through their “Area 51” proposals. Cordino is another such platform. I am not an expert and I do not know the overhead/expenses involved, but if it is similar to that for maintaining astrobetter/astropython/AstroPy mailing list, then it could be a viable option. An example of such as system is http://ask.scipy.org.

    In short, resources such as astrobetter, astropython and AstroPy mailing list, are the answer to these issues. With active participation from the community a collective for solving these issues can be easily formed.

    Since I should practice what I preach, I have written by a small article available at http://oneau.wordpress.com/2010/10/02/python-for-astronomy/. Hopefully, this will complement the article, Py4Science, by Fernando Pérez mentioned above.

  • Matt Kenworthy Oct 5, 2010 @ 5:03

    All computer languages suck, but what really matters is the libraries.

    I’ve seen this cycle repeat several times now:

    1) Computer people get excited about new computer $LANG, and there’s an initial burst of activity. Shiny websites! Awesome documentation! Great tutorials!

    2) Other computer savvy astronomers start writing basic library packages, excited by the endless possibilities, and share some of them with close friends. See, it’s so easy to write in $LANG!

    3) Slower astronomers hear about $LANG, get interested and try it out. They get excited, thinking about all the problems they’re having with $OLD_LANG – the frustration, the obscure bugs, the way printing out strings seems to be hell on earth…..

    4) Computer astronomers get busy with the semester/proposal/paper/duties/significant other’s endless attention on new $LANG, and stop writing libraries/maintaining their modules. As $LANG works through later revisions, $LANG’s library modules start to break down.

    5) Astronomers now find the ‘edge cases’ where $LANG modules don’t work properly or in an expected fashion. They get frustrated and drop $LANG like a hot potato, or make the jump and attempt to rewrite $LANG libraries and pitch in to the community effort. Which is very, very hard to do right.

    6) $OLD_LANG=$LANG, and a new $LANG is assigned. Repeat until you move to industry or retirement.

    I’ve seen $LANG=python,ruby,pyraf and personally done $LANG=iraf,starlink,matlab,idl and settled on $LANG=pdl for a minimum of suckyness, but I just can’t bring myself to go through the agony of IDL again (or getting one of those ‘infinite licenses’ that are oh so naughty).

    IRAF may be appalling and old, but it’s libraries are the least sucky and most trusted, because so many astronomer’s eyeballs have found the edge cases and there (was) a paid scientific staff to go fix it. It required several FTE’s to do this.

    This is why astronomers usually stick with the old packages, and why take-up of these new languages run through the above cycle.

    At the end of the day, you either write all your analysis code from scratch yourself, or be an eloi and trust that someone else’s black box works as promised.

  • Marshall Oct 5, 2010 @ 9:54

    I agree that Python’s loose, organic nature (tons of tutorials, lots of competing/redundant/optional modules, etc) makes for a steeper learning curve, but I think that’s something that will settle out in time as the landscape of astronomy resources matures. I do think this would be helped by the development of a consistent set of “core astro” resources, equivalent to the Goddard IDL Astro library. That process is under way, but it’s clearly going to take a while to really come together.

    How best to install packages is a trickier issue, given that the much larger Python community can’t even sort that out itself. easy_install vs. pip vs. ports vs prebuilt packages like Enthought vs. just compile from source… I do think it’s a shame that many of the key astro packages aren’t available for install from easy_install and/or pip. That’s probably the most obvious low-hanging fruit for improving the setup experience.

    I wish I could agree with Tom and Prasanth that setup should be the job of systems administrators. But the reality is that there are plenty of departments out there with understaffed, overworked sysadmins with no Python experience at all. I would never have gotten through grad school if I’d had to rely on the very limited systems support we had, instead of just compiling and building my own environments with up-to-date packages. Yeah, sure, this is far from an optimal solution. But it’s the reality, so we should strive for an environment that can be built and installed successfully by a grad student working on his or her own.

  • Matt Kenworthy Oct 5, 2010 @ 14:32

    I’d written a longer comment, but the internet burped and it disappeared, probably for the better.

    My comment is that all computer languages suck to varying degrees, but what’s important is the set of libraries you use and their subsequent maintenance.

    IRAF is difficult to program in and very old, but what made it used by so many astronomers is the fact that its routines had been error checked by a large fraction of the astronomical community, and it had several FTEs at NOAO debugging it and maintaining it. The edge cases that typically make you pull your hair out in frustration were duly found, and either corrected or labeled off with suitable warnings.

    IDL has taken over this position by dint of being the least awful language and it being there and installed on clusters when IRAF didn’t cut it.

    The library problem is still there with python, ruby, matlab, labview, and any other language of the week. At some point you don’t have the time to rewrite a 2-D gaussian fitting routine and you trust someone else’s black box. I don’t see python being different in this regard.

  • Matt T Oct 6, 2010 @ 21:02

    For data analysis I use Python exclusively, and I also wrote a package for analysis of simulation data that is written in Python called yt which has since become a true community project. I agree with the other posters — Python “suffers” from a preponderance of packages as well as a barrier to entry in the form of installation requirements.

    The solution presented by virtualenv isn’t really right for scientific users (it was designed by and for the web deployment community, which has a different set of needs) but pip goes a long way toward making installation of pure python packages easy. Recent evidence of this is the ability to pip install mpi4py, petsc4py and slep4py — all very non-trivial packages with a number of dependencies, but which can be and are “pip installable” now.

    The users of my package have to deploy it wherever their simulations are run — this could be on TeraGrid machines, local clusters, desktops (particularly OSX) or even the various Linux boxes out there. We tried for a long time to make this fit nicely with the module system at supercomputer centers, with the packaging system on Ubuntu/RHEL/CentOS/Slackware (see Zed Shaw’s recent post or Marshall’s comment above for more information on why this is probably a mistake, and then consider that HPC centers are often even more conservative than distros in upgrading packages.) This never, ever worked. The OSX NumPy install is broken, the package names in Ubuntu changed with every release … and on and on. EPD was a non-starter on almost all systems, as well.

    So, I ended up writing an installation script that built from source a Python installation and the necessary bits to run the analysis software. This included most of the stack of software including zlib and bzlib, although not quite as pathologically complete as the SAGE or FEMhub software stacks. This works, and it doesn’t require recompilation when the old HDF5 module that was deployed gets changed during the global software stack update on the shared resource. It was overkill to do this, but it ensured a standardized software environment. We provided a binary installation for OSX, but it took more than it gave and it has been dropped.

    I think one thing that’s really getting missed here, though, is that unlike IDL, Python is a fully open source and inspectable system. Reproducible research requires an open pipeline; IDL does not provide this. Plus Python has better foreign function interfaces, web application support (the only mechanism we were able to standardize on for a GUI), test structures, stateful OO programming, and on and on. Furthermore, this doesn’t even address the fact that the real innovation is happening in the open source community — even within Python alone, packages like Theano, CorePy, PyOpenCL, Cython, SAGE and so on are being developed in the open and released to the public.

    Python might not be the End All Be All of languages, as Matt K above pointed out. But it has momentum, and it does a lot of things really quite well.

  • Prasanth Oct 7, 2010 @ 3:16

    I completely agree with Matt Kenworthy above, in that there are many “old” scripts and packages that have been well tested by astronomers and these should be used as much as possible. In fact, if one has a good working system for data analysis then there is no need to change it, just because something else has become fashionable. But on the other hand, if I can get access to this working system, without having to learn 2 or 3 programming interfaces, then that would be a real advantage. The use of python as a common scripting interface to various old and new packages is what I find to be the most compelling reason to learn python, more than its use as a new language for writing my own set of data analysis packages per se. If folks at STScI had used ruby to write their nice software, I would have learned that instead of python. I know next to nothing about how one language is better/worse than the other.

    I had to learn IRAF scripting, GIPSY scripting, with AIPS waiting in the wings, not to mention the several shell scripts, bash AND tcsh, when working with some radio and optical data and it was just too much work. It is not too much of an exaggeration to say that it was like having to work with vim, emacs and microsoft word, at the same time! If they had one language as a scripting interface, or at-least one main language aided with shell scripts and such, even if the core is completely different, it would have made things much easier. One of my friends complained that she wanted to do science and not just data reduction and moved to working on binary stars in clusters, where, apparently, she got to the science part faster.

    I haven’t used pyraf, but it is one example of using python to access existing code. Python interface to SLALIB written by Scott Ransom is another one. Python is used as scripting interfaces in the CIAO package from Chandra X-ray Observatory, and in the radio astronomy package GIPSY. I haven’t used these interfaces, but if I had to work on data that needed these, I can use my python knowledge to start using these packages, even-though I will still have to know many things about how features are implemented and what the different routines do.

    Openness, as mentioned by Matt T, and software being free are also important factors. I can’t use MOOG, the spectrum analysis package by Chris Sneden, as I currently don’t have access to SM! The same for IDL, though GDL allows me to run idlastro code and so at-least I can learn those programs.

    I mentioned administrators, since one needs administrator privileges to install many libraries and this is not always available on office computers. But, Marshall is correct in pointing out that a graduate student must be able to install libraries on his/her own. I recently asked one of my friends to try out a program that I had written, that used matplotlib. My friend was eager to help and installed matploltlib, but something caused python to complain that it can’t find the library. I send some suggestions; it has been several weeks and I haven’t received any replies. This person, by the way, is competent enough to win a NASA post-doc fellowship; so skill and motivation, are not the issues here.

    Sorry for the long post, but hopefully I have put forward some good reasons for supporting the work by folks at STScI and other places, in making it easier to use and install python.

  • Tom Oct 7, 2010 @ 16:07

    @Prasanth: “a graduate student must be able to install libraries on his/her own”. I completely agree with this, although it’s unfortunate that it is very far from the truth! (and hence my suggestion about system administrators). I am surprised that many astronomers, not just graduate students, can use multi-million dollar/euro telescopes, run complex simulations on large computing clusters, but will freeze up if they follow some instructions for installing a computer package and an error occurs. A large fraction generally will just give up, and say that the package ‘doesn’t work’, or find another scientist that can install it for them. This is especially unfortunate since in most cases, googling the error message will often lead to the answer after a bit of investigating! I never have the courage to point them to this, but this situation is what that website was built for 😉

Leave a Reply

Your email address will not be published. Required fields are marked *