Data Exploration with Glue

Chris Beaumont is a software engineer at Counsyl, and previously a software engineer at Harvard and the Space Telescope Science Institute. Glue began as a side project during Chris’ PhD thesis, and is now being developed to visualize data from the James Webb Space Telescope.

We’ve recently released version 0.4 of Glue, a Python-based GUI for visually exploring related datasets.

Glue is a package which allows users to inter-compare several related datasets — images, catalogs, image cubes, etc. Glue provides a graphical interface to make basic visualizations of each dataset, with the important feature that all plots are selectable; users can draw geometric regions on any plot to define subsets used to filter data. Importantly, these subsets can propagate across several datasets — so a user can trace a geometrical structure in an astrophysical image, and use that to filter points in a spatially overlapping catalog. These kinds of linked-view interactions make it much easier to discover multidimensional structures within and between datasets.

Also central to Glue’s philosophy is it’s hybrid nature. Most data exploration tools are primarily Graphical User Interfaces (ds9, Topcat, Aladin, filtergraph, etc.) or programmatic interfaces (Python, IRAF, IDL, etc). Glue sits somewhere between these two extremes — it provides a graphical way of exploring data without having to write code, but also provides several interesting hooks for integrating code with the GUI. Here are some examples of the ways Glue can be extended with user-written Python:

  • Users have access to the Python command line from within Glue, and can run arbitrary code using data loaded into Glue.
  • People familiar with Matplotlib (Python’s main data visualization library) can easily create custom interactive data viewers. Importantly, they can do so without writing code to deal with user interaction — they focus on visualizing data in a particular way, and Glue generates an appropriate user interface automatically.
  • It’s easy to write custom data loaders, to load file formats Glue doesn’t understand by default.
  • Users can write scripts to load data and configure standard plots, which eliminates repetition when exploring datasets several times over the course of a research project.

For more information about Glue, you can watch one of the demo videos, check out the GitHub source (where we welcome bug reports and contributions), or join the Google Group to ask more general questions. We hope you’ll find Glue useful for your own data exploration needs.


1 comment… add one

Leave a Reply

Your email address will not be published. Required fields are marked *