This is a guest post by Alex Cioc, who is a sophomore at Caltech, working on new data visualization techniques.
In the era of “Big Data” science, the real challenges come not from the sheer volume of data obtained but rather from the complexity of these data. For example, in a typical sky survey we measure tens or even hundreds of attributes for each detected source. The data represent vectors in a parameter space of that many dimensions. Visualizing such data is a highly non-trivial task. Yet, this visualization is key in analysis.
Ideally, we can solve this issue with our own intuitions, by utilizing the great pattern recognition engines in our brains – our inerrant abilities to spot trends, correlations, and outliers in a specific landscape. If we can create a bridge between the quantitative content of data and our understanding, we can form an effectively visualization engine. We already use 2-dimensional plots, and even 2-D projections of 3-D data representations. However, what do we do if the data space is more than 3 dimensions? Answering this question is the critical issue in “Big Data” science, astronomy included.
Our group at Caltech has been experimenting with the use of immersive virtual reality (VR) spaces as a data visualization platform. The initial efforts were performed under the auspices of the Meta-Institute for Computational Astrophysics (MICA; Farr et al. 2009, Djorgovski et al. 2013). Above and beyond any good 3-D data-plotting package, this form of visualization delivers user immersion. As humans, we are “optimized” to interact within a 3-D world. A handy platform for such 3-D exploration is virtual worlds, such as the Second Life™ and its open-source counterparts that use the Open Simulator (OpenSim) platform. Thus, a scientist can “walk” into their data, while interacting and collaborating with their colleagues in the same virtual space. Equally notable, such methods have a low barrier of entry – these virtual worlds offer free access and can be accessed through any mainstream desktop or laptop computer. Furthermore, they allow for the encoding of up to nine dimensions of a data space using the XYZ positions, RGB colors, transparency (alpha layer), size, and shape of 3-D data objects. Additional dimensions may also be encoded through animation (e.g., spinning or pulsation), textures, or other such methods.
Nevertheless, these virtual worlds have technical limitations and drawbacks, and, as a result of their game-like appearances, sometimes have subjective stigmas attached to them. With that in mind, we started to develop a standalone 3-D data browser that can be accessed either as a standalone program for Mac/ Windows/ Linux, or through any standard Web browser. We used the Unity3D™ game development engine because it is a very efficient, popular, commercial engine.
Our prototype data browser (Cioc et al. 2013) uses the same approach to encode multiple data dimensions, allows multiple forms of user control, and allows for the loading of local or external data sets. Cross-platform users, which are represented by small cubes, can interact within the same space and control what they see. On a mid-2011 Macbook Air, our application can render 100,000 data objects in about 15 seconds. The same computer can render over half a million 3-D data object, as is shown in the figure below.
As soon as we feel that it is ready and robust enough for public distribution, the data browser will be freely available. We hope that it will help adoption of immersive and collaborative 3-D and VR data visualization tools for “Big Data” science and that it can act as a step towards solving the big problem of “Big Data” science. Stay tuned!