Slim down your bloated graphics
This post was originally written by Erik Petigura who is a graduate student at UC Berkeley and visitor at the IfA, Hawaii working on exo-planet hunting. The post was modified by Jessica R. Lu to incorporate suggestions from a Facebook Astronomers Group thread on the same topic.
Have you ever encountered a paper or proposal PDF that was painfully slow to scroll? The culprit is often a plot with far too many points on it. Here’s an example created with matplotlib, panning and zooming through this plot is painful.
import pylab as py
arr = py.randn(100000, 2)
py.plot(arr[:,0], arr[:,1], 'o', alpha=0.1, rasterized=False)
# File size is 1.6 MB
One way around this problem is to rasterize these graphics. However, text and line art are also rasterized, which can look ugly unless the dots per inch (DPI) is very high. For example, the journal Science, wants line art with a resolution of 1200 dpi
. Using the command line tool, convert, we can save a rasterized version of a plot at any DPI, but file size tends to be large.
convert -density 1200 dots_vector.pdf dots_raster.png
# File size is 3.0 MB
My favorite solution is to use the rasterized keyword in the matplotlib plot function. Points are rasterized, but text and line art remain vector. Even at 400 dpi (publication quality), the rasterized file is half the size of the original vector file.
py.plot(arr[:,0], arr[:,1], 'o', alpha=0.1, rasterized=True)
# File size is 0.8 MB
Note the above use of <pre lang=”python”>alpha=0.1</pre>, which gives some means of visualizing the density of points. A suggested alternative is to stop plotting all of those points and instead using some other means of visualizing the density of points. Possible options include:
If you are stuck reading someone else’s PDF with a “too many vector graphics” problem (on OS X), you can go into Preview -> Preferences and turn off PDF smoothing. This often produces better scrolling behavior at the expense of readable text in some of the figures.