Backup or Shut Up [Links]

March 31 is World Backup Day. Just like chocolate on Valentine’s, this Hallmark holiday calls for a links roundup:

Honestly, I have no sympathy for people who lose work due to not being backed up. Given the abundance of free and affordable options available, we should all be backed up pretty much continuously. I think manual backups are a thing of the past.

I bought a 10-seat CrashPlan Pro to keep the research-related bits of my group’s laptops backed up all the time, both on and off campus in case of major tragedy and I’ve been very pleased. (Does anybody know CrashPlan’s current education pricing structure?) Once set up, it runs in the background and, unlike Time Machine, I never notice it. Due to a mishap on my couch involving coffee, I’ve restored from CrashPlan to a spare computer while my laptop was in the shop and it was intuitive and seamless. I use Dropbox for accessing old versions of my day-to-day documents. I don’t stress about backing up Google (e.g., Gmail, Docs) or iCloud (e.g., iCal, Address Book) products; I consider them too big to fail.

What’s your backup strategy?

11 comments… add one
  • Anon Mar 28, 2012 @ 9:04

    While google may be unlikely to lose your data due to a random drive failure, there are other vulnerabilities (see article link below). If/when google does lose your data, they are so big, we are so small, we are not their ‘customer’ (advertisers are their customers, we are google’s ‘product’), that google isn’t going to be much help in getting our data back.

    I’m not saying don’t use google. I’m saying use google wisely and have a good backup plan. There are several programs out there that make backing up your google data easy and automatic. Google for them.

    http://www.theatlantic.com/magazine/archive/2011/10/hacked/8673/

  • Ian Crossfield Mar 28, 2012 @ 10:42

    I use an rsync-based Python script, which runs every 1-2 days.

  • Matt K Mar 28, 2012 @ 10:59

    I clone my work hard drive every month with SuperDuper! which has a slightly silly name but it works well.

    I have two complete cloned copies – I rotate them between work and home. If the work computer dies, I have an immediate working clone to hand. If that doesn’t work, I have one at home.

    I have a work laptop, and that is a complete synced version of all my work computer work directories, minus the large (usually very static) data directories. I use Unison file synchroniser to keep my computers completely in lock step. I never worry about wondering which version of a document on my work computer or on my laptop – a simple sync command means that there are two *identical* copies with one acting as backup at all times.

    That’s been quite a liberating step, because the amount of time I spent trying to keep two separate work environments managed was, well, a waste of time. If my laptop breaks, I’ve only lost back to the last sync call. I get a new laptop (eventually!) and I do a big pull over from my work desktop.

    So, there are two copies of my work data valid for a month, and my current work and documents has an additional backup in the form of my laptop. It seems to work well so far.

  • ALK Mar 28, 2012 @ 15:44

    My backup strategy has two main components:

    1) Active papers, proposals, and analysis sit in Dropbox, where they’re shared between work, home, laptop, and mobile devices. In order to preserve space, I move items out to long-term storage folders on my work desktop as they’re finished.

    2) I back up my entire work desktop monthly using Time Machine and an external USB drive. I then store this drive at home in between backups, providing off-site redundancy.

    I typically mirror any significant analysis work from my laptop over to my desktop pretty quickly, so I haven’t adopted a separate backup scheme for my laptop. If it dies, I’m comfortable with starting clean on a fresh install.

  • Tom Mar 29, 2012 @ 7:22

    I use Dropbox for papers, documents, code, etc., and critical code gets pushed to GitHub. I also have CrashPlan (which optionally allows you to encrypt data client-side). Finally, my institute provides a command-line utility that runs as a cron job and can backup terabytes off-site. So critical code is actually backed up 4 times! I still managed to lose some data last week which was on a non-backed up drive, but I know when I use that disk that it’s not backed up, so I don’t put anything critical on it (it’s kind of a scratch disk).

    Also, I just thought I should mention that RAID is not backup. The large disk I lost above was a RAID 5, so you’d think it’d be resilient to data loss, but what actually happened was data corruption – the disks themselves are fine. So don’t think that you don’t need to back up a RAID array.

    @ALK: your backup strategy for your desktop is dangerous, and not really off-site. It’s only off-site between backups, but of course plenty could go wrong when you actually do the backup, as you’re moving cables and devices around, etc. If you want *true* off-site backup, then you should have *two* USB hard drives, and always make sure one of them is off-site.

    More generally, a single backup is not enough. If you lose your backup drive, or it breaks, etc., then suddenly you only have a single copy of the data until you buy a replacement. Once you do have the replacement, you will be putting a lot of stress on the desktop/laptop’s hard drive to back up everything from scratch, and of course that is the perfect time for a hard drive to fail!

  • C Mar 29, 2012 @ 7:23

    I use Time Machine for everyday, and Carbon Copy Cloner to get an exact (bootable) drive copy whenever I feel like it (although this can be scheduled in CCC). I’ve never noticed Time Machine running, but I understand that for people with millions of files constituting several terabytes of data on their drives that it can slow things down. But Time Machine’s much better as a selective backup anyway – deselect system files, your copy of the entire SDSS-III dataset, etc., and use it for personal files. Use CCC or similar for a complete, bootable backup (I tested this when I had my iMac HD replaced – worked fine, even booting from an external USB HD). Time Machine also seamlessly backs up anything in Mail (I’ve recovered a few emails this way).

    Very important things also periodically synched with a USB stick that I carry with me. Someday I’ll trust the cloud, perhaps, but not yet 🙂

  • Andy Robertson Mar 29, 2012 @ 7:59

    FULL DISCLAIMER – I work for SocialSafe.

    SocialSafe allows you to backup your social media accounts to your own computer, presenting your content in a searchable offline digital journal. Currently supporting Facebook (profiles and pages), Twitter, Google+, Viadeo and LinkedIn, you can backup as little or as much of your content as you please, and view it all in either diary or file form.

    CSV export and search are other functions that our users find to be very useful, and scheduled backups mean you don’t have to worry about remembering to save you online social life.

    You can download a free trial at http://socialsafe.net

    Any questions feel free to drop me a line.

    All the best,

    Andy

  • ALK Mar 29, 2012 @ 14:05

    Tom – you have a good point, but I think the 2 hours per month that the drive is hooked up to its dedicated set of cables are an acceptable window of vulnerability. The entire rest of the time, it’s sitting on a shelf at home.

    Hard drive failures happen to me at a rate of ~1/4 per year (across several machines with many disks), so I was only looking for a 99% solution, to push that out to ~1/400 per year. Very little of what I have outside Dropbox is critical enough to be worth trying for more 9s on the safety margin, especially since almost all of my data is archived at the observatories anyway. I guess maybe I should go ahead and mirror /oldpapers and /oldproposals across multiple local drives, just because they’re small enough to fit pretty much anywhere with zero impact.

    (Of course, a lot of the motivation for off-site solutions isn’t related to equipment failures anyway. Earthquakes, fires, theft, etc.)

  • Marshall Perrin Mar 30, 2012 @ 9:42

    Multiple layers: First is Time Machine backups daily for both my desktop and laptop, with an additional off-site drive (at my parents’ house, onto which I time machine my laptop every couple months). Then most of my current working projects are on Dropbox, too. My personal version control repository (which hosts both code and papers) is hosted on yet another machine, plus various other hosted VCSes for different projects. And then also much of what I’m working on is on STScI’s central network attached storage, which is Somebody Else’s Problem to back up.

    My weakest links are probably the offsite backups (could be more often, but now that I no longer live in earthquake land I’ve not been motivated to look into cloud backup yet), and also I’ve got a 2 TB external drive at home that I use for videos & photos etc (really important personal stuff!). That one I clone onto another 2 TB drive every month or so, with that drive living in my fireproof lockbox at home along with legal documents and so on. That could probably be done better. On the other hand, that drive is turned off 90% of the time and the backup 99%; does anyone know statistics on how much limited time powered on does or does not decrease device failure rates?

    Years ago, I suffered a hard drive failure on my laptop in the midst of writing my thesis. It was a pretty nice feeling to not stress about that since I knew I’d backed up the day before.

  • Evgenya Apr 3, 2012 @ 19:21

    Thanks Kelle for the post. You’ve inspired me (or scared me) into getting CrashPlan Pro, in addition to my daily Time Machine backup system.

  • Kelle Apr 5, 2012 @ 10:20

    CrashPlan PROe is having a webinar on April 12 for Higher-Ed users. Before I signed up, I attended one of their webinars and found it to be very well run and informative. If you are considering implementing something a bit bigger for you or your group, I recommend considering attending. (Hm, might make sure it’s targeted at individual PIs rather than the entire institution):

    Space is limited.
    Reserve your Webinar seat now at:
    https://www1.gotomeeting.com/register/486887593
    Please join this webinar to learn more about how CrashPlan PROe can assist the colleges and universities with their unique backup and recovery challenges. We will provide an overview of the CrashPlan PROe system and other topics to include:

    – How CrashPlan PROe can resolve issues pertaining to the unique nature of the higher education user community
    – The security needs of handling grant-funded data and regulations set on government research projects
    – How other colleges and universities have applied the technology and have allocated the costs
    Title: Solutions for Higher Education
    Date: Thursday, April 12, 2012
    Time: 11:30 AM – 12:30 PM CDT

Leave a Reply

Your email address will not be published. Required fields are marked *