Version Control Demystified, Part 2: A Subversion Primer

Introduction

A couple of weeks ago, I outlined how version control, which I defined as the management of changes to documents, code, or information is both something you should care about, and something that you may already have been using or can easily use with Time Machine, Dropbox, Pages.app, etc.

One of the areas where version control is extremely useful is when writing code. If you think that it is only useful for large collaborations working on complex codes, think again. If you’ve ever found yourself with this kind of directory ‘structure’:

script.py
script_final.py
script_final_20nov.py
script_new.py
script_new2.py
script_newest.py
script_old.py

then version control is your friend! As well as allowing you to keep a history of the code as it evolves, version control makes it easier to find when bugs were introduced, experiment around with a script knowing that you can revert it to the previous version, and it allows you to either do this as a single user, or in a collaboration.

This week, I will talk about a type of version control which follows a client-server approach. This means that a single server is used to store the code, and different users can ‘check out’ a version onto their own computers. In this post, I will specifically give a brief introduction to Subversion (also known as svn). Next week, I will cover distributed version control systems, which operate differently, with some advantages and drawbacks.

Before you start worrying about setting up an svn server, there are two important things to know: your computer can act as both a client and a server, and svn works well over ssh, so the ‘server’ can just be a directory you own on on another computer – essentially, you don’t need to worry about bugging your system administrator, setting up servers, root access, etc. To check that you have subversion installed, type:

$ svn
Type 'svn help' for usage.

if instead you get svn: command not found then you need to install subversion (see here for more details). However, svn is included by default on many systems.

Quick start

We are first going to need to create a repository. This is a directory which will contain the complete history for one or more projects. To create a repository, go to the directory where you want to create the repository and type:

$ svnadmin create my_repository  # creates a repository with name `my_repository'

If you look inside you will see that some kind of directory structure has already been set up:

$ ls my_repository/
README.txt conf       db         format     hooks      locks

but you should never have to go into that directory (in fact you probably shouldn’t unless you know what you are doing!). Now, let’s create a project directory and create a file with “Hello World” written in it:

$ mkdir my_project
$ cd my_project/
$ echo "Hello, world" > README
$ ls
README
$ cat README 
Hello, world
$ cd ..

Import this project into the subversion repository:

$ svn import my_project file:///PATH/my_repository/my_project -m "Initial import"
Adding         my_project/README
Committed revision 1.

You will need to replace PATH with the absolute path to the directory containing the repository. The “Initial import” is a commit message, which is the description of this specific version of the project. Now that the project has been imported into the repository, we can check it out (which means requesting a work copy) by doing:

$ svn checkout file:///PATH/my_repository/my_project project_work
A    project_work/README
Checked out revision 1.

Note that the original directory you imported into the repository is not a work copy, and can eventually be deleted. We can now go into the work directory and add some text to the existing file:

$ cd project_work
$ echo "How are you?" >> README 
$ cat README 
Hello, world
How are you?

You can then find out what the status of the files in the directory is:

$ svn status
M       README

which means that README has been modified. You can view the modifications with:

$ svn diff
Index: README
===================================================================
--- README	(revision 1)
+++ README	(working copy)
@@ -1 +1,2 @@
 Hello, world
+How are you?

Now, commit the changes:

svn commit -m "Added a new line in README"
Sending        README
Transmitting file data .
Committed revision 2.

If you make a change that you don’t like, you can just revert to the latest version of a specific file:

$ echo "Blarg" >> README 
$ cat README 
Hello, world
How are you?
Blarg
$ svn status
M       README  # this file has been modified
$ svn diff
Index: README
===================================================================
--- README	(revision 2)
+++ README	(working copy)
@@ -1,2 +1,3 @@
 Hello, world
 How are you?
+Blarg
$ svn revert README
Reverted 'README'  # changes present in README have been removed
$ cat README
Hello, world
How are you?
$ svn status
$  # README no longer displays any changes

Adding files to the repository is very easy:

$ echo "import antigravity" > test.py
$ svn status
?       test.py
$ svn add test.py
A       test.py
$ svn status
A       test.py
$ svn commit -m "Added script"
Adding         test.py
Transmitting file data .
Committed revision 3.

And we can now view our hard work with:

$ svn update
At revision 3.
$ svn log
------------------------------------------------------------------------
r3 | tom | 2010-11-21 18:54:22 -0500 (Sun, 21 Nov 2010) | 1 line
 
Added script
------------------------------------------------------------------------
r2 | tom | 2010-11-21 18:50:04 -0500 (Sun, 21 Nov 2010) | 1 line
 
Added a new line in README
------------------------------------------------------------------------
r1 | tom | 2010-11-21 18:47:26 -0500 (Sun, 21 Nov 2010) | 1 line
 
Initial import
------------------------------------------------------------------------

If you want to work with a repository on a different computer, then you can use svn+ssh://username@host/PATH instead of file:///PATH, but otherwise everything remains the same.

I’ve only described about 1% of what svn can do, so if you are interested in learning about it more, you can read over the very good and free Version Control with Subversion online book!

Graphical Interfaces

It’s interesting to play around with the above commands to start with to understand how svn works, but you can actually make your life a lot easier by using a graphical interface (GUI) on a day-to-day basis, such as Cornerstone ($59) or Versions (35 euros). There are many other GUIs out there to choose from – if you know any good ones, feel free to suggest them in the comments!

Hosting repositories

While you can use any local computer to act as an svn server, in some cases you might want to use online repositories to collaborate with people at other institutions. There are many services that provide free and not-so-free repository hosting solutions. In general, free hosting solutions usually mean (with some exceptions) that your code is public, so if you have top-secret code that you can’t share, you might need to go for a non-free solution. Examples of hosting providers include:

Downsides

While a client-server model can work well when working with multiple users, by ensuring that there is only one ‘true’ history of the code on the server, having a separate server and client can be a bit of a pain, and also a bit overkill. For example, if you move your files to a new computer, you need to remember to move both the working copies and the repositories, and if the absolute path to the repositories changes, you may need to hack a bit to get the working copies to see them again. Git and Mercurial, which I will talk about next week, get around this problem by making each working copy a full repository. Stay tuned for more information in my next post!

If you have any questions about Subversion or are having difficulties in using it for real-life cases, please post a comment! Similarly, if you have been successful in using Subversion, then please share your experiences with others!

18 comments… add one
  • John Nov 29, 2010 @ 9:19

    Subversion isn’t going anywhere, because lots of systems have huge piles of legacy baggage. Hence, it’s probably worth learning something about it.

    However, if you’re starting version control from scratch, please consider waiting for the next article which will discuss more modern systems! The whole client-server approach taken by Subversion is not just old-fashioned, but it’s introducing a lot of complexity which beginners ought not to worry about. Juggling working copies, repositories, servers, etc shouldn’t be necessary in this day and age: to get started with a modern system (git, Mercurial, etc), you just need to work in one directory and can worry about the esoteric details (by which I mean anything beyond init, add and commit!) once you’re comfortable with that.

  • Adam Ginsburg Nov 29, 2010 @ 12:35

    On the GUI and hosting front: I use google code (code.google.com) to host my repositories. It has very nice source-code browsing features (including syntax highlighting, but not for IDL), allows you to build a wiki frontpage, and works with both svn and mercurial.

    I’m curious to hear more about git and mercurial. I still can’t understand git despite working with it on two projects; I use mercurial but more often than not I wish it would just behave like svn. Do you have any comments to offer on bazaar and/or other less-used systems?

  • Tom Nov 29, 2010 @ 14:51

    While it’s true that ‘modern’ systems such as Git and Mercurial make it a lot easier to set up version control and working with branches and tags (which I haven’t mentioned here), I think that the concept of developing with multiple users is actually easier to understand with svn for complete beginners. The idea of a ‘server’ containing the ‘true’ version of the code, and users checking out ‘working copies’ is in some ways easier to understand than multiple users working on different clones of a repository and having to merge the code history further down the road. I personally had much less trouble learning svn than git/mercurial, and I think that ultimately it’s actually worth learning both types of version control and see what works best for a given user. I actually use both git and svn on a daily basis for slightly different types of projects.

    Another small point is that the existing svn GUIs are (for the moment) much nicer than the equivalent git/mercurial GUIs. I’ve been using Cornerstone for the last two years, and it’s been one of the most useful apps on my Mac!

  • Ben Nov 29, 2010 @ 18:11

    A comment from someone who only uses version control on a minority of projects, so it may be off base: If you’re going to move files to a new computer, I don’t understand why you should move the working directories. It seems cleaner to check the code into the repository, move the repository, and then check the code out on the new computer.

  • Tom Nov 29, 2010 @ 18:16

    @Ben: Absolutely!

    One thing that I did not mention in the above post, is that you can keep your repository and working copies in Dropbox! This way you can use them on multiple computers and this also gets around the problem of migration entirely. Before I used Dropbox, I had separate working copies on my desktop and laptop, which was a pain because I had to finish implementing something on the desktop before coding on the same project on the laptop (or had to rsync the uncommitted changes). Now I can just work on the same working copy on any computer and commit the changes once I’m ready.

  • John Nov 30, 2010 @ 5:25

    I’m no Dropbox expert, but… what does it do in terms of locking and so on? What if you share your repository using Dropbox, and then commit different things to different copies of the repository before Dropbox can sync everything up? Naively, I’d assume that chaos will ensue.

  • John Nov 30, 2010 @ 5:36

    I think worrying about multiple users, branches, tags, servers, repositories, … is all being very complicated.

    Start with an empty directory: “git init”. Add some files: “git add”. Commit your changes: “git commit”. Congratulations, you have version control!

    Sure, using remote tracking branches to pull in other people’s changes, merge history, rebase, whatever, are potentially complex procedures, but all of them are optional, incremental changes on top of a really simple model which lets people get started with no need to go about setting up servers and suchlike.

  • Tom Nov 30, 2010 @ 7:59

    I agree that placing a repository in Dropbox *may* not always be a good idea, but I have about 20 svn working copies and git repositories in Dropbox for a year and have never run into any issues. So if you are unsure, place the repository outside Dropbox, and the working copies in Dropbox.

    I also completely agree that for single users, git is ideal, and in fact, that is going to be the point of my next post! 🙂 But multiple users isn’t a scenario that is *that* unlikely in science, and in that scenario, some users (including myself at this time) will prefer svn, hence why I think it’s useful to learn both.

  • Matt Nov 30, 2010 @ 16:58

    I think the only issue with using repositories inside DropBox is how it handles hardlinks. Mercurial uses hardlinks for cloned repositories (not sure about git.)

    For hosting solutions for DVCS (specifically mercurial) bitbucket.org just made most of their hosting free. It’s also trivial to set up something like hgweb on most shared hosting providers or any cgi-bin capable webserver. More advanced solutions like RhodeCode can also be set up on shared hosting providers, but they won’t benefit from things like backend message queues, which are something of a value add. Plus, mercurial’s built in “serve” command is amazing — it spawns a self-contained web server which provides graph view, changeset/changelog view, file access, and push/pull/clone access to the repository.

    For the Enzo simulation code we found that the Subversion concept of a single shared state — which is a very valid one for many environments — was not really the right model for development of a shared code base. The same was true for the yt project. Both have since moved from SVN to Mercurial and it has been exceptionally beneficial. As the person who advocated for Mercurial over SVN, as well as implemented the transition, I am comfortable saying the transition would have been substantially harder to move to git. Mercurial’s kitchen-sink approach, the awesome branching support, the ability to write extensions and most of all its much shallower learning curve has been extremely beneficial. It’s also been much easier to install on various supercomputer centers than git, largely because of issues with expat and libcurl, which surprised me. We now include mercurial with our installation scripts, and the built-in ability to call the API has eased providing auto-upgrade, auto-versioning and access to remote repositories of cookbook and canned analysis scripts. It’s been an extremely positive transition.

  • Adam Ginsburg Nov 30, 2010 @ 17:40

    Matt and Tom –
    I tried cloning a mercurial repository onto my dropbox and ran into problems right away. First off, my repository only contains ~100 MB of material, but the clone somehow pulled over untracked files – I think that should not have happened, but it did and I’m still investigating it. Then, after the clone, hg status turns up “?” in front of every file. So, overall, it looks like the clone failed.

    There are a lot of posts on the web about hosting a repository on dropbox, but since I want to use it the same way I think Tom does (avoiding scp/rsync between laptop and host machine), none of those appear to be a solution.

    svn checkout on dropbox worked fine, though.

  • Matt Nov 30, 2010 @ 17:46

    Hi Adam,

    One thing about distributed version control systems is that unless you specifically clone only a shallow portion of them, they contain the entire state of the repository all the way back to the very first revision. This is in stark contrast to SVN, and it’s essentially the source of their large quantity of awesome. When you clone a repository, you are peered with the original source; there’s no additional information. This makes them larger (see the discussion on ESR’s blog about reposurgeon for a historical perspective, with the SCCS and CVS conservative approach discussed) but it also means you can update to and compare against any previous state in the repository.

    Did you clone from an external host, or from someplace that appears to exist on the same file system? If it’s from the same file system, hg probably tried to do hardlinks, which may not be the right operation. You may try “hg clone –pull /path/to/other/repo” which will avoid this.

    I suppose one question I have, with using dropbox as an additional layer, is why would one want to do this? I use dropbox, and dropbox has its own revision control, but I fail to see what dropbox provides that using an external host like bitbucket, google code, github and so on do for DVCS repositories. You could use private branching or explicit rather than implicit pushing for instance; what advantage does having dropbox manage this provide?

  • Adam Ginsburg Nov 30, 2010 @ 17:56

    Thanks Matt. How do you do a shallow clone? Also, thanks for the –pull tip; it is on the same filesystem and probably did try hardlinks.

    Why dropbox? I have found myself in a situation where I have code + data on my home machine and a copy of the repository on my laptop. I edit the code on my laptop, then in some way have to copy the code to my home machine before running it. One way to do this is commit/push/pull/update on the home machine, but that generates a lot of overhead commands and also requires a lot of (probably redundant) log comments. Alternately, I can rsync. But that’s still an extra command every time I want to run a piece of code. My coding style requires a lot of interactive debugging, so this gets to be annoying. My hope is that I could just edit code on dropbox on my local machine and immediately run it on the remote machine.

    I don’t like editing code directly on the remote machine because I usually run into intolerable lag because I’m transmitting data while editing.

  • Matt Nov 30, 2010 @ 18:04

    Adam — shallow clones are actually kind of tricky, and something I’m not extremely familiar with. You might also try something like exporting (hg export) then when you want to commit, rsync back to a repository on a location without the same suepr-tight space constraints. As for your explanation about dropbox, that makes sense to me. I think there are some technical solutions (of varying degrees of annoyance) that could get around this, but really, pragmatism would suggest you’ve hit on the best one for you!

  • Tom Nov 30, 2010 @ 18:15

    Matt – Thanks for all the great advice on Mercurial! 🙂

    Adam – In addition to svn working copies, I have all my git repositories in Dropbox (which I push to github from time to time). My reason for having the repositories in Dropbox is for the exact same reasons as you, to edit/run code on different machines seamlessly, and only committing code when I actually need to, and I haven’t run into any issues so far (several months).

  • Tiago Nov 30, 2010 @ 20:59

    I’ve got to say, I am no expert in Version Control. Actually, I have started using it after reading this post. But, after googling around a way to connect a svn server with Xcode on Mac I am pretty happy with the result. If you are not a hard user as I am and if you use Xcode, you should probably give it a try…

    Thanks for the post! Very useful!

  • John Dec 1, 2010 @ 8:22

    Re keeping repositories in Dropbox: please be aware that it really is a risk! Subversion doesn’t have any concept of merging separate repositories, so, if for whatever reason (network glitches, etc) you have multiple copies of the repository which diverge, you really are stuffed. How likely that is in your particular usage pattern is another question: some people might get away with it, others would potentially be less lucky. Personally, if it were my data, I wouldn’t take the risk!

  • Tom Dec 1, 2010 @ 8:51

    John – I agree that keeping svn repositories in Dropbox isn’t a good idea, and I retract that suggestion. What I really meant to say is that keeping svn working copies in Dropbox is fine, and in that case if the worst that can happen is that you have to check out a new copy if things go wrong. In my particular case, I am not worried about the git repositories in Dropbox, because I regularly push my repositories to github, and they get backed up by TimeMachine and Dropbox, so if things go wrong, I can always get things back the way they were.

  • Cameron Dec 8, 2010 @ 9:13

    hey tom, good article –any discussion of version control for astronomy is a Good Thing in my book. I’m looking forward to seeing your article covering GIT.

    re: your comments on the distributed end — I don’t agree with you that SVN is easier. In fact, I think using it as a start just reflects our paths (that we learned something like SVN/CVS first, so use it as a comparison).

    I’ve been a GIT user for .. over 4 years now. One of the things I found very natural and useful (once I learned how!) was micro commits (edit and commit quickly), which I found really difficult with the CVS/SVN approach. This lead to better use of branching on my part, and I felt much more free. In addition, the quick initialization of repos in GIT made it much more likely I would create and update code and papers. The huge number of utilities available in GIT makes it possible to restructure the repo later if I decide that is necessary. GIT might be somewhat complex, but the basics can really be shored up to be as simple as SVN. Finally, I want to point out that GIT is becoming more available at computing centers. However, if necessary — I’ve used the CVS interface to a GIT archive very successfully.

    Anyhow, I’m just a user (contrary to the above sales pitch). Going back to my original point: I want to stress my feelings that introducing GIT (or mercurial?) might have merit without the SVN first reference.

Leave a Reply

Your email address will not be published. Required fields are marked *