Subversion to Git Migration

Recently, I migrated about 20 projects from SVN to Git. While this may seem like a straightforward job, especially with many migration guides available on the internet, the devil is really in the details. None of the guides that I found talks about the potentials gotchas in the migration process, which is why this article exists. In this article, I’ll focus on the pitfalls that can come back to bite you and omit some of the mundane details for brevity.

Atlassian has a multipage migration guide that you can refer to. The basic steps involved are:

  1. Prepare the migration environment

    The migration should be done on a case-sensitive file system to avoid corrupting the repository. If you’ve a Linux system available for the migration, skip the rest of this section. Atlassian has instructions for mounting a case-sensitive file system on Windows and Mac OS X (both file systems are case-insensitive), but it seemed more trouble than its worth, so I did the migration using a Debian Docker image I’d previously created for development purposes. The image is published to my Docker Hub account, and comes with JDK 8, SVN and Git installed. Having created a container from the image, install git-svn.

    I no longer have the Debian image available publicly; you can use asarkar/alpine-plus instead and adapt the following commands for Alpine.

    $ docker run -it abhijitsarkar/docker:debian-dev bash
    dev@b22ce7fa8250:~$ sudo apt-get update
    dev@b22ce7fa8250:~$ sudo apt-get install git-svn
    

    Refer to the Docker official site if you need further details.

    The login user is dev, the password is the same. dev has sudo access to run as root.

    The Alpine image runs as root.

    If you’re not familiar with Docker, I recommend using a virtual machine for doing the migration. Oracle VirtualBox is free and so are most Linux distros, so take your pick.

  2. Create an authors mapping file

    This file maps the SVN usernames to Git committers. SVN stores only usernames while Git stores full names and emails. Thus, each entry in the mapping file will map a SVN username to a Git committer full name and email. Something like this:

    username = Full Name <email>

    How long this step will take depends on the format of the SVN username, the number of users, and the number of SVN repositories being migrated.

    The authors file is simply used for converting the history and the users don’t actually need to exist in the Git system. Don’t be confused if there’re users that no longer work for your organization, keep those entries as usual.

    The Atlassian guide assumes that the SVN repo you are extracting the author information from doesn’t require authentication, which almost always is false. Thus the command they’ve in the guide doesn’t work. Use the following command instead:

    $ java -jar svn-migration-scripts.jar authors <svn-repo> \
    <svn-username> <svn-password> > authors.txt
    
  3. Clone the SVN repository

    There are various important details that the Atlassian guide misses to mention regarding this phase.

    • Limiting the beginning and ending SVN revision:

      Because SVN trunk, branches and tags can be created inside any regular directory, you may have multiple directories in SVN with their own trunk, branches and tags. We did. SVN tracks all changes using a global unique revision number that is incremented for every commit. Thus if you make a commit on directory dir1, it gets a unique revision number, and next time you commit something to dir2, it gets a higher revision number.

      During conversion, dir1 with it’s trunk, branches and tags is probably going to become a Git repo. In order to avoid SVN beginning the history from the earliest available revision number, it’ll save you a lot of time, and unrelated history, if you just begin from the earliest available revision number for that particular SVN repo. You can find that number by using the following command:

      $ svn log <svn-repo> | tail where <svn-repo> is a URL to dir1.

      Two other useful options are --no-minimize-url and --no-metadata. Refer to the git-svn man page for details.

      Thus, the command to convert a SVN repo with standard layout may be as follows:

      $ git svn clone -r 12345:HEAD --username <svn-username> --no-minimize-url \
      --stdlayout --authors-file=authors.txt \
      <svn-repo>/<project> <git-repo-name>
      

      where 12345 is the earliest available revision number corresponding to the repo.

    • Dealing with non-standard layout:

      If the repo doesn’t follow “standard layout”, meaning it doesn’t have the usual trunk, branches and tags or has them but not all in lower case, then you need to tailor the command to the custom layout. Assuming the trunk is named as TRUNK, and that you don’t care about the branches and tags, the clone command will become as follows:

      $ git svn clone -r 12345:HEAD --username <svn-username> --no-minimize-url \
      --trunk=TRUNK/<project> --prefix='' -authors-file=authors.txt \
      <svn-repo> <git-repo-name>
      

      Note the --prefix='' option. Without it, the Atlassian migration scripts fail to sync with SVN once the initial migration is done but the cutover isn’t. You can go crazy with the custom layout; refer to the git-svn man page for other options.

    • Merging multiple SVN directories into one Git repo:

      To do this, you’ll need to use a SVN URL that is one or more levels above the topmost directory that you want to merge, and use --include-paths and/or --exclude-paths filters. Something like this:

      $ git svn clone -r 12345:HEAD --username <svn-username> --no-minimize-url \
      --stdlayout -authors-file=authors.txt --include-paths="dir1|dir2" \
      <svn-repo> <git-repo-name>
      

      Above I’ve done an OR filter that’ll include both dir1 and dir2. You can use include/exclude with non-standard layout as well.

  4. Clean the Git repository

    This basically consists of converting the weird remote references that git svn sets up to local branches and tags. The Atlassian migration scripts do a decent job at this but make some assumptions that may not work for you. First, to check what remote references have been set up, run the following command:

    $ git branch -a

    For me, I only wanted to migrate the trunk and thus simply deleted the remote references without converting them to local branches by using the following command:

    $ rm -Rf .git/refs/remotes/origin

    Run the git branch -a to verify that the references are indeed gone. If they still linger, then they exist in the .git/packed-refs file. Simply edit the file and delete the unwanted lines. You can read all about packed-refs here.

  5. Push to the Git server

    This consists of 2 steps, adding a Git remote and pushing to it.

    $ git remote add origin git@my-git-server:myrepository.git
    $ git push origin --all
    $ git push origin --tags
    
  6. SVN sync until cutover

    The Atlassian guide talks about a 2-way sync, meaning people can work in SVN and Git, and your local migration repo can be synchronized with both. Unless you’re paid by the hour, I don’t recommend it. The best case scenario is a big bang cutover; if that’s not possible, you can have people continue committing to SVN and sync your local with it. The Atlassian migration scripts have a sync-rebase option that can do this.

    For non-standard layout, The Atlassian migration scripts only work if you’d used the --prefix='' option during cloning as I’d described earlier.

Comments