Recently, I migrated about 20 projects from SVN to Git. While this may seem like a straightforward job, especially with many migration guides available on the internet, the devil is really in the details. None of the guides that I found talks about the potentials gotchas in the migration process, which is why this article exists. In this article, I’ll focus on the pitfalls that can come back to bite you and omit some of the mundane details for brevity.
Atlassian has a multipage migration guide that you can refer to. The basic steps involved are:
Prepare the migration environment
The migration should be done on a case-sensitive file system to avoid corrupting the repository. If you’ve a Linux system available for the migration, skip the rest of this section. Atlassian has instructions for mounting a case-sensitive file system on Windows and Mac OS X (both file systems are case-insensitive), but it seemed more trouble than its worth, so I did the migration using a Debian Docker image I’d previously created for development purposes.
The image is published to my Docker Hub account, and comes with JDK 8, SVN and Git installed.Having created a container from the image, install
I no longer have the Debian image available publicly; you can use
asarkar/alpine-plusinstead and adapt the following commands for Alpine.
$ docker run -it abhijitsarkar/docker:debian-dev bash dev@b22ce7fa8250:~$ sudo apt-get update dev@b22ce7fa8250:~$ sudo apt-get install git-svn
Refer to the Docker official site if you need further details.
The login user is
dev, the password is the same.
sudoaccess to run as root.
The Alpine image runs as
If you’re not familiar with Docker, I recommend using a virtual machine for doing the migration. Oracle VirtualBox is free and so are most Linux distros, so take your pick.
Create an authors mapping file
This file maps the SVN usernames to Git committers. SVN stores only usernames while Git stores full names and emails. Thus, each entry in the mapping file will map a SVN username to a Git committer full name and email. Something like this:
username = Full Name <email>
How long this step will take depends on the format of the SVN username, the number of users, and the number of SVN repositories being migrated.
The authors file is simply used for converting the history and the users don’t actually need to exist in the Git system. Don’t be confused if there’re users that no longer work for your organization, keep those entries as usual.
The Atlassian guide assumes that the SVN repo you are extracting the author information from doesn’t require authentication, which almost always is false. Thus the command they’ve in the guide doesn’t work. Use the following command instead:
$ java -jar svn-migration-scripts.jar authors <svn-repo> \ <svn-username> <svn-password> > authors.txt
Clone the SVN repository
There are various important details that the Atlassian guide misses to mention regarding this phase.
Limiting the beginning and ending SVN revision:
tagscan be created inside any regular directory, you may have multiple directories in SVN with their own
tags. We did. SVN tracks all changes using a global unique revision number that is incremented for every commit. Thus if you make a commit on directory
dir1, it gets a unique revision number, and next time you commit something to
dir2, it gets a higher revision number.
tagsis probably going to become a Git repo. In order to avoid SVN beginning the history from the earliest available revision number, it’ll save you a lot of time, and unrelated history, if you just begin from the earliest available revision number for that particular SVN repo. You can find that number by using the following command:
$ svn log <svn-repo> | tailwhere
<svn-repo>is a URL to
Two other useful options are
--no-metadata. Refer to the
git-svnman page for details.
Thus, the command to convert a SVN repo with standard layout may be as follows:
$ git svn clone -r 12345:HEAD --username <svn-username> --no-minimize-url \ --stdlayout --authors-file=authors.txt \ <svn-repo>/<project> <git-repo-name>
12345is the earliest available revision number corresponding to the repo.
Dealing with non-standard layout:
If the repo doesn’t follow “standard layout”, meaning it doesn’t have the usual
tagsor has them but not all in lower case, then you need to tailor the command to the custom layout. Assuming the trunk is named as
TRUNK, and that you don’t care about the branches and tags, the
clonecommand will become as follows:
$ git svn clone -r 12345:HEAD --username <svn-username> --no-minimize-url \ --trunk=TRUNK/<project> --prefix='' -authors-file=authors.txt \ <svn-repo> <git-repo-name>
--prefix=''option. Without it, the Atlassian migration scripts fail to sync with SVN once the initial migration is done but the cutover isn’t. You can go crazy with the custom layout; refer to the
git-svnman page for other options.
Merging multiple SVN directories into one Git repo:
To do this, you’ll need to use a SVN URL that is one or more levels above the topmost directory that you want to merge, and use
--exclude-pathsfilters. Something like this:
$ git svn clone -r 12345:HEAD --username <svn-username> --no-minimize-url \ --stdlayout -authors-file=authors.txt --include-paths="dir1|dir2" \ <svn-repo> <git-repo-name>
Above I’ve done an OR filter that’ll include both
dir2. You can use include/exclude with non-standard layout as well.
Clean the Git repository
This basically consists of converting the weird remote references that
git svnsets up to local branches and tags. The Atlassian migration scripts do a decent job at this but make some assumptions that may not work for you. First, to check what remote references have been set up, run the following command:
$ git branch -a
For me, I only wanted to migrate the
trunkand thus simply deleted the remote references without converting them to local branches by using the following command:
$ rm -Rf .git/refs/remotes/origin
git branch -ato verify that the references are indeed gone. If they still linger, then they exist in the
.git/packed-refsfile. Simply edit the file and delete the unwanted lines. You can read all about
Push to the Git server
This consists of 2 steps, adding a Git remote and pushing to it.
$ git remote add origin git@my-git-server:myrepository.git $ git push origin --all $ git push origin --tags
SVN sync until cutover
The Atlassian guide talks about a 2-way sync, meaning people can work in SVN and Git, and your local migration repo can be synchronized with both. Unless you’re paid by the hour, I don’t recommend it. The best case scenario is a big bang cutover; if that’s not possible, you can have people continue committing to SVN and sync your local with it. The Atlassian migration scripts have a
sync-rebaseoption that can do this.
For non-standard layout, The Atlassian migration scripts only work if you’d used the
--prefix=''option during cloning as I’d described earlier.