Subversion to Git Migration
Recently, I migrated about 20 projects from SVN to Git. While this may seem like a straightforward job, especially with many migration guides available on the internet, the devil is really in the details. None of the guides that I found talks about the potentials gotchas in the migration process, which is why this article exists. In this article, I’ll focus on the pitfalls that can come back to bite you and omit some of the mundane details for brevity.
Atlassian has a multipage migration guide that you can refer to. The basic steps involved are:
-
Prepare the migration environment
The migration should be done on a case-sensitive file system to avoid corrupting the repository. If you’ve a Linux system available for the migration, skip the rest of this section. Atlassian has instructions for mounting a case-sensitive file system on Windows and Mac OS X (both file systems are case-insensitive), but it seemed more trouble than its worth, so I did the migration using a Debian Docker image I’d previously created for development purposes.
The image is published to my Docker Hub account, and comes with JDK 8, SVN and Git installed.Having created a container from the image, installgit-svn
.I no longer have the Debian image available publicly; you can use
asarkar/alpine-plus
instead and adapt the following commands for Alpine.$ docker run -it abhijitsarkar/docker:debian-dev bash dev@b22ce7fa8250:~$ sudo apt-get update dev@b22ce7fa8250:~$ sudo apt-get install git-svn
Refer to the Docker official site if you need further details.
The login user isdev
, the password is the same.dev
hassudo
access to run as root.The Alpine image runs as
root
.If you’re not familiar with Docker, I recommend using a virtual machine for doing the migration. Oracle VirtualBox is free and so are most Linux distros, so take your pick.
-
Create an authors mapping file
This file maps the SVN usernames to Git committers. SVN stores only usernames while Git stores full names and emails. Thus, each entry in the mapping file will map a SVN username to a Git committer full name and email. Something like this:
username = Full Name <email>
How long this step will take depends on the format of the SVN username, the number of users, and the number of SVN repositories being migrated.
The authors file is simply used for converting the history and the users don’t actually need to exist in the Git system. Don’t be confused if there’re users that no longer work for your organization, keep those entries as usual.
The Atlassian guide assumes that the SVN repo you are extracting the author information from doesn’t require authentication, which almost always is false. Thus the command they’ve in the guide doesn’t work. Use the following command instead:
$ java -jar svn-migration-scripts.jar authors <svn-repo> \ <svn-username> <svn-password> > authors.txt
-
Clone the SVN repository
There are various important details that the Atlassian guide misses to mention regarding this phase.
-
Limiting the beginning and ending SVN revision:
Because SVN
trunk
,branches
andtags
can be created inside any regular directory, you may have multiple directories in SVN with their owntrunk
,branches
andtags
. We did. SVN tracks all changes using a global unique revision number that is incremented for every commit. Thus if you make a commit on directorydir1
, it gets a unique revision number, and next time you commit something todir2
, it gets a higher revision number.During conversion,
dir1
with it’strunk
,branches
andtags
is probably going to become a Git repo. In order to avoid SVN beginning the history from the earliest available revision number, it’ll save you a lot of time, and unrelated history, if you just begin from the earliest available revision number for that particular SVN repo. You can find that number by using the following command:$ svn log <svn-repo> | tail
where<svn-repo>
is a URL todir1
.Two other useful options are
--no-minimize-url
and--no-metadata
. Refer to thegit-svn
man page for details.Thus, the command to convert a SVN repo with standard layout may be as follows:
$ git svn clone -r 12345:HEAD --username <svn-username> --no-minimize-url \ --stdlayout --authors-file=authors.txt \ <svn-repo>/<project> <git-repo-name>
where
12345
is the earliest available revision number corresponding to the repo. -
Dealing with non-standard layout:
If the repo doesn’t follow “standard layout”, meaning it doesn’t have the usual
trunk
,branches
andtags
or has them but not all in lower case, then you need to tailor the command to the custom layout. Assuming the trunk is named asTRUNK
, and that you don’t care about the branches and tags, theclone
command will become as follows:$ git svn clone -r 12345:HEAD --username <svn-username> --no-minimize-url \ --trunk=TRUNK/<project> --prefix='' -authors-file=authors.txt \ <svn-repo> <git-repo-name>
Note the
--prefix=''
option. Without it, the Atlassian migration scripts fail to sync with SVN once the initial migration is done but the cutover isn’t. You can go crazy with the custom layout; refer to thegit-svn
man page for other options. -
Merging multiple SVN directories into one Git repo:
To do this, you’ll need to use a SVN URL that is one or more levels above the topmost directory that you want to merge, and use
--include-paths
and/or--exclude-paths
filters. Something like this:$ git svn clone -r 12345:HEAD --username <svn-username> --no-minimize-url \ --stdlayout -authors-file=authors.txt --include-paths="dir1|dir2" \ <svn-repo> <git-repo-name>
Above I’ve done an OR filter that’ll include both
dir1
anddir2
. You can use include/exclude with non-standard layout as well.
-
-
Clean the Git repository
This basically consists of converting the weird remote references that
git svn
sets up to local branches and tags. The Atlassian migration scripts do a decent job at this but make some assumptions that may not work for you. First, to check what remote references have been set up, run the following command:$ git branch -a
For me, I only wanted to migrate the
trunk
and thus simply deleted the remote references without converting them to local branches by using the following command:$ rm -Rf .git/refs/remotes/origin
Run the
git branch -a
to verify that the references are indeed gone. If they still linger, then they exist in the.git/packed-refs
file. Simply edit the file and delete the unwanted lines. You can read all aboutpacked-refs
here. -
Push to the Git server
This consists of 2 steps, adding a Git remote and pushing to it.
$ git remote add origin git@my-git-server:myrepository.git $ git push origin --all $ git push origin --tags
-
SVN sync until cutover
The Atlassian guide talks about a 2-way sync, meaning people can work in SVN and Git, and your local migration repo can be synchronized with both. Unless you’re paid by the hour, I don’t recommend it. The best case scenario is a big bang cutover; if that’s not possible, you can have people continue committing to SVN and sync your local with it. The Atlassian migration scripts have a
sync-rebase
option that can do this.For non-standard layout, The Atlassian migration scripts only work if you’d used the
--prefix=''
option during cloning as I’d described earlier.
Leave a comment