Friday, December 21, 2012

Unconventional to Git

I recently started working with a team which had an unconventional (to me, anyway) method of source control.   They created a subversion repository on a file share and used a file:// url with their tool of choice.  Now, I realize file based subversion repositories are fully supported, but as you'll come to see, they're not always as anticipated.

When first joining we had discussed the many different source control systems I had managed in the past.  There was some discussion about moving to something a bit more modern and distributed.  And hopefully something with less disastrous merging. Of course, I recommended mercurial or git with the latter quickly winning a decision.  All that was left was to set up a server, migrate their repositories and maintain history.

Git has a very useful command that allows you to easily migrate from subversion, aptly called 'svn'.

(Note: On my Ubuntu install, the default git-core did not have this command.  You have to install the package 'git-svn'.  If you're on Windows Msysgit should have it by default)

The really cool thing about git-svn is that it's not really intended to only migrate from subversion.  It's more of a layer between the two.  For instance, say my team doesn't want to switch to git, but I can't stand going back to subversion?  I could just clone their svn repo with git and work from there.  All the git commands (I believe) will work.  Pulling, fetching, committing, pushing, etc should all work seamless back to subversion.

For our purposes, though, we merely want to use it as a means to an end; Quickly setting up our new git repository!

(Another note: There is a ruby gem called 'svn2git' that attempts to make this process easier for you by calling git-svn in a script.  After learning how to use git svn, I really don't see how it's any easier.)

The simplest way to migrate a single svn repository to git would be to use this command:
git svn clone svn://foo-repo-url foo-git-dir

This will 'git init' into a directory (with some extra special sprinkles for svn), set the origin to your svn repo url and pull it down with history intact.

(Note: Notice I'm using svn:// and not file://? You're a freaking detective! In my special case, git svn would *not* accept a file:// url.  I had to run svnserve and point to the file path and then reference my local machine as the svn repository)

That's all fine and good but in subversion, it only tracks a users name in their commit log!  git won't be able to match up with our user/email combination!

Not so I say!  You merely need to point to a map for git to use as it goes through the process.

Create a file with the following format, listing all of the users with history to migrate

svnUserOne = John User 
svnUserTwo = Joe Doe 

Reference this file when running your git clone command with --authors-file or -A argument.

git svn clone svn://repo my_dir --authors-file=my_user_mapping.txt

"What about my branches!?" you ask?  "Is trunk and branch and tags and all that going into my git repo?" you bellow

Well, yes that's another thing you should be concerned about.  I wasn't, though, because we didn't have that sort of setup in the repository.  I told you it was unconventional.  I suppose I should have explained at the top of this article, I'll have to edit this out!

There were no branches.  There were two main repositories with more than ten projects in each.  I believe they were split between 'internal' and 'public-facing'. As such, the root of every repository was project directories.  I didn't have to worry about branches or tags, but if you do, don't worry. git svn has you covered for that as well.

Here's a quick list of what arguments you'll need

--trunk=your trunk directory
--branches=your branches directory
--tags=your tags directory

If your layout is fairly standard, (/trunk, /branches, /tags), you should be good to go using --stdlayout or -s to signify that fact.  If you break things up a bit (say like /code/branches, /code/trunk, /tags) or something like that, you may have to specify.  Remember / signifies the root of your repository.

I didn't need to pass any of those arguments since I didn't worry about branches or tags, but what about the fact that I have 20 odd projects, two subversion repositories and want to end up with a git repository per project?   Unfortunately, that goes beyond a simple git svn command.

The first thing I did was experiment.  Would it work if I gave the path directly to the project?

git svn clone svn://server/project project --authors-file=file.txt

Excellent! I now have my project directory filled with code only from that project!

Essentially, this means if I have a list of projects (or at least the names of their paths), a mapping file for the subversion users and a few lines of ruby I should be able to get this to work!

Of course, I'm on my machine and I need to put these git repos on a centralized server.

For that, I mapped the server directory holding our repositories locally.  I had to created a directory for each project and then 'git init' inside of each of them.  From there I could set the 'origin' for these local git repositories to that centralized server and then push the code to it.  With 20+ projects, that certainly complicates things, eh?

I decided that maybe a script to sit in front of git-svn would be useful after all.  If only to manage all that extraneous stuff for which git-svn isn't intended.  My mad hacking resulted in a less than impressive ruby script, but I'll still attach it here for posterity.  From there, I had the idea that I could improve it and it might be useful for others in a similar situation.

My svn to git migration project is currently currently hosted on GitHub, with the horribly pun-tastic name of 'GetSvn', waiting for me to finally improve it. Is that enough links?

Original column-like ruby script, as promised

Saturday, December 1, 2012

Helpful bash aliases

I thought I'd share a few of the most helpful bash functions and aliases I've collected throughout the years I've been using Linux and cygwin (sometimes Windows is unavoidable!)

I should probably explain that these should be entered into a file that will typically be read and executed on login. Something like .bashrc or .bash_aliases is popular. You may have to configure .bash_aliases to be called within another dot file. (.bashrc or .bash_profile).
Feel free to check my files on github to see how I do it (though it's definitely not 'correct' :D )

A lot of these were copied straight from other peoples' configs from forums or github. Some starts as a copied function and was then edited to fit my particular usage. Some are my own.

First up, extract

# ex - archive extractor
# usage: ex 
ex ()
{
  if [ -f $1 ] ; then
    case $1 in
      *.tar.bz2)   tar xjf $1   ;;
      *.tar.gz)    tar xzf $1   ;;
      *.bz2)       bunzip2 $1   ;;
      *.rar)       rar x $1     ;;
      *.gz)        gunzip $1    ;;
      *.tar)       tar xf $1    ;;
      *.tbz2)      tar xjf $1   ;;
      *.tgz)       tar xzf $1   ;;
      *.zip)       unzip $1     ;;
      *.Z)         uncompress $1;;
      *.7z)        7z x $1      ;;
      *)           echo "'$1' cannot be extracted via ex()" ;;
    esac
  else
    echo "'$1' is not a valid file"
  fi
}

You can see it merely checks the file extension and calls the appropriate binary with the appropriate switches. Not so complicated.  One thing I would like to do in the future is hook into the bash completion scripts so a switch like this would not even be necessary. Next up we go sort of inverse from extract with roll()

# roll - archive wrapper
# usage: roll  ./foo ./bar
roll ()
{
  FILE=$1
  case $FILE in
    *.tar.bz2) shift && tar cjf $FILE $* ;;
    *.tar.gz) shift && tar czf $FILE $* ;;
    *.tgz) shift && tar czf $FILE $* ;;
    *.zip) shift && zip $FILE $* ;;
    *.rar) shift && rar $FILE $* ;;
  esac
}

As you can see it's fairly simplistic just like ex, though adding the element of shift does raise the complexity a bit. Essentially we just need to detect the type of archive based on the chosen file name and return the binary and switches to create it. Again, I think I would rather look into the bash completion code to find them, but for now this works great.

ducks is a simple alias that simplifies finding the largest subdirectories in a given argument. It duplicates aliases and functions I've written previously, multiple times, but never have gotten it like I want. Luckily I happened across another persons bash_alias with this defined and it's been better than anything I've done with this idea so far. I'm not giving up, though!



alias ducks='du -cksh * |sort -rn |head -11




I'll summarize the man page since it's just invoking a few switches.



-c says to show a complete size for all of the listings

-k says to count with a block-size of 1k

-s summarizes each entry. You could think of it as du -s * the same as du --max-depth=1 .

-h makes the results human readable. Basically you'll see 303k instead of 303012


What I want to know; why isn't it called ducksh? :) I guess duchkmd=1 doesn't have the same ring to it.

I've definitely noticed some issues with this approach. Things aren't quite sorted like you'd figure. It also cuts off the list if it gets too long. It might be nice to optionally pass that limit in yourself.


Well, that's it for now. Check back for some more bash configuration fun as well as a few ruby scripts that might make CLI maintenance on your linux machines a little less painful!

* From what I can tell some of this code has been passed around so many times that their original authors are impossible to find. If you notice your own code, or obviously modified from your code here, feel free to let me know and I'll give you credit and a link back!