Removing Large Files from TFS

13 Jun

I recently worked with a client to automate a process that rebuilds several databases based on licensed data that is delivered to them via FTP on a weekly basis.  At the end of this process, a backup of the newly built database is checked into version control as a means of backup, history, and deployment (i.e., conveniently making the database available to other developers and additional build automation).

They use Team Foundation Server (TFS) for version control.  We grew concerned that these checkins, along with a number of others, would bloat the overall size of the version control database over time, especially since TFS stores full copies of every revision of very large files.  I found a post by Eugene Zakhareyev that described how to identify large files, but I still wanted a good strategy for cleaning them up.

TFS provides a destroy command that can be used to permanently remove some or all revisions of a file from version control.  But obviously, we didn’t want to destroy these files entirely; just older revisions.  I decided to automate this process.  I put together a Perl script which will enumerate all the files found under a specified folder in version control, identify those that are larger than a user-configurable threshold, and provide tf.exe commands to destroy older revisions of those files.  The user can specify a maximum number of revisions to keep as well as a minimum timespan of history to keep; if the two are in conflict, the timespan specified overrides.

Some readers may be asking why I used Perl for this.  As I saw it, there were three basic approaches: querying TFS’ version control database directly, writing an application with the version control API, or scripting tf.exe.  Working directly against the database is bad because it isn’t documented, and the risk of shooting myself in the foot appeared high.  Writing a compiled program for this task felt heavyweight, but in hindsight it probably would have been simpler.

The resulting script is available on GitHub.  It has been tested against multiple TFS 2008 version control databases, with Strawberry Perl.  Looking back, I discovered several non-related but interesting things:

  • Perl feels much more mature than when I last used it five or six years ago.  I had to pull in a number of modules to complete this script.  All were available as part of the Strawberry Perl distribution — I didn’t have to go out to CPAN for anything, and they all worked exactly as documented on CPAN.
  • At first, I was shocked at the poor performance of the script due to the many invocations of tf.exe .  After searching a bit, I discovered that tf.exe has an “@ mode” or “console mode” that you can use to run many version control commands with a single invocation of tf.exe.  This appears undocumented; I only found a single mention of it within an MSDN forums thread.  After the fact, I discovered that tf.exe also supports response files, and I expect they would have worked equally well.
  • One complication I encountered involved dates: tf history‘s console output prints dates according to Windows’ regional settings, which can obviously vary from one system to the next.  To address this, I pull the Windows date format from the Registry and then do a little munging of that string so that I can parse it using the Time::Piece module.  This worked quite nicely and could prove useful for scripting other command line utilities.

If you have been looking for a way to clean large files from your TFS database and have stumbled upon this post, I encourage you to give the script a try.  It’s safe to run — it emits the tf destroy commands you’ll want to run, but leaves it to you to actually perform them.