Rate this page

Flattr this

Permanently delete a file from all revisions of a Subversion repository

Tested on

Debian (Etch, Lenny, Squeeze)
Ubuntu (Lucid, Maverick, Natty, Precise, Trusty)

Objective

Permanently delete a file from all revisions of a Subversion repository

Scenario

Suppose that /var/lib/svn/foo is a publicly-accessible Subversion repository containing a file called /trunk/hello_world.c. A litigious Utah-based software company has claimed that this file infringes their copyright, and for that reason you wish to completely remove it from the repository.

Many hundreds of revisions have been committed since the file was originally added, therefore rolling back to a point before it existed is not an option.

Method

Overview

Retroactive file deletion is a highly invasive procedure which involves rebuilding the repository. You will require administrative access to it, and you should inform other users of what has been done. In principle it ought to be safe, but there is significant potential for data loss if you make a mistake.

The method described here has five steps:

  1. Identify any copies of the unwanted content.
  2. Dump the repository to a file.
  3. Filter the dumpfile to exclude the unwanted content.
  4. Load the filtered dumpfile into a new, empty repository.
  5. Replace the old repository with the new repository.

Do not use this procedure if you merely wish to delete a file or directory from the head revision. This can be done without administrative access using the svn rm command.

Identify any copies of the unwanted content

When Subversion copies or moves a file it does not duplicate the content: instead a link is created that refers to the existing copy. This link will would be broken if you excluded the original without excluding the copy.

You need to identify:

Although this may seem straightforward enough the effects are quite subtle. An example will illustrate some of the distinctions:

If you are uncertain as to what copies might exist then you can find out by searching the log. The verbose option (-v) provides information about where files have been copied from:

svn log -v file:///var/lib/svn/foo -v

For example, if trunk/hello_world.c had been copied to trunk/hello_again_world.c at revision 137 then the log would contain an entry similar to:

------------------------------------------------------------------------
r137 | user | 2011-03-01 09:00:00 +0000 (Tue, 01 Mar 2011) | 1 line
Changed paths:
   A /trunk/hello_again_world.c (from /trunk/hello_world.c:125)

Began work on the 'Again' variant of 'Hello World'.
------------------------------------------------------------------------

but the following pair of revisions would be equally problematic despite making no explicit reference to trunk/hello_world.c:

------------------------------------------------------------------------
r138 | user | 2011-03-01 10:00:00 +0000 (Tue, 01 Mar 2011) | 1 line
Changed paths:
   A /branches/test (from /trunk:125)

Created test branch.
------------------------------------------------------------------------
r139 | user | 2011-03-01 11:00:00 +0000 (Tue, 01 Mar 2011) | 1 line
Changed paths:
   M /branches/test/hello_world.c

Experimental modification to 'Hello World'.
------------------------------------------------------------------------

Another technique that may prove useful is to use the svnlook tree command to search for files that end with the same leafname as the original, for example:

svnlook tree --full-paths /var/lib/svn/foo | grep "/hello_world.c$"

Results located within tags are unlikely to require explicit deletion, but those in branches or any other location would merit further investigation.

Dump the repository to a file

Use the svnadmin dump command to create a dumpfile:

cd /var/lib/svn
svnadmin dump foo > foo.dump

Filter the dumpfile to exclude the unwanted content

Use the svndumpfilter command to create a filtered copy of the dumpfile from which the unwanted file or files have been removed:

svndumpfilter exclude trunk/hello_world.c < foo.dump > foo.filtered.dump

Some revisions may be left empty by this operation. This might be considered untidy, but it is not actually harmful. You can delete the empty revisions and renumber the remainder if you so wish:

svndumpfilter exclude --drop-empty-revs --renumber-revs trunk/hello_world.c < foo.dump > foo.filtered.dump

however this will invalidate any working copies that refer to the renumbered revisions.

Load the filtered dumpfile into a new, empty repository

Use the svnadmin create command to create a new, empty repository, then use the svnadmin load command to load it with the content of the filtered dumpfile:

svnadmin create foo.new
svnadmin load foo.new < foo.filtered.dump

If the repository needs to be owned by a different account from the one you are using then you should (recursively) change its ownership now. For example, if it is served using Apache then it may need to be owned by www-data:

chown -R www-data:www-data foo.new

Replace the old repository with the new repository

The new repository should now be ready for use, so can be moved into place:

mv foo foo.old
mv foo.new foo

Tags: svn