Rate this page

Flattr this

Act on all files in a directory tree using find and xargs

Tested on

Debian (Etch, Lenny, Squeeze)
Fedora (14)
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Precise, Trusty)

Objective

To perform a given action on all files within a given directory tree.

Scenario

The path ~/foo/ is the root of a working copy of a Subversion repository. You wish to calculate the MD5sum of all files within that working copy, except for those created for internal use by Subversion. The latter can be identified by the fact that they are stored within subdirectories called .svn.

Method

List the files to be acted upon using find, then remove any pathnames that contain the string /.svn/, then pipe the result into xargs:

find ~/foo -type f -print0 | grep -zZv "/[.]svn/" | xargs -0 -n 1 md5sum

The first argument to find is the root of the directory tree to be searched. The option -type f specifies that only regular files are to be matched (as opposed to, for example, directories). By default find will match any type of object that it finds in the given directory tree, but this is rarely the desired behaviour when passing the pathnames to a command.

The -v option to grep specifies that it should search for lines that do not match the pattern (as opposed to its normal behaviour where it searches for lines that match).

The -n 1 option to xargs specifies that the given command should be run separately for each pathname. This is not necessary in the example above because md5sum allows multiple pathnames to be passed on the same line. For this reason it would have been equally valid (and significantly more efficient) to invoke xargs without the -n option:

find ~/foo -type f -print0 | grep -zZv "/[.]svn/" | xargs -0 md5sum

The same is true for many other commands, but it is not true in all cases. The safe option is therefore to include -n 1 unless you have positively established that it is not needed.

By default, find produces a newline-separated list of filenames and xargs accepts any form of whitespace as a separator. This works well provided that none of the filenames listed contain any whitespace, at which point it fails badly. The -print0 option to find, the -z and -Z options to grep, and the -0 option to xargs circumvent this issue by using a null-separated list instead. These options are non-standard GNU extensions, but there is no completely safe alternative.

Testing

Commands of the type described here can damage a large number of files in a very short space of time if you make a mistake. You may therefore wish to perform a dry run first, to inspect the commands that would be executed without actually executing them. This can be done by inserting an echo command into the command line passed to xargs:

find ~/foo -type f -print0 | grep -zZv "/[.]svn/" | xargs -0 echo md5sum

This technique will not work if the command contains any redirections, but you can work around that limitation by temporarily replacing the redirection token with some other string that has no special meaning to the shell.

(Note that spaces and other special characters will not be escaped in the commands as listed.)

Troubleshooting

An effective strategy for troubleshooting is to first check that the correct pathnames are being selected, then that the correct commands are being executed.

To check which pathnames are selected, execute the find command without xargs or any filtering. While doing this you may want to revert to using newlines as terminators:

find ~/foo -type f

You can do this again as each stage of filtering is added to the pipeline:

find ~/foo -type f | grep -v "/[.]svn/"

If you are using the -name option to find, an easy mistake to make here is not putting the patten in quotation marks. If you don't, and there are one or more files in the current working directory that match the pattern, then the pattern will be expanded by the shell. If the pattern expands to exactly one pathname then the find command will appear to work, but will not necessarily have acted upon all the files it should have done.

Note that long options to find being with a single hyphen (not the double hyphen required for most other commands).

If the pathnames are correct then the error must lie in the commands that are executed. These can be inspected using the dry-run technique described above.

See also

Tags: shell