Act on all files in a directory tree using find
and xargs
Tested on |
Debian (Etch, Lenny, Squeeze) |
Fedora (14) |
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Precise, Trusty) |
Objective
To perform a given action on all files within a given directory tree.
Scenario
The path ~/foo/
is the root of a working copy of a Subversion repository. You wish to calculate the MD5sum of all files within that working copy, except for those created for internal use by Subversion. The latter can be identified by the fact that they are stored within subdirectories called .svn
.
Method
List the files to be acted upon using find
, then remove any pathnames that contain the string /.svn/
, then pipe the result into xargs
:
find ~/foo -type f -print0 | grep -zZv "/[.]svn/" | xargs -0 -n 1 md5sum
The first argument to find
is the root of the directory tree to be searched. The option -type f
specifies that only regular files are to be matched (as opposed to, for example, directories). By default find
will match any type of object that it finds in the given directory tree, but this is rarely the desired behaviour when passing the pathnames to a command.
The -v
option to grep
specifies that it should search for lines that do not match the pattern (as opposed to its normal behaviour where it searches for lines that match).
The -n 1
option to xargs
specifies that the given command should be run separately for each pathname. This is not necessary in the example above because md5sum
allows multiple pathnames to be passed on the same line. For this reason it would have been equally valid (and significantly more efficient) to invoke xargs
without the -n
option:
find ~/foo -type f -print0 | grep -zZv "/[.]svn/" | xargs -0 md5sum
The same is true for many other commands, but it is not true in all cases. The safe option is therefore to include -n 1
unless you have positively established that it is not needed.
By default, find
produces a newline-separated list of filenames and xargs
accepts any form of whitespace as a separator. This works well provided that none of the filenames listed contain any whitespace, at which point it fails badly. The -print0
option to find
, the -z
and -Z
options to grep
, and the -0
option to xargs
circumvent this issue by using a null-separated list instead. These options are non-standard GNU extensions, but there is no completely safe alternative.
Testing
Commands of the type described here can damage a large number of files in a very short space of time if you make a mistake. You may therefore wish to perform a dry run first, to inspect the commands that would be executed without actually executing them. This can be done by inserting an echo
command into the command line passed to xargs
:
find ~/foo -type f -print0 | grep -zZv "/[.]svn/" | xargs -0 echo md5sum
This technique will not work if the command contains any redirections, but you can work around that limitation by temporarily replacing the redirection token with some other string that has no special meaning to the shell.
(Note that spaces and other special characters will not be escaped in the commands as listed.)
Troubleshooting
An effective strategy for troubleshooting is to first check that the correct pathnames are being selected, then that the correct commands are being executed.
To check which pathnames are selected, execute the find
command without xargs
or any filtering. While doing this you may want to revert to using newlines as terminators:
find ~/foo -type f
You can do this again as each stage of filtering is added to the pipeline:
find ~/foo -type f | grep -v "/[.]svn/"
If you are using the -name
option to find
, an easy mistake to make here is not putting the patten in quotation marks. If you don't, and there are one or more files in the current working directory that match the pattern, then the pattern will be expanded by the shell. If the pattern expands to exactly one pathname then the find command will appear to work, but will not necessarily have acted upon all the files it should have done.
Note that long options to find being with a single hyphen (not the double hyphen required for most other commands).
If the pathnames are correct then the error must lie in the commands that are executed. These can be inspected using the dry-run technique described above.
See also
Tags: shell