Act on all files in a directory tree using
|Debian (Etch, Lenny, Squeeze)|
|Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Precise, Trusty)|
To perform a given action on all files within a given directory tree.
~/foo/ is the root of a working copy of a Subversion repository. You wish to calculate the MD5sum of all files within that working copy, except for those created for internal use by Subversion. The latter can be identified by the fact that they are stored within subdirectories called
List the files to be acted upon using
find, then remove any pathnames that contain the string
/.svn/, then pipe the result into
find ~/foo -type f -print0 | grep -zZv "/[.]svn/" | xargs -0 -n 1 md5sum
The first argument to
find is the root of the directory tree to be searched. The option
-type f specifies that only regular files are to be matched (as opposed to, for example, directories). By default
find will match any type of object that it finds in the given directory tree, but this is rarely the desired behaviour when passing the pathnames to a command.
-v option to
grep specifies that it should search for lines that do not match the pattern (as opposed to its normal behaviour where it searches for lines that match).
-n 1 option to
xargs specifies that the given command should be run separately for each pathname. This is not necessary in the example above because
md5sum allows multiple pathnames to be passed on the same line. For this reason it would have been equally valid (and significantly more efficient) to invoke
xargs without the
find ~/foo -type f -print0 | grep -zZv "/[.]svn/" | xargs -0 md5sum
The same is true for many other commands, but it is not true in all cases. The safe option is therefore to include
-n 1 unless you have positively established that it is not needed.
find produces a newline-separated list of filenames and
xargs accepts any form of whitespace as a separator. This works well provided that none of the filenames listed contain any whitespace, at which point it fails badly. The
-print0 option to
-Z options to
grep, and the
-0 option to
xargs circumvent this issue by using a null-separated list instead. These options are non-standard GNU extensions, but there is no completely safe alternative.
Commands of the type described here can damage a large number of files in a very short space of time if you make a mistake. You may therefore wish to perform a dry run first, to inspect the commands that would be executed without actually executing them. This can be done by inserting an
echo command into the command line passed to
find ~/foo -type f -print0 | grep -zZv "/[.]svn/" | xargs -0 echo md5sum
This technique will not work if the command contains any redirections, but you can work around that limitation by temporarily replacing the redirection token with some other string that has no special meaning to the shell.
(Note that spaces and other special characters will not be escaped in the commands as listed.)
An effective strategy for troubleshooting is to first check that the correct pathnames are being selected, then that the correct commands are being executed.
To check which pathnames are selected, execute the
find command without
xargs or any filtering. While doing this you may want to revert to using newlines as terminators:
find ~/foo -type f
You can do this again as each stage of filtering is added to the pipeline:
find ~/foo -type f | grep -v "/[.]svn/"
If you are using the
-name option to
find, an easy mistake to make here is not putting the patten in quotation marks. If you don't, and there are one or more files in the current working directory that match the pattern, then the pattern will be expanded by the shell. If the pattern expands to exactly one pathname then the find command will appear to work, but will not necessarily have acted upon all the files it should have done.
Note that long options to find being with a single hyphen (not the double hyphen required for most other commands).
If the pathnames are correct then the error must lie in the commands that are executed. These can be inspected using the dry-run technique described above.