Globally rename an identifier throughout a set of source files
Content |
Tested on |
Debian (Lenny, Squeeze) |
Ubuntu (Lucid) |
Objective
To globally rename an identifier throughout a set of source files
Scenario
Suppose that you have a set of C source files in which the function foo
is declared, defined and used. You wish to change the name of this function to bar
, but without affecting identifiers with similar names such as foo2
or zfoo
.
The files are located in a single directory and (as is normal practice) have names ending in the extensions .c
and .h
. One of the files that makes use of the function foo
is named main.c
.
Method
Overview
There are a number of ways in which the required effect can be achieved with commonly available tools. Three methods are described here:
- using GNU
sed
, - using POSIX
sed
, and - using Perl.
All are equally effective, but vary in terms of portability and complexity.
Be warned that making global changes to a body of source code has the potential to cause severe data loss if the procedure were to go wrong for any reason. It would be prudent either to make a copy of the code, or better, ensure that it is fully checked into a revision control system before attempting to use any of the methods described here.
Method (using GNU sed)
If the GNU implementation of sed
is available then the required effect can be obtained using the following editing script:
s/\bfoo\b/bar/g
The s
command means ‘substitute’ and it takes two arguments: a regular expression to search for (\bfoo\b
) and the text to substitute when a match is found (bar
). The g
flag requests that the substitution be performed globally, as opposed to only for the first match on each line.
Within the regular expression \b
matches a zero-width string at a word boundary. Word characters are letters, digits and underscores, therefore the boundaries matched are in most cases the same as would be recognised by a C compiler.
For testing sed
should be invoked with just the script and the name of one of the files:
sed 's/\bfoo\b/bar/g' main.c
The result will be written to stdout
. If it is satisfactory then the -i
option can be added to enable in-place editing, and the list of files extended to include all source and header files:
sed -i 's/\bfoo\b/bar/g' *.c *.h
Both \b
and -i
are GNU extensions that are not required by POSIX. This is not therefore a suitable method for use in scripts that should be portable.
Method (using POSIX sed)
It is possible to achieve the required effect using a minimally POSIX-compatible implementation of sed
, however the procedure for doing so is somewhat more complicated than when the \b
extension is available, and the effort unlikely to be worthwhile in most cases. A suitable editing script would be:
s/\(^\|[^a-zA-Z0-9_]\)foo\([^0-9A-Za-z_]\|$\)/\1bar\2/g
The boundaries are detected here by looking for non-word characters before and after the identifier. Because those characters now form part of the string that will be matched they must be reinserted into the output using backreferences. Identifiers at the start and/or end of a line cannot be matched using this technique so are treated as a special case.
In-place editing can be achieved either by writing to a temporary file then renaming it:
sed 's/\(^\|[^a-zA-Z0-9_]\)foo\([^0-9A-Za-z_]\|$\)/\1bar\2/g' < main.c > main.c.tmp mv main.c.tmp main.c
or by using the sponge
command (provided by the moreutils
package on Debian-based systems):
sed 's/\(^\|[^a-zA-Z0-9_]\)foo\([^0-9A-Za-z_]\|$\)/\1bar\2/g' < main.c | sponge main.c
This method does not by itself allow multiple files to be processed, however it can be placed within a for
loop or used in combination with the find
command if that is a requirement.
Method (using Perl)
The same outcome can be achieved using the following Perl script:
s/\bfoo\b/bar/g
This is identical to the GNU Sed script presented above, and has the same meaning. For testing it should be invoked with the -p
and -e
options:
perl -p -e 's/\bfoo\b/bar/g' main.c
The -e
option specifies that the next argument is the script to be executed. The -p
option causes repeated execution of that script, once for each line of input. The result will be written to stdout
. If it is found to be satisfactory then the -i
option can be added to enable in-place editing and the script applied to all files:
perl -pi -e 's/\bfoo\b/bar/g' *.c *.h
Further reading
- Regex Tutorial - \b Word Boundaries
- The Open Group, sed, Base Specifications Issue 7
- The GNU Project, GNU sed user's manual
- perlre (Perl regular expressions), Perl 5 documentation