Rate this page

Convert leading spaces to tabs in a text file

Tested on

Ubuntu (Precise, Trusty)


To convert the indentation of a text file from tabs to spaces


There are two commonly used methods for indenting source code: using spaces (U+0020) or tab characters (U+0009). There is no consensus as to which is preferable, however it is considered good practice to be consistent within any given body of code. To achieve consistency it is sometimes necessary to convert from one format to the other.

The most effective way to use tab characters is for each to represent one logical level of indentation. You may think that this indents the code too deeply, since most tools default to eight spaces per tab character, however the default can usually be changed. By using one tab per level you allow the indentation depth to be decided by the recipient of the code rather than it being fixed by the author. This an example of separating presentation from content.

Mixing tabs with spaces is not recommended, because this will cause the indentation to become irregular if you display the text with the wrong tab to space ratio.


Suppose that you have received a C source file called input.c that is indented using four spaces per level. You wish to convert it to one tab per level.



Two methods are presented here:

There is little to choose between these methods unless you are performing the conversion from within a script, in which case considerations such as portability and speed may become significant.

Method (using unexpand)

The unexpand command is part of the coreutils package, which is installed automatically as part of most general-purpose Linux distributions. It takes a named file as input, and writes the result to standard output. The -t option can be used to specify the number of spaces per tab. The --first-only option is then needed if you want only leading spaces to be converted:

unexpand -t 4 --first-only input.c > output.c

The command can instead be used as a filter by invoking it with no file argument:

unexpand -t 4 --first-only < input.c > output.c

If the -t option is omitted then unexpand defaults to 8 spaces per tab, and --first-only is implied.

Method (using Perl)

The same effect can be achieved using the following Perl script, either as a filter:

perl -pe 's/(^|\G) {4}/\t/g' < input.c > output.c

or to modify a file in place:

perl -pi -e 's/(^|\G) {4}/\t/g' input.c

The -p option requests line-by-line iteration over the input. At the start of each iteration $_ contains the line to be processed and at the end of each iteration the content of $_ is printed.

The -i option, where used, requests in-place editing.

The -e option specifies the script to be executed. In this case it globally replaces groups of four space characters found either:

The script is therefore capable of matching an unlimited number of leading spaces, but will stop at the first non-space character found on each line.

Note that the \G anchor is allowed in Perl-compatible regular expressions, but not in POSIX basic or extended regular expressions. For this reason the script above is not suitable for use with sed.


Tabs in a file can be inspected using the -T option of cat:

cat -T output.c

They are represented by the two-character sequence ^I.

See also

Tags: shell