Convert the line endings in a text file from DOS to UNIX format
To convert the line endings in a text file from DOS to UNIX format (CRLF to LF)
Most modern operating systems use the linefeed character (LF) as a line separator. The only notable exception is Microsoft Windows, which uses a carriage return followed by a linefeed (CRLF). When processing files that originated on a system running Microsoft Windows it is often necessary to convert them to use linefeeds only.
The use of CRLF as a line separator is often referred to as DOS format due to its historical use by PC-DOS, MS-DOS and related operating systems.
Suppose that you have received a DOS format text file called
input.txt. You wish to convert it to UNIX format, writing the result to a file called
output.txt. The conversion will be performed in an environment in which the line separator is a single linefeed (LF).
There are many different ways to convert from DOS to UNIX line endings, of which those presented here are only a selection. They can be grouped into those based on general-purpose tools that are likely to be installed already on most systems:
- using GNU
- using Perl
and those which make use of a program that is dedicated to the task:
There is little to choose between these methods unless you are performing the conversion from within a script, in which case considerations such as portability and speed may become significant.
Note that the methods based on general-purpose tools are unlikely to work in environments where the line separator is not a single linefeed.
sed command takes a script containing a list of editing commands and applies them to a stream of text. It can be invoked as a filter:
sed 's/\r$//' < input.txt > output.txt
or it can be given a list of files to read from:
sed 's/\r$//' input.txt > output.txt
or (in the case of GNU
sed) it can be instructed to edit the file or files in place, overwriting the originals:
sed -i 's/\r$//' input.txt
The script in this case consists of a substitution command. It replaces the regular expression \r$ (a carriage return occurring at the end of a line) with the empty string. This is done for each line of input. Provided that the newline character is a linefeed, this amounts to replacing CRLF with LF.
Unfortunately, the notation
\r to represent a carriage return is a GNU extension that will not be recognised by a minimally POSIX-compliant implementation of
sed. The alternative is to insert a literal carriage return into the script, either by creating a file with the required content, or by using the shell if it has the required functionality. For example, using
bash you could write:
sed s/$'\r'$// < input.txt > output.txt
$'\r' expands to a carriage return.
Perl can be used in a similar manner to
sed, applying a given script to each line read from
perl -pe 's/\r\n/\n/' < input.txt > output.txt
or from a given list of input files:
perl -pe 's/\r\n/\n/' input.txt > output.txt
or modifying a file in place:
perl -pi -e 's/\r\n/\n/' input.txt
-p option requests line-by-line iteration over the input. At the start of each iteration
the line to be processed and at the end of each iteration the content of
$_ is printed.
-i option, where used, requests in-place editing.
-e option specifies the script to be executed. In this case it replaces the string
\r\n (CRLF) with the string
\n (LF). You may encounter a variant of this substitution in which the
g (global) flag is set. This is harmless but unnecessary.
If you are automatically converting large numbers of files then this method is likely to be significantly slower than using
sed because of the overhead of invoking Perl. Otherwise, it has similar advantages and disadvantages.
fromdos command is part of the
tofrodos package by Christopher Heng. On Debian-based systems it can be installed using that name:
apt-get install tofrodos
If it is given a filename as an argument then it will perform an in-place conversion (overwriting the original content):
however it can be made to write to a different file by invoking it with no arguments, in which case it acts as a filter:
fromdos < input.txt > output.txt
dos2unix command by Benjamin Lin is a reimplementation of a command that was originally a feature of SunOS and Solaris. On Debian-based systems it is provided by the
apt-get install dos2unix
fromdos it can act as a filter:
dos2unix < input.txt > output.txt
or modify files in place:
fromdos it explicitly supports writing the output to a separate file, using the
dos2unix -n input.txt output.txt
However, be aware that some systems (notably Debian prior to Squeeze and Ubuntu prior to Maverick) implement
dos2unix as a softlink to
fromdos, in which case the
-n option will not be available.
The line endings in a file can be inspected using the
-e option of
cat -e output.txt
Carriage returns are displayed as a caret followed by the letter em (
^M) and newlines as a dollar sign (
$). Here is an example of a line of text in DOS format, as displayed by
The quick brown fox jumps over the lazy dog^M$
and here is the same line after conversion to UNIX format:
The quick brown fox jumps over the lazy dog$
- Convert the line endings in a text file from UNIX to DOS format
- Convert a text file from one character encoding to another