Convert the line endings in a text file from UNIX to DOS format
To convert the line endings in a text file from UNIX to DOS format (LF to CRLF)
Most modern operating systems use the linefeed character (LF) as a line separator. The only notable exception is Microsoft Windows, which uses a carriage return followed by a linefeed (CRLF). When preparing files intended primarily or exclusively for use on machines running Microsoft Windows it may be desirable to convert them at source to use carriage returns and linefeeds.
The use of CRLF as a line separator is often referred to as DOS format due to its historical use by PC-DOS, MS-DOS and related operating systems.
Suppose that you have a UNIX format text file called
input.txt. You wish to convert it to DOS format, writing the result to a file called
output.txt. The conversion will be performed in an environment in which the line separator is a single linefeed (LF).
There are many different ways to convert from UNIX to DOS line endings, of which those presented here are only a selection. They can be grouped into those based on general-purpose tools that are likely to be installed already on most systems:
- using GNU
- using Perl
and those which make use of a program that is dedicated to the task:
There is little to choose between these methods unless you are performing the conversion from within a script, in which case considerations such as portability and speed may become significant.
Note that the methods based on general-purpose tools are unlikely to work in environments where the line separator is not a single linefeed.
sed command takes a script containing a list of editing commands and applies them to a stream of text. It can be invoked as a filter:
sed 's/$/\r/' < input.txt > output.txt
or it can be given a list of files to read from:
sed 's/$/\r/' input.txt > output.txt
or (in the case of GNU
sed) it can be instructed to edit the file or files in place, overwriting the originals:
sed -i 's/$/\r/' input.txt
The script in this case consists of a substitution command. It replaces the regular expression $ (an empty string occurring at the end of a line) with a carriage return. This is done for each line of input. Provided that the newline character is a linefeed, this amounts to inserting CR prior to each LF.
Unfortunately, the notation
\r to represent a carriage return is a GNU extension that will not be recognised by a minimally POSIX-compliant implementation of
sed. The alternative is to insert a literal carriage return into the script, either by creating a file with the required content, or by using the shell if it has the required functionality. For example, using
bash you could write:
sed s/$/$'\r'/ < input.txt > output.txt
$'\r' expands to a carriage return.
Perl can be used in a similar manner to
sed, applying a given script to each line read from
perl -pe 's/\n/\r\n/' < input.txt > output.txt
or from a given list of input files:
perl -pe 's/\n/\r\n/' input.txt > output.txt
or modifying a file in place:
perl -pi -e 's/\n/\r\n/' input.txt
-p option requests line-by-line iteration over the input. At the start of each iteration
$_ contains the line to be processed and at the end of each iteration the content of
$_ is printed.
-i option, where used, requests in-place editing.
-e option specifies the script to be executed. In this case it replaces the string
\n (LF) with the string
\r\n (CRLF). You may encounter a variant of this substitution in which the
g (global) flag is set. This is harmless but unnecessary.
If you are automatically converting large numbers of files then this method is likely to be significantly slower than using
sed because of the overhead of invoking Perl. Otherwise, it has similar advantages and disadvantages.
todos command is part of the
tofrodos package by Christopher Heng. On Debian-based systems it can be installed using that name:
apt-get install tofrodos
If it is given a filename as an argument then it will perform an in-place conversion (overwriting the original content):
however it can be made to write to a different file by invoking it with no arguments, in which case it acts as a filter:
todos < input.txt > output.txt
unix2dos command by Benjamin Lin is a reimplementation of a command that was originally a feature of SunOS and Solaris. On Debian-based systems it is provided by the
apt-get install dos2unix
todos it can act as a filter:
unix2dos < input.txt > output.txt
or modify files in place:
todos it explicitly supports writing the output to a separate file, using the
unix2dos -n input.txt output.txt
However, be aware that some systems (notably Debian prior to Squeeze and Ubuntu prior to Maverick) implement
unix2dos as a softlink to
todos, in which case the
-n option will not be available.
The line endings in a file can be inspected using the
-e option of
cat -e output.txt
Carriage returns are displayed as a caret followed by the letter em (
^M) and newlines as a dollar sign (
$). Here is an example of a line of text in UNIX format, as displayed by
The quick brown fox jumps over the lazy dog$
and here is the same line after conversion to DOS format:
The quick brown fox jumps over the lazy dog^M$
- Convert the line endings in a text file from DOS to UNIX format
- Convert a text file from one character encoding to another