Rate this page

Flattr this

Convert the line endings in a text file from UNIX to DOS format

Tested on

Debian (Lenny)

Objective

To convert the line endings in a text file from UNIX to DOS format (LF to CRLF)

Background

Most modern operating systems use the linefeed character (LF) as a line separator. The only notable exception is Microsoft Windows, which uses a carriage return followed by a linefeed (CRLF). When preparing files intended primarily or exclusively for use on machines running Microsoft Windows it may be desirable to convert them at source to use carriage returns and linefeeds.

The use of CRLF as a line separator is often referred to as DOS format due to its historical use by PC-DOS, MS-DOS and related operating systems.

Scenario

Suppose that you have a UNIX format text file called input.txt. You wish to convert it to DOS format, writing the result to a file called output.txt. The conversion will be performed in an environment in which the line separator is a single linefeed (LF).

Methods

Overview

There are many different ways to convert from UNIX to DOS line endings, of which those presented here are only a selection. They can be grouped into those based on general-purpose tools that are likely to be installed already on most systems:

and those which make use of a program that is dedicated to the task:

There is little to choose between these methods unless you are performing the conversion from within a script, in which case considerations such as portability and speed may become significant.

Note that the methods based on general-purpose tools are unlikely to work in environments where the line separator is not a single linefeed.

Method (using GNU sed)

The sed command takes a script containing a list of editing commands and applies them to a stream of text. It can be invoked as a filter:

sed 's/$/\r/' < input.txt > output.txt

or it can be given a list of files to read from:

sed 's/$/\r/' input.txt > output.txt

or (in the case of GNU sed) it can be instructed to edit the file or files in place, overwriting the originals:

sed -i 's/$/\r/' input.txt

The script in this case consists of a substitution command. It replaces the regular expression $ (an empty string occurring at the end of a line) with a carriage return. This is done for each line of input. Provided that the newline character is a linefeed, this amounts to inserting CR prior to each LF.

Unfortunately, the notation \r to represent a carriage return is a GNU extension that will not be recognised by a minimally POSIX-compliant implementation of sed. The alternative is to insert a literal carriage return into the script, either by creating a file with the required content, or by using the shell if it has the required functionality. For example, using bash you could write:

sed s/$/$'\r'/ < input.txt > output.txt

where $'\r' expands to a carriage return.

Method (using Perl)

Perl can be used in a similar manner to sed, applying a given script to each line read from STDIN:

perl -pe 's/\n/\r\n/' < input.txt > output.txt

or from a given list of input files:

perl -pe 's/\n/\r\n/' input.txt > output.txt

or modifying a file in place:

perl -pi -e 's/\n/\r\n/' input.txt

The -p option requests line-by-line iteration over the input. At the start of each iteration $_ contains the line to be processed and at the end of each iteration the content of $_ is printed.

The -i option, where used, requests in-place editing.

The -e option specifies the script to be executed. In this case it replaces the string \n (LF) with the string \r\n (CRLF). You may encounter a variant of this substitution in which the g (global) flag is set. This is harmless but unnecessary.

If you are automatically converting large numbers of files then this method is likely to be significantly slower than using sed because of the overhead of invoking Perl. Otherwise, it has similar advantages and disadvantages.

Method (using todos)

The todos command is part of the tofrodos package by Christopher Heng. On Debian-based systems it can be installed using that name:

apt-get install tofrodos

If it is given a filename as an argument then it will perform an in-place conversion (overwriting the original content):

todos input.txt

however it can be made to write to a different file by invoking it with no arguments, in which case it acts as a filter:

todos < input.txt > output.txt

Method (using dos2unix)

The unix2dos command by Benjamin Lin is a reimplementation of a command that was originally a feature of SunOS and Solaris. On Debian-based systems it is provided by the dos2unix package:

apt-get install dos2unix

Like todos it can act as a filter:

unix2dos < input.txt > output.txt

or modify files in place:

unix2dos input.txt

Unlike todos it explicitly supports writing the output to a separate file, using the -n option:

unix2dos -n input.txt output.txt

However, be aware that some systems (notably Debian prior to Squeeze and Ubuntu prior to Maverick) implement unix2dos as a softlink to todos, in which case the -n option will not be available.

Testing

The line endings in a file can be inspected using the -e option of cat:

cat -e output.txt

Carriage returns are displayed as a caret followed by the letter em (^M) and newlines as a dollar sign ($). Here is an example of a line of text in UNIX format, as displayed by cat:

The quick brown fox jumps over the lazy dog$

and here is the same line after conversion to DOS format:

The quick brown fox jumps over the lazy dog^M$

See also

Tags: shell