Resume an interrupted download when using Wget
Tested on |
Debian (Lenny, Squeeze) |
Ubuntu (Lucid) |
Objective
To resume an interrupted download when using Wget
Scenario
Suppose that you have instructed Wget to download a large file from the url http://www.example.com/image.iso
:
wget http://www.example.com/image.iso
Unfortunately it was necessary to terminate Wget before it finished the download in order to reboot the machine for maintenance. You wish to fetch the remainder of the file, but do not wish to refetch any data that has already been downloaded.
Method
Wget can be instructed to resume an interrupted download using the -c
option:
wget -c http://www.example.com/image.iso
If there is an existing file with the appropriate filename (image.iso
in this instance) then it is assumed to contain a partially downloaded copy of the content. If possible then Wget will skip forward by the appropriate number of bytes and resume the download from the point at which it was interrupted.
Be warned that Wget does not attempt to verify that the content of the partially downloaded file matches the content of the URL. This could be a problem if:
- the content of the URL has changed between the first and second attempted downloads, or
- there has been a system failure and the filesystem containing the local copy of the file was not cleanly unmounted.
Following a system failure, the risk of file corruption depends on the type of filesystem and how it is configured. Journalling filesystems are not necessarily immune to this issue. The risk can be reduced by using truncate
to discard the last few seconds or minutes of data.
If you have the means to verify the download (for example using md5sum
) then it would be prudent to do so.
Note
Not all downloads can be resumed: it might not be allowed by the protocol used to perform the download, or it might be allowed by the protocol but not by the server at which the URL is hosted. The support provided by HTTP and FTP is as follows:
- HTTP/1.1 allows a partial download to be requested by means of the
Range
header, however the server is not required to honour the request. Earlier versions of HTTP do not allow partial downloads. - FTP has supported the REST (restart) command since its inception, but with the fairly serious drawback that it did not apply to STREAM-mode file transfers. This omission was not formally rectified until March 2007, when the
REST STREAM
feature was added by RFC 3659, however early drafts of this RFC indicate thatREST STREAM
was already in widespread use by 1997.
The -c
option is therefore worth trying for both HTTP and FTP downloads, but there is no guarantee that it will work in either case.
Further reading
- P.Hethmon, Extensions to FTP, RFC 3659, IETF, March 2007
- R.Elz and P.Hethmon, Extended Directory Listing and Restart Mechanism for FTP, draft RFC (expired), IETF, June 1997