Download all a website using wget

If you ever need to an entire Web site, perhaps for off-line viewing, wget can do the
job—for example:

Example 1:

$ wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     ---extension \
     --convert-links \
     --restrict--names= \
     --domains .org \
     --no-parent \
         www..org/tutorials/html/

This downloads the Web site www.website.org/tutorials/html/.

The options are:

  • –recursive: download the entire Web site.
  • –domains website.org: don’t follow links outside website.org.
  • –no-parent: don’t follow links outside the directory tutorials/html/.
  • –page-requisites: get all the elements that compose the page (images, and so on).
  • –html-extension: save files with the .html extension.
  • –convert-links: convert links so that they work locally, off-line.
  • –restrict-file-names=windows: modify filenames so that they will work in Windows as well.
  • –no-clobber: don’t overwrite any existing files (used in case the download is interrupted and
    resumed).

Example 2:

wget --mirror --convert-links --adjust-extension --page-requisites 
--no-parent http://example.org

Explanation of the various flags:

  • --mirror – Makes (among other things) the download recursive.
  • --convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.
  • --adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type.
  • --page-requisites – Download things like CSS style-sheets and images required to properly display the page offline.
  • --no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.

Alternatively, the command above may be shortened:

wget -mkEpnp http://example.org

Note: that the last p is part of np (--no-parent) and hence you see p twice in the flags.

Print Friendly, PDF & Email

Comments

comments

Bài viết liên quan

Be the first to comment

Để lại lời nhắn