wget: Very Advanced Usage

 
 7.3 Very Advanced Usage
 =======================
 
    • If you wish Wget to keep a mirror of a page (or FTP
      subdirectories), use ‘--mirror’ (‘-m’), which is the shorthand for
      ‘-r -l inf -N’.  You can put Wget in the crontab file asking it to
      recheck a site each Sunday:
 
           crontab
           0 0 * * 0 wget --mirror https://www.gnu.org/ -o /home/me/weeklog
 
    • In addition to the above, you want the links to be converted for
      local viewing.  But, after having read this manual, you know that
      link conversion doesn’t play well with timestamping, so you also
      want Wget to back up the original HTML files before the conversion.
      Wget invocation would look like this:
 
           wget --mirror --convert-links --backup-converted  \
                https://www.gnu.org/ -o /home/me/weeklog
 
    • But you’ve also noticed that local viewing doesn’t work all that
      well when HTML files are saved under extensions other than ‘.html’,
      perhaps because they were served as ‘index.cgi’.  So you’d like
      Wget to rename all the files served with content-type ‘text/html’
      or ‘application/xhtml+xml’ to ‘NAME.html’.
 
           wget --mirror --convert-links --backup-converted \
                --html-extension -o /home/me/weeklog        \
                https://www.gnu.org/
 
      Or, with less typing:
 
           wget -m -k -K -E https://www.gnu.org/ -o /home/me/weeklog