At some stage you are in need to download a website to your machine for various reasons. For many out there, their go-to selection is HTTrak. Here, I will show you another very simple way to download website using wget. The command is very simple to understand and very feature reach to offer you many possibilities.
As per official website wget is : "GNU Wget is a free software package for retrieving files using HTTP, HTTPS, FTP and FTPS, the most widely used Internet protocols. It is a non-interactive command-line tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc."
In simple words, wget is a tool that can help us to retrieve contents from internet. On most Linux distributions wget comes preinstalled.
If you ever want to download an entire website form internet, wget can be used to complete the job:
$ wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --no-parent https://saur.in/
This command will download entire https://saur.in website to the current directory. All pages and child pages, css, js, images, videos everything and will convert links so that they work locally and off-line.
Options used are:
- --recursive : Download the entire Web site. This will follow and download every links found on the website. If option --domains not specified it will also download files outside of domain.
- --no-clobber : Don't overwrite any existing files. This is useful in two ways, 1) It will not download repeated links. 2) In case the download is interrupted and resumed.
- --page-requisites : Get all the elements that compose the page (Images, CSS, JS, fonts, videos and so on).
- --html-extension : Save pages with the .html extension.
- --convert-links : Convert all links so that they work locally, off-line.
- --restrict-file-names=windows : Modify filenames so that they will work in Windows as well.
Few other useful options:
- --domains saur.in : Don't follow links outside saur.in domain. Use comma-separated list in case of multiple domains.
- --no-parent : Don't follow links from parent URLs.
- --quiet : Prints no output to terminal (The default option is --verbose).
- --show-progress : Display progress bar.
- --timestamping : Don't re-retrieve files unless newer than local.
- --server-response : Print server response.
- --ignore-case : Ignore case when matching files/directories.
- --no-directories : Don't create directories. Save everything in current directory.
- --https-only : Only follow secure https links.
- --relative : Follow relative links only.