Download Entire Website With Wget

As per official website wget is : "GNU Wget is a free software package for retrieving files using HTTP, HTTPS, FTP and FTPS, the most widely used Internet protocols. It is a non-interactive command-line tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc."

In simple words, wget is a tool that can help us to retrieve contents from internet. On most Linux distributions wget comes preinstalled. wget is mostly used in cron jobs, get content from internet or to get data from API.

If you ever want to download an entire website form internet, wget can be used to complete the job:

$ wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --no-parent https://saur.in/

This command will download entire https://saur.in website to the current directory. All pages and child pages, css, js, images, videos everything and will convert links so that they work locally and off-line.

Options used are:

  • --recursive : Download the entire Web site. This will follow and download every links found on the website. If option --domains not specified it will also download files outside of domain.
  • --no-clobber : Don't overwrite any existing files. This is useful in two ways, 1) It will not download repeated links. 2) In case the download is interrupted and resumed.
  • --page-requisites : Get all the elements that compose the page (Images, CSS, JS, fonts, videos and so on).
  • --html-extension : Save pages with the .html extension.
  • --convert-links : Convert all links so that they work locally, off-line.
  • --restrict-file-names=windows : Modify filenames so that they will work in Windows as well.

Few other useful options:

  • --domains saur.in : Don't follow links outside saur.in domain. Use comma-separated list in case of multiple domains.
  • --no-parent : Don't follow links from parent URLs.
  • --quiet : Prints no output to terminal (The default option is --verbose).
  • --show-progress : Display progress bar.
  • --timestamping : Don't re-retrieve files unless newer than local.
  • --server-response : Print server response.
  • --ignore-case : Ignore case when matching files/directories.
  • --no-directories : Don't create directories. Save everything in current directory.
  • --https-only : Only follow secure https links.
  • --relative : Follow relative links only.

WordCamp Mumbai 2019

[Late Post] Organising a WordCamp involves great many things. I had my time already with WordCamp Vadodara. I wanted to experience and enjoy WordCamp as an attendee. Mumbai WordCamp was that chance to me. Few memories I have captured from WordCamp Mumbai 2019.

I met with an accident!

Its quite late post, but yes, that was unfortunate. On 30th Oct, I was driving back with my wife and daughter to Vadodara, via Godhra-Vadodara express highway, while crossing round-about on highway near Halol, A truck scratched and bumped into rear right door or my car. Fortunately we were safe and had no injuries. As per truck driver's view; he was driving without co-driver, and he missed judgement on his left side corners, and made this mistake.

Some pics from that accident.

WordCamp Vadodara 2019

On 12th Oct 2019, we have organised WordCamp Vadodara, at Lukshmi Villas Palace Banquet and Conventions, the iconic landmark of the city. It was a single day event and nearly 250+ WordPress community members, students and enthusiast has attended and made the event successful. Being co-organizer, it was unique and life long proud experience.

Continue reading "WordCamp Vadodara 2019"
Copy link
Powered by Social Snap