Admins eHow SysAdmin Tips & Tricks

April 17, 2012

How to create a mirror of a site using wget

Filed under: General,linux — Tags: , , , — admin @ 7:33 am

First you need to make sure you have the latest version of wget, some distros are still being released with older versions of wget which has some bugs regarding mirroring functionality. currently the latest version is 1.13.4, so if you don’t have the latest version, you can download and build it from following link :

ftp://ftp.gnu.org/gnu/wget/

after building the wget, make sure the latest version is being used :

wget -V

output :

GNU Wget 1.13.4 built on linux-gnu.

+digest +https +ipv6 -iri +large-file +nls -ntlm +opie +ssl/gnutls 

Wgetrc: 
    /usr/local/etc/wgetrc (system)
Locale: /usr/local/share/locale 
Compile: gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/usr/local/etc/wgetrc" 
    -DLOCALEDIR="/usr/local/share/locale" -I. -I../lib -I../lib -O2 
    -Wall 
Link: gcc -O2 -Wall -lgnutls -lgcrypt -lgpg-error -lz -lrt ftp-opie.o 
    gnutls.o ../lib/libgnu.a 

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Niksic <hniksic@xemacs.org>.
Please send bug reports and questions to <bug-wget@gnu.org>.

OK, you are good to go now, just execute the following command and relax 🙂

wget -mkp -e robots=off http://site

July 16, 2010

Find the fastest apt mirrors (repos) for debian lenny

Filed under: Debian — Tags: , , , , , , , — admin @ 3:21 pm
apt-get install netselect-apt
netselect-apt -n -s lenny

June 16, 2009

How to mirror a website on linux ?

Filed under: General,HTML — Tags: , , — admin @ 2:34 pm

You almost certainly have wget already. Try wget –help at the command line. If you get an error message, install wget with your Linux distribution’s package manager. Or fetch it from the official wget page and compile your own copy from source.

Once you have wget installed correctly, the command line to mirror a website is:

wget -m -k -K -E http://url/of/web/site

See man wget or wget –help | more for a detailed explanation of each option.

If this command seems to run forever, there may be parts of the site that generate an infinite series of different URLs. You can combat this in many ways, the simplest being to use the -l option to specify how many links “away” from the home page wget should travel. For instance, -l 3 will refuse to download pages more than three clicks away from the home page. You’ll have to experiment with different values for -l. Consult man wget for additional workarounds.

Powered by WordPress