Advanced WGET on Windows

1. Download the unix tools from http://unxutils.sourceforge.net/ (there are a lot of really useful tools ported over to Windows, aside from wget - oh, and rename the UNIX util sort.exe to something like usort.exe...)

2. Add the .\unix\usr\local\wbin directory to your path.

Hit the 'windows key'+'Pause/Break' and select Environment Variables. Either create a new user variable named Path or edit the system Path variable. The Variable Name is 'Path' and the Variable Value should be appended to the existing path prepended with a semicolon or entered by itself if this is a new variable being created.

3. Setup your wgetrc file and save it to wherever you want to keep it. Make sure you edit the proxy server settings to reflect your proxy servers address.

       #############################

       ###
       ### Sample Wget initialization file .wgetrc
       ###

       ## You can use this file to change the default behaviour of wget or to
       ## avoid having to type many many command-line options. This file does
       ## not contain a comprehensive list of commands -- look at the manual
       ## to find out what you can put into this file.
       ##
       ## Wget initialization file can reside in /usr/local/etc/wgetrc
       ## (global, for all users) or $HOME/.wgetrc (for a single user).
       ##
       ## To use the settings in this file, you will have to uncomment them,
       ## as well as change them, in most cases, as the values on the
       ## commented-out lines are the default values (e.g. "off").


       ##
       ## Global settings (useful for setting up in /usr/local/etc/wgetrc).
       ## Think well before you change them, since they may reduce wget's
       ## functionality, and make it behave contrary to the documentation:
       ##

       # You can set retrieve quota for beginners by specifying a value
       # optionally followed by 'K' (kilobytes) or 'M' (megabytes).  The
       # default quota is unlimited.
       #quota = inf

       # You can lower (or raise) the default number of retries when
       # downloading a file (default is 20).
       #tries = 20

       # Lowering the maximum depth of the recursive retrieval is handy to
       # prevent newbies from going too "deep" when they unwittingly start
       # the recursive retrieval.  The default is 5.
       #reclevel = 5

       # Many sites are behind firewalls that do not allow initiation of
       # connections from the outside.  On these sites you have to use the
       # `passive' feature of FTP.  If you are behind such a firewall, you
       # can turn this on to make Wget use passive FTP by default.
       #passive_ftp = off

       # The "wait" command below makes Wget wait between every connection.
       # If, instead, you want Wget to wait only between retries of failed
       # downloads, set waitretry to maximum number of seconds to wait (Wget
       # will use "linear backoff", waiting 1 second after the first failure
       # on a file, 2 seconds after the second failure, etc. up to this max).
       waitretry = 10


       ##
       ## Local settings (for a user to set in his $HOME/.wgetrc).  It is
       ## *highly* undesirable to put these settings in the global file, since
       ## they are potentially dangerous to "normal" users.
       ##
       ## Even when setting up your own ~/.wgetrc, you should know what you
       ## are doing before doing so.
       ##

       # Set this to on to use timestamping by default:
       #timestamping = off

       # It is a good idea to make Wget send your email address in a `From:'
       # header with your request (so that server administrators can contact
       # you in case of errors).  Wget does *not* send `From:' by default.
       #header = From: Your Name 

       # You can set up other headers, like Accept-Language.  Accept-Language
       # is *not* sent by default.
       #header = Accept-Language: en

       # You can set the default proxies for Wget to use for http and ftp.
       # They will override the value in the environment.
       http_proxy = http://1.2.3.4:8080/
       #ftp_proxy = http://proxy.yoyodyne.com:18023/

       # If you do not want to use proxy at all, set this to off.
       use_proxy = on

       # You can customize the retrieval outlook.  Valid options are default,
       # binary, mega and micro.
       #dot_style = default

       # Setting this to off makes Wget not download /robots.txt.  Be sure to
       # know *exactly* what /robots.txt is and how it is used before changing
       # the default!
       #robots = on

       # It can be useful to make Wget wait between connections.  Set this to
       # the number of seconds you want Wget to wait.
       #wait = 0

       # You can force creating directory structure, even if a single is being
       # retrieved, by setting this to on.
       #dirstruct = off

       # You can turn on recursive retrieving by default (don't do this if
       # you are not sure you know what it means) by setting this to on.
       #recursive = off

       # To always back up file X as X.orig before converting its links (due
       # to -k / --convert-links / convert_links = on having been specified),
       # set this variable to on:
       #backup_converted = off

       # To have Wget follow FTP links from HTML files by default, set this
       # to on:
       #follow_ftp = off  

4. Set the environment variable so wget knows where to look for the wgetrc file. In a command shell type the following, changing the path to reflect where you saved your wgetrc file above, and hit enter.

set WGETRC=C:\tools\usr\local\wbin\wgetrc

5. Change your system or user environment variable so its set at boot or login time.

Hit the 'windows key'+'Pause/Break' and select Environment Variables. Create a new variable (user or system - which ever suits you and your permissions best). For the Variable Name enter 'WGETRC'. For the Variable Value enter the path to your wgetrc file from above and click OK.

6. Open up a command shell and enter a the following command replacing the necessary info with the output locations and URL you wish to retrieve.

wget -o c:\your_output_1.log -v -O c:\your_output_1.html --debug --proxy=on --proxy-user=user --proxy-passwd=password --server-response http://www.site.com/page-you-want.html

7. For more info on wget at a command shell type:

wget --help

This should get you working with wget on Windows thru your proxy server.

Malik

Reference: http://www.interlog.com/~tcharron/wgetwin.html


Non-Active Sitemap

Copyright © 2000-2014 Whitehats.ca
Contact Information 519.221.9132 : Web Contact webmaster@whitehats.ca