Batch downloading with REXX and WGET

Recent news, I had to reinstall SUSE linux on my 64-bit PC. So this means I have to get my tools that I use often, including REXX scripts. Previously I used a REXX script for reading a text file and downloading a series of files based on a filter, such as files x001.jpg to x030.jpg. The script did it effectively but was not flexible. So I decided to make the next generation version of the script. One that can handle more complex download sequences without modifying the code.

I took the modular approach, separating the program into two parts, the core download processing script and a higher level simple file filtering script that offers the ease of use as my old script.

The new core script is flexible but uses a more complex syntax that is very linux like, taking s syntax as follows:

dlfiles –start n1 –stop n2 –lz n3 –dest s1 –src s2

where….
n1 says where to start its numeric counter for downloading
n2 says where to stop its numeric counter for downloading
n3 says to use leading zeros and how many to use… eg… 001 002 003 … 099
s1 says to what directory to store the downloads
s2 is the source url of the downloads where the counting section is replaced with
a symbolic “~~”

Thus to download files x0001.txt to x0040.txt from a server the command to use would be

dlfiles –start 1 –stop 40 –lz 2 –src http://server/x00~~.txt

The second script called “getem” works by reading a specified text file and parsing the contents and then calling dlfiles with the correct parameters. Thus the file would contain records such as…
Docs 1, http://server1/doc~~.doc
Docs 2. http://server2/text~~.odt

It uses the dlfiles default settings… starting with 1 and ending with 30, no leading zeros but tells it to store the files in the Docs 1 and Docs 2 folders, if these folders do not exist, they are created by the download program WGET, of which it is dependent for downloading.

The main error that I got from this script was the limited parsing ability of REXX and its inability to handle parameters with spaces even if delimited by quotation marks, which resulted in that I had to create my own parsing code.

The end result is the following two programs that would be listed. Of course, the usual copyright notices require giving me credit if any of it is used for your legal purposes. If illegal, please do not include me in your illicit activities.

Btw. Why do I use REXX instead of sh or perl or python? Simple, I am an old OS/2 user, and a lot of the REXX programming I learned then is still my mainstay, as a result I install REXX on any computer I am using, including Windows and Linux machines. I currently use and support Open Object Rexx, as this is the direct IBM descendant as that was used on the OS/2 PCs and the OS/400 minicomputers.


— dlfiles —

#!/usr/bin/rexx
/*
   dlfiles (c) 2007 Dion Mohammed
*/

parse arg params
DEBUG = 0
v. = ""
call proc_args
s = 'wget --tries=10 --continue --timeout=10 --directory-prefix="'v.dest'/"'
c = v.start

do until c > v.stop
    parse value v.src with v.prefix "~~"v.postfix
    if v.lzeros\=0 then
    do
        fspec=v.prefix""right(c, v.lzeros,"0")""v.postfix
    end
    else
    do
        fspec=v.prefix""format(c)""v.postfix
    end
    dncmd=s' "'fspec'"'
    IF DEBUG=1 then
    do
       say "DLFILES ==> "dncmd
       say "V.SRC: "v.src
       say "PREFIX: "v.prefix
       say "POSTFIX:"v.postfix
    end
    else
       dncmd
    c = c + 1
end
exit

proc_args:
procedure expose params v.
    SE = "Syntax Error: "
    x=1
    i=1
    v.dest = "."
    v.start=1
    v.stop=30
    v.lzeros=0
    v.src=""

    do while x\=0
       y=pos("--", params, x)
       x=pos("--", params, y+1)
       if x=0 then
       do
           w.i = substr(params,y)
       end
       else
       do
           w.i = substr(params,y,(x-y))
       end
       i = i + 1
   end
   i = i - 1
   do c=1 to i
       parse value w.c with operand data
       select
           when operand="--start" then v.start=data
           when operand="--stop" then v.stop=data
           when operand="--lz" then v.lzeros=data
           when operand="--dest" then v.dest=data
          when operand="--src" then v.src=data
       otherwise
       end
   end

/* Validifying */
   if v.src="" then
   do
       say SE "Error processing Source"
       exit
   end
   if v.dest="" then
   do
       say SE "Error processing Destination"
       exit
   end
   if datatype(v.start)\="NUM" then
   do
       say SE "Start value must be numeric"
       exit
   end
   if datatype(v.stop)\="NUM" then
   do
       say SE "Stop value must be numeric"
       exit
   end
   if datatype(v.lzeros)\="NUM" then
   do
       say SE "Leading zeros must be numeric"
       exit
   end
   return

 


-- getem --

#!/usr/bin/rexx
/*
 GETEM (c) 2007 Dion Mohammed
 process download index file
*/
parse arg indexfile .
do while lines(indexfile)>0<
 currline = linein(indexfile)
 parse value currline with storepath", "fileurl
 cmdline = 'dlfiles --dest "'storepath'" --src "'fileurl'"'
 say "GETEM ==> "cmdline
 cmdline
end
exit

These are released for free use, with respect to the original owner of the source code. You are free to use and modify for your use, with acknowledgment given to me.