mIRC Homepage
Posted By: Loki12583 $urlget bugs / discussion - 12/03/19 11:47 PM
Bug: .redirect property contains original url when the response contains no redirect.

Bug: Trying to cancel a download with either $urlget(%id,c) or $urlget(%url,c) first returns 1 then crashes mIRC.

"Bug": Downloading to binvar is slow, ~2 MB/s after 10 seconds compared to 60 MB/s for download to file.

Enhancement request: Allow verbs other than GET/POST - PUT/PATCH/DELETE etc.
Posted By: Wims Re: $urlget bugs / discussion - 12/03/19 11:59 PM
Improvement:

Add a switch to allow redirection to be followed with depth, -dN with N = 0 for infinite redirection, or N > 0 for N redirection

It would be nice to be able to keep the socket alive if the server answers with a keep alive header.
The syntax would become:
Code:
$urlget([id],url,gpfbrtcdN,target,alias,headers,body)
and if "id" is specified (which can be easily recognized from an url), then the same socket is used if one is already in use for that id, (otherwise either use the id "passed" as the id "returned" if possible (and error out if already in use) or just ignore the id parameter and create a new id).
Posted By: SykO Re: $urlget bugs / discussion - 13/03/19 04:56 AM
Bug: $urlget(http://usr:pass@host:port,gf,target,alias) fails

compared to:
curl --request GET --url http://usr:pass@host:port
Posted By: Khaled Re: $urlget bugs / discussion - 13/03/19 09:28 AM
Quote:
Bug: .redirect property contains original url when the response contains no redirect.

Ah, right. I was wavering between calling this ".urlfinal" to indicate the final URL or ".redirect" to represent a redirect. Another option was to simply update .url to store the final URL. But I think .redirect makes the most sense. The .redirect property will be changed in the next beta to be empty unless a redirect takes place.

Quote:
Bug: Trying to cancel a download with either $urlget(%id,c) or $urlget(%url,c) first returns 1 then crashes mIRC.

I have not been able to reproduce this yet. I have tried starting multiple $urlget()s and cancelling them repeatedly and mIRC has not crashed yet. Can you show me the $urlget() call you are using to initiate the download?

Quote:
"Bug": Downloading to binvar is slow, ~2 MB/s after 10 seconds compared to 60 MB/s for download to file.

Hmm. There is only one download routine. The only difference between file and &binvar is that file is written to during the download, whereas &binvar is set at the end. So, technically, file should be slower. In my tests, I actually had to make "file" cache downloads up to a certain amount because Windows Anti-Virus was scanning the file after each write. Can you provide two $urlget() calls that reproduce this issue?
Posted By: Khaled Re: $urlget bugs / discussion - 13/03/19 09:31 AM
Quote:
It would be nice to be able to keep the socket alive if the server answers with a keep alive header.
The syntax would become:

I'm afraid this is not possible. Each download is completely independent/encapsulated and cannot be re-used.
Posted By: Khaled Re: $urlget bugs / discussion - 13/03/19 09:33 AM
Quote:
Bug: $urlget(http://usr:pass@host:port,gf,target,alias) fails

I have not been able to reproduce an issue with this. I tried the above call on a password protected http folder and it passed authentication and downloaded the file without any issues. Do you mean to say that usr:pass is not working? Or that there is an issue with :port? Or something else?
Posted By: Raccoon Re: $urlget bugs / discussion - 13/03/19 09:40 AM
Is there a means to handle response headers with a Content-Disposition recommended filename, so that filename can be used or later /renamed? eg, http://example.com/get?100 -> somefile.mp3
Posted By: Khaled Re: $urlget bugs / discussion - 13/03/19 09:43 AM
Quote:
Add a switch to allow redirection to be followed with depth, -dN with N = 0 for infinite redirection, or N > 0 for N redirection

Currently, $urlget() gives up after 10 redirects. It does not detect cyclical redirections. As far as I know, most browsers have a redirect limit of between 10 to 20 redirects. Instead of adding an option for this, I would rather make it behave in a standard way. I could increase the limit to 20 but 10 seems reasonable? I would not want to allow infinite redirects.
Posted By: Khaled Re: $urlget bugs / discussion - 13/03/19 09:54 AM
Quote:
Is there a means to handle response headers with a Content-Disposition recommended filename, so that filename can be used or later /renamed? eg, http://example.com/get?100 -> somefile.mp3

Nope. $urlget() allows you to specify no dir/filename, a dir, a filename, or a dir + filename. In all cases, it will either use the dir/filename you specify or will determine these itself from your DCC folders, the URL path, the redirect path, etc. If resume is not used, it will add an incrementing number to the filename if it already exists. For example: //echo result: $urlget(https://www.mirc.com/get.php,gf,,alias).

Update: there is an issue where making repeated calls to the above non-resume $urlget() results in some calls failing due to identical filenames being used. Fixing this required moving the non-resume filename check to a different point in the download. This change will be in the next version.
Posted By: Raccoon Re: $urlget bugs / discussion - 13/03/19 10:35 AM
Consider this example. Again, it's the Content-Disposition response header that's intended for a server specified filename. It's the correct method that replaces the redirct method.

https://www.oldtimeradiodownloads.com/download/get_file/79609

glenn-miller-glenn-millers-music-39-06-13-first-song-at-sundown.mp3

Code:
cache-control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
cf-ray: 4b6d5b87aa8c98ad-LAX
content-disposition: attachment; filename=glenn-miller-glenn-millers-music-39-06-13-first-song-at-sundown.mp3
content-transfer-encoding: Binary
content-type: application/octet-stream
date: Wed, 13 Mar 2019 10:34:50 GMT
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
expires: Thu, 19 Nov 1981 08:52:00 GMT
pragma: public
server: cloudflare
set-cookie: PHPSESSID=f221d67c468d8d1037b0ba215d52cb7f; expires=Thu, 14-Mar-2019 05:35:15 GMT; Max-Age=86400; path=/
set-cookie: countPage=34
set-cookie: session_database=dfefb85ec34e0dd0cf90214b4c1eccabfcb716c5%7E5c8896946d9840-67444800; path=/
status: 200
x-powered-by: PHP/5.6.30
Posted By: Loki12583 Re: $urlget bugs / discussion - 13/03/19 12:26 PM
The crash happens only with a binvar target, I have PM'd you links to two large files for testing

Also, where are these temp files in case I now have many partial files downloaded due to crash testing?

Code:
; Call /urlget.test %url

alias urlget.test {
  var %url = $iif($1,$1,http://localhost/)
  bset -t &header 1 Test: Header
  bset -t &body 1 foo1=bar1&foo2=bar2

  var %id = $urlget(%url,gb,&target.dat,urlget.callback,&header) | ; Change from binvar to file for full speed
  echo 4 -ag %id
  timer 1 5 urlget.callback %id
  ;timer 1 6 echo 4 -ag here: $!urlget( %id ,c)    | ; Uncomment to crash on binvar test
  timers
}

alias urlget.callback {
  var %id = $1

  echo -agi9 url      $urlget(%id).url
  echo -agi9 redirect $urlget(%id).redirect
  echo -agi9 method   $urlget(%id).method
  echo -agi9 type     $urlget(%id).type
  echo -agi9 target   $urlget(%id).target
  echo -agi9 alias    $urlget(%id).alias
  echo -agi9 id       $urlget(%id).id
  echo -agi9 state    $urlget(%id).state
  echo -agi9 size     $urlget(%id).size
  echo -agi9 resume   $urlget(%id).resume
  echo -agi9 rcvd     $urlget(%id).rcvd
  echo -agi9 time     $urlget(%id).time
  echo -agi9 reply    $urlget(%id).reply

  echo 4 -agi9 speed    $bytes($calc($urlget(%id).rcvd * 1000 / $urlget(%id).time)).suf $+ /s

  if ($urlget(%id).type == binvar) && ($bvar($urlget(%id).target)) {
    echo -agi9 response $bvar($urlget(%id).target,1-3000).text
  }
}
Posted By: Khaled Re: $urlget bugs / discussion - 13/03/19 02:52 PM
Quote:
Consider this example. Again, it's the Content-Disposition response header that's intended for a server specified filename.

Thanks, I am aware of this header, however the WinInet Query Info page lists HTTP_QUERY_CONTENT_DISPOSITION as obsolete. Puzzling. I will see if I can add support for it in the next version.
Posted By: Khaled Re: $urlget bugs / discussion - 14/03/19 12:49 AM
Quote:
Also, where are these temp files in case I now have many partial files downloaded due to crash testing?

If you are telling $urlget() to save the result to a &binvar, no files are created. It will save the result to the &binvar.

If you are downloading a large file to a &binvar, I expect it would be very easy for your system to run out of memory, resulting in a crash.
Posted By: SykO Re: $urlget bugs / discussion - 14/03/19 04:54 AM
urs:pass seems to be the problem. I currently tested sending Authorization header as follows:
Code:
bset -t &header 1 Authorization: Basic $encode(usr:pass,m)
$urlget(http://localhost:port/,gf,&target,noop,&header)


and it works
Posted By: Khaled Re: $urlget bugs / discussion - 14/03/19 04:02 PM
Quote:
urs:pass seems to be the problem

Thanks for testing this out. Unfortunately, I have not been able to reproduce this issue yet.

I tested this feature by creating a password protected folder on a website (through cPanel, htaccess, etc.). When I called $urlget() with the correct user:pass, it downloaded the page. When I called it with the wrong user:pass or none at all, it failed.

If I use SmarSniff to look at the packets sent/recieved, it shows the correct Authorization Basic header being sent.

If I then send the header using /bset, as in your example, it sends the same header.

Both methods work for me. The difference is that $urlget() currently uses WinInet to handle the authorization.

Which version of Windows are you using?
Posted By: Khaled Re: $urlget bugs / discussion - 14/03/19 04:32 PM
Quote:
The crash happens only with a binvar target, I have PM'd you links to two large files for testing

Thanks, this issue has been fixed for the next version.
Posted By: Khaled Re: $urlget bugs / discussion - 14/03/19 06:24 PM
Quote:
"Bug": Downloading to binvar is slow, ~2 MB/s after 10 seconds compared to 60 MB/s for download to file.

I narrowed this down to the use of realloc() to repeatedly extend memory to store the downloaded bytes. The larger the memory, the slower realloc() gets. Pre-allocating large chunks helps a little.

Switching to a linked list implementation to store downloaded bytes makes it fast, however this leads to another problem - it needs to be reassembled at the end, which means first allocating contiguous memory to store the entire download in the binvar, effectively requiring double the amount of memory during the process.

(Currently, the memory pointer allocated during the download is assigned directly to the binvar structure, so no extra memory is needed)

In short, there does not seem to be an ideal solution to this - if we want to make the download available as a &binvar, we can opt for fast speed, double memory use, or slow speed, low memory use. Or we could just remove &binvar support and let scripters save to a file and load it as a &binvar if they need to.
Posted By: Raccoon Re: $urlget bugs / discussion - 14/03/19 06:45 PM
I'd go for a compromise where, if the download is larger than say 1 megabyte, then the download goes to a temp file and is loaded back into &binvar when complete. You decide when realloc() becomes too clumsy and slow -- 1 mb? 32 mb?

&binvar is going to be most handy for people performing page scraping, where the html/xml they're scraping never even approaches 1 mb in size.
Posted By: Protopia Re: $urlget bugs / discussion - 14/03/19 07:19 PM
I would imagine there are four use cases, depending on whether there is a Content-Length field in the response header:

If we know what the content size is from the header, we should be able to allocate a &binvar of the right size from the start.

It is only when there is no content-length header that we potentially need to allocate memory several times or have multiple copies in memory.

P.S. It might be sensible to extend $urlget to include a maximum size - after which we terminate the download. Sometimes you are only interested in the <head> part of a web page. Sometimes you only want to look at the beginning of a file to determine its content type. However this would potentially avoid situations for someone downloading a file without realising that it is way to big to fit into a &binvar or way too big to download in a reasonable timeframe.
Posted By: Loki12583 Re: $urlget bugs / discussion - 14/03/19 08:28 PM
Allow use of the HEAD method and an option to call the alias after every state change (including after headers are received)? This way you can make some decisions without the need to have an arbitrary /timer

Edit: States similar to here? https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/readyState
Posted By: Khaled Re: $urlget bugs / discussion - 16/03/19 09:44 AM
Originally Posted By: SykO
urs:pass seems to be the problem. I currently tested sending Authorization header as follows:
Code:
bset -t &header 1 Authorization: Basic $encode(usr:pass,m)
$urlget(http://localhost:port/,gf,&target,noop,&header)

and it works

Is anyone else experiencing an issue with using user:password in $urlget()? I have tried to reproduce SykO's issue but in my tests, $urlget() is correctly logging into password protected folders.
Posted By: Loki12583 Re: $urlget bugs / discussion - 16/03/19 12:35 PM
I have tried user:pass even on a url which returns a redirect and still had no problem
Posted By: SykO Re: $urlget bugs / discussion - 16/03/19 05:58 PM
Might just be on my end, but I do find it weird that using the header directly works.

My OS:
Edition: Windows 10 Education
Version: 1803
OS build: 17134.648
Posted By: Khaled Re: $urlget bugs / discussion - 17/03/19 12:28 PM
Originally Posted By: SykO
Might just be on my end, but I do find it weird that using the header directly works.

My OS:
Edition: Windows 10 Education
Version: 1803
OS build: 17134.648

Hmm. If it isn't working for you, it will likely not work for others as well.

If you use SmartSniff to monitor http packets, can you see if mIRC is sending the Basic Authorization header? This will only be visible with http and not https. For https, you would need to install something like Fiddler (which installs a root certificate to enable https decryption).

Also, can you let me know the type of server you are logging into? Is it linux/windows/apache/cpanel/amazon/etc.? I might be able to set up a similar service to test it out.
Posted By: 0nslaught Re: $urlget bugs / discussion - 17/03/19 03:40 PM
Originally Posted By: Khaled
Quote:
Add a switch to allow redirection to be followed with depth, -dN with N = 0 for infinite redirection, or N > 0 for N redirection

Currently, $urlget() gives up after 10 redirects. It does not detect cyclical redirections. As far as I know, most browsers have a redirect limit of between 10 to 20 redirects. Instead of adding an option for this, I would rather make it behave in a standard way. I could increase the limit to 20 but 10 seems reasonable? I would not want to allow infinite redirects.


How about a switch to disable auto-redirects?
Posted By: Loki12583 Re: $urlget bugs / discussion - 13/04/19 12:01 AM
$urlget() does not process relative redirects properly. It seems to simply append the location to "scheme://domain/". It should instead construct an effective uri as described here:

https://tools.ietf.org/html/rfc7231#section-7.1.2
https://tools.ietf.org/html/rfc7230#section-5.5

Bug 1: $urlget() reverts to default port (80/443) when processing relative redirect from non-default port: http://localhost:8080/

Bug 2: $urlget() does not construct effective request uri. A request to "/relative/sub" with a redirect "./sub2" should create a new request for "/relative/sub2"

Bug 3: $urlget() adds an extra "/" with relative redirect (same as bug 2 really, simply appending instead of constructing the effective uri)

I have tested $urlget() against an actual nginx server using relative redirects and against a local mirc implementation (below). Once the local servers are listening Chrome dev tools can be used to cross reference behavior: http://localhost:8080/relative

Quote:
Request on 8080
> GET /relative HTTP/1.1
> Accept: */*
> Accept-Encoding: gzip, deflate
> User-Agent: mIRC
> Host: localhost:8080
> Connection: Keep-Alive
> Cache-Control: no-cache
< HTTP/1.1 302 Temporary Redirect
< Location: /relative/sub
< Connection: close
< Content-Length: 0
<
-
Request on 80
> GET //relative/sub HTTP/1.1
> Accept: */*
> Accept-Encoding: gzip, deflate
> User-Agent: mIRC
> Host: localhost
> Connection: Keep-Alive
> Cache-Control: no-cache
< HTTP/1.1 302 Temporary Redirect
< Location: ./sub2
< Connection: close
< Content-Length: 0
<
-
Request on 80
> GET /./sub2 HTTP/1.1
> Accept: */*
> Accept-Encoding: gzip, deflate
> User-Agent: mIRC
> Host: localhost
> Connection: Keep-Alive
> Cache-Control: no-cache
< HTTP/1.1 200 OK
< Connection: close
< Content-Length: 7
< failure
-
url      http://localhost:8080/relative
redirect http://localhost/./sub2
method   get
type     binvar
target   &target
alias    urlget.callback
id       1066
state    ok
size     7
resume   0
rcvd     7
time     296
reply    HTTP/1.1 200 OKConnection: closeContent-Length: 7
response failure


Code:
alias urlget.test {
  urlget.listen
  
  var %url = $iif($1,$1,http://localhost:8080/relative)
  var %id = $urlget(%url,gb,&target,urlget.callback,)
}

alias urlget.callback {
  var %id = $1
  
  echo -ag -
  echo -agi9 url      $urlget(%id).url
  echo -agi9 redirect $urlget(%id).redirect
  echo -agi9 method   $urlget(%id).method
  echo -agi9 type     $urlget(%id).type
  echo -agi9 target   $urlget(%id).target
  echo -agi9 alias    $urlget(%id).alias
  echo -agi9 id       $urlget(%id).id
  echo -agi9 state    $urlget(%id).state
  echo -agi9 size     $urlget(%id).size
  echo -agi9 resume   $urlget(%id).resume
  echo -agi9 rcvd     $urlget(%id).rcvd
  echo -agi9 time     $urlget(%id).time
  echo -agi9 reply    $urlget(%id).reply

  if ($urlget(%id).type == binvar) && ($bvar($urlget(%id).target,0)) {
    echo -agi9 response $bvar($urlget(%id).target,1-3000).text
  }
}

alias urlget.listen {
  if (!$sock(urlget.listen)) socklisten -d 127.0.0.1 urlget.listen 80
  if (!$sock(urlget.listen2)) socklisten -d 127.0.0.1 urlget.listen2 8080
}

on *:socklisten:urlget.listen*:{
  echo -ag -
  echo 4 -ag Request on $iif($sockname == urlget.listen,80,8080)
  var %sockname = urlget.client. $+ $ticks
  if ($sock(%sockname)) return

  sockaccept %sockname
}

on *:sockread:urlget.client.*:{
  var %header

  if (!$sock($sockname).mark) {
    sockread %header
    while (%header != $null) {
      if ($regex(%header,/GET (\S+)/)) {
        var %request = $regml(1)
      }
      echo 3 -ag > %header
      if (sub isin %header) %x = $true
      if ($regex(%header,Content-Length: (\d+))) {
        hadd -m $sockname content-length $regml(1)
      }
      sockread %header
    }
    if ($sockbr) sockmark $sockname $true
  }

  if ($sock($sockname).mark) && ($sock($sockname).rq) {
    sockread &read

    while ($sockbr) {
      hinc $sockname content-read $sockbr
      echo 6 -agi2 > $bvar(&read,1-3000).text

      sockread &read
    }
  }

  if ($hget($sockname,content-length) == 0) || ($v1 == $hget($sockname,content-read)) {
    var %redirect, %data

    if      (/relative/sub2 == %request)  %data = success
    else if (/relative/sub isin %request) %redirect = ./sub2
    else if (/relative isin %request)     %redirect = /relative/sub
    else %data = failure

    noop $socket.respond($sockname,%data,%redirect)
  }
}

alias -l sockwrite {
  echo 12 -ag < $3-
  sockwrite $1-
}

alias -l socket.respond {
  var %sockname = $$1, %data = $2, %redirect = $3
  if ($3) sockwrite -n %sockname HTTP/1.1 302 Temporary Redirect
  else    sockwrite -n %sockname HTTP/1.1 200 OK
  if ($3) sockwrite -n %sockname Location: %redirect
  sockwrite -n %sockname Connection: close
  sockwrite -n %sockname Content-Length: $len(%data)
  sockwrite -n %sockname $+($crlf,%data)
}
Posted By: Khaled Re: $urlget bugs / discussion - 16/04/19 03:24 PM
Quote:
$urlget() does not process relative redirects properly.

Thanks for testing these out.

Fixing these has turned out to be more complicated than I expected. I am trying to use standard windows APIs for URL parsing but their limitation means that I need to either write my own custom URL parsing routine, which is not advisable, or use a URL parsing library, of which many exist at differing levels of sophistication and size, ranging from tens of lines to thousands of lines of code. In addition, it's hard to know how well-tested these libraries are. Some of them implement recent security fixes.

As for relative redirects, again, I really should be using rfc-compatible, established, tested code instead of writing my own. I will need to look into this.
Posted By: Khaled Re: $urlget bugs / discussion - 19/04/19 05:44 PM
Thanks for the test script. The latest beta implements several changes to $urlget() that should fix the issues mentioned in your post.
© mIRC Discussion Forums