mIRC Home    About    Download    Register    News    Help

Print Thread
#266468 16/12/19 01:11 AM
Joined: Dec 2002
Posts: 252
T
Talon Offline OP
Fjord artisan
OP Offline
Fjord artisan
T
Joined: Dec 2002
Posts: 252
I recently been toying with some HTML scraping toying around with $urlget() and ran across a situation.

There's several parts of the document I only need to scrape out from inbetween the <head></head> tags and avoid processing the entire file. I'm using $read() to find the line to adjust the range of a /filter.

Certain pages, I got an error with noop with this line:

Code
noop $read($urlget($1).target,w,*</head>*)


Code
-
* /noop: line too long (line 293, aliases.ini)
-


which I don't need the line really, just $readn to know where to stop my /filter with the -r flag for range.

The odd part is, the workaround... I later wrapped the $read() within a bogus $regex() and it works without issue.

Code
noop $regex($read($urlget($1).target,w,*</head>*),//)


I just found it odd that there are identifiers that can handle larger line lengths than noop, which is basically like catting to null, literally no operation, it should be able to handle this since it's literally told to do nothing with it...

If you need a mock-up file for an example of what I'm doing to illustrate this behavior, let me know, I'll draft up some examples.

Joined: Dec 2002
Posts: 5,412
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,412
Thanks for your bug report. This behaviour does actually look right. What is happening is this:

In the non-regex call, the final command is /noop combined with the results of $read(). $read() is not returning an error (and never has for long strings). This combination exceeds the maximum line length allowed for a command. So an error is reported.

In the regex call, /noop is combined with nothing. The $read() is evaluated into a single parameter of maximum line length. This is passed directly to $regex() with // which results in an empty string. So the entire final command is just /noop.

Joined: Jul 2006
Posts: 4,145
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,145
$regex is returning an integer (// or empty regex always match so it would return 1), not empty string, but your points remain the same and the behavior is correct.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Dec 2002
Posts: 252
T
Talon Offline OP
Fjord artisan
OP Offline
Fjord artisan
T
Joined: Dec 2002
Posts: 252
you missed the point, obviously the regex returns a smaller value: 1 which is plenty short enough for noop. The point is the need for such a work-around as noop can't handle a line length that the regex identifier can. I'll have to do some testing with capture groups, maybe /(.)/g so the return is a match result larger than 1 and compare this to the actual line length in question to see where they differ in size-of string processed.

If this indeed the case, does this hold true for any and all identifiers? that they're automatically chomping a value to the maximum line length? Why can't noop follow this behavior as well? Thought noop was for when you need to do nothing with a return, hence no operation, much like linux/unix you'd cat to /dev/null for instance.

Joined: Dec 2002
Posts: 5,412
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,412
Quote
Thought noop was for when you need to do nothing with a return, hence no operation, much like linux/unix you'd cat to /dev/null for instance.

That is indeed what it is for. However, /noop is still a command and its parameters need to be evaluated. This means that a command line needs to be formed and if it is longer than the maximum allowed line length, it will report an error.

Joined: Aug 2003
Posts: 319
P
Pan-dimensional mouse
Offline
Pan-dimensional mouse
P
Joined: Aug 2003
Posts: 319
My obvious thought is the following:

Code
alias -l nil {return $null}

/noop $nil($read($urlget($1).target,w,*</head>*))


Would this solve the issue?


Link Copied to Clipboard