mIRC Home    About    Download    Register    News    Help

Print Thread
#232780 21/06/11 09:57 PM
Joined: Apr 2010
Posts: 969
F
Hoopy frood
OP Offline
Hoopy frood
F
Joined: Apr 2010
Posts: 969
if you use $N[-[n]] in the replacement for regsubex, $N is treated as \N instead of the Nth token in $1-.

There's one of two ways around this, the first is to use variables.
The second, is using []'s which can be quite exploitable:
Code:
alias exp {
  tokenize 1 $!findfile($mircdirtest,*,0,return $!?!="See?")
  echo -a Input: $1

  ; This evaulates $1, then evaluates the contents of $1
  echo -a Subbed With []'s:    $regsubex(a,$1-,/^.*(See).*$/i,\1 = [ $1 ] )

  ;This treats $1 as though it were \1
  echo -a Subbed Without []'s: $regsubex(b,$1-,/^.*(See).*$/i,\1 = $1 )

  echo -a Correct Output: See = $!findfile($mircdirtest,*,0,return $!?!="See?")
}


I am SReject
My Stuff
Joined: Jul 2006
Posts: 4,145
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,145
I also would like to see this old issue fixed, it shouldn't use the same array as the one for $N


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
I think there's a technical reason / limitation that $N is used internally during regex. It might be avoidable (read: it's most likely avoidable), but I'm not too sure it's worth avoiding. It might involve a lot more code complexity and introduce more bugs, simply to add a small extra convenience. I also don't see what's so bad about the "workarounds".

I actually see this as a feature, not a bug. Other languages like Perl put the regex match inside $N, although of course $N is dedicated to matches in those languages. In some sense, you can see $N inside regex as a tokenized list of the resulting matches. Of course this means wiping out the previous $N, but the behaviour is similar to $v1/$v2 inside nested ifs (or even more appropriately, $1- inside of $findfile()). You could see the replace part of $regsubex as a nested tokenization context for $N.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
argv0 #232789 22/06/11 05:01 AM
Joined: Jul 2006
Posts: 4,145
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,145
I just totally disagree.

Quote:
I think there's a technical reason / limitation that $N is used internally during regex
I'm really curious, what would be the limit or the technical reason that would make Khaled use the internal data of $N for \N, instead of another "internal data" ?

Quote:
It might involve a lot more code complexity and introduce more bugs, simply to add a small extra convenience
If the actual implementation use $N and is working, where is the complexity, and why would it introduces more bugs ?

Quote:
I also don't see what's so bad about the "workarounds".
Using the [ ] isn't an option since it would double evaluate the content (and working around that problem itself would make the code very ugly), same for the variable way, it's cool if you use $1 once, but if you use $1 $2 $3 $4 $N you have to create $N variable, not really a problem, but having to use a workaround here mean that there's a problem somewhere with the actual implementation, it is what is so bad.

Quote:
I actually see this as a feature
What is the feature exactly, losing the previous tokenization ? There's already \N to get the matches, why would you use $N or see $N here as a feature ? How is that comparable to $v1 and $v2 ? It's completely normal to have a new value for $v1 and $v2 after each if statement, it's not normal at all to have $N being the same as \N in that case. The $findfile example is also not comparable, you can retreive the value of the previous tokenization inside the command parameter and still get the path too, (it is clearly stated in the help file that $1- will be filled with the path anyway, but using an identifier as the first token will make $N avalaible, and $!N will return the path)

Last edited by Wims; 22/06/11 05:10 AM.

#mircscripting @ irc.swiftirc.net == the best mIRC help channel
argv0 #232790 22/06/11 05:02 AM
Joined: Mar 2010
Posts: 57
W
Babel fish
Offline
Babel fish
W
Joined: Mar 2010
Posts: 57
Originally Posted By: argv0
I think there's a technical reason / limitation that $N is used internally during regex. It might be avoidable (read: it's most likely avoidable), but I'm not too sure it's worth avoiding. It might involve a lot more code complexity and introduce more bugs, simply to add a small extra convenience. I also don't see what's so bad about the "workarounds".


Without seeing any actual code, we are just throwing wild guesses. However mIRC does use $N (and $+ although we can't really touch that one) internally piping it through the interpreter to perform the substations. This was probably the easiest way to accomplish this. It was likely avoidable (perhaps still is).

Originally Posted By: argv0
I actually see this as a feature, not a bug. Other languages like Perl put the regex match inside $N, although of course $N is dedicated to matches in those languages. In some sense, you can see $N inside regex as a tokenized list of the resulting matches. Of course this means wiping out the previous $N, but the behaviour is similar to $v1/$v2 inside nested ifs (or even more appropriately, $1- inside of $findfile()). You could see the replace part of $regsubex as a nested tokenization context for $N.


I have to strongly disagree with you on that one, it makes simple operations become overly convoluted. A simple `return $regsubex(...., $1 foobar $2 foobar $3)' for example has to be transformed into `var %a = $1, %b = $2, %c = $3 | $regsubex(...., %a foobar %c foobar %c)' for no real reason other than the fact $N are being forcefully edited without our control (or perhaps that's what you call a feature). Most scripts out there resort to some variation of that, mostly by storing the entire replacement in one variable and using that variable as the substitution. This should absolutely not be necessary, unless of course: it's a feature!

Languages that use $N replacement have that as a dedicated feature, they don't alter values of thing unwillingly – ie they aid, not hinder usability.

Don't compare apples to oranges, they don't behave the same. $N is nothing like $v1/$v2. They don't change values during the comparisons itself, only after, which is exactly what these identifiers should be doing. ((1 == 2) || ($v2 == 2)), we know $v2 is 2 and we also know it won't magically change to something we don't want it to.

Lastly, using evaluation brackets alone to place the content of $N before the substation is completely out of the question, as the $regsubex will evaluate them again allowing for mSL injections to take place. We *could* work around the 'work around' by telling mIRC to not evaluate the content of the identifiers we previously evaluated via some hackey crap like $regsubex(foo, /(.)/, $( [ $1 ] ,0) ). Of course this still isn't bulletproof as someone could easily enter something along the lines of ' $ip)(' and be able to evade our trick.

Last edited by Wiz126; 22/06/11 05:37 AM.
Wims #232791 22/06/11 05:35 AM
Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
Originally Posted By: Wims
If the actual implementation use $N and is working, where is the complexity, and why would it introduces more bugs ?


It's hard to say, but judging by the use of $N, it's likely that mIRC uses some internal "tokenize" function on the resulting data, leading to $1- being filled-- if so, mIRC is using this tokenized data internally to return the data for the external \N API. That means that mIRC would have to change how it behaves internally, which clearly can lead to regressions that could even end up affecting $1- more globally, because one of these methods of refactoring the could would mean changing how the internal tokenize function itself works.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Wiz126 #232792 22/06/11 05:49 AM
Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
Originally Posted By: Wiz126
Don't compare apples to oranges, they don't behave the same. $N is nothing like $v1/$v2. They don't change values during the comparisons itself, only after, which is exactly what these identifiers should be doing. ((1 == 2) || ($v2 == 2)), we know $v2 is 2 and we also know it won't magically change to something we don't want it to.


How is this an apples to oranges comparison? It's a situation where global state can be modified based on control flow. $v2 will "magically change" through a subsequent if statement the same way $1 will "magically change" through a subsequent tokenization context. The difference is you're not expecting $regsubex to create a new tokenization context-- but such a behaviour is not unique to $regsubex in mIRC; $findfile does the same thing. In short, there is precedent for $N- being wiped inside identifiers for whatever reason. In this case the reason is admittedly less useful to the user, and I certainly see it as a limitation, but I'm not convinced it's a behaviour that needs changing.

I'm dubious about your "convoluted" argument. Sure, there are some isolated situations where a $regsubex becomes more complicated than it could be if $N worked. Arguing about convenience syntax is a slippery slope though, and we could make that argument in many many places. How much easier would it be if we could specify nicknames in ON TEXT event as matchers-- or network names? Certainly easier-- is it worth the trouble? *shrug*.

How often does a situation really arise where you're using 3 or more $N values inside of a regex? Most likely situations like that are complicated enough already and have some kind of local variable setup phase, so throwing in a few new variables isn't a big deal.

Of course this is all as hypothetical as your example. We don't really know how often this bites people. My experience is completely anecdotal and just one small data point, but I can't remember this ever being a problem for me personally. In such a case, I'd opt for caution-- it seems smarter to choose a robust mIRC over the benefit of fixing the syntax in one specific isolated (potentially uncommon) situation.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Feb 2006
Posts: 546
J
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
the function of $regsubex(), as we all know, is to perform substitutions first then evaluate afterwards. this must have originally posed a problem:

Code:
$regsubex($chr(40), /(.)/g, $len(\1))


if substitutions were performed immediately, with no intermediate steps such as there now are, this would of course throw an error ("$len(()" -> invalid format). so, clearly something needs to be done to the substitution parameter before it gets passed to the interpreter to avoid double evaluations and other undesirable behaviour. we could suggest that a modification be made to the core evaluation function, such that any regsubex markers encountered as tokens will have their values resolved as appropriate. this way, for example, '$len(\1)' gets passed to the interpreter, and '\1', which would ordinarily be plaintext, would instead evaluate to the first backref.

this would solve the reported problem, but would require changes made to a core function which must already be massively complex. we can forgive Khaled for implementing it the way he has: no changes to the core function, and manipulating the substitution parameter in such a way that it can be passed to the existing evaluator and have it yield the correct output.

currently, regsubex markers are replaced from left to right as $1, $2, $3, etc. (with optional $+s to ensure correct output) and then the result is evaluated in an environment where each $N returns the value of the corresponding marker. this obviously stops any pre-existing $N in the substitution parm from evaluating correctly. in hindsight, we can groan about the limitations of this approach, but can we suggest a decent alternative? the truth is, there isn't really a much more sensible alternative that works with the current substitution model. obscure variable names? some hidden internal identifier that functions as $1- (eg. $`~(1) $`~(2)) :P?

there is, however, a solution that doesn't involve much of a change to Khaled's existing code. we know that he iterates through the substitution parm looking for markers such as \1, \2, \0, \a, \n, etc. and maps one marker to one $N. why not include any pre-existing $N in this mapping? for example:

Code:
//tokenize 32 x | $regsubex(abc, /(a)/, \1 - $1)


then for the substitution parm, \1 gets converted to $1 as usual, but this time it takes $1 in the parm and turns it into $2, then sets up the sub parm for evaluation in an environment where $1 = <value of \1> and $2 = <value of original $1>. ie:

Code:
//tokenize 32 x | $regsubex(abc, /(a)/, \1 - $1)

; set up substitution parm and have $1 = a, $2 = x

... /!returnex $1 - $2



if Khaled wants to fix this problem with little additional work, i think this is the most comfortable approach to doing so.

edit: just realized an obvious drawback of my proposal ;( the replacing of regsubex markers is clearly different than the evaluation of $N (or identifiers in general) in a line of code. currently, a substitution parm equal to 'a\1a' constitutes a use of \1, but 'a$2a' should be treated as plaintext and shouldn't be replaced by 'a $+ $1 $+ a'. so it wouldn't be a simple addon to the existing method, but still manageable. but then there's '$eval($2, 0)' which would get converted to '$eval($1, 0)' and evaluated to '$1'. yes, a similar problem arises presently with such things as:

Code:
//echo -a $regsubex(a, /a/, $eval(\1, 0))


but i suspect '$eval(<code with $N>, 0)' is more commonly used than '$eval(<code with regsubex markers>, 0)'


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
jaytea #232795 22/06/11 06:11 AM
Joined: Dec 2002
Posts: 344
D
Pan-dimensional mouse
Offline
Pan-dimensional mouse
D
Joined: Dec 2002
Posts: 344
Assuming that you are correct about the implementation, then why not use some unused identifiers ($temp1, $temp2, $temp3 -- name doesn't matter) which only exist in limited scope as opposed to overriding the token identifiers?

Last edited by drum; 22/06/11 06:15 AM.
drum #232796 22/06/11 06:31 AM
Joined: Feb 2006
Posts: 546
J
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
Originally Posted By: drum
Assuming that you are correct about the implementation, then why not use some unused identifiers ($temp1, $temp2, $temp3 -- name doesn't matter) which only exist in limited scope as opposed to overriding the token identifiers?


yeah, that's what i meant by $`~(1) $`~(2) and such :P we can only speculate, but i assume $1- was just the simplest choice at the time, and one that involved little to no modification of other code.


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
jaytea #232798 22/06/11 12:27 PM
Joined: Jul 2006
Posts: 4,145
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,145
Quote:
but can we suggest a decent alternative? the truth is, there isn't really a much more sensible alternative that works with the current substitution model. obscure variable names? some hidden internal identifier that functions as $1- (eg. $`~(1) $`~(2)) :P?
Seems better to me too, than using $N.
To avoid the problem of creating a new $identifier, to quickly change the way it handles this, that could be a custom one it could be $*N, it's not possible to create such a custom identifier actually, so Khaled could use that instead of $N


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Wims #232824 25/06/11 06:48 PM
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
As jaytea said later though, using any identifier other than $N would involve fiddling with the core evaluation routine, something Khaled would rather avoid unless there's a very good reason. Maybe this is good enough a reason, but it's easy to be brave if you've never seen the beast (that is the scripting engine).


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
qwerty #232829 25/06/11 09:06 PM
Joined: Jul 2006
Posts: 4,145
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,145
I can understand that some bugs might involve too much work on some evil part of the engine, but we are not going to stop reporting bugs because they involve too much work, are we ?
I first answered saying that I would like to see this fixed because I know it means "fiddling with the core evaluation routine".
I know the chance it has to be fixed are null, but the same way Khaled might add a feature because people request it, he might fix a bug because people report it.
I'm just supporting the idea that this needs to be fixed, only Khaled decides if it's a good reason to touch the code wink


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Wims #232831 25/06/11 09:26 PM
Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
Nobody is saying that these shouldn't be reported. I think qwerty is simply explaining why this might not be a good idea to touch, but more importantly, why it's not entirely a bug, but a limitation of the engine. No matter what Khaled chooses as a new sequence of characters in the changed eval code (if it gets touched), there will be a reserved sequence that is unusable-- this will inevitably cause problems for some users.

You might think $`~(1) will never be used by anybody, but eventually someone just might come right back to this very forum with a new "bug" saying that weird things happen when they use $`~(1) in their replacement.

Therefore the reserved sequence might as well be something that is common enough to at least be predictable. The user who ends up running into a bug with $`~(N) is going to be orders of magnitude more confused about what the problem is than a user trying to use $N, because at least there is precedent for $N being reserved in such identifiers. Burying these gotchas deeper in the engine doesn't solve the problem, it just hides it from *you*, and that inevitably makes things less predictable... and unpredictable languages are just *bad*.

I think Khaled should simply document that $N shouldn't be used within $regsubex, that at least will solve the confusion surrounding this engine limitation.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
argv0 #232833 25/06/11 09:36 PM
Joined: Jul 2006
Posts: 4,145
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,145
Quote:
there will be a reserved sequence that is unusable-- this will inevitably cause problems for some users.

You might think $`~(1) will never be used by anybody, but eventually someone just might come right back to this very forum with a new "bug" saying that weird things happen when they use $`~(1) in their replacement.

Therefore the reserved sequence might as well be something that is common enough to at least be predictable. The user who ends up running into a bug with $`~(N) is going to be orders of magnitude more confused about what the problem is than a user trying to use $N, because at least there is precedent for $N being reserved in such identifiers. Burying these gotchas deeper in the engine doesn't solve the problem, it just hides it from *you*, and that inevitably makes things less predictable... and unpredictable languages are just *bad*.
And since the sequence must be an identifier because it needs to be evaluated by the engine, I proposed something that is already not usable, $*N, so that solution wouldn't cause problems to any user, and it makes nothing unpredictable, but yes, it involves more work

Quote:
I think Khaled should simply document that $N shouldn't be used within $regsubex, that at least will solve the confusion surrounding this engine limitation.
Indeed


#mircscripting @ irc.swiftirc.net == the best mIRC help channel

Link Copied to Clipboard