mIRC Home    About    Download    Register    News    Help

Print Thread
#260851 25/06/17 02:23 PM
Joined: Jul 2006
Posts: 4,149
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,149
As I mentioned in this thread, we don't have access to the full match and its position made by the pcre engine, it's not that useful most of the time but in some cases it can.

I'd use it for a bot, to show the capturing group and what is matched, something like !regex /b+a.*c/ bbbbadddc would be shown as bbbbadddc for example.

My proposed syntax in that link is not so great though, it doesn't allow us to get the position of the full match.
So I suggest $regmlex([name],M,[N],full) to return the full match, where the .pos property could be used.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Jun 2008
Posts: 6
D
Nutrimatic drinks dispenser
Offline
Nutrimatic drinks dispenser
D
Joined: Jun 2008
Posts: 6
I also support this idea.

Joined: Feb 2006
Posts: 546
J
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
i also add my support because it can be useful and scripting this to work in the general case is impossible difficult but possible!

i finally found a way to do this that doesn't involve completely parsing the expression laugh

Code:
; $regexm(<string>, <regex> [, N])[.pos]

; Get the Nth value of match[0] (defaults to N = 1)
; Return its position with .pos

alias regexm {

  ; parse the full regex as mIRC does
  noop $regex(full, $2, /^(?|m(.)?|(/)|^)(.*?)(?:\1(?!.*\1)(.*)$)?$/usD)

  ; isolate PCRE options from the start of the expression
  noop $regex(pcre, $regmlex(full, 1, 2), /^((?:(?!\((?:(?:MARK|PRUNE|SKIP|THEN|\*(?=:))(?::[^()]*)?|ACCEPT|COMMIT)\))\(\*.*?\))*)(.*)/us)

  ; validate expression
  if (!$regex( , $+(/, $regmlex(pcre, 1, 1), |, $regmlex(pcre, 1, 2), /))) {
    echo -eagc i * $!regexm: Invalid expression ( $+ $regerrstr $+ )
    return
  }

  var %char, %exp

  ; find suitable placeholder char
  while ($chr($r(2048, 55295)) isin $1) /
  %char = $v1

  ; add \K onto end of given expression
  ; first wrap entire expression in (?: ), adding \E in case of unterminated \Q
  %exp = $+(m, $regmlex(full, 1, 1), $regmlex(pcre, 1, 1), (?: $+ $regmlex(pcre, 1, 2) $+ \E)(?(R)|\K), $regmlex(full, 1, 1), $regmlex(full, 1, 3))

  ; construct a second expression by transforming the result of subbing the placeholder char where the matches occurred
  ; run this against the result of subbing the placeholder into the matches of the \K-modified expression above
  ; and hey presto, you can find the matches with no ambiguity 
  noop $regex(final, $regsubex($1, %exp, %char), $+(/\Q, $replace($regsubex($1, $2, %char), \E, \E\\E\Q, %char, \E(.*?)\Q $+ %char), \E/u))

  ; if it's a position we seek, we need to subtract all placeholders factored into the result
  if ($prop == pos) return $calc(1 + $regml(final, $3 1).pos - $3 1)
  returnex $regml(final, $3 1)
}


example usage:

Code:
//var -s %str = babababababa, %re = /ba(?=ba$)|baba(?=bababa$)/g | echo -eag $regsubex(%str, %re, X) - $regexm(%str, %re, 1).pos - $regexm(%str, %re, 2).pos


not thoroughly tested but i just wanted to get you your Christmas present quickly smile the basic principle is to modify the expression slightly by adding \K on the end. then compare the result of substituting normally vs. with \K. that will tell you all you need to be able to figure out exactly which substring(s) matched!


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Joined: Feb 2006
Posts: 546
J
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
sorry, glaring oversight: that $replace() on the 4th last line should of course be $replacecs()


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Joined: Jul 2006
Posts: 4,149
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,149
@jaytea: I finally found the time to put this alias to use, but I quickly found an issue :

Code:
//echo -a $regexm(eaze ezrrazer zr5ze45ra5z5 t,/((\w+)\S+)*/Fg,1)
returns nothing

The debug shows that the final pattern in that code:
Code:
$+(/\Q, $replacecs($regsubex($1, $2, %char), \E, \E\\E\Q, %char, \E(.*?)\Q $+ %char)
result in something requiring 8 $chr($r(2048, 55295)) chars, but the input string being tested contains only 5 of them...


@Khaled: is it possible to hear from you about this? Is this on your todolist somewhere?


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Jul 2006
Posts: 4,149
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,149
Also for the suggested syntax, $regmlex([name],M,-1) is a better syntax, to return the fullmatch of the Mth match.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Feb 2006
Posts: 546
J
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
so i forgot to post the solution to this.. turns out the problem was related to the "empty string with //g" issue that plagues PCREv1's demo code. here's the fix:

Code:
; $regexm(<string>, <regex> [, N])[.pos]

; Get the Nth value of match[0] (defaults to N = 1)
; Return its position with .pos

alias regexm {
  noop $regex(full, $2, /^(?|m(.)?|(/)|^)(.*?)(?:\1(?!.*\1)(.*)$)?$/usD)
  noop $regex(pcre, $regmlex(full, 1, 2), /^((?:(?!\((?:(?:MARK|PRUNE|SKIP|THEN|\*(?=:))(?::[^()]*)?|ACCEPT|COMMIT)\))\(\*.*?\))*)(.*)/us)
  if (!$regex( , $+(/, $regmlex(pcre, 1, 1), |, $regmlex(pcre, 1, 2), /))) {
    echo -eagc i * $!regexm: Invalid expression ( $+ $regerrstr $+ )
    return
  }
  var %char, %exp
  while ($chr($r(2048, 55295)) isin $1) /
  %char = $v1
  var %exp = $+(m, $regmlex(full, 1, 1), $regmlex(pcre, 1, 1), (?: $+ $regmlex(pcre, 1, 2) $+ \E)(?(R)|\K), $regmlex(full, 1, 1), $regmlex(full, 1, 3))
  var %str = $regsubex($1, %exp, %char) .
  if (!$pos(%str, %char, $regex($1, $2))) {
    noop $regex(check, $regsubex($1, $2, %char), / %char ( %char ?)/gxu)
    %str = $regsubex(fix, $left(%str, -2), / %char \K/gxu, $regml(check, \n)) .
  }
  noop $regex(final, $left(%str, -2), $+(/\Q, $replacecs($regsubex($1, $2, %char), \E, \E\\E\Q, %char, \E(.*?)\Q $+ %char), \E/u))
  if ($prop == pos) && ($3. isnum 1- $regml(final, 0)) return $calc(1 + $regml(final, $3).pos - $3)
  returnex $regml(final, $3 1)
}


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Joined: Oct 2017
Posts: 47
D
Ameglian cow
Offline
Ameglian cow
D
Joined: Oct 2017
Posts: 47
I was looking for the same thing, and wondering why not mIRC doesn't already include $regexm() identifier.

It's pretty useful if it can be included in the next version.


Link Copied to Clipboard