mIRC Homepage
Posted By: Wims Full match of a regex - 25/06/17 02:23 PM
As I mentioned in this thread, we don't have access to the full match and its position made by the pcre engine, it's not that useful most of the time but in some cases it can.

I'd use it for a bot, to show the capturing group and what is matched, something like !regex /b+a.*c/ bbbbadddc would be shown as bbbbadddc for example.

My proposed syntax in that link is not so great though, it doesn't allow us to get the position of the full match.
So I suggest $regmlex([name],M,[N],full) to return the full match, where the .pos property could be used.
Posted By: digitok Re: Full match of a regex - 15/12/17 03:25 AM
I also support this idea.
Posted By: jaytea Re: Full match of a regex - 16/12/17 11:06 AM
i also add my support because it can be useful and scripting this to work in the general case is impossible difficult but possible!

i finally found a way to do this that doesn't involve completely parsing the expression laugh

Code:
; $regexm(<string>, <regex> [, N])[.pos]

; Get the Nth value of match[0] (defaults to N = 1)
; Return its position with .pos

alias regexm {

  ; parse the full regex as mIRC does
  noop $regex(full, $2, /^(?|m(.)?|(/)|^)(.*?)(?:\1(?!.*\1)(.*)$)?$/usD)

  ; isolate PCRE options from the start of the expression
  noop $regex(pcre, $regmlex(full, 1, 2), /^((?:(?!\((?:(?:MARK|PRUNE|SKIP|THEN|\*(?=:))(?::[^()]*)?|ACCEPT|COMMIT)\))\(\*.*?\))*)(.*)/us)

  ; validate expression
  if (!$regex( , $+(/, $regmlex(pcre, 1, 1), |, $regmlex(pcre, 1, 2), /))) {
    echo -eagc i * $!regexm: Invalid expression ( $+ $regerrstr $+ )
    return
  }

  var %char, %exp

  ; find suitable placeholder char
  while ($chr($r(2048, 55295)) isin $1) /
  %char = $v1

  ; add \K onto end of given expression
  ; first wrap entire expression in (?: ), adding \E in case of unterminated \Q
  %exp = $+(m, $regmlex(full, 1, 1), $regmlex(pcre, 1, 1), (?: $+ $regmlex(pcre, 1, 2) $+ \E)(?(R)|\K), $regmlex(full, 1, 1), $regmlex(full, 1, 3))

  ; construct a second expression by transforming the result of subbing the placeholder char where the matches occurred
  ; run this against the result of subbing the placeholder into the matches of the \K-modified expression above
  ; and hey presto, you can find the matches with no ambiguity 
  noop $regex(final, $regsubex($1, %exp, %char), $+(/\Q, $replace($regsubex($1, $2, %char), \E, \E\\E\Q, %char, \E(.*?)\Q $+ %char), \E/u))

  ; if it's a position we seek, we need to subtract all placeholders factored into the result
  if ($prop == pos) return $calc(1 + $regml(final, $3 1).pos - $3 1)
  returnex $regml(final, $3 1)
}


example usage:

Code:
//var -s %str = babababababa, %re = /ba(?=ba$)|baba(?=bababa$)/g | echo -eag $regsubex(%str, %re, X) - $regexm(%str, %re, 1).pos - $regexm(%str, %re, 2).pos


not thoroughly tested but i just wanted to get you your Christmas present quickly smile the basic principle is to modify the expression slightly by adding \K on the end. then compare the result of substituting normally vs. with \K. that will tell you all you need to be able to figure out exactly which substring(s) matched!
Posted By: jaytea Re: Full match of a regex - 17/12/17 11:38 AM
sorry, glaring oversight: that $replace() on the 4th last line should of course be $replacecs()
Posted By: Wims Re: Full match of a regex - 30/03/18 10:18 PM
@jaytea: I finally found the time to put this alias to use, but I quickly found an issue :

Code:
//echo -a $regexm(eaze ezrrazer zr5ze45ra5z5 t,/((\w+)\S+)*/Fg,1)
returns nothing

The debug shows that the final pattern in that code:
Code:
$+(/\Q, $replacecs($regsubex($1, $2, %char), \E, \E\\E\Q, %char, \E(.*?)\Q $+ %char)
result in something requiring 8 $chr($r(2048, 55295)) chars, but the input string being tested contains only 5 of them...


@Khaled: is it possible to hear from you about this? Is this on your todolist somewhere?
Posted By: Wims Re: Full match of a regex - 31/03/18 08:05 AM
Also for the suggested syntax, $regmlex([name],M,-1) is a better syntax, to return the fullmatch of the Mth match.
Posted By: jaytea Re: Full match of a regex - 11/08/18 04:07 AM
so i forgot to post the solution to this.. turns out the problem was related to the "empty string with //g" issue that plagues PCREv1's demo code. here's the fix:

Code:
; $regexm(<string>, <regex> [, N])[.pos]

; Get the Nth value of match[0] (defaults to N = 1)
; Return its position with .pos

alias regexm {
  noop $regex(full, $2, /^(?|m(.)?|(/)|^)(.*?)(?:\1(?!.*\1)(.*)$)?$/usD)
  noop $regex(pcre, $regmlex(full, 1, 2), /^((?:(?!\((?:(?:MARK|PRUNE|SKIP|THEN|\*(?=:))(?::[^()]*)?|ACCEPT|COMMIT)\))\(\*.*?\))*)(.*)/us)
  if (!$regex( , $+(/, $regmlex(pcre, 1, 1), |, $regmlex(pcre, 1, 2), /))) {
    echo -eagc i * $!regexm: Invalid expression ( $+ $regerrstr $+ )
    return
  }
  var %char, %exp
  while ($chr($r(2048, 55295)) isin $1) /
  %char = $v1
  var %exp = $+(m, $regmlex(full, 1, 1), $regmlex(pcre, 1, 1), (?: $+ $regmlex(pcre, 1, 2) $+ \E)(?(R)|\K), $regmlex(full, 1, 1), $regmlex(full, 1, 3))
  var %str = $regsubex($1, %exp, %char) .
  if (!$pos(%str, %char, $regex($1, $2))) {
    noop $regex(check, $regsubex($1, $2, %char), / %char ( %char ?)/gxu)
    %str = $regsubex(fix, $left(%str, -2), / %char \K/gxu, $regml(check, \n)) .
  }
  noop $regex(final, $left(%str, -2), $+(/\Q, $replacecs($regsubex($1, $2, %char), \E, \E\\E\Q, %char, \E(.*?)\Q $+ %char), \E/u))
  if ($prop == pos) && ($3. isnum 1- $regml(final, 0)) return $calc(1 + $regml(final, $3).pos - $3)
  returnex $regml(final, $3 1)
}
Posted By: DooMaster Re: Full match of a regex - 19/01/21 03:01 PM
I was looking for the same thing, and wondering why not mIRC doesn't already include $regexm() identifier.

It's pretty useful if it can be included in the next version.
© mIRC Discussion Forums