mIRC Home    About    Download    Register    News    Help

Print Thread
Full match of a regex #260851 25/06/17 02:23 PM
Joined: Jul 2006
Posts: 3,729
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,729
As I mentioned in this thread, we don't have access to the full match and its position made by the pcre engine, it's not that useful most of the time but in some cases it can.

I'd use it for a bot, to show the capturing group and what is matched, something like !regex /b+a.*c/ bbbbadddc would be shown as bbbbadddc for example.

My proposed syntax in that link is not so great though, it doesn't allow us to get the position of the full match.
So I suggest $regmlex([name],M,[N],full) to return the full match, where the .pos property could be used.


Looking for a good help channel about mIRC? Check #mircscripting @ irc.swiftirc.net
Re: Full match of a regex [Re: Wims] #261927 15/12/17 03:25 AM
Joined: Jun 2008
Posts: 6
D
digitok Offline
Nutrimatic drinks dispenser
Offline
Nutrimatic drinks dispenser
D
Joined: Jun 2008
Posts: 6
I also support this idea.

Re: Full match of a regex [Re: Wims] #261955 16/12/17 11:06 AM
Joined: Feb 2006
Posts: 546
J
jaytea Offline
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
i also add my support because it can be useful and scripting this to work in the general case is impossible difficult but possible!

i finally found a way to do this that doesn't involve completely parsing the expression laugh

Code:
; $regexm(<string>, <regex> [, N])[.pos]

; Get the Nth value of match[0] (defaults to N = 1)
; Return its position with .pos

alias regexm {

  ; parse the full regex as mIRC does
  noop $regex(full, $2, /^(?|m(.)?|(/)|^)(.*?)(?:\1(?!.*\1)(.*)$)?$/usD)

  ; isolate PCRE options from the start of the expression
  noop $regex(pcre, $regmlex(full, 1, 2), /^((?:(?!\((?:(?:MARK|PRUNE|SKIP|THEN|\*(?=:))(?::[^()]*)?|ACCEPT|COMMIT)\))\(\*.*?\))*)(.*)/us)

  ; validate expression
  if (!$regex( , $+(/, $regmlex(pcre, 1, 1), |, $regmlex(pcre, 1, 2), /))) {
    echo -eagc i * $!regexm: Invalid expression ( $+ $regerrstr $+ )
    return
  }

  var %char, %exp

  ; find suitable placeholder char
  while ($chr($r(2048, 55295)) isin $1) /
  %char = $v1

  ; add \K onto end of given expression
  ; first wrap entire expression in (?: ), adding \E in case of unterminated \Q
  %exp = $+(m, $regmlex(full, 1, 1), $regmlex(pcre, 1, 1), (?: $+ $regmlex(pcre, 1, 2) $+ \E)(?(R)|\K), $regmlex(full, 1, 1), $regmlex(full, 1, 3))

  ; construct a second expression by transforming the result of subbing the placeholder char where the matches occurred
  ; run this against the result of subbing the placeholder into the matches of the \K-modified expression above
  ; and hey presto, you can find the matches with no ambiguity 
  noop $regex(final, $regsubex($1, %exp, %char), $+(/\Q, $replace($regsubex($1, $2, %char), \E, \E\\E\Q, %char, \E(.*?)\Q $+ %char), \E/u))

  ; if it's a position we seek, we need to subtract all placeholders factored into the result
  if ($prop == pos) return $calc(1 + $regml(final, $3 1).pos - $3 1)
  returnex $regml(final, $3 1)
}


example usage:

Code:
//var -s %str = babababababa, %re = /ba(?=ba$)|baba(?=bababa$)/g | echo -eag $regsubex(%str, %re, X) - $regexm(%str, %re, 1).pos - $regexm(%str, %re, 2).pos


not thoroughly tested but i just wanted to get you your Christmas present quickly smile the basic principle is to modify the expression slightly by adding \K on the end. then compare the result of substituting normally vs. with \K. that will tell you all you need to be able to figure out exactly which substring(s) matched!


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Re: Full match of a regex [Re: jaytea] #261971 17/12/17 11:38 AM
Joined: Feb 2006
Posts: 546
J
jaytea Offline
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
sorry, glaring oversight: that $replace() on the 4th last line should of course be $replacecs()


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Re: Full match of a regex [Re: jaytea] #262761 30/03/18 10:18 PM
Joined: Jul 2006
Posts: 3,729
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,729
@jaytea: I finally found the time to put this alias to use, but I quickly found an issue :

Code:
//echo -a $regexm(eaze ezrrazer zr5ze45ra5z5 t,/((\w+)\S+)*/Fg,1)
returns nothing

The debug shows that the final pattern in that code:
Code:
$+(/\Q, $replacecs($regsubex($1, $2, %char), \E, \E\\E\Q, %char, \E(.*?)\Q $+ %char)
result in something requiring 8 $chr($r(2048, 55295)) chars, but the input string being tested contains only 5 of them...


@Khaled: is it possible to hear from you about this? Is this on your todolist somewhere?


Looking for a good help channel about mIRC? Check #mircscripting @ irc.swiftirc.net
Re: Full match of a regex [Re: Wims] #262765 31/03/18 08:05 AM
Joined: Jul 2006
Posts: 3,729
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,729
Also for the suggested syntax, $regmlex([name],M,-1) is a better syntax, to return the fullmatch of the Mth match.


Looking for a good help channel about mIRC? Check #mircscripting @ irc.swiftirc.net
Re: Full match of a regex [Re: Wims] #263486 11/08/18 04:07 AM
Joined: Feb 2006
Posts: 546
J
jaytea Offline
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
so i forgot to post the solution to this.. turns out the problem was related to the "empty string with //g" issue that plagues PCREv1's demo code. here's the fix:

Code:
; $regexm(<string>, <regex> [, N])[.pos]

; Get the Nth value of match[0] (defaults to N = 1)
; Return its position with .pos

alias regexm {
  noop $regex(full, $2, /^(?|m(.)?|(/)|^)(.*?)(?:\1(?!.*\1)(.*)$)?$/usD)
  noop $regex(pcre, $regmlex(full, 1, 2), /^((?:(?!\((?:(?:MARK|PRUNE|SKIP|THEN|\*(?=:))(?::[^()]*)?|ACCEPT|COMMIT)\))\(\*.*?\))*)(.*)/us)
  if (!$regex( , $+(/, $regmlex(pcre, 1, 1), |, $regmlex(pcre, 1, 2), /))) {
    echo -eagc i * $!regexm: Invalid expression ( $+ $regerrstr $+ )
    return
  }
  var %char, %exp
  while ($chr($r(2048, 55295)) isin $1) /
  %char = $v1
  var %exp = $+(m, $regmlex(full, 1, 1), $regmlex(pcre, 1, 1), (?: $+ $regmlex(pcre, 1, 2) $+ \E)(?(R)|\K), $regmlex(full, 1, 1), $regmlex(full, 1, 3))
  var %str = $regsubex($1, %exp, %char) .
  if (!$pos(%str, %char, $regex($1, $2))) {
    noop $regex(check, $regsubex($1, $2, %char), / %char ( %char ?)/gxu)
    %str = $regsubex(fix, $left(%str, -2), / %char \K/gxu, $regml(check, \n)) .
  }
  noop $regex(final, $left(%str, -2), $+(/\Q, $replacecs($regsubex($1, $2, %char), \E, \E\\E\Q, %char, \E(.*?)\Q $+ %char), \E/u))
  if ($prop == pos) && ($3. isnum 1- $regml(final, 0)) return $calc(1 + $regml(final, $3).pos - $3)
  returnex $regml(final, $3 1)
}


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde