mIRC Homepage
Posted By: Wims $bfind().regex - 07/11/22 07:52 PM
mIRC uses pcre with the --with-match-limit option set to *around* 1000000. In theory it was possible to reach such limit already with $regex and the maximum length for a string as the input, regardless of what that maximum have been, in practice it was enough.
It's always possible to anchor a pattern with (*LIMIT_MATCH=n) to set a custom match limit for that call, but with binvar support and the convenient $urlget, the length of the input string passed to pcre can reach such size that in practice, even simple pattern will fail, because of how many characters/bytes there's, increasing the count for the match limit a lot with backtracking etc.

1) $bfind().regex does not report error from pcre like $regex or $regsubex are able to, $bfind currently returns 0 but it would be better if it were returning a negative value like $regex.. since $bfind().regex returns the number of match anyway.

2) I was originally thinking that the default match limit (and recursion limit) should be increased, for $bfind().regex only, but it's not possible to do it unless mIRC starts looking at the pattern and add itself (*LIMIT_MATCH) at the beginning if it's not already there from the scripter, but with custom delimiter and all the (*control verb) that can be there and chained at the beginning of a pattern, i'm not too sure mIRC should be doing that. People will be encountering this issue though.

Here is an example grabbing the 'active threads' page of this forum, I had to use (*LIMIT_MATCH=2000000) or it won't work:

Code
alias getthreads {
  if ($urlget(https://forums.mirc.com/ubbthreads.php/activetopics/30/1,gb,&thread,threadcb) == 0) {
    .timergetmthreads off
    hfree thread
    return
  }
}

alias threadcb {
  var %p m@(*LIMIT_MATCH=2000000)<a href="/ubbthreads.php/forums/[^"]+">([^<]+)</a>\s+.*\s+.*\s+.*\s+.*\s+.*\s+.*\s+<span class="bold dblock" style="line-height:normal;font-size:100%;">\s*<img src="[^"]+" alt="" style="max-height:12px;">\s+<a href="(/ubbthreads.php/topics/(\d+)/[^"]+)" class="bold">(.*?)</a>\s*</span>\s*.*\s*.*?'username'>(.*?)<@gu
  var  %n $bfind(&thread,0,%p,newbeta).regex,%save
  while (%n) {
    if (!$hget(fposts,$regmlex(newbeta,%n,3))) {
     ;announcing
     echo -sg $html2ascii($regmlex(newbeta,%n,1)) -- $regmlex(newbeta,%n,5) -- $html2ascii($regmlex(newbeta,%n,4)) -- https://forums.mirc.com/ $+ $gettok($regmlex(newbeta,%n,2),1--2,47) $+ /
      hadd fposts $regmlex(newbeta,%n,3) 1
      var %save 1
    }
    dec %n
  }
  if (%save) hsave fposts fposts
}
alias html2ascii returnex $1 
;not provided for simplicity
© mIRC Discussion Forums