i assume you found that get_startchar() refers to the old value of start_offset that was passed in order to yield the current results (otherwise the infinite loop issue would not be resolved)?

regarding the examples that have become broken.. this is (unfortunately?) how PCRE recommends they be handled until such a time that the Perl community is able to define exactly how \K should be handled in all cases.

think of the expression "/(?=.\K)|a/g" against "abc". in the first round, the reported match is [1,0]. PCRE demo code throws a fit. ideally, we would want it to re-try the match at position 0, but this time avoiding that same [1,0] match, favouring the [0,1] match instead ("|a"). however, there is no way to instruct the engine to do this.

you're quite right that no online testers have got this right, that's because most of them will be using the PCRE demo code or slight modifications of it. Perl itself hangs on expressions such as "/(?<=\K.)/g" and also throws an error on "(?=.\K)", so there is no real precedent for "correct" behaviour. i have my own opinions, which differ from PCRE's upgraded implementation in the following ways:
  • support for "/(?=.\K)/g" and similar
  • avoid duplicate matches in expressions such as "/(?<=\K.)/g"
  • match 'b' in expressions such as "/(?<=\Ka)|b/g" against "abc"

the problem is that to fully implement these results, a small change to the core match() routine needs to be made. the measures taken against matching additional empty strings need to be revised; the NOTEMPTY_ATSTART and ANCHORED option combo needs to be wiped in favour of a new option ("NONADV") that can be toggled to indicate a "non-advancing" state, ie. one in which the end of match did not previously advance. since ideal g-option propagation relies solely on the end of the match rather than its start, it is the end of the matches that should be studied in order to properly stave off infinite loops and avoid duplicate matches.

anyway, sorry, i'm rambling. the long and short of it is that these results you now see are to be expected.


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde