Capture groups are generally broken
Okay, I tried your example and the PCRE API pcre16_get_substring() returned the following:
text: abc
expr: /(a)|(b)|(c)/g
PCRE returns:
Match ( 1/ 1) : ( 0, 1): 'a'
Match ( 1/ 2) : (-1,-1): ''
Match ( 2/ 2) : ( 1, 2): 'b'
Match ( 1/ 3) : (-1,-1): ''
Match ( 2/ 3) : (-1,-1): ''
Match ( 3/ 3) : ( 2, 3): 'c'
$regex() returns: 3
$regml() returns: 6 and items:
1 : 1 : a
2 : 0 :
3 : 2 : b
4 : 0 :
5 : 0 :
6 : 3 : c
Note: PCRE seems to be skipping subsequent empty matches. The above result is actually "a-- -b- --c". I have not found a way of making it include the full nine matches, as shown in the regex website tests below.
For the same regex pattern, Perl returns:
my @result = $text =~ /(a)|(b)|(c)/g;
for (my $i=1; $i <= @result; $i++) {
print "$i:" . ($result[$i-1] || "") . "\n";
}
1:a
2:
3:
4:
5:b
6:
7:
8:
9:c
The same test at
rubular.com returns a similar result, as does a test at
regexplanet.com, although
regex101.com strips out empty matches, like pre-beta versions of mIRC.
The reason I made this change is that pre-beta mIRCs were not capturing the initial empty group in the following situation:
test {
var %text = :nick!ident@host.com PRIVMSG #testing :one two three
var %re = ^(@\S+ )*\x3A(([^\s!@]+)![^\s!@]+@[^\s]+) PRIVMSG (#\S+) (\x3A.+)$
noop $regex(test, %text, %re)
var %n = $regml(test, 0)
echo n: %n
var %m = 1
while (%m <= %n) {
echo 1 %m : $regml(test, %m).pos : $regml(test, %m)
inc %m
}
}
result in pre-beta:
n: 4
1 : 2 : nick!ident@host.com
2 : 2 : nick
3 : 30 : #testing
4 : 39 : :one two three
result in beta:
n: 5
1 : 0 :
2 : 2 : nick!ident@host.com
3 : 2 : nick
4 : 30 : #testing
5 : 39 : :one two three
The result in the beta, which returns an emtpy match against (@\S+ )*, is preferred. Testing the above with regex101.com, its match list starts at 2 and ends at 5, so it also includes the empty match.
I see what you mean about workarounds in pre-beta versions of mIRC being broken with the new behaviour. The only solution I can think of is to only enable the new behaviour with a new regex modifier/switch - this would allow it to be used in contexts such as $regex(), event definitions, settings in options dialogs, and so on. There is a fairly comprehensive list of regex modifiers at
rexegg.com. We would have to decide on a letter that is not currently in use.