this result isn't actually limited to $regsubex; it affects all functions in mIRC that implicitly decode UTF-8, eg:

Code:
//bset -t &a 1 $chr($base(D800, 16, 10)) | echo -a $len($bvar(&a, 1-).text)


= 3

the internal UTF-8 decoding function won't touch unpaired surrogates. would tweaking this be encroaching on violating the sanctity of unicode? clearly there is invalid UTF-8 being represented at some level, so perhaps having it decoded as well as possible isn't such a tall order? laugh

btw, $regsubex() needs to encode (and later decode) the substitution parm in order to play nice with offset positions returned by PCRE (which only handles UTF-8 encoded strings). this seems necessary, and the observed bug is an unfortunate side effect.


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde