I would attempt to post some examples on here, but this forum destroys unicode characters.
I will note however that attempting this same trick of UTF-8 Plane 0 surrogate pairs behaves he same way, and requires an extra pass of $utfdecode() wrapped around the $regsubex().
We might be able to fix consistency by enabling $utfdecode() to support Plane 1,2,3...
I do use $regsubex() to dice up BYTES regardless of encoding, so that I can handle them as BYTES.