I would attempt to post some examples on here, but this forum destroys unicode characters.

I will note however that attempting this same trick of UTF-8 Plane 0 surrogate pairs behaves he same way, and requires an extra pass of $utfdecode() wrapped around the $regsubex().

We might be able to fix consistency by enabling $utfdecode() to support Plane 1,2,3...

I do use $regsubex() to dice up BYTES regardless of encoding, so that I can handle them as BYTES.


Well. At least I won lunch.
Good philosophy, see good in bad, I like!