Thanks for your bug report. If you are passing a unicode string, you should be using //u. There is no valid result for .pos without //u in this case as the string is processed as UTF-8 internally. The returned .pos must reference the original string, not a non-existent string. That said, I would prefer it if UTF-8 characters forming a unicode character returned a .pos to the start of that character. That would make more sense. This change will be in the next beta.

Then again, I'd say that the regex routine should apply //u automatically if it sees a Unicode string. However, this was probably not added as it might have affected older scripts, although I am not seeing what the side-effects might be at this point.