I rather expected your answer (multiple conversions, UTF encoding, double-byte string type limitations).
I wonder if it's possible to identify an unpaired surrogate, wait for other surrogates, then process UTF on the sequence when it's completed. I think up to 8 byte characters are possible.