I'm afraid that most of your post is based on a fundamental misconception.
This creates the 9-byte chloe.txt filename with the accented small-e, but writes a 10-byte filename to that same file, with the accented-e encoded into 2 bytes.
No, it creates a 9-
character filename, where each character may have a value in the range 0-65535 (let's stick to
Unicode plane 0 for simplicity here) rather than the 0-255 range of byte values. As such, in order to store such a string of characters (the filename) as a string of bytes (file data), a conversion must take place, and this (
necessarily) requires up to multiple bytes per character. The conversion that mIRC performs is a conversion to the standardized UTF-8 encoding of the string, where each character value above 127 is indeed converted to a set of multiple bytes. As a result of this encoding,
every filename can be stored in such a way that it can be converted back again without losing anything, and that is exactly what happens when you have mIRC read from your file later on.
As such, your examples are misleading in that they use a character in the 128-255 value range, which erroneously suggests that the UTF-8 encoding is adding redundant bytes. Imagine a filename with a character in the 256-65535 range. How would you store such a filename as file data? You don't have to answer that, because mIRC and UTF-8 have already solved that for you. The only price to pay for that universal solution is that for characters in the 128-255 range, the UTF-8 encoding "looks" like it introduces unnecessary extra bytes, which it really doesn't.
So when you say this..
I'm not sure how to write the filenames to disk and not the utfencoding of the filenames.
..you're confusing "filename" with "encoding of a filename". The UTF-8 encoding of the filename
is in essence the filename. What you call "the filename" here, is a non-standard codepage-based encoding of it that only happens to work for your specific example and not for filenames in general. If you want to use such a custom, non-universal encoding, then yes, you'll have to work with binary variables, and /bset (in particular with -ta) already gives you the tools you need to do that. In your post you're not making any sort of case as to why mIRC should make it easier to make use of such encodings. Generally speaking they're a bad idea and at most they should be used for interoperability with other applications that do not support UTF-8 yet.
The same applies to the rest of your suggestions: the CRC/SHA-1/etc identifiers all take binary variables, so if you have a reason to hash a specific set of bytes rather than a simple string (which is then implicitly encoded as UTF-8), you can and should use a binary variable as input. It would be nice if DLLs had a way to accept and manipulate binary variables, but that's really an altogether different issue..