Originally Posted By: Khaled
It has been a while since I researched this but if I recall correctly, the only way to determine the encoding format of a file without a BOM is to scan the contents of the file and to analyze it.


Well, there is no reliable way to detect the proper encoding of a UTF-16 encoded file without a BOM-- and by extension, there is no way to detect that a file is UTF-16 without a BOM (similarly, even the BOM itself isn't always a valid way to detect an encoding format).

The auto-detection method you propose would be extremely slow since it would potentially scan the entire file, and would need to do this for every file-- this would mean that for most files (non-utf16 ones) you would be always scanning ~1000 chars prior to reading-- every time! slow!

That's why I proposed an extra switch in all $read/$fread commands to force a specific encoding. It's basically impossible (impractical) to auto-detect, so the scripter should have to tell mIRC in these cases. Telling the runtime what encoding you want to read a file as is fairly common in every language with robust encoding support. It's fine to assume UTF-8 as default, and fair to allow basic auto-detection, but for certain encodings, we need a way to enforce this manually.