Binary Vars BUG

Print Thread

Re: Binary Vars BUG Khaled #230978 26/03/11 11:59 PM
Joined: Oct 2003 Posts: 3,641 Montreal, QC, Canada A argv0 Hoopy frood
argv0 Hoopy frood A Joined: Oct 2003 Posts: 3,641 Montreal, QC, Canada	Originally Posted By: Khaled It has been a while since I researched this but if I recall correctly, the only way to determine the encoding format of a file without a BOM is to scan the contents of the file and to analyze it. Well, there is no reliable way to detect the proper encoding of a UTF-16 encoded file without a BOM-- and by extension, there is no way to detect that a file is UTF-16 without a BOM (similarly, even the BOM itself isn't always a valid way to detect an encoding format). The auto-detection method you propose would be extremely slow since it would potentially scan the entire file, and would need to do this for every file-- this would mean that for most files (non-utf16 ones) you would be always scanning ~1000 chars prior to reading-- every time! slow! That's why I proposed an extra switch in all $read/$fread commands to force a specific encoding. It's basically impossible (impractical) to auto-detect, so the scripter should have to tell mIRC in these cases. Telling the runtime what encoding you want to read a file as is fairly common in every language with robust encoding support. It's fine to assume UTF-8 as default, and fair to allow basic auto-detection, but for certain encodings, we need a way to enforce this manually.

Link Copied to Clipboard

Forums Bug Reports Binary Vars BUG