Binary Vars BUG

Print Thread

Re: Binary Vars BUG argv0 #230972 26/03/11 06:45 PM
Joined: Dec 2002 Posts: 3,845 London, UK Khaled Hoopy frood
Khaled Hoopy frood Joined: Dec 2002 Posts: 3,845 London, UK	It has been a while since I researched this but if I recall correctly, the only way to determine the encoding format of a file without a BOM is to scan the contents of the file and to analyze it. In the case of a text file that uses UTF-16, BE or LE, and is only storing characters from the Basic Latin and Latin-1 Supplement of the Unicode table, you can check for alternating zero bytes. You will need to load the file, check for the BOM and if it does not exist, continue reading the file as single-byte characters. If you come across a zero byte, the file is probably UTF-16 and you will need to start reading the file again from the beginning as UTF-16 (whether it is BE or LE depends on where the zero byte came in the sequence). If there is an error in the file and the zero byte is not meant to be there (or if it was ANSI text saved with zero byte separators on purpose for some reason), loading it as UTF-16 results in garbage. If there are no zero bytes, you will not be able to determine whether it is ANSI or UTF-16. For example, if you come across a UTF-16 file with no BOM that contains thousands of characters in the range 0x0101 to 0x017F, you will not be able to tell whether it is ANSI or Unicode. However if it contains any character that has a zero byte (in the range 0x0000 to 0x00FF, or 0x0100 etc.) you can assume it is UTF-16. While the above method will work for some text files (and mostly only for Latin text), I felt it was a little too unreliable and limited in scope, which is why I decided not to add support for it.

Entire Thread
Subject	Posted By	Posted
Binary Vars BUG	Anonymous	20/03/11 09:41 PM
Re: Binary Vars BUG	starbucks_mafia	20/03/11 10:24 PM
Re: Binary Vars BUG	drum	20/03/11 10:24 PM
Re: Binary Vars BUG	Anonymous	24/03/11 05:06 AM
Re: Binary Vars BUG	argv0	24/03/11 05:57 AM
Re: Binary Vars BUG	drum	24/03/11 06:15 AM
Re: Binary Vars BUG	Anonymous	24/03/11 09:12 PM
Re: Binary Vars BUG	starbucks_mafia	24/03/11 09:28 PM
Re: Binary Vars BUG	Wims	24/03/11 09:54 PM
Re: Binary Vars BUG	drum	24/03/11 10:18 PM
Re: Binary Vars BUG	Wims	24/03/11 10:31 PM
Re: Binary Vars BUG	drum	24/03/11 10:43 PM
Re: Binary Vars BUG	argv0	25/03/11 05:21 AM
Re: Binary Vars BUG	drum	24/03/11 10:17 PM
Re: Binary Vars BUG	Anonymous	25/03/11 10:09 AM
Re: Binary Vars BUG	drum	25/03/11 11:12 AM
Re: Binary Vars BUG	argv0	25/03/11 08:24 PM
Re: Binary Vars BUG	drum	25/03/11 11:10 PM
Re: Binary Vars BUG	argv0	26/03/11 05:04 AM
Re: Binary Vars BUG	Khaled	26/03/11 06:45 PM
Re: Binary Vars BUG	argv0	26/03/11 11:59 PM
Re: Binary Vars BUG	argv0	27/03/11 12:05 AM
Re: Binary Vars BUG	Anonymous	29/03/11 02:26 PM
Re: Binary Vars BUG	starbucks_mafia	29/03/11 09:29 PM

Link Copied to Clipboard

Forums Bug Reports Binary Vars BUG