I have made a few changes in the next beta that co-ordinate CRT vs API calls relating to the lower/upper case identifiers you used in your examples.
These resolve some of the differences you point out, however the Windows APIs are still classifying many characters in the way you describe above. You will need to look into this further to determine why this is the case. The best I can do is to use the APIs provided.
Regarding characters like the German Eszett, note that Unicode can be asymmetric. There is no guarantee that converting a letter from lower to upper to lower case will result in the same letter.
To make matters more complicated, correct mapping of some unicode characters/ranges depends on locale as well as transformation options, eg. see LCMapStringEx()
, so there is a lot more to it than just lower/upper case.
In addition, although mIRC uses UTF-16, and Windows itself uses UTF-16, which means API calls generally handle surrogate pairs/planes, that does not mean these are handled in all contexts. While mIRC was changed to use UTF-16, there are many places where surrogate pairs can be split while parsing text, which is where work still needs to be done.