mIRC Home    About    Download    Register    News    Help

Print Thread
Page 2 of 2 1 2
Joined: Dec 2002
Posts: 5,411
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,411
Quote:
do you think adding a property to $regsubex is a bad idea?

Although I have done this before, I have never really liked using a property to change the behaviour of an identifier because it precludes the use/addition of other properties. It would be better to find an alternative method. In this case, custom regex modifiers have been added in the past, S and F, for other uses. So a better option would probably be to add a custom regex modifier.

Update: I have decided to defer support of this to a future version. Changing the current routines to support both ANSI and Unicode PCRE calls at the same time requires far more changes than supporting just one or the other. So the next beta will revert to ANSI PCRE for now.

Last edited by Khaled; 26/06/18 10:51 AM.
Joined: Feb 2003
Posts: 2,812
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2003
Posts: 2,812
What if we just allow $chr/asc() and $utfxcode() identifiers to support characters beyond U+FFFF

That way, if you are doing something really goofy in $regsubex, that most people would never do, we'll have those proper tools to deal with it.


Well. At least I won lunch.
Good philosophy, see good in bad, I like!
Joined: Dec 2002
Posts: 5,411
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,411
Quote:
What if we just allow $chr/asc() and $utfxcode() identifiers to support characters beyond U+FFFF

How would this be implemented? All of the Windows APIs support only 16bit characters. All of mIRC's features, storage methods, routines, commands, identifiers, etc. use 16bit characters and process them that way.

Joined: Feb 2003
Posts: 2,812
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2003
Posts: 2,812
I tried doing a bit of research on this, but it's not a strong area of mine. I know there's a solution, since AutoHotkey_L appears to handle supplemental plane and surrogates just fine, and even specifically with PCRE. There's a lot of WideCharToMultiByte / MultiByteToWideChar function use.

https://github.com/Lexikos/AutoHotkey_L/search?q=WideCharToMultiByte

https://github.com/Lexikos/AutoHotkey_L/search?q=surrogate

https://github.com/Lexikos/AutoHotkey_L/search?q=0x10FFFF

...might be able to give you some ideas, anyway. AHK does a lot of what mSL attempts to do.

Example: https://github.com/Lexikos/AutoHotkey_L/..._ord2utf8.c#L41
/* This file contains a private PCRE function that converts an ordinal character value into a UTF8 string. */


Well. At least I won lunch.
Good philosophy, see good in bad, I like!
Joined: Jul 2006
Posts: 4,145
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,145
It's weird that you say that after claiming/knowing mirc doesn't support it.

Autohotkey is not relevant, there is no denying that an application can correctly provide a way to user to use/generate these chars and provide, let's say, a string library, like mirc is doing for the BMP. The issue is that, as mentioned, all functions in mIRC deal with utf16. Suppose $chr() is extended and nothing else is done, given that the character in this thread is code point 128287, $len($chr(128287)) would, just like it does right now with combining two surrogates, return the number of element in the array of 16bits, so the two surrogates, which is problematic.


The solution would be to use 32 bits, but there, Khaled already decided against it when converting mIRC to be unicode compatible by choosing the 16 bits design.

I'm also sad that we can't deal with characters others planes in scripts frown


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Feb 2003
Posts: 2,812
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2003
Posts: 2,812
It's funny how Ouims says AutoHotkey is not relevant, when it's exactly the opposite of that; it's super relevant. lolz.

Yeah, let's ignore the fact that they're both written in the same language, compiled by the same tools, do the same things, interpret the same scripts, interact with the same functions and libraries. But, no, not relevant. *laughing-emoji*


Well. At least I won lunch.
Good philosophy, see good in bad, I like!
Joined: Jul 2006
Posts: 4,145
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,145
Language and compiler have nothing to do with it, mIRC internally has an array of 16bits, so we, in our script, can only deal with codepoint up to 65535. We have seen how this isn't preventing 16 bits api used by mIRC (and likely by autohotkey, if you think it's relevant) from themselves handling surrogates but this is different from exposing the characters in a script: autohotkey probably has a 32bits array, allowing you to 'control' 32bits in your script, and when autohotkey is about to use the same 16 bits api/function as mIRC, it's then just converting to utf16.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Page 2 of 2 1 2

Link Copied to Clipboard