Not bad ideas but they don't belong to this thread about binvar and regex.

To stay on topic, you said:

Quote
While normally $bfind returns the position of the match, the .regex prop makes it return the number of matches instead of a position within the binvar, so your regex pattern must use a capture group and you must use $regml([name,]N).pos to find that position
This is an extremely important point and I've been pointing it out for a long time: mIRC does not provide the fullmatch position (or its value anyway) and it really needs to, because some regex requires you to have a capture in ways that is not compatible with this workaround, so despite ().bytepos's addition in the next beta, it won't fix the problem of getting the correct position for $bfind().regex. So yeah, Khaled, please add the fullmatch result and position to $regml*() identifiers.

Another thing is that $bfind().regex encodes the binary variable's content to utf8.
This is wanted for someone coming from $regex() and having line length limit issue, someone using $regex(data,/\w+/u) wants his data to be utf8 encoded in $bfind(&data,1,/\w+/u).regex
However, this is not wanted for someone who wants to get control over the byte, //bset &v 1 233 | echo -ag $bfind(&v,1,\xa9).regex is currently returning 1 because 233 gets encoded to 195 169, and a9 is 169.

It doesn't make that much sense to do byte based search with $regex etc while mIRC is sending utf8 to pcre and will want to convert back to 16 bits array, and basically, it can be said that /u should have been enforced there.
But now that we have a way to use binary variable, it makes no sense to touch the byte in the binvar when it's their purpose to be left untouched by mIRC.

In any case, for backward compatible reason, we now need a way to make $bfind().regex not encode to utf8, but i'd rather see it changed since it's new, so that it doesn't encode to utf8 by default, and add something to make it encodes the data.
And this is regardless of /u usage, if i want to inspect bytes, i won't use /u, if i do, i'll use /u, but I still want to encode the byte in the binvar myself:
For example, $utfencode could be extended with a property to take the first parameter as a binary variable if it starts with a '&', so that current usage of $utfencode() with something that starts with a & still work.

Edit: the just released beta does not utf8 encode the binvar anymore, but there's still no good way to encode it if we want to.

Last edited by Wims; 29/09/20 08:31 PM.

#mircscripting @ irc.swiftirc.net == the best mIRC help channel