/breplace - multibytes replacement - mIRC Discussion Forums

mIRC Homepage

/breplace lacks multibyte replacement, it was probably not so much of a problem before, but with unicode I believe this is going to be more common: we can't replace an utf8 sequence of bytes by another, replacing (all) é with è in a binvar has to be done manually with $bfind in a loop as we can't replace each byte separately.

There are multiple ways to extend /breplace to do this but eventually, in order to avoid the scripter the need to specify how many bytes are to be replaced and for mIRC to quickly get the different bytes, commas could be used:

Quote:

/breplace <&binvar> <oldvalue> [oldvalueN], <newvalue>
/breplace &in 195 169,195 168

and one single byte can be replaced with multiple bytes, and vice versa

mIRC can work this out by looking for a comma before hand but if that's not desirable, a new switch can always be added:

Quote:

/breplace -m <&binvar> <oldvalue> [oldvalueN], <newvalue>

That way the first list of bytes can be stored as it's parsed, and an error is returned if no comma is found or if the number of bytes in each list is different.

Few things I'd add here, using this command rather extensively.

Add support for multibyte values of multiple encodings, such that the user need only specify the unicode codepoint and mirc does the work to find/replace encoded values according to -switch.

So /breplace -u might support all multibyte unicode forms such as UTF8, UTF16, and Windows DBCS (double byte). When a match is discovered to any of the supported encoodings, a replacement is made of the same encoding type. This may mean a necessary shrink/grow of the variable. If -u8 or -u16 or -ud are specified, then only that single encoding is used. -u would default to encoding to UTF8 when the replace value is >127 AND the find is only ASCII, and -un would never encode replacement values between 128 and 255, leaving them as bytes (like with the /raw command).

I like your idea using commas. If no commas present, treat each space-delimited value as alternating find/replace as before. If a comma is present, then treat the whole command as comma-delimited substrings alternating find/replace. This also permits insertions and deletions, where a find match is deleted if paired with ab empty replacement value, or multiple bytes are inserted at the match. A find value could also be a whole series of bytes or unicode codepoints as you describe above.

If possible, also add support for plain text substrings. Automatic detection or -t switch. Plain text strings could also utilize the -u switches and comma delemitation. $chr(44) for literal commas in text strings.

These suggestions shouldnt deviate far from the purpose of /breplace, and would go far for supporting intl language, unicode doublebyte file/memory strings, and allow greater editing of file format headers, TCP protocols, etc.

It's no regex, but valuable none the less.

As /breplace is a command dealing with byte replacement, I think it would be better to keep it doing that. It may reinterpret the number if it's > 255, but there it would always encode to utf8, I don't think supporting other encoding makes much sense, mIRC doesn't do that in others areas...

That being said, replacing in a binvar using text rather byte number would certainly be handy, but I think it would be best implemented with an identifier: $breplace(&binvar,text,replace,text1,replace1,[N]), returning the number of replacement made, N would be to only to do N replacement, to go with this

Something though, a switches for /breplace could be used to get a $replacex-like feature for /breplace

Afiak, UTF-8 is rarely used in a host of areas where binary variables would be used, such as, when reading binary data from files. Most unicode text is encoded as DBCS, and not UTF-8.

IE, see any hex editor that says "Include unicode strings in search" and they mean double-byte characters, non UTF.

I found this in my aliases:

Code:

alias breplacestr {
  ;noop $breplacestr(&input,searchbyte1 searchbyte2,replacebytes)
  ;noop $breplacestr(&input,195 169,226 128) replace the sequence of bytes 195 169 (the é character)  by the sequence 226 128 (the • character)
  ;noop $breplacestr(&input,195 169,32) replace é with a space
  var %a 1,%p,%l $numtok($2,32),%m $numtok($3,32)
  while ($bfind($1,%a,$2)) {
    bcopy &breplacestr -1 $1 %a $calc($v1 - %a)
    bset &breplacestr -1 $3
    %a = $calc($v1 + %l)
  }
  if ($bvar(&breplacestr,0)) {
    if (%a <= $bvar($1,0)) bcopy &breplacestr -1 $1 %a -1
    bunset $1
    bcopy $1 1 &breplacestr 1 -1
    bunset &breplacestr
  }
}

Just posting it if people are interested and to illustrate the 'amount' of work, it's not so hard to write such a script but that while loop is typically inefficient, maybe $bfind(&bvar,pos,text,/command|@win) could be a thing, with $1 filled as the position, but regardless I still think /breplace should be extended