mIRC Homepage

regex with binvar

Posted By: Wims

regex with binvar - 27/04/17 03:28 PM

While this may have been suggested in the past, I don't think it was pointed out why this is needed.
One simple example which pushed me to write this: the default mIRC tool for log file does not allow me to search via regex, I could suggest for it here but in the meantime I still need the functionality.
My log files are much bigger than 4150 characters, of course (so $regex is not an option).
You must be thinking, why I am not using /filter, for example, well that leads to the problem, how do I correctly search multiple line at once? I simply cannot.
Concrete example: I want to find someone saying 'hash table' in my log, but only when the next line contains, let's say 'efficiency'.
In PCRE that's about using:

/^TIMESTAMP NICK .*?hash table.*\nTIMESTAMP NICK .*?efficiency/s

Where TIMESTAMP and NICK handle the format for my timestamp and nick decoration.
Of course it's possible to workaround this, calling /filter -nk to find the first line and then check that the next line has what we want (would probably needs a $read call inside the custom alias called by /filter, this is terrible!)
I also think this is long overdue, I don't see why that wasn't added before.

I was thinking about improving $bfind in this regard, $bfind(&bvar,pattern) seems fine, if pattern is only a number then users can use delimiters, which shouldn't break script.

However this is only good for searching, I suppose replacing would also be great.

$breplace(&binvar,a,o,more,replacement) - no regex replacement, would be equivalent to /breplace but works with string
$breplace(&binvar,/(a)(.)/g,o \n \1).regex - regex replacement, behave the same as $regsubex with its new behavior regarding $regsub: returns the number of replacement made since returning the &binvar wouldn't be useful.

$breplacex would also be a thing, then.

These are just quick idea/syntax I came up with, it could be different as long as the functionality is there.
Posted By: rockcavera

Re: regex with binvar - 20/11/17 12:05 AM

It would be very interesting to have an identifier that would work with &binvar as input and regex to work with that input, not only for replacement, but also for search, since $bfind does not work with &binvar as input.
Posted By: Wims

Re: regex with binvar - 20/11/17 11:07 AM

Thanks for the support!
since $bfind does not work with &binvar as input
$bfind does support binvar as input
Posted By: rockcavera

Re: regex with binvar - 20/11/17 08:43 PM

Thanks for correcting me, Wims.

Actually I wanted to mention that $bfind does not work with regex.

Disregard the final part "... since $bfind does not work with &binvar as input."
Posted By: digitok

Re: regex with binvar - 15/12/17 03:26 AM

I also support this idea.
Posted By: kikuchi

Re: regex with binvar - 15/12/17 04:42 PM

I also support this suggestion.
Posted By: FroggieDaFrog

Re: regex with binvar - 21/01/20 01:08 PM

Bumping this thread as Im in need of this feature

I'm currently working on an HTTP implementation. Within that implementation I need to verify header values are formatted correctly; A header's value is stored in a bvar and can be over the ~8k string-length limit imposed.

I need to check if the header's value only contains values in the ASCII range(1-126). Currently this requires a (slow) loop for what amounts to a OR check of values:
alias isAscii {
  if (!$bvar($1, 0)) {
    return $false

  var %x = 1, %len = $bvar($1, 0)
  while (%x < %len) {
    inc %x
    if ($bvar($1, %x) == 0 || $v1 > 126) {
      return $false

  return $true

To amend Wim's suggestion what I'd like to see is:
$bfind(&binvar, start-position, [end-position], [name], pattern).regex
  Returns the starting position of the first found match

    The bvar to search

    The starting position of which the search should begin
    Must be an integer value

  end-position - Optional
    The end position of which the search should stop
    Must be an integer value

  name - Optional
    The regex-name to use when referencing the match list via $regml()
    Must not be a numerical value

    The regex pattern

    Indicates a regex pattern has been specified

$breplace(&binvar, substring, newstring...)
$breplace(&binvar, substring, newstring...).cs
  Performs a text-based in-place substitution  on a binary variable
  Returns the number of substitutions made

  if .cs is specified, the search will be case-sensitive

$breplace([name], &binvar, pattern, subtext).regex
  Performs a regex-based in-place substitution on a binary variable
  Returns the number of substitutions made

  You can assign a name to a $breplace().regex call which you can use later in $regml() to retrieve the list of matches.

$regml([name], n, [&binvar])
$regmlex([name], m, n, [&binvar])
    Similar to the current implementation except the result is output to the specified &binvar
    If outputting to a &binvar, the length of the bvar is returned

The reason I have choosen a new identifier over altering $replace/cs and $regsubex is that of the end-result differing. With current implementations, the replace creates a new string and once substitutions have finished, the new string is returned. The functionality I'd like to see is that of substitutions being performed in-place
Posted By: maroon

$bvar().hex /bset|/breplace -x hex - 23/08/20 03:22 AM

Support for $bvar hex output and hex input for binary commands would be helpful when using $bfind().regex

While normally $bfind returns the position of the match, the .regex prop makes it return the number of matches instead of a position within the binvar, so your regex pattern must use a capture group and you must use $regml([name,]N).pos to find that position. However to make a regex match, in a binary string which doesn't contain UTF8 text, requires using \xNN where NN is a hex value 00-FF

Also, if /bset /breplace support input of replacement strings using the same hex alphabet, that would make things simpler and avoid miss-steaks.

Perhaps a .hex prop for $bvar to change the output to 2-digit hex, and a -x switch so the /bcommands would see the byte values (but not the position) as hex.

Here's some scripted aliases to sorta do what I'm suggesting.

alias bset {
  if ($isid) { echo -sc info * No such identifier: $bset | halt }
  if (-* iswm $1) {
    if ((x isin $1) && (t isin $1)) { echo -sc info /bset don't use -x AND -t | halt }
    var %pattern /([0-9a-fA-F]{1,2})/g
    bset $1-3 $regsubex(foo,$4-,%pattern,$base(\t,16,10) $+ $chr(32))
  else !bset $1-

//bset -x &v 1 61 62 63 | echo -a $bvar(&v,1-).text
result: abc

alias breplace {
  if ($isid) { echo -sc info * No such identifier: $breplace | halt }
  if ($1 == -x) breplace $2 $regsubex(foo,$3-,/([0-9a-fA-F]+)/g,$base(\t,16,10))
  else !breplace $1-

//bset &v 1 1 2 3 255 4 5 6 | breplace -x &v ff fe | echo -a $bvar(&v,1-)
result: 1 2 3 254 4 5 6

alias bvar2hex {
  if ($2 == $null) var %range 1- | else var %range $2 | if ($3) var %range $+($2,-,$calc($2 +$3 -1))
  if ($1) return $regsubex($iif(&* iswm $1,$bvar($1,%range),$1),/(\d+)/g,$base(\t,10,16,2) $iif(dot isin $prop && (!$calc(\n % 8)),. $+ $chr(32)))
  echo -sc info *bvar2hex(list of base10's) *$bvar2hex(&binvar) *$bvar2hex(&binvar,N|N-|N1-N2) *$bvar2hex(&binvar,offset,length) [.dot] formats output into groups of 8

//bset -x &v 1 $sha1(abc) | echo -a $bvar2hex(&v,1-)
result: A9 99 3E 36 47 06 81 6A BA 3E 25 71 78 50 C2 6C 9C D0 D8 9D

The /bset alias allows hex values to be bunched together, but if the token length is odd, the '0' is padded in front of the 1st output token created from it.

//bset -x &v 1 abcde 1 2 34 5 | echo -a $bvar2hex(&v,1-)
result: 0A BC DE 01 02 34 05

To improve readibility of the hex output, I added an option to have a period following every 8th byte.

//bset &v 1 $regsubex(foo,$str(x,94),/x/g,$calc(32+ \n) $+ $chr(32)) | echo -a $bvar2hex(&v).dot

If you have the base10 byte values and just want the hex equivalents:

//echo -a $bvar2hex(00 11 22 33 44 55 66 77 88 99 111)
result: 00 0B 16 21 2C 37 42 4D 58 63 6F

It's hard for the $bvar2hex alias to estimate whether the output from range 1- fits within the line length, because the byte value obtained from $bvar has a variable 1-3 length each, while the hex output is fixed at 2 each. So, to be safe, should probably limit output to a range of $maxlenl *1/4 values.
Posted By: Wims

Re: $bvar().hex /bset|/breplace -x hex - 29/09/20 04:14 PM

Not bad ideas but they don't belong to this thread about binvar and regex.

To stay on topic, you said:

While normally $bfind returns the position of the match, the .regex prop makes it return the number of matches instead of a position within the binvar, so your regex pattern must use a capture group and you must use $regml([name,]N).pos to find that position
This is an extremely important point and I've been pointing it out for a long time: mIRC does not provide the fullmatch position (or its value anyway) and it really needs to, because some regex requires you to have a capture in ways that is not compatible with this workaround, so despite ().bytepos's addition in the next beta, it won't fix the problem of getting the correct position for $bfind().regex. So yeah, Khaled, please add the fullmatch result and position to $regml*() identifiers.

Another thing is that $bfind().regex encodes the binary variable's content to utf8.
This is wanted for someone coming from $regex() and having line length limit issue, someone using $regex(data,/\w+/u) wants his data to be utf8 encoded in $bfind(&data,1,/\w+/u).regex
However, this is not wanted for someone who wants to get control over the byte, //bset &v 1 233 | echo -ag $bfind(&v,1,\xa9).regex is currently returning 1 because 233 gets encoded to 195 169, and a9 is 169.

It doesn't make that much sense to do byte based search with $regex etc while mIRC is sending utf8 to pcre and will want to convert back to 16 bits array, and basically, it can be said that /u should have been enforced there.
But now that we have a way to use binary variable, it makes no sense to touch the byte in the binvar when it's their purpose to be left untouched by mIRC.

In any case, for backward compatible reason, we now need a way to make $bfind().regex not encode to utf8, but i'd rather see it changed since it's new, so that it doesn't encode to utf8 by default, and add something to make it encodes the data.
And this is regardless of /u usage, if i want to inspect bytes, i won't use /u, if i do, i'll use /u, but I still want to encode the byte in the binvar myself:
For example, $utfencode could be extended with a property to take the first parameter as a binary variable if it starts with a '&', so that current usage of $utfencode() with something that starts with a & still work.

Edit: the just released beta does not utf8 encode the binvar anymore, but there's still no good way to encode it if we want to.
© 2021 mIRC Discussion Forums