Print Thread

regex with binvar #260478 27/04/17 03:28 PM
Joined: Jul 2006 Posts: 4,020 France W Wims OP Hoopy frood
OP Wims Hoopy frood W Joined: Jul 2006 Posts: 4,020 France	While this may have been suggested in the past, I don't think it was pointed out why this is needed. One simple example which pushed me to write this: the default mIRC tool for log file does not allow me to search via regex, I could suggest for it here but in the meantime I still need the functionality. My log files are much bigger than 4150 characters, of course (so $regex is not an option). You must be thinking, why I am not using /filter, for example, well that leads to the problem, how do I correctly search multiple line at once? I simply cannot. Concrete example: I want to find someone saying 'hash table' in my log, but only when the next line contains, let's say 'efficiency'. In PCRE that's about using: /^TIMESTAMP NICK .?hash table.\nTIMESTAMP NICK .*?efficiency/s Where TIMESTAMP and NICK handle the format for my timestamp and nick decoration. Of course it's possible to workaround this, calling /filter -nk to find the first line and then check that the next line has what we want (would probably needs a $read call inside the custom alias called by /filter, this is terrible!) I also think this is long overdue, I don't see why that wasn't added before. I was thinking about improving $bfind in this regard, $bfind(&bvar,pattern) seems fine, if pattern is only a number then users can use delimiters, which shouldn't break script. However this is only good for searching, I suppose replacing would also be great. $breplace(&binvar,a,o,more,replacement) - no regex replacement, would be equivalent to /breplace but works with string $breplace(&binvar,/(a)(.)/g,o \n \1).regex - regex replacement, behave the same as $regsubex with its new behavior regarding $regsub: returns the number of replacement made since returning the &binvar wouldn't be useful. $breplacex would also be a thing, then. These are just quick idea/syntax I came up with, it could be different as long as the functionality is there. #mircscripting @ irc.swiftirc.net == the best mIRC help channel

Re: regex with binvar Wims #261729 20/11/17 12:05 AM
Joined: Aug 2016 Posts: 57 Brazil R rockcavera Babel fish
rockcavera Babel fish R Joined: Aug 2016 Posts: 57 Brazil	It would be very interesting to have an identifier that would work with &binvar as input and regex to work with that input, not only for replacement, but also for search, since $bfind does not work with &binvar as input.

Re: regex with binvar rockcavera #261730 20/11/17 11:07 AM
Joined: Jul 2006 Posts: 4,020 France W Wims OP Hoopy frood
OP Wims Hoopy frood W Joined: Jul 2006 Posts: 4,020 France	Thanks for the support! Quote: since $bfind does not work with &binvar as input $bfind does support binvar as input Last edited by Wims; 20/11/17 02:50 PM. #mircscripting @ irc.swiftirc.net == the best mIRC help channel

Re: regex with binvar Wims #261731 20/11/17 08:43 PM
Joined: Aug 2016 Posts: 57 Brazil R rockcavera Babel fish
rockcavera Babel fish R Joined: Aug 2016 Posts: 57 Brazil	Thanks for correcting me, Wims. Actually I wanted to mention that $bfind does not work with regex. Disregard the final part "... since $bfind does not work with &binvar as input."

Re: regex with binvar Wims #261929 15/12/17 03:26 AM
D digitok
digitok D	I also support this idea.

Re: regex with binvar Wims #261944 15/12/17 04:42 PM
Joined: Jul 2013 Posts: 26 Belgique K kikuchi Ameglian cow
kikuchi Ameglian cow K Joined: Jul 2013 Posts: 26 Belgique	I also support this suggestion.

Re: regex with binvar

Wims #266720 21/01/20 01:08 PM

Joined: Apr 2010

Posts: 964

USA

FroggieDaFrog Offline

Hoopy frood

FroggieDaFrog

Hoopy frood

Joined: Apr 2010

Posts: 964

USA

Bumping this thread as Im in need of this feature

I'm currently working on an HTTP implementation. Within that implementation I need to verify header values are formatted correctly; A header's value is stored in a bvar and can be over the ~8k string-length limit imposed.

I need to check if the header's value only contains values in the ASCII range(1-126). Currently this requires a (slow) loop for what amounts to a OR check of values:

Code

alias isAscii {
  if (!$bvar($1, 0)) {
    return $false
  }

  var %x = 1, %len = $bvar($1, 0)
  while (%x < %len) {
    inc %x
    if ($bvar($1, %x) == 0 || $v1 > 126) {
      return $false
    }
  }

  return $true
}

To amend Wim's suggestion what I'd like to see is:

Code

$bfind(&binvar, start-position, [end-position], [name], pattern).regex
  Returns the starting position of the first found match

  &binvar
    The bvar to search

  start-position
    The starting position of which the search should begin
    Must be an integer value

  end-position - Optional
    The end position of which the search should stop
    Must be an integer value

  name - Optional
    The regex-name to use when referencing the match list via $regml()
    Must not be a numerical value

  pattern
    The regex pattern

  .regex
    Indicates a regex pattern has been specified


$breplace(&binvar, substring, newstring...)
$breplace(&binvar, substring, newstring...).cs
  Performs a text-based in-place substitution  on a binary variable
  Returns the number of substitutions made

  if .cs is specified, the search will be case-sensitive


$breplace([name], &binvar, pattern, subtext).regex
  Performs a regex-based in-place substitution on a binary variable
  Returns the number of substitutions made

  You can assign a name to a $breplace().regex call which you can use later in $regml() to retrieve the list of matches.


$regml([name], n, [&binvar])
$regmlex([name], m, n, [&binvar])
    Similar to the current implementation except the result is output to the specified &binvar
    If outputting to a &binvar, the length of the bvar is returned

The reason I have choosen a new identifier over altering $replace/cs and $regsubex is that of the end-result differing. With current implementations, the replace creates a new string and once substitutions have finished, the new string is returned. The functionality I'd like to see is that of substitutions being performed in-place

Last edited by FroggieDaFrog; 21/01/20 02:30 PM.

$bvar().hex /bset\|/breplace -x hex FroggieDaFrog #267625 23/08/20 03:22 AM
Joined: Jan 2004 Posts: 2,081 M maroon Hoopy frood
maroon Hoopy frood M Joined: Jan 2004 Posts: 2,081	Support for $bvar hex output and hex input for binary commands would be helpful when using $bfind().regex While normally $bfind returns the position of the match, the .regex prop makes it return the number of matches instead of a position within the binvar, so your regex pattern must use a capture group and you must use $regml([name,]N).pos to find that position. However to make a regex match, in a binary string which doesn't contain UTF8 text, requires using \xNN where NN is a hex value 00-FF Also, if /bset /breplace support input of replacement strings using the same hex alphabet, that would make things simpler and avoid miss-steaks. Perhaps a .hex prop for $bvar to change the output to 2-digit hex, and a -x switch so the /bcommands would see the byte values (but not the position) as hex. Here's some scripted aliases to sorta do what I'm suggesting. Code alias bset { if ($isid) { echo -sc info * No such identifier: $bset \| halt } if (-* iswm $1) { if ((x isin $1) && (t isin $1)) { echo -sc info /bset don't use -x AND -t \| halt } var %pattern /([0-9a-fA-F]{1,2})/g bset $1-3 $regsubex(foo,$4-,%pattern,$base(\t,16,10) $+ $chr(32)) } else !bset $1- } //bset -x &v 1 61 62 63 \| echo -a $bvar(&v,1-).text result: abc Code alias breplace { if ($isid) { echo -sc info * No such identifier: $breplace \| halt } if ($1 == -x) breplace $2 $regsubex(foo,$3-,/([0-9a-fA-F]+)/g,$base(\t,16,10)) else !breplace $1- } //bset &v 1 1 2 3 255 4 5 6 \| breplace -x &v ff fe \| echo -a $bvar(&v,1-) result: 1 2 3 254 4 5 6 Code alias bvar2hex { if ($2 == $null) var %range 1- \| else var %range $2 \| if ($3) var %range $+($2,-,$calc($2 +$3 -1)) if ($1) return $regsubex($iif(&* iswm $1,$bvar($1,%range),$1),/(\d+)/g,$base(\t,10,16,2) $iif(dot isin $prop && (!$calc(\n % 8)),. $+ $chr(32))) echo -sc info bvar2hex(list of base10's) $bvar2hex(&binvar) $bvar2hex(&binvar,N\|N-\|N1-N2) $bvar2hex(&binvar,offset,length) [.dot] formats output into groups of 8 } //bset -x &v 1 $sha1(abc) \| echo -a $bvar2hex(&v,1-) result: A9 99 3E 36 47 06 81 6A BA 3E 25 71 78 50 C2 6C 9C D0 D8 9D The /bset alias allows hex values to be bunched together, but if the token length is odd, the '0' is padded in front of the 1st output token created from it. //bset -x &v 1 abcde 1 2 34 5 \| echo -a $bvar2hex(&v,1-) result: 0A BC DE 01 02 34 05 To improve readibility of the hex output, I added an option to have a period following every 8th byte. //bset &v 1 $regsubex(foo,$str(x,94),/x/g,$calc(32+ \n) $+ $chr(32)) \| echo -a $bvar2hex(&v).dot If you have the base10 byte values and just want the hex equivalents: //echo -a $bvar2hex(00 11 22 33 44 55 66 77 88 99 111) result: 00 0B 16 21 2C 37 42 4D 58 63 6F It's hard for the $bvar2hex alias to estimate whether the output from range 1- fits within the line length, because the byte value obtained from $bvar has a variable 1-3 length each, while the hex output is fixed at 2 each. So, to be safe, should probably limit output to a range of $maxlenl *1/4 values.

Re: $bvar().hex /bset\|/breplace -x hex maroon #267794 29/09/20 04:14 PM
Joined: Jul 2006 Posts: 4,020 France W Wims OP Hoopy frood
OP Wims Hoopy frood W Joined: Jul 2006 Posts: 4,020 France	Not bad ideas but they don't belong to this thread about binvar and regex. To stay on topic, you said: Quote While normally $bfind returns the position of the match, the .regex prop makes it return the number of matches instead of a position within the binvar, so your regex pattern must use a capture group and you must use $regml([name,]N).pos to find that position This is an extremely important point and I've been pointing it out for a long time: mIRC does not provide the fullmatch position (or its value anyway) and it really needs to, because some regex requires you to have a capture in ways that is not compatible with this workaround, so despite ().bytepos's addition in the next beta, it won't fix the problem of getting the correct position for $bfind().regex. So yeah, Khaled, please add the fullmatch result and position to $regml*() identifiers. Another thing is that $bfind().regex encodes the binary variable's content to utf8. This is wanted for someone coming from $regex() and having line length limit issue, someone using $regex(data,/\w+/u) wants his data to be utf8 encoded in $bfind(&data,1,/\w+/u).regex However, this is not wanted for someone who wants to get control over the byte, //bset &v 1 233 \| echo -ag $bfind(&v,1,\xa9).regex is currently returning 1 because 233 gets encoded to 195 169, and a9 is 169. It doesn't make that much sense to do byte based search with $regex etc while mIRC is sending utf8 to pcre and will want to convert back to 16 bits array, and basically, it can be said that /u should have been enforced there. But now that we have a way to use binary variable, it makes no sense to touch the byte in the binvar when it's their purpose to be left untouched by mIRC. In any case, for backward compatible reason, we now need a way to make $bfind().regex not encode to utf8, but i'd rather see it changed since it's new, so that it doesn't encode to utf8 by default, and add something to make it encodes the data. And this is regardless of /u usage, if i want to inspect bytes, i won't use /u, if i do, i'll use /u, but I still want to encode the byte in the binvar myself: For example, $utfencode could be extended with a property to take the first parameter as a binary variable if it starts with a '&', so that current usage of $utfencode() with something that starts with a & still work. Edit: the just released beta does not utf8 encode the binvar anymore, but there's still no good way to encode it if we want to. Last edited by Wims; 29/09/20 08:31 PM. #mircscripting @ irc.swiftirc.net == the best mIRC help channel

Link Copied to Clipboard

Forums Feature Suggestions regex with binvar