|
Joined: Jul 2006
Posts: 4,222
Hoopy frood
|
OP
Hoopy frood
Joined: Jul 2006
Posts: 4,222 |
While this may have been suggested in the past, I don't think it was pointed out why this is needed. One simple example which pushed me to write this: the default mIRC tool for log file does not allow me to search via regex, I could suggest for it here but in the meantime I still need the functionality. My log files are much bigger than 4150 characters, of course (so $regex is not an option). You must be thinking, why I am not using /filter, for example, well that leads to the problem, how do I correctly search multiple line at once? I simply cannot. Concrete example: I want to find someone saying 'hash table' in my log, but only when the next line contains, let's say 'efficiency'. In PCRE that's about using:
/^TIMESTAMP NICK .*?hash table.*\nTIMESTAMP NICK .*?efficiency/s
Where TIMESTAMP and NICK handle the format for my timestamp and nick decoration. Of course it's possible to workaround this, calling /filter -nk to find the first line and then check that the next line has what we want (would probably needs a $read call inside the custom alias called by /filter, this is terrible!) I also think this is long overdue, I don't see why that wasn't added before.
I was thinking about improving $bfind in this regard, $bfind(&bvar,pattern) seems fine, if pattern is only a number then users can use delimiters, which shouldn't break script.
However this is only good for searching, I suppose replacing would also be great.
$breplace(&binvar,a,o,more,replacement) - no regex replacement, would be equivalent to /breplace but works with string $breplace(&binvar,/(a)(.)/g,o \n \1).regex - regex replacement, behave the same as $regsubex with its new behavior regarding $regsub: returns the number of replacement made since returning the &binvar wouldn't be useful.
$breplacex would also be a thing, then.
These are just quick idea/syntax I came up with, it could be different as long as the functionality is there.
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
Joined: Aug 2016
Posts: 59
Babel fish
|
Babel fish
Joined: Aug 2016
Posts: 59 |
It would be very interesting to have an identifier that would work with &binvar as input and regex to work with that input, not only for replacement, but also for search, since $bfind does not work with &binvar as input.
rockcavera #Scripts @ irc.VirtuaLife.com.br
|
|
|
|
Joined: Jul 2006
Posts: 4,222
Hoopy frood
|
OP
Hoopy frood
Joined: Jul 2006
Posts: 4,222 |
Thanks for the support! since $bfind does not work with &binvar as input
$bfind does support binvar as input
Last edited by Wims; 20/11/17 02:50 PM.
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
Joined: Aug 2016
Posts: 59
Babel fish
|
Babel fish
Joined: Aug 2016
Posts: 59 |
Thanks for correcting me, Wims.
Actually I wanted to mention that $bfind does not work with regex.
Disregard the final part "... since $bfind does not work with &binvar as input."
rockcavera #Scripts @ irc.VirtuaLife.com.br
|
|
|
|
Joined: Jun 2008
Posts: 6
Nutrimatic drinks dispenser
|
Nutrimatic drinks dispenser
Joined: Jun 2008
Posts: 6 |
I also support this idea.
|
|
|
|
Joined: Jul 2013
Posts: 27
Ameglian cow
|
Ameglian cow
Joined: Jul 2013
Posts: 27 |
I also support this suggestion.
|
|
|
|
Joined: Apr 2010
Posts: 969
Hoopy frood
|
Hoopy frood
Joined: Apr 2010
Posts: 969 |
Bumping this thread as Im in need of this feature I'm currently working on an HTTP implementation. Within that implementation I need to verify header values are formatted correctly; A header's value is stored in a bvar and can be over the ~8k string-length limit imposed. I need to check if the header's value only contains values in the ASCII range(1-126). Currently this requires a (slow) loop for what amounts to a OR check of values: alias isAscii {
if (!$bvar($1, 0)) {
return $false
}
var %x = 1, %len = $bvar($1, 0)
while (%x < %len) {
inc %x
if ($bvar($1, %x) == 0 || $v1 > 126) {
return $false
}
}
return $true
} To amend Wim's suggestion what I'd like to see is: $bfind(&binvar, start-position, [end-position], [name], pattern).regex
Returns the starting position of the first found match
&binvar
The bvar to search
start-position
The starting position of which the search should begin
Must be an integer value
end-position - Optional
The end position of which the search should stop
Must be an integer value
name - Optional
The regex-name to use when referencing the match list via $regml()
Must not be a numerical value
pattern
The regex pattern
.regex
Indicates a regex pattern has been specified
$breplace(&binvar, substring, newstring...)
$breplace(&binvar, substring, newstring...).cs
Performs a text-based in-place substitution on a binary variable
Returns the number of substitutions made
if .cs is specified, the search will be case-sensitive
$breplace([name], &binvar, pattern, subtext).regex
Performs a regex-based in-place substitution on a binary variable
Returns the number of substitutions made
You can assign a name to a $breplace().regex call which you can use later in $regml() to retrieve the list of matches.
$regml([name], n, [&binvar])
$regmlex([name], m, n, [&binvar])
Similar to the current implementation except the result is output to the specified &binvar
If outputting to a &binvar, the length of the bvar is returned The reason I have choosen a new identifier over altering $replace/cs and $regsubex is that of the end-result differing. With current implementations, the replace creates a new string and once substitutions have finished, the new string is returned. The functionality I'd like to see is that of substitutions being performed in-place
Last edited by FroggieDaFrog; 21/01/20 02:30 PM.
|
|
|
|
Joined: Jan 2004
Posts: 2,127
Hoopy frood
|
Hoopy frood
Joined: Jan 2004
Posts: 2,127 |
Support for $bvar hex output and hex input for binary commands would be helpful when using $bfind().regex While normally $bfind returns the position of the match, the .regex prop makes it return the number of matches instead of a position within the binvar, so your regex pattern must use a capture group and you must use $regml([name,]N).pos to find that position. However to make a regex match, in a binary string which doesn't contain UTF8 text, requires using \xNN where NN is a hex value 00-FF Also, if /bset /breplace support input of replacement strings using the same hex alphabet, that would make things simpler and avoid miss-steaks. Perhaps a .hex prop for $bvar to change the output to 2-digit hex, and a -x switch so the /bcommands would see the byte values (but not the position) as hex. Here's some scripted aliases to sorta do what I'm suggesting.
alias bset {
if ($isid) { echo -sc info * No such identifier: $bset | halt }
if (-* iswm $1) {
if ((x isin $1) && (t isin $1)) { echo -sc info /bset don't use -x AND -t | halt }
var %pattern /([0-9a-fA-F]{1,2})/g
bset $1-3 $regsubex(foo,$4-,%pattern,$base(\t,16,10) $+ $chr(32))
}
else !bset $1-
}
//bset -x &v 1 61 62 63 | echo -a $bvar(&v,1-).text result: abc
alias breplace {
if ($isid) { echo -sc info * No such identifier: $breplace | halt }
if ($1 == -x) breplace $2 $regsubex(foo,$3-,/([0-9a-fA-F]+)/g,$base(\t,16,10))
else !breplace $1-
}
//bset &v 1 1 2 3 255 4 5 6 | breplace -x &v ff fe | echo -a $bvar(&v,1-) result: 1 2 3 254 4 5 6
alias bvar2hex {
if ($2 == $null) var %range 1- | else var %range $2 | if ($3) var %range $+($2,-,$calc($2 +$3 -1))
if ($1) return $regsubex($iif(&* iswm $1,$bvar($1,%range),$1),/(\d+)/g,$base(\t,10,16,2) $iif(dot isin $prop && (!$calc(\n % 8)),. $+ $chr(32)))
echo -sc info *bvar2hex(list of base10's) *$bvar2hex(&binvar) *$bvar2hex(&binvar,N|N-|N1-N2) *$bvar2hex(&binvar,offset,length) [.dot] formats output into groups of 8
}
//bset -x &v 1 $sha1(abc) | echo -a $bvar2hex(&v,1-) result: A9 99 3E 36 47 06 81 6A BA 3E 25 71 78 50 C2 6C 9C D0 D8 9D The /bset alias allows hex values to be bunched together, but if the token length is odd, the '0' is padded in front of the 1st output token created from it. //bset -x &v 1 abcde 1 2 34 5 | echo -a $bvar2hex(&v,1-) result: 0A BC DE 01 02 34 05 To improve readibility of the hex output, I added an option to have a period following every 8th byte. //bset &v 1 $regsubex(foo,$str(x,94),/x/g,$calc(32+ \n) $+ $chr(32)) | echo -a $bvar2hex(&v).dot If you have the base10 byte values and just want the hex equivalents: //echo -a $bvar2hex(00 11 22 33 44 55 66 77 88 99 111) result: 00 0B 16 21 2C 37 42 4D 58 63 6F It's hard for the $bvar2hex alias to estimate whether the output from range 1- fits within the line length, because the byte value obtained from $bvar has a variable 1-3 length each, while the hex output is fixed at 2 each. So, to be safe, should probably limit output to a range of $maxlenl *1/4 values.
|
|
|
|
Joined: Jul 2006
Posts: 4,222
Hoopy frood
|
OP
Hoopy frood
Joined: Jul 2006
Posts: 4,222 |
Not bad ideas but they don't belong to this thread about binvar and regex. To stay on topic, you said: While normally $bfind returns the position of the match, the .regex prop makes it return the number of matches instead of a position within the binvar, so your regex pattern must use a capture group and you must use $regml([name,]N).pos to find that position This is an extremely important point and I've been pointing it out for a long time: mIRC does not provide the fullmatch position (or its value anyway) and it really needs to, because some regex requires you to have a capture in ways that is not compatible with this workaround, so despite ().bytepos's addition in the next beta, it won't fix the problem of getting the correct position for $bfind().regex. So yeah, Khaled, please add the fullmatch result and position to $regml*() identifiers. Another thing is that $bfind().regex encodes the binary variable's content to utf8. This is wanted for someone coming from $regex() and having line length limit issue, someone using $regex(data,/\w+/u) wants his data to be utf8 encoded in $bfind(&data,1,/\w+/u).regex However, this is not wanted for someone who wants to get control over the byte, //bset &v 1 233 | echo -ag $bfind(&v,1,\xa9).regex is currently returning 1 because 233 gets encoded to 195 169, and a9 is 169. It doesn't make that much sense to do byte based search with $regex etc while mIRC is sending utf8 to pcre and will want to convert back to 16 bits array, and basically, it can be said that /u should have been enforced there. But now that we have a way to use binary variable, it makes no sense to touch the byte in the binvar when it's their purpose to be left untouched by mIRC. In any case, for backward compatible reason, we now need a way to make $bfind().regex not encode to utf8, but i'd rather see it changed since it's new, so that it doesn't encode to utf8 by default, and add something to make it encodes the data. And this is regardless of /u usage, if i want to inspect bytes, i won't use /u, if i do, i'll use /u, but I still want to encode the byte in the binvar myself: For example, $utfencode could be extended with a property to take the first parameter as a binary variable if it starts with a '&', so that current usage of $utfencode() with something that starts with a & still work. Edit: the just released beta does not utf8 encode the binvar anymore, but there's still no good way to encode it if we want to.
Last edited by Wims; 29/09/20 08:31 PM.
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
|