|
Joined: Aug 2005
Posts: 31
Ameglian cow
|
OP
Ameglian cow
Joined: Aug 2005
Posts: 31 |
Hi all, Seems I'm stuck with this regex i'm trying to put in... I wanna check a string of text for the occurance of a string surrounded by 2 chracters that are either -, _, ( or ) Example: -HI- _HI_ (HI) For those who might be confused about the '$gettok(%regxfix,%a,32)' part, it's a list of strings that it should check for, seperated by a space (hence the $gettok on ascii 32) List is like: HI HELLO HO HALLO ELLOW ELLO In my echo's i only check for the ones with 2 characters, I am aware, it's just for testing purposes. For this use the following: if ($regex(%var,/(\- $+ $gettok(%regxfix,%a,32) $+ \-)|(\- $+ $gettok(%regxfix,%a,32) $+ \_)|(\_ $+ $gettok(%regxfix,%a,32) $+ \_)|(\_ $+ $gettok(%regxfix,%a,32) $+ \-)|(\( $+ $gettok(%regxfix,%a,32) $+ \))/g) != 0) { echo working } To test this in an echo I use: //echo -a $regex(blah blah -HI- there,/(\-[A-Z]{2}\-)|(\-[A-Z]{2}\_)|(\_[A-Z]{2}\_)|(\_[A-Z]{2}\-)|(\([A-Z]{2}\))/g) So far all that works... But now comes the tricky part. I want to add the checks for an occurance of -HI) or _HI) So I used as it should be obvious: if ($regex(%var,/(\- $+ $gettok(%regxfix,%a,32) $+ \-)|(\- $+ $gettok(%regxfix,%a,32) $+ \_)|(\_ $+ $gettok(%regxfix,%a,32) $+ \_)|(\_ $+ $gettok(%regxfix,%a,32) $+ \-)|(\_ $+ $gettok(%regxfix,%a,32) $+ \))|(\- $+ $gettok(%regxfix,%a,32) $+ \))|(\( $+ $gettok(%regxfix,%a,32) $+ \))/g) != 0) { echo working! } Again an echo to test: //echo -a $regex(blah blah -HI- there,/(\-[A-Z]{2}\-)|(\-[A-Z]{2}\_)|(\_[A-Z]{2}\_)|(\_[A-Z]{2}\-)|(\-[A-Z]{2}\))|(\([A-Z]{2}\))/g) But that results in 0 and not 1 as it should... I can't find out why, so I'm hoping one of you guys will ;-) Thanks alot in advance!
|
|
|
|
Joined: Jan 2007
Posts: 259
Fjord artisan
|
Fjord artisan
Joined: Jan 2007
Posts: 259 |
You can use (HI|ELLO|ETC|ETC) to check for different strings, i.e:
alias matchtest {
if ($regex($1-,/(?:_|\-|\()(HI|HELLO|ELLO)(?:_|\-|\))/mg)) {
echo -a match: $regml(1), Total matches: $v1
}
elseecho -a -.-
}
Then you can use /(?:(?: $+ %sur1 $+ )( $+ $replace(%regfix,$chr(32),$chr(124)) $+ )(?: $+ %sur2 $+ )/mg
where the variables are:
%sur1 = _|-|\(
%sur2 = _|-|\)
%regfix = HI HELLO HO HALLO ELLOW ELLO
and then the final code should be somthing like:
alias matchtest {
if ($regex($1-,/(?: $+ %sur1 $+ )( $+ $replace(%regfix,$chr(32),$chr(124)) $+ )( $+ %sur2 $+ )/mg)) {
echo 4 -a match: $regml(1), Total matches: $v1
}
else {
echo -a No matches
}
}
However, in this code: '_' does not seem to be picked up for some reason.
Last edited by Kardafol; 09/01/07 05:11 PM.
Those who can, cannot. Those who cannot, can.
|
|
|
|
Joined: Jan 2007
Posts: 259
Fjord artisan
|
Fjord artisan
Joined: Jan 2007
Posts: 259 |
This seems to fix it:
alias matchtest {
if ($regex($1-,/((?: $+ %sur1 $+ )(?: $+ $replace(%regfix,$chr(32),$chr(124)) $+ )(?: $+ %sur2 $+ ))/mg)) {
echo 4 -a match: $regml(1), Total matches: $v1
}
else {
echo -a No matches
}
}
Variables:
%sur1 = |_|\-|\(
%sur2 = |_|\-|\)
%regfix = HI HELLO HO HALLO ELLOW ELLO
Last edited by Kardafol; 09/01/07 05:12 PM.
Those who can, cannot. Those who cannot, can.
|
|
|
|
Joined: Sep 2003
Posts: 42
Ameglian cow
|
Ameglian cow
Joined: Sep 2003
Posts: 42 |
Again an echo to test: //echo -a $regex(blah blah -HI- there,/(\-[A-Z]{2}\-)|(\-[A-Z]{2}\_)|(\_[A-Z]{2}\_)|(\_[A-Z]{2}\-)|(\-[A-Z]{2}\))|(\([A-Z]{2}\))/g) But that results in 0 and not 1 as it should... I can't find out why, so I'm hoping one of you guys will ;-) Try to replace the "(" and ")" you check for with $chr(40) and $chr(41): //echo -a $regex(blah blah -HI- there,/(\-[A-Z]{2}\-)|(\-[A-Z]{2}\_)|(\_[A-Z]{2}\_)|(\_[A-Z]{2}\-)|(\-[A-Z]{2}\ $+ $chr(40) $+ )|(\ $+ $chr(40) $+ [A-Z]{2}\ $+ $chr(41) $+ )/g) Now the result is 1, hope that helps
|
|
|
|
Joined: Oct 2006
Posts: 166
Vogon poet
|
Vogon poet
Joined: Oct 2006
Posts: 166 |
Try this.
alias matchtext {
var %a = 1,%b = -x-._x-.(x-._x-._x_._x).(x-.(x_.(x),%r = $1
while $gettok(%b,%a,46) {
var %c = $ifmatch
echo 4 -a %a $+ . %c - $regex(%c,%r)
inc %a
}
}
/matchtext /(?:(\()[^()]+?(?(1)\))|(?:-|(_))[^-_]+?(?(2)_|-))/g so.. you can make an identifier simply like this.
alias matcheq var %r = /(?:(\()[^()]+?(?(1)\))|(?:-|(_))[^-_]+?(?(2)_|-))/g | return $regex($1,%r)
Kind Regards, blink
|
|
|
|
Joined: Oct 2006
Posts: 166
Vogon poet
|
Vogon poet
Joined: Oct 2006
Posts: 166 |
It's not that hard to imagine. you could easily make a non-capturing group and collect all your match patters within. something like /(?:1st match|2nd match|3rd match)/gcheck pattern below: /(?:\-[^-]+?\-|\_[^_]+?\_|\([^()]+?\))/g
Kind Regards, blink
|
|
|
|
Joined: Aug 2005
Posts: 31
Ameglian cow
|
OP
Ameglian cow
Joined: Aug 2005
Posts: 31 |
Thanks guys. I have now just put the regex pattern into another variable and run the script as $regex(%input,%pattern) This seems to do the trick too. Another small question still rises though... Example: //var %pattern = /([-_][A-Z]{2}[-_])/g | echo -a $regex(-HI-HO-HA-,%pattern) $regml(1) $regml(2) $regml(3) How can I get that regex to give me 3 replies? Since it 'should' match the middle -HO- too, but it seems when it has used a character to check the first instance, it cant use that same character again. Or am I missing something?
|
|
|
|
Joined: Jan 2007
Posts: 259
Fjord artisan
|
Fjord artisan
Joined: Jan 2007
Posts: 259 |
You could try somthing with $mid, using $regml().pos, for example: a while loop running through, removing everything before the occurance, and removing the occurance aswell. Somthing like:
var %rx = blah -HI--HI--HI- bla bla
while ($regex(%rx,regex)) {
var %mid = $calc($len($regml(1)) + $regml(1).pos + 1)
var %rx = $mid(%rx,%mid)
##save stuff here if needed (i.e where it occured, what occured, etc) or process it
}
##UNTESTED - you might have to remove the + 1 if it isnt working properly
Should work. Also, you can use $regsub. And if you want it to match strings that share the '-','_' or ')', '(':
var %rx = blah -HI--HI--HI- bla bla
while ($regex(%rx,regex)) {
var %mid = $calc($len($regml(1)) + $regml(1).pos + 1)
var %rx = - $+ $mid(%rx,%mid)
##save stuff here if needed (i.e where it occured, what occured, etc) or process it
}
However, if you use: -HO_HI-HO)HI) it would return: -HO_ -HI- -HO) -HI)
Last edited by Kardafol; 10/01/07 12:33 AM.
Those who can, cannot. Those who cannot, can.
|
|
|
|
Joined: Oct 2006
Posts: 166
Vogon poet
|
Vogon poet
Joined: Oct 2006
Posts: 166 |
You must capture - or _ because you want to check whenever if \1 in the end is the same at the start. alias matchmid if ($regex($1,/(?<=(-|_))([^-_]+)\1/g) = 3) return $regml(4)
Kind Regards, blink
|
|
|
|
|