mIRC Home    About    Download    Register    News    Help

Print Thread
#28290 05/06/03 10:23 AM
Joined: May 2003
Posts: 730
S
ScatMan Offline OP
Hoopy frood
OP Offline
Hoopy frood
S
Joined: May 2003
Posts: 730
//var %x,%y = $regsub(<blah>a</blah>,/<.*>/g,,%x) | echo -a %x
it doesn't work, any1 know what is the problem with THIS??

#28291 05/06/03 01:03 PM
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
The problem is that the .* part is trying to match the maximum number of chars possible. So, in
<blah>a</blah>
the .* matches the green part.

You can start by adding the ? right after the .*, ie make the pattern like this:
/<.+?>/g
The ? is the ungreedy quantifier. It makes quantifiers like * and + consuming the minimum (instead of the max) number of chars possible. Read more about this in PCRE manual page. Another way would be this:
/<[^<>]+>/g

However, removing html tags perfectly is more complicated: tags that span across lines (or their closing tags do), tags with attributes etc. Making a regex to cover all cases is impossible, since you can't use $regex/$regsub for multiline matches. But for strings that contain both tags in one line and do not contain attributes, quotes etc the way I suggested should be enough.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
#28292 05/06/03 01:10 PM
Joined: May 2003
Posts: 730
S
ScatMan Offline OP
Hoopy frood
OP Offline
Hoopy frood
S
Joined: May 2003
Posts: 730
thanks smile
another thing if u can, i don't understand what is the difference between ? and ??, i read the manual it says that
? prefer 1
and ?? prefer 0
but that's not helping, i can't get the difference
can u explain me??



#28293 05/06/03 06:02 PM
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
I don't think I can explain it better than man.txt but I'll give it a shot. The first thing you have to understand is that the two ?'s in ?? do not do the same thing. The "?" character, when following a subpattern (which could be (?:abc|d), [a-z], . or whatever) it tells regex to match "zero or one time". IE, $regex(a,/^ab?/) returns one, because "ab?" matches "a once and b zero or one time." This means it matches "a" or "ab" respectively.

But this poses another question: which subpattern is tried first against the input string? "a" or "ab" ? In other words, does $regex() try to see if "ab" matches the input and when this fails THEN try "a" ? Or the opposite? The answer is that it depends on whether the second "?" was used. If not, $regex() tries "ab" first and if it doesn't match, it tries "a". This is the default behaviour of quantifiers (such as ?, *, +, {4,} etc): they try to match as much as possible. But if a "?" is appended to the quantifier, it makes the quantifier match as little as possible. This means that the input is first checked against "a" and if it doesn't match for some reason (and that reason is that the part of the regex following "eats" the "b"), it is checked against "ab". Here's an example to see the difference:

//!.echo -q $regex(ab,/^(ab?)/) | echo -s $regml(1)
it echoes "ab". This means that "ab" was first tried and succeded.

//!.echo -q $regex(ab,/^(ab??)/) | echo -s $regml(1)
this one echoes "a". This means that $regex() first tried "a" (because of the 2nd "?"), it saw that it matches and stopped, ie didn't try "ab".

//!.echo -q $regex(ab,/^(ab??)$/) | echo -s $regml(1)
this one echoes "ab", even though the 2nd "?" was used. That's because the "^" and "$" around (ab??) tell $regex() that the pattern must match the entire input string. But "a" doesn't match the entire string (which is ab) so $regex() then tries "ab", that matches.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
#28294 06/06/03 12:11 AM
Joined: May 2003
Posts: 730
S
ScatMan Offline OP
Hoopy frood
OP Offline
Hoopy frood
S
Joined: May 2003
Posts: 730
hmm, i think i get it but i'm not sure
if u can give me a little example that uses the ?? (but something that is useful so i can understand it better) i will appreciate it
thanks






Link Copied to Clipboard