mIRC Home    About    Download    Register    News    Help

Print Thread
Page 1 of 3 1 2 3
#231572 24/04/11 02:28 PM
Joined: Dec 2010
Posts: 26
L
LMN Offline OP
Ameglian cow
OP Offline
Ameglian cow
L
Joined: Dec 2010
Posts: 26
I have a script that uses sockets. But I always have the same trouble. Example, a piece of a html code:
<span class="title">Just A random text</span>
How can I get the piece of text by using regex or something else?

Joined: Jun 2007
Posts: 933
5
Hoopy frood
Offline
Hoopy frood
5
Joined: Jun 2007
Posts: 933
try...
Code:
/noop $regex(title,%string,/<span class="title">(.+)<\/span>/ig)

...and then using...
Code:
$regml(title,N)

...to retrieve the title, where N is 1, 2, 3, etc. for the first match, second match, etc.

Joined: Dec 2010
Posts: 26
L
LMN Offline OP
Ameglian cow
OP Offline
Ameglian cow
L
Joined: Dec 2010
Posts: 26
It works smile

Another question:
I need a timer to sockclose my socket, because otherwise it's closed before the information is received. But if I use:
Code:
/timer 1 1 /sockclose [name]

there wil be like 700 timers of closing the socket. Can someone help me with this problem.

Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
Edit: I'm not sure why 5618 used "title" and the /g modifier when they aren't really needed.

Last edited by Tomao; 24/04/11 06:25 PM.
Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
Originally Posted By: LMN
Another question:
I need a timer to sockclose my socket, because otherwise it's closed before the information is received. But if I use:
Code:
/timer 1 1 /sockclose [name]

there wil be like 700 timers of closing the socket. Can someone help me with this problem.
under your socketopen event, add
Code:
sockwrite -nt $sockname Connection: close
under your "Host:" this will prompt the connection to be closed after the info is received.

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
As long as you put a /sockclose command at the very end of your socket script and not anywhere else, it should not close before you're done.

There are also $htmlfree identifiers floating around (including on this forum), where it will strip all of the html tags from text rather than using different ones for different tags.


Invision Support
#Invision on irc.irchighway.net
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
You need to make your regex ungreedy by changing ".+" to ".+?" like this:

Code:
//noop $regex(title,%string,/<span class="title">(.+?)<\/span>/ig)


You can see the difference between the two by typing these two commands:

Code:
//var %string = <span class="title">abc</span><span class="anotherspan">def</span> | noop $regex(title,%string,/<span class="title">(.+?)<\/span>/ig) | echo -a $regml(title,1)
//var %string = <span class="title">abc</span><span class="anotherspan">def</span> | noop $regex(title,%string,/<span class="title">(.+)<\/span>/ig) | echo -a $regml(title,1)


Mine (ungreedy): abc
Yours (greedy): abc</span><span class="anotherspan">def

A greedy regex matches as much data as it possibly can, while an ungreedy regex matches as little data as it possibly can. This is why yours matches up to the final </span> and mine only matches up to the first </span> smile

Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
hixxy, if you already make it un-greedy, why do you keep the /g modifier? I believe you can use the /U modifier as well:
Code:
/<span class="title">(.+)<\/span>/iU

Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
I left /g in there so that it will support multiple span tags on one line, but you would need to loop through $regml(title,N)

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
To clarify, there are two different "greedy" at work here. Yours, Tomao, would be greedy in that it would take everything until the final matching text. So, as hixxy mentioned, you would capture everything to the last </span>. If you only have one on a line, that's not going to hurt anything, but if there are multiple </span> on a line, you will get unwanted data. You also won't be able to get $regml(2) because there will only every be one match.

With hixxy's, the greedy means to accept multiple matches. The match only goes to the first </span>. Then, the next match, if there is one, will go to the next one and so on. So you can then use $regml(N) to get multiple matches if there is more than one on the line.

One other note... /g really shouldn't ever be needed in this specific case. Normally, there should never be more than one <title> tag on a page. But it's a good thing to understand because it is very useful for other situations.


Invision Support
#Invision on irc.irchighway.net
Joined: Jul 2006
Posts: 4,193
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,193
You're not clarifying anything, rather the opposite.
There are not two different 'greedy' here, greedy has only one meaning:
Quote:
A greedy regex matches as much data as it possibly can, while an ungreedy regex matches as little data as it possibly can.
that's all.

Tomao: /g has nothing to do with greedy and /U only makes the whole expression ungreedy by default, where .+? would now represent a greedy search


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
Although "greedy" is probably a better word to describe the behaviour of the /g modifier in regex, it actually means "global" - this is probably so as not to confuse it with greedy matching.

We're not matching <title>, we're matching a span tag with the class name of title: <span class="title"></span> and, as far as I know, you can have as many of those as you want on a page and it will still be valid HTML. Correct me if I'm wrong smile

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
@Wims: Ok, /g is global. I've heard it called greedy from enough people that I thought that was the correct name for it. Anyhow, take out the "greedy" from my previous post and it still explains the difference between the two things. There are two "things" going on rather than two kinds of "greedy".

@hixxy: You're right. I was thinking <title> rather than <span...>. It's been a long time since I've done much with HTML and most of that was before <span> was used very much. So I forgot how it was being used. Oops. blush

I don't think a well written site is likely to have more than one on a single line, though. There may be some valid reasons for it, but normally I don't think it would be necessary. Anyhow, /g isn't going to hurt anything whether it is needed or not. smile


Invision Support
#Invision on irc.irchighway.net
Joined: Dec 2010
Posts: 26
L
LMN Offline OP
Ameglian cow
OP Offline
Ameglian cow
L
Joined: Dec 2010
Posts: 26
There are in total 8 matches. But instead of that $regml(title,2) and $regml(title,3) etc. show the second and third match etc., I just get 8 times a new $regml(title,1)
I am now using the
Code:
//noop $regex(title,%string,/<span class="title">(.+?)<\/span>/ig)
code.
Can anyone help me by getting the second match as $regml(2) and the third match as $regml(3) etc. instead of 8 times a different $regml(1)?

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
That works fine for what it is meant to do. If you have the following as the %string-

Code:
<span class="title">text1</span><span class="title">text2</span><span class="title">text3</span>


And then use that $regex, then:

$regml(title,1) = text1
$regml(title,2) = text2
$regml(title,3) = text3

If you give us your %sting that has 8 matches in it, we can tell you if there's an issue.


Invision Support
#Invision on irc.irchighway.net
Joined: Dec 2010
Posts: 26
L
LMN Offline OP
Ameglian cow
OP Offline
Ameglian cow
L
Joined: Dec 2010
Posts: 26
It's in Dutch, so you won't understand much of the text:
Code:
 /noop $regex(FCKnudde,%sockreader,/<span class="title">(.+?)<\/span>/ig) | if ($regml(FCKnudde,1) != $null) { /echo -a regml 1: $regml(FCKnudde,1) $+ , regml 2: $regml(FCKnudde2,2) $+ , regml 3: $regml(FCKnudde,3) $+ , regml 4: $regml(FCKnudde,4)

Results in:
Code:
regml 1: Ferguson lyrisch over Neuer, regml 2: , regml 3: , regml 4:
regml 1: Ajax laat Dost vallen en wil Matavz, regml 2: , regml 3: , regml 4:
regml 1: Bayern wordt steeds gezelliger, regml 2: , regml 3: , regml 4:
regml 1: Real Madrid verslaat Barcelona, regml 2: , regml 3: , regml 4:
regml 1: Van der Wiel voor 10 miljoen naar Bayern, regml 2: , regml 3: , regml 4:
regml 1: Van den Boog: Gaat verdomme goed met Ajax!, regml 2: , regml 3: , regml 4:
regml 1: Magische aanpak Andries Jonker werkt, regml 2: , regml 3: , regml 4:
regml 1: AZ jat 4e plek van ADO, regml 2: , regml 3: , regml 4:

Joined: Jun 2007
Posts: 933
5
Hoopy frood
Offline
Hoopy frood
5
Joined: Jun 2007
Posts: 933
That's probably because there are (at least) 8 /sockreads, all of which are run through /noop $regex() separately.
So this has nothing to do with $regml() but rather with the number of (and appearance of <span class="title">...<\/span> in) /sockread until the receive buffer is empty.

Joined: Dec 2010
Posts: 26
L
LMN Offline OP
Ameglian cow
OP Offline
Ameglian cow
L
Joined: Dec 2010
Posts: 26
But can I change that or is that impossible?

Joined: Jun 2007
Posts: 933
5
Hoopy frood
Offline
Hoopy frood
5
Joined: Jun 2007
Posts: 933
Of course it can be changed. But what *do* you want the script to show? And you'll need to post the entire script.

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
In addition to your answers to 5618, $regml(title,N) where N is a number will only populate if everything is in a single variable/line. If they are all on separate lines (separate /sockreads), then there is only ever 1 match each time $regex() is called. That means $regml(title,2) and up will be $null.

If you want them all on one line (which may end making the line too long to display), then you'd want to set a variable and add to it...

sockread #1 sets variable to $regml(title,1)
sockread #2 sets variable to variable, $regml(title,2)
etc.

In the end, you have a variable with all of the titles listed.


Invision Support
#Invision on irc.irchighway.net
Page 1 of 3 1 2 3

Link Copied to Clipboard