|
Joined: Dec 2010
Posts: 26
Ameglian cow
|
OP
Ameglian cow
Joined: Dec 2010
Posts: 26 |
I have a script that uses sockets. But I always have the same trouble. Example, a piece of a html code: <span class="title">Just A random text</span> How can I get the piece of text by using regex or something else?
|
|
|
|
Joined: Jun 2007
Posts: 933
Hoopy frood
|
Hoopy frood
Joined: Jun 2007
Posts: 933 |
try... /noop $regex(title,%string,/<span class="title">(.+)<\/span>/ig) ...and then using... ...to retrieve the title, where N is 1, 2, 3, etc. for the first match, second match, etc.
|
|
|
|
Joined: Dec 2010
Posts: 26
Ameglian cow
|
OP
Ameglian cow
Joined: Dec 2010
Posts: 26 |
It works Another question: I need a timer to sockclose my socket, because otherwise it's closed before the information is received. But if I use: /timer 1 1 /sockclose [name] there wil be like 700 timers of closing the socket. Can someone help me with this problem.
|
|
|
|
Joined: Jul 2007
Posts: 1,129
Hoopy frood
|
Hoopy frood
Joined: Jul 2007
Posts: 1,129 |
Edit: I'm not sure why 5618 used "title" and the /g modifier when they aren't really needed.
Last edited by Tomao; 24/04/11 06:25 PM.
|
|
|
|
Joined: Jul 2007
Posts: 1,129
Hoopy frood
|
Hoopy frood
Joined: Jul 2007
Posts: 1,129 |
Another question: I need a timer to sockclose my socket, because otherwise it's closed before the information is received. But if I use: /timer 1 1 /sockclose [name] there wil be like 700 timers of closing the socket. Can someone help me with this problem. under your socketopen event, add sockwrite -nt $sockname Connection: close under your "Host:" this will prompt the connection to be closed after the info is received.
|
|
|
|
Joined: Oct 2004
Posts: 8,330
Hoopy frood
|
Hoopy frood
Joined: Oct 2004
Posts: 8,330 |
As long as you put a /sockclose command at the very end of your socket script and not anywhere else, it should not close before you're done.
There are also $htmlfree identifiers floating around (including on this forum), where it will strip all of the html tags from text rather than using different ones for different tags.
Invision Support #Invision on irc.irchighway.net
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
You need to make your regex ungreedy by changing ".+" to ".+?" like this: //noop $regex(title,%string,/<span class="title">(.+?)<\/span>/ig) You can see the difference between the two by typing these two commands: //var %string = <span class="title">abc</span><span class="anotherspan">def</span> | noop $regex(title,%string,/<span class="title">(.+?)<\/span>/ig) | echo -a $regml(title,1)
//var %string = <span class="title">abc</span><span class="anotherspan">def</span> | noop $regex(title,%string,/<span class="title">(.+)<\/span>/ig) | echo -a $regml(title,1) Mine (ungreedy): abc Yours (greedy): abc</span><span class="anotherspan">def A greedy regex matches as much data as it possibly can, while an ungreedy regex matches as little data as it possibly can. This is why yours matches up to the final </span> and mine only matches up to the first </span>
|
|
|
|
Joined: Jul 2007
Posts: 1,129
Hoopy frood
|
Hoopy frood
Joined: Jul 2007
Posts: 1,129 |
hixxy, if you already make it un-greedy, why do you keep the /g modifier? I believe you can use the /U modifier as well: /<span class="title">(.+)<\/span>/iU
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
I left /g in there so that it will support multiple span tags on one line, but you would need to loop through $regml(title,N)
|
|
|
|
Joined: Oct 2004
Posts: 8,330
Hoopy frood
|
Hoopy frood
Joined: Oct 2004
Posts: 8,330 |
To clarify, there are two different "greedy" at work here. Yours, Tomao, would be greedy in that it would take everything until the final matching text. So, as hixxy mentioned, you would capture everything to the last </span>. If you only have one on a line, that's not going to hurt anything, but if there are multiple </span> on a line, you will get unwanted data. You also won't be able to get $regml(2) because there will only every be one match.
With hixxy's, the greedy means to accept multiple matches. The match only goes to the first </span>. Then, the next match, if there is one, will go to the next one and so on. So you can then use $regml(N) to get multiple matches if there is more than one on the line.
One other note... /g really shouldn't ever be needed in this specific case. Normally, there should never be more than one <title> tag on a page. But it's a good thing to understand because it is very useful for other situations.
Invision Support #Invision on irc.irchighway.net
|
|
|
|
Joined: Jul 2006
Posts: 4,193
Hoopy frood
|
Hoopy frood
Joined: Jul 2006
Posts: 4,193 |
You're not clarifying anything, rather the opposite. There are not two different 'greedy' here, greedy has only one meaning: A greedy regex matches as much data as it possibly can, while an ungreedy regex matches as little data as it possibly can. that's all. Tomao: /g has nothing to do with greedy and /U only makes the whole expression ungreedy by default, where .+? would now represent a greedy search
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
Although "greedy" is probably a better word to describe the behaviour of the /g modifier in regex, it actually means "global" - this is probably so as not to confuse it with greedy matching. We're not matching <title>, we're matching a span tag with the class name of title: <span class="title"></span> and, as far as I know, you can have as many of those as you want on a page and it will still be valid HTML. Correct me if I'm wrong
|
|
|
|
Joined: Oct 2004
Posts: 8,330
Hoopy frood
|
Hoopy frood
Joined: Oct 2004
Posts: 8,330 |
@Wims: Ok, /g is global. I've heard it called greedy from enough people that I thought that was the correct name for it. Anyhow, take out the "greedy" from my previous post and it still explains the difference between the two things. There are two "things" going on rather than two kinds of "greedy". @hixxy: You're right. I was thinking <title> rather than <span...>. It's been a long time since I've done much with HTML and most of that was before <span> was used very much. So I forgot how it was being used. Oops. I don't think a well written site is likely to have more than one on a single line, though. There may be some valid reasons for it, but normally I don't think it would be necessary. Anyhow, /g isn't going to hurt anything whether it is needed or not.
Invision Support #Invision on irc.irchighway.net
|
|
|
|
Joined: Dec 2010
Posts: 26
Ameglian cow
|
OP
Ameglian cow
Joined: Dec 2010
Posts: 26 |
There are in total 8 matches. But instead of that $regml(title,2) and $regml(title,3) etc. show the second and third match etc., I just get 8 times a new $regml(title,1) I am now using the //noop $regex(title,%string,/<span class="title">(.+?)<\/span>/ig) code. Can anyone help me by getting the second match as $regml(2) and the third match as $regml(3) etc. instead of 8 times a different $regml(1)?
|
|
|
|
Joined: Oct 2004
Posts: 8,330
Hoopy frood
|
Hoopy frood
Joined: Oct 2004
Posts: 8,330 |
That works fine for what it is meant to do. If you have the following as the %string- <span class="title">text1</span><span class="title">text2</span><span class="title">text3</span> And then use that $regex, then: $regml(title,1) = text1 $regml(title,2) = text2 $regml(title,3) = text3 If you give us your %sting that has 8 matches in it, we can tell you if there's an issue.
Invision Support #Invision on irc.irchighway.net
|
|
|
|
Joined: Dec 2010
Posts: 26
Ameglian cow
|
OP
Ameglian cow
Joined: Dec 2010
Posts: 26 |
It's in Dutch, so you won't understand much of the text: /noop $regex(FCKnudde,%sockreader,/<span class="title">(.+?)<\/span>/ig) | if ($regml(FCKnudde,1) != $null) { /echo -a regml 1: $regml(FCKnudde,1) $+ , regml 2: $regml(FCKnudde2,2) $+ , regml 3: $regml(FCKnudde,3) $+ , regml 4: $regml(FCKnudde,4) Results in: regml 1: Ferguson lyrisch over Neuer, regml 2: , regml 3: , regml 4:
regml 1: Ajax laat Dost vallen en wil Matavz, regml 2: , regml 3: , regml 4:
regml 1: Bayern wordt steeds gezelliger, regml 2: , regml 3: , regml 4:
regml 1: Real Madrid verslaat Barcelona, regml 2: , regml 3: , regml 4:
regml 1: Van der Wiel voor 10 miljoen naar Bayern, regml 2: , regml 3: , regml 4:
regml 1: Van den Boog: Gaat verdomme goed met Ajax!, regml 2: , regml 3: , regml 4:
regml 1: Magische aanpak Andries Jonker werkt, regml 2: , regml 3: , regml 4:
regml 1: AZ jat 4e plek van ADO, regml 2: , regml 3: , regml 4:
|
|
|
|
Joined: Jun 2007
Posts: 933
Hoopy frood
|
Hoopy frood
Joined: Jun 2007
Posts: 933 |
That's probably because there are (at least) 8 /sockreads, all of which are run through /noop $regex() separately. So this has nothing to do with $regml() but rather with the number of (and appearance of <span class="title">...<\/span> in) /sockread until the receive buffer is empty.
|
|
|
|
Joined: Dec 2010
Posts: 26
Ameglian cow
|
OP
Ameglian cow
Joined: Dec 2010
Posts: 26 |
But can I change that or is that impossible?
|
|
|
|
Joined: Jun 2007
Posts: 933
Hoopy frood
|
Hoopy frood
Joined: Jun 2007
Posts: 933 |
Of course it can be changed. But what *do* you want the script to show? And you'll need to post the entire script.
|
|
|
|
Joined: Oct 2004
Posts: 8,330
Hoopy frood
|
Hoopy frood
Joined: Oct 2004
Posts: 8,330 |
In addition to your answers to 5618, $regml(title,N) where N is a number will only populate if everything is in a single variable/line. If they are all on separate lines (separate /sockreads), then there is only ever 1 match each time $regex() is called. That means $regml(title,2) and up will be $null.
If you want them all on one line (which may end making the line too long to display), then you'd want to set a variable and add to it...
sockread #1 sets variable to $regml(title,1) sockread #2 sets variable to variable, $regml(title,2) etc.
In the end, you have a variable with all of the titles listed.
Invision Support #Invision on irc.irchighway.net
|
|
|
|
|