Register Log In

Forums Scripts & Popups Sockread, parsing loooong lines.. help :(

Print Thread

Sockread, parsing loooong lines.. help :( #162615 20/10/06 06:53 PM
K Karas
Karas K	Hi guys, I have to read and parse a long line from a website (I hate people who doesn't use \n :S).. this line is something like: T1[item1a]T2[item1b]T3[item1c]T4[item1d]T1[item2a]T2[item2b]T3[item2c]T4... Quote: <tr bgcolor="#ff0000"><td class="verdana"><a href="?link" title="blahblah">[item1a]</a></td><td class="verdana"><a href="?link2">#</a></td><td class="verdana">[item1b]</td><td class="verdana">blah</td><td class="verdana">[item1c]</td><td class="verdana">[item1d]</td></tr><tr bgcolor="#ff0000"><td class="verdana"><a href="?link" title="blahblah">[item2a]</a></td><td class="verdana"><a href="?link2">#</a></td><td class="verdana">[item2b]</td><td class="verdana">blah</td><td class="verdana">[item2c]</td><td class="verdana">[item2d]</td></tr>... This string is more than 950 characters long. Well, i've this script for now: Code: on *:sockread:sada:{ if ($sockerr) return var %t, %re = /regular expression/ sockread %t while ($sockbr) { if ($regex(%t, %re)) echo -a >> $regml(1) - $regml(2) - $regml(3) - $regml(4) sockread %t } } Sadly this isn't working like I want.. this script execs the regex on the 950 char string only.. and not on the whole line.. so i'm loosing data What can i do ? :I [sorry for the bad english.. i hope you'll undestard what's my problem, lol]

Re: Sockread, parsing loooong lines.. help :( #162616 20/10/06 09:49 PM
Joined: Oct 2004 Posts: 8,061 MA, USA R Riamus2 Hoopy frood
Riamus2 Hoopy frood R Joined: Oct 2004 Posts: 8,061 MA, USA	You will need to store the data in a binary variable. /help binary variables

Re: Sockread, parsing loooong lines.. help :( #162617 20/10/06 10:02 PM
K Karas
Karas K	Quote: You will need to store the data in a binary variable. /help binary variables Yeah, but then ? how can I split the line to retrive what im looking for ? :I

Re: Sockread, parsing loooong lines.. help :( #162618 20/10/06 10:34 PM
Joined: Oct 2004 Posts: 8,061 MA, USA R Riamus2 Hoopy frood
Riamus2 Hoopy frood R Joined: Oct 2004 Posts: 8,061 MA, USA	You should be able to work with the binary variable just like a normal variable. It's just able to be longer than normal. I'll let someone with more experience using them explain how best to do it with sockets. I've always just avoided such sites in my socket scripts.

Re: Sockread, parsing loooong lines.. help :( #162619 21/10/06 10:30 AM
K Karas
Karas K	lol.. I can't work with this variable because it is too long for the other commands :S dunno

Re: Sockread, parsing loooong lines.. help :( #162620 21/10/06 10:53 AM
Joined: Feb 2004 Posts: 2,013 Leuven, Belgium F FiberOPtics Hoopy frood
FiberOPtics Hoopy frood F Joined: Feb 2004 Posts: 2,013 Leuven, Belgium	Just read the section on binvars, that's all you need to know really. You'll use $bfind to find the position of a certain string near the part you want to actually retrieve. Then you'll use $bvar(&binvar,%position,N) to retrieve a chunk of this and use regular tools on it like $regex or whatever suits your purpose. After this, you'll increase the position, either by issuing another $bfind for the next occurance of the search string, or by whatever other means. I've recently made something to parse an insanely long html, here's an example (edited): Code: var %pos = 1, %t bread tmp 0 %size &in while ($bfind(&in,%pos,id="thread_title_)) { %pos = $v1 %t = $regex(a,$bvar(&in,%pos,930).text,/...../) inc %pos $regml(a,5).pos %t = $regex(b,$bvar(&in,%pos,930).text,/..../) ; do something with $regml(a,N) and $regml(b,N) } What I do here is look for the string ="thread_title_ in the binvar. If it is found, $bfind will return the position. I will use this position together with the $bvar identifier to take a chunk of it by issuing $bvar(&in,%pos,930).text and issue a regex on it to fill some $regmls. I know that a little further are more matches, but of course, they will be outside the 930 char range. So I increment the %pos with the position of the last matched captured expression, and take another chunk. After that, I know the next match will be very far, so I use $bfind again and set the %pos to that. Then again take a chunk, do regex on it, increase the position, take another chunk, and use $bfind... How you organise your code will be different for each case, I knew for this html that I would always have matches moderately close together, but groups of matches far from each other in the entire html. I use $bfind to hop in between groups, and once I'm in a group I just take chunks and manually increase the position. It could be different for other html, you may need to use more $bfinds etc. but you should understand the gist of my post.

Re: Sockread, parsing loooong lines.. help :( #162621 21/10/06 12:51 PM
K Karas
Karas K	thx a lot, it works

Link Copied to Clipboard