Sockread, parsing loooong lines.. help :(

Hi guys, I have to read and parse a long line from a website (I hate people who doesn't use \n :S).. this line is something like:
T1[item1a]T2[item1b]T3[item1c]T4[item1d]T1[item2a]T2[item2b]T3[item2c]T4...

Quote:

<tr bgcolor="#ff0000"><td class="verdana"><a href="?link" title="blahblah">[item1a]</a></td><td class="verdana"><a href="?link2">#</a></td><td class="verdana">[item1b]</td><td class="verdana">blah</td><td class="verdana">[item1c]</td><td class="verdana">[item1d]</td></tr><tr bgcolor="#ff0000"><td class="verdana"><a href="?link" title="blahblah">[item2a]</a></td><td class="verdana"><a href="?link2">#</a></td><td class="verdana">[item2b]</td><td class="verdana">blah</td><td class="verdana">[item2c]</td><td class="verdana">[item2d]</td></tr>...

This string is more than 950 characters long.

Well, i've this script for now:

Code:

on *:sockread:sada:{
  if ($sockerr) return
  var %t, %re = /regular expression/  
  sockread %t
  while ($sockbr) {
    if ($regex(%t, %re)) echo -a &gt;&gt; $regml(1) - $regml(2) - $regml(3) - $regml(4)
    sockread %t
  }
}

Sadly this isn't working like I want.. this script execs the regex on the 950 char string only.. and not on the whole line.. so i'm loosing data

What can i do ? :I

[sorry for the bad english.. i hope you'll undestard what's my problem, lol]

You will need to store the data in a binary variable.

/help binary variables

Quote:

You will need to store the data in a binary variable.

/help binary variables

Yeah, but then ?
how can I split the line to retrive what im looking for ? :I

You should be able to work with the binary variable just like a normal variable. It's just able to be longer than normal. I'll let someone with more experience using them explain how best to do it with sockets. I've always just avoided such sites in my socket scripts.

lol.. I can't work with this variable because it is too long for the other commands :S

dunno

Just read the section on binvars, that's all you need to know really.

You'll use $bfind to find the position of a certain string near the part you want to actually retrieve. Then you'll use $bvar(&binvar,%position,N) to retrieve a chunk of this and use regular tools on it like $regex or whatever suits your purpose. After this, you'll increase the position, either by issuing another $bfind for the next occurance of the search string, or by whatever other means.

I've recently made something to parse an insanely long html, here's an example (edited):

Code:

 var %pos = 1, %t
 bread tmp 0 %size &amp;in
 while ($bfind(&amp;in,%pos,id="thread_title_)) {
    %pos = $v1
    %t = $regex(a,$bvar(&amp;in,%pos,930).text,/...../)
    inc %pos $regml(a,5).pos
    %t = $regex(b,$bvar(&amp;in,%pos,930).text,/..../)
    ; do something with $regml(a,N) and $regml(b,N)
  }

What I do here is look for the string ="thread_title_ in the binvar. If it is found, $bfind will return the position. I will use this position together with the $bvar identifier to take a chunk of it by issuing $bvar(&in,%pos,930).text and issue a regex on it to fill some $regmls. I know that a little further are more matches, but of course, they will be outside the 930 char range. So I increment the %pos with the position of the last matched captured expression, and take another chunk.

After that, I know the next match will be very far, so I use $bfind again and set the %pos to that. Then again take a chunk, do regex on it, increase the position, take another chunk, and use $bfind...

How you organise your code will be different for each case, I knew for this html that I would always have matches moderately close together, but groups of matches far from each other in the entire html. I use $bfind to hop in between groups, and once I'm in a group I just take chunks and manually increase the position. It could be different for other html, you may need to use more $bfinds etc. but you should understand the gist of my post.

thx a lot, it works