mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Jul 2005
Posts: 40
K
Karas Offline OP
Ameglian cow
OP Offline
Ameglian cow
K
Joined: Jul 2005
Posts: 40
Hi guys, I have to read and parse a long line from a website (I hate people who doesn't use \n :S).. this line is something like:
T1[item1a]T2[item1b]T3[item1c]T4[item1d]T1[item2a]T2[item2b]T3[item2c]T4...

Quote:
<tr bgcolor="#ff0000"><td class="verdana"><a href="?link" title="blahblah">[item1a]</a></td><td class="verdana"><a href="?link2">#</a></td><td class="verdana">[item1b]</td><td class="verdana">blah</td><td class="verdana">[item1c]</td><td class="verdana">[item1d]</td></tr><tr bgcolor="#ff0000"><td class="verdana"><a href="?link" title="blahblah">[item2a]</a></td><td class="verdana"><a href="?link2">#</a></td><td class="verdana">[item2b]</td><td class="verdana">blah</td><td class="verdana">[item2c]</td><td class="verdana">[item2d]</td></tr>...


This string is more than 950 characters long.

Well, i've this script for now:
Code:
on *:sockread:sada:{
  if ($sockerr) return
  var %t, %re = /regular expression/  
  sockread %t
  while ($sockbr) {
    if ($regex(%t, %re)) echo -a &gt;&gt; $regml(1) - $regml(2) - $regml(3) - $regml(4)
    sockread %t
  }
}


Sadly this isn't working like I want.. this script execs the regex on the 950 char string only.. and not on the whole line.. so i'm loosing data frown

What can i do ? :I

[sorry for the bad english.. i hope you'll undestard what's my problem, lol]


j0k3r @ k4s.ch
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
You will need to store the data in a binary variable.

/help binary variables


Invision Support
#Invision on irc.irchighway.net
Joined: Jul 2005
Posts: 40
K
Karas Offline OP
Ameglian cow
OP Offline
Ameglian cow
K
Joined: Jul 2005
Posts: 40
Quote:
You will need to store the data in a binary variable.

/help binary variables


Yeah, but then ?
how can I split the line to retrive what im looking for ? :I


j0k3r @ k4s.ch
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
You should be able to work with the binary variable just like a normal variable. It's just able to be longer than normal. I'll let someone with more experience using them explain how best to do it with sockets. I've always just avoided such sites in my socket scripts. laugh


Invision Support
#Invision on irc.irchighway.net
Joined: Jul 2005
Posts: 40
K
Karas Offline OP
Ameglian cow
OP Offline
Ameglian cow
K
Joined: Jul 2005
Posts: 40
lol.. I can't work with this variable because it is too long for the other commands :S

dunno smirk


j0k3r @ k4s.ch
Joined: Feb 2004
Posts: 2,019
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2004
Posts: 2,019
Just read the section on binvars, that's all you need to know really.

You'll use $bfind to find the position of a certain string near the part you want to actually retrieve. Then you'll use $bvar(&binvar,%position,N) to retrieve a chunk of this and use regular tools on it like $regex or whatever suits your purpose. After this, you'll increase the position, either by issuing another $bfind for the next occurance of the search string, or by whatever other means.

I've recently made something to parse an insanely long html, here's an example (edited):

Code:
 var %pos = 1, %t
 bread tmp 0 %size &amp;in
 while ($bfind(&amp;in,%pos,id="thread_title_)) {
    %pos = $v1
    %t = $regex(a,$bvar(&amp;in,%pos,930).text,/...../)
    inc %pos $regml(a,5).pos
    %t = $regex(b,$bvar(&amp;in,%pos,930).text,/..../)
    ; do something with $regml(a,N) and $regml(b,N)
  }

What I do here is look for the string ="thread_title_ in the binvar. If it is found, $bfind will return the position. I will use this position together with the $bvar identifier to take a chunk of it by issuing $bvar(&in,%pos,930).text and issue a regex on it to fill some $regmls. I know that a little further are more matches, but of course, they will be outside the 930 char range. So I increment the %pos with the position of the last matched captured expression, and take another chunk.

After that, I know the next match will be very far, so I use $bfind again and set the %pos to that. Then again take a chunk, do regex on it, increase the position, take another chunk, and use $bfind...

How you organise your code will be different for each case, I knew for this html that I would always have matches moderately close together, but groups of matches far from each other in the entire html. I use $bfind to hop in between groups, and once I'm in a group I just take chunks and manually increase the position. It could be different for other html, you may need to use more $bfinds etc. but you should understand the gist of my post.


Gone.
Joined: Jul 2005
Posts: 40
K
Karas Offline OP
Ameglian cow
OP Offline
Ameglian cow
K
Joined: Jul 2005
Posts: 40
thx a lot, it works wink


j0k3r @ k4s.ch

Link Copied to Clipboard