mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Dec 2007
Posts: 2
J
j0k3rr Offline OP
Bowl of petunias
OP Offline
Bowl of petunias
J
Joined: Dec 2007
Posts: 2
Hi guys, i have to parse something like this, from a web page:

Quote:
<tr><th> <a href="LINK1">blablabla</a>
<a href="LINK2">blablabla</a>
<a href="LINK3">blablabla</a></th>
<th>SOMETHING1</th>
<th>SOMETHING2</th>
<th>SOMETHING3</th>
<th>SOMETHING4</th>
<th>SOMETHING5</th>
</tr>

<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>

yes, it's a simple table.. problem is, i can't have all in one line because it's too long.. so.. ideas ? smile
i'm only intrested in the bold text.. the structure is always the same.. but on the page there are other things too.. so.. first of all i have to pick up only the <tr>..</tr> then the text.. waah.

it's possible to use regex in this case ? other methods ?

ty smile

Joined: May 2007
Posts: 89
T
Babel fish
Offline
Babel fish
T
Joined: May 2007
Posts: 89
Personally, I've never been able to make a multiline regexp match something in mIRC though the appropriate modifier exists in the PCRE library. To be more precise, the /m modifier considers ^ and $ as normal \n characters. The $ is not considered as the end of a line but just as a \n character. Same for the ^. That said, it should eventually match all the lines given to the regular expression.

The problem is that mIRC has no method of permitting a 'multiline' read. Maybe that can be done with a binary variable, I've never used them. I shall start in not much time :s if I continue in the way of finding a possible solution.

Nevertheless, you could still use regexps to parse the web page's source code 'line by line'. Just show us exactly the piece of source that you wish to parse. There may be some other solutions (maybe binary variables wink ) rather than using multiline-activated regexps.

Cordialement


tropnul
Joined: Jul 2006
Posts: 4,149
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,149
Binary variable *work* like the global/local variable...
You can add all line you want into it but i think there an issues here, when you'll try to use $regex or equivalent with you variable, i think mirc will give an error because the string could be too long.Parsed this line by line is the best solution imo


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Dec 2007
Posts: 2
J
j0k3rr Offline OP
Bowl of petunias
OP Offline
Bowl of petunias
J
Joined: Dec 2007
Posts: 2
yeah, thx, i did so wink
solved

Joined: Apr 2004
Posts: 759
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Apr 2004
Posts: 759
You can use the /m modifier which is actually needed if theres a $lf in the line

Code:
//var %t = hello you $lf there | echo -a %t | echo -a $regex(%t,/^(.*?)$/) $regml(1) -- $regex(%t,/^(.*?)$/mg) $regml(1) & $regml(2) 


$maybe
Joined: May 2007
Posts: 89
T
Babel fish
Offline
Babel fish
T
Joined: May 2007
Posts: 89
Concerning multiline matching...

I've tried this function.

Code:
Alias MultilineRegexp {
  Var %rx.parser = /^(<tr>.*?(<a.href="([^"]+)">[^<]+<\/a>|<th>([^<]+)<\/th>).*?<\/tr>)$/mg
  Var %rx.inf = inrx.txt , %rx.outf = outrx.txt
  Filter -ffg %rx.inf %rx.outf %rx.parser
}


Where inrx.txt contains:

<tr><th> <a href="LINK1">blablabla</a>
<a href="LINK2">blablabla</a>
<a href="LINK3">blablabla</a></th>
<th>SOMETHING1</th>
<th>SOMETHING2</th>
<th>SOMETHING3</th>
<th>SOMETHING4</th>
<th>SOMETHING5</th>
</tr>
<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>

But still, nothing happens. Am I doing something wrong ?


tropnul
Joined: Apr 2004
Posts: 759
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Apr 2004
Posts: 759
/Filter still reads per line ($lf), The only time /mg gets interesting is if you have a buffer with $lf and want to match per line on that buffer.

Code:
alias filt { 
  .fopen -no t t.txt
  .fwrite t hello $+ $lf $+ you there
  .fclose t
  window @t
  filter -fw t.txt @t *
  bread t.txt 0 30 &t 
  var -s %x = $bvar(&t,1-).text
  noop $regex(%x,/^(.*?)$/mg)
  echo -a Multiline matching matched $regml(0) lines: $regml(1) & $regml(2)
  noop $regex(%x,/^(.*?)$/)
  echo -a Singleline matching matched $regml(0) lines:  $regml(1)
  .remove t.txt 
}


If you dont use /mg on this buffer ^$ will match the start and end of the entire buffer instead of all the lines within that buffer. However mIRC doesnt like ^$ without /mg if there's an $lf in the string.


$maybe
Joined: May 2007
Posts: 89
T
Babel fish
Offline
Babel fish
T
Joined: May 2007
Posts: 89
Thank you very much for this clarification.


tropnul

Link Copied to Clipboard