mIRC Home    About    Download    Register    News    Help

Print Thread
regex to parse... multiline ? :I #191335 06/12/07 09:01 PM
Joined: Dec 2007
Posts: 2
J
j0k3rr Offline OP
Bowl of petunias
OP Offline
Bowl of petunias
J
Joined: Dec 2007
Posts: 2
Hi guys, i have to parse something like this, from a web page:

Quote:
<tr><th> <a href="LINK1">blablabla</a>
<a href="LINK2">blablabla</a>
<a href="LINK3">blablabla</a></th>
<th>SOMETHING1</th>
<th>SOMETHING2</th>
<th>SOMETHING3</th>
<th>SOMETHING4</th>
<th>SOMETHING5</th>
</tr>

<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>

yes, it's a simple table.. problem is, i can't have all in one line because it's too long.. so.. ideas ? smile
i'm only intrested in the bold text.. the structure is always the same.. but on the page there are other things too.. so.. first of all i have to pick up only the <tr>..</tr> then the text.. waah.

it's possible to use regex in this case ? other methods ?

ty smile

Re: regex to parse... multiline ? :I [Re: j0k3rr] #191362 07/12/07 05:23 PM
Joined: May 2007
Posts: 89
T
TropNul Offline
Babel fish
Offline
Babel fish
T
Joined: May 2007
Posts: 89
Personally, I've never been able to make a multiline regexp match something in mIRC though the appropriate modifier exists in the PCRE library. To be more precise, the /m modifier considers ^ and $ as normal \n characters. The $ is not considered as the end of a line but just as a \n character. Same for the ^. That said, it should eventually match all the lines given to the regular expression.

The problem is that mIRC has no method of permitting a 'multiline' read. Maybe that can be done with a binary variable, I've never used them. I shall start in not much time :s if I continue in the way of finding a possible solution.

Nevertheless, you could still use regexps to parse the web page's source code 'line by line'. Just show us exactly the piece of source that you wish to parse. There may be some other solutions (maybe binary variables wink ) rather than using multiline-activated regexps.

Cordialement


tropnul
Re: regex to parse... multiline ? :I [Re: TropNul] #191365 07/12/07 05:44 PM
Joined: Jul 2006
Posts: 3,616
W
Wims Offline
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,616
Binary variable *work* like the global/local variable...
You can add all line you want into it but i think there an issues here, when you'll try to use $regex or equivalent with you variable, i think mirc will give an error because the string could be too long.Parsed this line by line is the best solution imo


Looking for a good help channel about mIRC? Check #mircscripting @ irc.swiftirc.net
Re: regex to parse... multiline ? :I [Re: j0k3rr] #191367 07/12/07 06:15 PM
Joined: Dec 2007
Posts: 2
J
j0k3rr Offline OP
Bowl of petunias
OP Offline
Bowl of petunias
J
Joined: Dec 2007
Posts: 2
yeah, thx, i did so wink
solved

Re: regex to parse... multiline ? :I [Re: j0k3rr] #191369 07/12/07 06:42 PM
Joined: Apr 2004
Posts: 759
M
Mpdreamz Offline
Hoopy frood
Offline
Hoopy frood
M
Joined: Apr 2004
Posts: 759
You can use the /m modifier which is actually needed if theres a $lf in the line

Code:
//var %t = hello you $lf there | echo -a %t | echo -a $regex(%t,/^(.*?)$/) $regml(1) -- $regex(%t,/^(.*?)$/mg) $regml(1) & $regml(2) 


$maybe
Re: regex to parse... multiline ? :I [Re: Mpdreamz] #191654 13/12/07 03:06 PM
Joined: May 2007
Posts: 89
T
TropNul Offline
Babel fish
Offline
Babel fish
T
Joined: May 2007
Posts: 89
Concerning multiline matching...

I've tried this function.

Code:
Alias MultilineRegexp {
  Var %rx.parser = /^(<tr>.*?(<a.href="([^"]+)">[^<]+<\/a>|<th>([^<]+)<\/th>).*?<\/tr>)$/mg
  Var %rx.inf = inrx.txt , %rx.outf = outrx.txt
  Filter -ffg %rx.inf %rx.outf %rx.parser
}


Where inrx.txt contains:

<tr><th> <a href="LINK1">blablabla</a>
<a href="LINK2">blablabla</a>
<a href="LINK3">blablabla</a></th>
<th>SOMETHING1</th>
<th>SOMETHING2</th>
<th>SOMETHING3</th>
<th>SOMETHING4</th>
<th>SOMETHING5</th>
</tr>
<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>

But still, nothing happens. Am I doing something wrong ?


tropnul
Re: regex to parse... multiline ? :I [Re: TropNul] #191670 13/12/07 07:48 PM
Joined: Apr 2004
Posts: 759
M
Mpdreamz Offline
Hoopy frood
Offline
Hoopy frood
M
Joined: Apr 2004
Posts: 759
/Filter still reads per line ($lf), The only time /mg gets interesting is if you have a buffer with $lf and want to match per line on that buffer.

Code:
alias filt { 
  .fopen -no t t.txt
  .fwrite t hello $+ $lf $+ you there
  .fclose t
  window @t
  filter -fw t.txt @t *
  bread t.txt 0 30 &t 
  var -s %x = $bvar(&t,1-).text
  noop $regex(%x,/^(.*?)$/mg)
  echo -a Multiline matching matched $regml(0) lines: $regml(1) & $regml(2)
  noop $regex(%x,/^(.*?)$/)
  echo -a Singleline matching matched $regml(0) lines:  $regml(1)
  .remove t.txt 
}


If you dont use /mg on this buffer ^$ will match the start and end of the entire buffer instead of all the lines within that buffer. However mIRC doesnt like ^$ without /mg if there's an $lf in the string.


$maybe
Re: regex to parse... multiline ? :I [Re: Mpdreamz] #191698 14/12/07 02:11 AM
Joined: May 2007
Posts: 89
T
TropNul Offline
Babel fish
Offline
Babel fish
T
Joined: May 2007
Posts: 89
Thank you very much for this clarification.


tropnul