mIRC Homepage
Posted By: j0k3rr regex to parse... multiline ? :I - 06/12/07 09:01 PM
Hi guys, i have to parse something like this, from a web page:

Quote:
<tr><th> <a href="LINK1">blablabla</a>
<a href="LINK2">blablabla</a>
<a href="LINK3">blablabla</a></th>
<th>SOMETHING1</th>
<th>SOMETHING2</th>
<th>SOMETHING3</th>
<th>SOMETHING4</th>
<th>SOMETHING5</th>
</tr>

<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>

yes, it's a simple table.. problem is, i can't have all in one line because it's too long.. so.. ideas ? smile
i'm only intrested in the bold text.. the structure is always the same.. but on the page there are other things too.. so.. first of all i have to pick up only the <tr>..</tr> then the text.. waah.

it's possible to use regex in this case ? other methods ?

ty smile
Posted By: TropNul Re: regex to parse... multiline ? :I - 07/12/07 05:23 PM
Personally, I've never been able to make a multiline regexp match something in mIRC though the appropriate modifier exists in the PCRE library. To be more precise, the /m modifier considers ^ and $ as normal \n characters. The $ is not considered as the end of a line but just as a \n character. Same for the ^. That said, it should eventually match all the lines given to the regular expression.

The problem is that mIRC has no method of permitting a 'multiline' read. Maybe that can be done with a binary variable, I've never used them. I shall start in not much time :s if I continue in the way of finding a possible solution.

Nevertheless, you could still use regexps to parse the web page's source code 'line by line'. Just show us exactly the piece of source that you wish to parse. There may be some other solutions (maybe binary variables wink ) rather than using multiline-activated regexps.

Cordialement
Posted By: Wims Re: regex to parse... multiline ? :I - 07/12/07 05:44 PM
Binary variable *work* like the global/local variable...
You can add all line you want into it but i think there an issues here, when you'll try to use $regex or equivalent with you variable, i think mirc will give an error because the string could be too long.Parsed this line by line is the best solution imo
Posted By: j0k3rr Re: regex to parse... multiline ? :I - 07/12/07 06:15 PM
yeah, thx, i did so wink
solved
Posted By: Mpdreamz Re: regex to parse... multiline ? :I - 07/12/07 06:42 PM
You can use the /m modifier which is actually needed if theres a $lf in the line

Code:
//var %t = hello you $lf there | echo -a %t | echo -a $regex(%t,/^(.*?)$/) $regml(1) -- $regex(%t,/^(.*?)$/mg) $regml(1) & $regml(2) 
Posted By: TropNul Re: regex to parse... multiline ? :I - 13/12/07 03:06 PM
Concerning multiline matching...

I've tried this function.

Code:
Alias MultilineRegexp {
  Var %rx.parser = /^(<tr>.*?(<a.href="([^"]+)">[^<]+<\/a>|<th>([^<]+)<\/th>).*?<\/tr>)$/mg
  Var %rx.inf = inrx.txt , %rx.outf = outrx.txt
  Filter -ffg %rx.inf %rx.outf %rx.parser
}


Where inrx.txt contains:

<tr><th> <a href="LINK1">blablabla</a>
<a href="LINK2">blablabla</a>
<a href="LINK3">blablabla</a></th>
<th>SOMETHING1</th>
<th>SOMETHING2</th>
<th>SOMETHING3</th>
<th>SOMETHING4</th>
<th>SOMETHING5</th>
</tr>
<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>
<tr> same thing blahblah </tr>

But still, nothing happens. Am I doing something wrong ?
Posted By: Mpdreamz Re: regex to parse... multiline ? :I - 13/12/07 07:48 PM
/Filter still reads per line ($lf), The only time /mg gets interesting is if you have a buffer with $lf and want to match per line on that buffer.

Code:
alias filt { 
  .fopen -no t t.txt
  .fwrite t hello $+ $lf $+ you there
  .fclose t
  window @t
  filter -fw t.txt @t *
  bread t.txt 0 30 &t 
  var -s %x = $bvar(&t,1-).text
  noop $regex(%x,/^(.*?)$/mg)
  echo -a Multiline matching matched $regml(0) lines: $regml(1) & $regml(2)
  noop $regex(%x,/^(.*?)$/)
  echo -a Singleline matching matched $regml(0) lines:  $regml(1)
  .remove t.txt 
}


If you dont use /mg on this buffer ^$ will match the start and end of the entire buffer instead of all the lines within that buffer. However mIRC doesnt like ^$ without /mg if there's an $lf in the string.
Posted By: TropNul Re: regex to parse... multiline ? :I - 14/12/07 02:11 AM
Thank you very much for this clarification.
© mIRC Discussion Forums