mIRC Home    About    Download    Register    News    Help

Print Thread
parsing #170686 12/02/07 09:51 PM
Joined: Oct 2005
Posts: 827
P
pouncer Offline OP
Hoopy frood
OP Offline
Hoopy frood
P
Joined: Oct 2005
Posts: 827
This is the html stuff in the page i want to grab info from:

td class=StyleTahoma><a href='chatroom.php?rm=helpdesk'>helpdesk</a><br><a href='chatroom.php?rm=irchat'>ircchat</a><br><a href='chatroom.php?rm=lebs'>lebs</a><br><a href='chatroom.php?rm=domcat'>domcat</a><br></td></tr></table>

it tells me im in 4 rooms, helpdesk,irchat,lebs,domcat

could someone show me how i can extract this info from the string?
Thanks.

Re: parsing [Re: pouncer] #170689 12/02/07 10:01 PM
Joined: Apr 2006
Posts: 400
K
Kurdish_Assass1n Offline
Fjord artisan
Offline
Fjord artisan
K
Joined: Apr 2006
Posts: 400
/help $remove I would do it for you, but, I don't know if it's always like that code above, meaning the exact characters. I'd bet regex would be able to do, but, I don't know what it is or how to use it.

Last edited by Kurdish_Assass1n; 12/02/07 10:01 PM.

-Kurdish_Assass1n
Re: parsing [Re: Kurdish_Assass1n] #170691 12/02/07 11:02 PM
Joined: Oct 2005
Posts: 827
P
pouncer Offline OP
Hoopy frood
OP Offline
Hoopy frood
P
Joined: Oct 2005
Posts: 827
yea i thought regex too, but im not sure myself.

hope there is someone else out there who can help

Re: parsing [Re: pouncer] #170693 12/02/07 11:19 PM
Joined: Oct 2004
Posts: 8,327
Riamus2 Offline
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,327
You can use the $htmlfree alias. There are a few different ones out there, but this is the first one I found with searching the forum:

Code:
alias -l htmlfree {
  var %x, %i = $regsub($1-,/(^[^<]*>|<[^>]*>|<[^>]*$)/g,$chr(32),%x), %x = $remove(%x,&nbsp;)
  return %x
}


Then, you use $htmlfree(%variable) and it will remove all HTML coding. Note that the $htmlfree alias works from within the script it's included with and not anywhere else. If you want it to work from the command line or in other scripts, then remove the -l from the first line.

That's a slightly modified version that will insert a space between each word. Normally $chr(32) is $null for this sort of alias.


Invision Support
#Invision on irc.irchighway.net
Re: parsing [Re: pouncer] #170769 14/02/07 05:37 AM
Joined: Mar 2006
Posts: 8
A
astigmatik Offline
Nutrimatic drinks dispenser
Offline
Nutrimatic drinks dispenser
A
Joined: Mar 2006
Posts: 8
DISCLAIMER: I have no mirc here, and I'm not a regex expert, but this might just work:

Code:
alias extract_channels {
  ; The only reason I'm separating the expression is to make it clearer
  var %r = <.*?>(.*?)<\/.*?>
  noop $regex($1-, %r)
  var %x = 1
  while ($regml(%x) != $null) {
    %result = $addtok(%result, $v1, 44)
    inc %x
  }
  return %result
}

Re: parsing [Re: pouncer] #170774 14/02/07 10:17 AM
Joined: Oct 2006
Posts: 166
B
b1ink Offline
Vogon poet
Offline
Vogon poet
B
Joined: Oct 2006
Posts: 166
You can use the htmlsrtip thingy and a parser. something like this:
Code:
alias htmlchans {
  var %a = $1
  noop $regsub(%a,/(?:^[^<]*?>|<[^>]*?>|<[^>]*$)/g,.,%a)
  noop $regsub(%a,/(?:^\.+|(?<=\.)\.+|\.+$)/g,$null,%a)
  return $iif($2,$gettok(%a,$2,46),$numtok(%a,46) - %a)
}

$htmlchan(string[,N])
N refers to the Nth channel. (it's optional)


Kind Regards, blink