mIRC Home    About    Download    Register    News    Help

Print Thread
#170494 09/02/07 07:36 PM
Joined: Jan 2007
Posts: 1,156
D
DJ_Sol Offline OP
Hoopy frood
OP Offline
Hoopy frood
D
Joined: Jan 2007
Posts: 1,156
Hi! I have this html stripper a friend made for me. It's pretty cool but there are a few flaws. Wondered if anyone could help me.

What it does wrong is removes any text after a comma. For instance ...

Testing, and more testing.

This would return: Testing

Can someone either help me with this or provide a better html stripper? thanks!

Code:
hst {
  ;----Basics by an unknow author(thanks alot!!)
  var %x, %y = $regsub($replace($1,<br>,$lf),/^[^<]*>|<[^>]*>|<[^>]*$/g,,%x), %ht = html $+ $ticks, %bd = body $+ $ticks
  if ($prop == rem) { !.echo -q $regsub(%x,/&\S+?;/g,,%x) | return %x }
  if (!$regex(%x,/&\S+?;/)) return %x
  ;%x = $replace(%x,)
  .comopen %ht htmlfile
  if ($comerr) return %x
  %x = $com(%ht,write,1,bstr*,$+(<html><body>,%x,</body></html>))
  %x = $com(%ht,body,2,dispatch* %body) $com(%bd,innertext,2)
  %x = $com(%bd).result
  if ($com(%bd)) .comclose %bd
  if ($com(%ht)) .comclose %ht
  return %x
}



DJ_Sol #170496 09/02/07 07:44 PM
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
Code:
alias hst { return $regsubex($1,/<.+?>/g,) }

DJ_Sol #189213 04/11/07 11:18 PM
Joined: Feb 2004
Posts: 2,019
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2004
Posts: 2,019
The reason it cuts off the text is because you're not using it right. You can't directly send a comma to an identifier, it will treat it as multiple tokens ($1, $2 etc).

Try this:

//var %x = Testing, and more testing | echo -a $hst(%x)

Using the variable circumvents the problem with the comma in your string that you're trying to strip.

For simple html stripping (not converting, which is the main reason for the snippet), use the .rem property:

$hst(%var).rem
You can find the original code here.

My first post of 2007... better late than never wink

Last edited by FiberOPtics; 04/11/07 11:35 PM.

Gone.

Link Copied to Clipboard