This sort of problem always makes me think of this cartoon:
Even if we force recursion [a] is all we'll get.
alias htmlfree {
var %x, %i = $regsub($1-,/(^[^<]*>|<[^>]*>|<[^>]*$)/g,$null,%x), %x = $replace(%x, ,$chr(32))
if ($regex(%x,/<|>/)) {
while (%x != $htmlfree.recurse(%x)) {
echo -a %x
%x = $v2
}
}
return %x
}
alias htmlfree.recurse return $htmlfree($1);
//echo -a $htmlfree(<p> <p /> <sup id="en-NIV-26127" class="vnum" value='16'>16</sup>Here is normal text.<sup class='footnote' value='[<a href="#fen-NIV-26127a" title="See footnote a">a</a>]'>[<a href="#fen-NIV-26127a" title="See footnote a">a</a>]</sup>)I opted for the lazy recursion rather then cracking at PCRE's (?R)
first iteration =>
16Here is normal text.a]'>[a]second iteration =>
[a]In these case cracking out your own parser is needed and happens elsewhere too in mIRC this would be a huge performance hit though it might be better to delage the work
alias nohtml {
if (!$1) return
.comopen h htmlfile
.comclose h $com(h,write,1,bstr,$1) $com(h,body,3,dispatch* b) $com(b,innertext,3)
var %x = $com(b).result
.comclose b
return %x
}
//echo -a $htmlfree(<p> <p /> <sup id="en-NIV-26127" class="vnum" value='16'>16</sup>Here is normal text.<sup class='footnote' value='[<a href="#fen-NIV-26127a" title="See footnote a">a</a>]'>[<a href="#fen-NIV-26127a" title="See footnote a">a</a>]</sup>)returns
16Here is normal text.[a]It connects with MSHTML.HTMLDocument which should be available since windows 95 and up.
EDIT: Oh and someone name and shame the guy for nesting HTML tags within attribute values!