Is there a particular reason why my $nohtml falls short for you?
I suggested it as a supplement rather then a complete replacement. You could still do an $regex check to determine wheter the sockread data is something you'd want to $nohtml.
The cartoon jokes about situations almost identical to this one where parsing nested parenthesis is not something you'd do with a fine state automation (regex) or as a famous saying goes "I know, I’ll use regular expressions. Now I have two problems". The saying is applied too frequently though but in this situation it surely holds up.
Your best bet at parsing this will be with the IHTMLDocument3 COM interface (also windows 95 and up) which already deals with all kinds of invalid data. I could whip something up that queries all the spans or divs with the classname 'footer' and call innerTEXT on them and call a callback alias you supply with the innerTEXT as $1-.
I'd like to hear your reasons for dismissing a COM based approach first though before i'll spent my time on it.
This guarantees that the text you recieve is as you see it in the browser, no regular expression will get you anywhere near the same coverage.
You could as genius_at_work suggested of course make it work in this specific case with a decent ammount of work but it would still be a very quirky parser which is too easily broken.
Even if speed is of importance the COM approach should'nt be any slower if not faster.