Register Log In

Forums Scripts & Popups Parsing HTML the proper way - existing solutions?

Print Thread

Re: Parsing HTML the proper way - existing solutions? #236303 17/02/12 01:34 AM
Joined: Nov 2009 Posts: 295 P pball Fjord artisan
pball Fjord artisan P Joined: Nov 2009 Posts: 295	I've made my fair share of website parsing scripts so I'll throw down some tips and tricks I use. Though this might not be exactly what you're looking for. The main way I parse html is using regex. This method and how mirc handles getting the website's code has it's limitations though. It works great for websites that have "clean" code where the beginning and end tags are on the same line as the info you want. (Since mirc gets the code line by line) EX: Code: if ($regex(%fml.result,/<p><a href=".?" class="fmllink">(.?)</a></p>/)) { just strip html code and viola you have a funny FML story Second method I've used (though really dislike) is to find a tag using an isin check to find a tag. This method is for sites that are "messy" and have the beginning and end tags on separate lines. After the tag is found a new line is read and a loop is started and stores the info into a variable until the end tag is found. Using a loop can be dangerous and has screwed up for me many times. EX: Code: if (<tr><td>Temperature</td> isin %w.read) { set %w.temp $null sockread %w.read while (</td> !isin %w.read) { set %w.temp %w.temp %w.read \| sockread %w.read } } And my favorite way to parse html inside mirc is to not use mirc to parse the html. I've been making C# programs that you just /run google.exe search%20terms%20here and mirc leaves open a local socket which the C# program sends back the already parsed info to. This is great for many reasons. C# is faster than mirc, mirc isn't frozen while it waits for the website/C# to respond, in C# you can get the whole page in a single var and use more complex regex. I use this method for anything that doesn't have simple clean html code, like google or wunderground. Not really sure if this is the kinda info you're wanting to discuss but it's what I know. I also want to say I've never used or come across a script that writes to a file and searches it.

Entire Thread
Subject	Posted By	Posted
Parsing HTML the proper way - existing solutions?	Anonymous	16/02/12 11:40 PM
Re: Parsing HTML the proper way - existing solutions?	pball	17/02/12 01:34 AM
Re: Parsing HTML the proper way - existing solutions?	Anonymous	17/02/12 04:24 AM
Re: Parsing HTML the proper way - existing solutions?	argv0	17/02/12 03:27 AM
Re: Parsing HTML the proper way - existing solutions?	Anonymous	17/02/12 04:43 AM
Re: Parsing HTML the proper way - existing solutions?	argv0	17/02/12 05:20 AM
Re: Parsing HTML the proper way - existing solutions?	jaytea	17/02/12 12:03 PM
Re: Parsing HTML the proper way - existing solutions?	Anonymous	18/02/12 09:40 PM
Re: Parsing HTML the proper way - existing solutions?	Anonymous	20/02/12 10:35 PM
Re: Parsing HTML the proper way - existing solutions?	Anonymous	20/02/12 10:51 PM
Re: Parsing HTML the proper way - existing solutions?	jaytea	22/02/12 09:16 AM
Re: Parsing HTML the proper way - existing solutions?	Anonymous	23/02/12 01:40 PM

Link Copied to Clipboard