mIRC Homepage
Posted By: praetorian_ SAX XML Parser - 12/11/04 11:25 PM
I know this has been brought up before, but nobody has mentioned the relatively easy to implement SAX parser.

Most of the objections to XML implementation I have read on the board are due to the complexity of data access, and I agree. The type of parser previously discussed was a DOM, or Document Object Model. As one might guess, a DOM parser is extremely difficult to implement in a language that is not object oriented (mIRC) because each node is treated as an object, as are children of that node, and children of those nodes, ect.

With a SAX parser, the file is parsed from tag to tag, firing events for opening and closing tags, attributes, data, ect. This is extremely easy to implement (relatively speaking). A SAX parser would need to move through the file from <tag> to <tag>, and anything between the tags is treated as data (whitespace characters mostly ignored). It is up to the scripter/programmer to handle the data as it is sent through the events. Here is an example of how such a parser would work:

Code:
&lt;item id="1"&gt;
	&lt;name&gt;Hat&lt;/name&gt;
	&lt;price&gt;$5&lt;/price&gt;
	&lt;description&gt;A warm hat.&lt;/description&gt;
&lt;/item&gt;


When this XML is parsed the parser would fire these events:

Code:
tagopen: item
attribute: id; 1
tagopen: item\name
data: Hat 
tagclose: item\name
tagopen: item\price
data: $5
tagclose: item\price
tagopen: item\description
data: A warm hat.
tagclose: item\description
tagclose: item


Such a parser in mIRC would not be a "total" solution (it would be difficult to modify attributes or data), but it would provide an easy way to parse all sorts of data, like RSS feeds, XHTML compliant websites, Microsoft documents (the newer formats written in XML) and so on.

I have implemented a SAX parser in mIRC scripting and it is extremely handy, however it has some quirks that cannot be overcome, like speed (2200 lines in ~10 seconds, the output being formatted and /echo'ed into the status window), data length (although my parser sends it in data events, chunked into 800 byte strings) and a strange bug involving &binvars and /signal -n.

I will be happy to provide my source code (either in mIRC scripting or pseudocode) to ease the implementation of the parser, if anyone is interested.
Posted By: bamaboy1217 Re: SAX XML Parser - 13/11/04 08:25 PM
i like the idea.. u could easily make aliases like

%tag = $mxml_opentag(parent,child,child)
%attrib = $mxml_getattrb(%tag,something)
$mxml_setattrib(%tag,something,new text)
var %newtag = mxml_addtag(%tag,tag,[data],[something to represent before or after or something if thats really nessary])

easy to make a xml parser like that just useing handles basicly in the form of varibles EXACTLY how php does sql querys and such :-p
Posted By: Artwerks Re: SAX XML Parser - 13/11/04 10:37 PM
Well, sax (a French coder) has actually coded a dll that does the XML parsing using DOM and also SAX parsers, to use with mIRC. Unfornately for you, the documentation is only available in French, but if you ask on scriptsdb, someone might want to translate... I'd do it like I did with a few addons, but I actually don't have any time right now crazy

Edit: Oups... here's the link.
Posted By: praetorian_ Re: SAX XML Parser - 14/11/04 05:43 PM
I have seen and used his SAX parser DLL, (actually, I wrote the review for it on mircscripts.org smile) and had seen that he made a DOM parser, but as you said the documentation is in French and I havent bothered trying to translate it because online translation tools typically dont produce the most accurate results.

The problem with DLLs and scripted parsers is that people wishing to parse XML in scripts and addons must distribute them along with their other files, which is why I personally would like to see a basic parser implemented in mIRC.
Posted By: Jae Re: SAX XML Parser - 24/11/04 04:53 PM
I dont see why someone doesnt make it easier and simply do something where u type:

item ."1" \name : NEW_DATA

where the . shows the next bit is an attribute \ denoting a tag and : showing the rest is data or something to the effect ofabove..

making simple line data manipulation possible.
similar to how CSS can be set on an element in a webpage using a compination of + . > etc to refer to an element in the tree by using a form of logical string

Anyone out there understand css and understand what i am trying to say. please help expand on what im trying to say to maybe make more sence to those who are unfamiliar with it.


Cheers smile
© mIRC Discussion Forums