mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Dec 2002
Posts: 17
KyD Offline OP
Pikka bird
OP Offline
Pikka bird
Joined: Dec 2002
Posts: 17
Code:
/(?:([\x24](?>[a-zA-Z0-9\.\_\-]+)(?(?=[\x28](?![\x29])).))|([\x25](?>[a-zA-Z0-9\.\_\-]+))|((?>[^\x20\x2c\x28\x29]+)(?(?=[\x28](?![\x29])).))|([\x29](?(?=\.[a-zA-Z0-9\_\-]).(?>[a-zA-Z0-9\_\-]+)))|(?(?=[\x28](?![\x29]))(.)|[\x28][\x29]\.(?>[a-zA-Z0-9\_\-]+)))(?=(?:(?![\x2c\x20]).)*?(?:[\x2c\x20]|$))/g
let me explain.

this is a regular expression that can be used to 'tokenize' statements the mircscripting language.
for those of you less familiar with lexical analysis of translating a language:
you need to tokenize the program text before you can do any kind of syntyactical analysis.
the choice of tokens you use should depend on how the language is designed,
but a good guideline is to find where you can add or remove whitespace from
the code, without changing its meaning. that is, in between tokens.

using mirc's $regex to match the following line of example code:
Code:
if ($foo(bar,%var,$func).prop) { command $special().noproperty }


results in the following backreferences in $regml:
1 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
2 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
3 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
4 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
5 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
6 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
7 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
8 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
9 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
10 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
11 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
12 if ($foo(bar,%var,$func).prop) { command $special().noproperty }
13

you can see it just runs through the code and returns any of the following kinds of tokens:
$foo( - identifier (starting with $), with a non-empty parameter set (opening parenthesis at end)
$special - identifier with no or a empty set of parameters
%var - variables start with %
).prop - properties for (possible) identifiers with a non-empty parameter set, starting with ).
{ } - grouping statements
( ) - logical grouping
bar - other text, possible (key)words

what can you use this for?
this tokenizer returns uniquely identifiable tokens, you that you can derive syntax from it.
using this tokenizer one could easily make some syntax checking script, for checking if
all parentheses are matched (just by keeping count of the ( and )'s), also you can derive
information about which variables are used, which identifiers are being called, the number
of parameters they get, which $prop they're given etc. all very usefull for debugging/analyzing,
generating a parser, make syntax highlighting etc.

additional notes:
- i used hex values in the regex because they read easier and have a cleaner look than escaped ascii charcters imho
- i used atomic grouping as an effective aid against backtracking in the regex,
this may make it look redundantly long when in fact it is more efficient.
the lookarounds are used to get the additional info.
- the example has line has 12 backreferences and 13 matches (the last denoting the end of the line)
- you can see that the property 'noproperty' of the $special().noproperty identifier is discarded
because it is called with an empty parameter set (as does the mirc interpreter).
- i restricted the set of input symbols al little to avoid ambiguity, and for sake of clarity
- this regex took me about 30-40 hours to make


thought it was worth sharing, hope some will find it useful, i know i do (:

Joined: Feb 2004
Posts: 714
Z
Hoopy frood
Offline
Hoopy frood
Z
Joined: Feb 2004
Posts: 714
If I knew regex I'd be able to comment something clever and/or somehow, useful. But I don't... :P

Sounds very good on theory, though smile GJ!


"All we are saying is give peace a chance" -- John Lennon
Joined: Aug 2004
Posts: 1
S
Mostly harmless
Offline
Mostly harmless
S
Joined: Aug 2004
Posts: 1
This would be very relevant to anyone who plans to help Armada out with mIRC Syntax Coloring and EditPad Pro. I would do it myself if I wasn't so busy with school and such. frown


Link Copied to Clipboard