Originally Posted By: qwerty

The key idea you're missing here is backtracking; when a particular way of matching each * with a substring fails, a different combination is tried, and the number of such combinations is exponential to the number of *'s.

Not at all, the problem mostly occurs when the pattern does not match the string in any way, so that the routine has to try all possible "assignments" of *'s.

You should probably Google these issues, as I suspect there's not much I could say to convince you. The problem is central to regular expressions implementations too (pcre.txt, section "ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS" describes a similar case). An analogous situation in regex is this:

//echo -ag $regex($str(a,39),/^.*.*.*.*.*.*\d$/)

\d matches a digit (0 to 9), so this pattern will never match a string of a's. Even a single call to the above takes a noticeably long time to complete. Removing a ".*" makes it much faster; adding a ".*" makes it return -8, which is the error code meaning "too many attempts to match". Regular expressions in mirc are supported through a 3rd-party library, PCRE (my point being, different author). PCRE contains an optimisation, ie if you change \d to a literal, say b, $regex fails very quickly. This is the sort of 'early termination' checks I mentioned in my previous post. Khaled could do something like that in his wildcard matching routine, but of course that would only cover a subset of problem cases.

I'm unsure why you now talk about regular expressions in a case where only simple wildcards symbols are used, actually only 1, of the 3 existing,a "*".
The only symbols that the parser gives information how to handle it are those wildcards, so how does that 'backtracking' apply then? To use your words: why does it "try a different combination"? And what does that "combination" combine?
At any stage in the method I described, there is a pointer in the text to match (in my case what is right of the iswm operator), everything before the pointer is 'matched', so why would it need to re-use text that is before that pointer position? Let alone so many times that mIRC needs 5 sec to know a 44 characters wide wildcard text does not match a literal text?

Last edited by RRX; 28/05/07 10:06 AM.