mIRC Home    About    Download    Register    News    Help

Print Thread
Page 1 of 2 1 2
Joined: Jun 2003
Posts: 19
S
Pikka bird
OP Offline
Pikka bird
S
Joined: Jun 2003
Posts: 19
HI

Here is what I have: (profanity10 is an Alias name)
on *:text:*:#:{
if ( $istok($strip($1-),penis,32) { /profanity10 }

I like to catch it even if the word penis is entered with any punctuation or special charecter.
ie: p*nis, P*n/i*s etc.

Please help
One solution in my mind is to remove any charecter from $1- before evaluating it but how do I do it....?Plz Help

How come I can't find any good book on mIRC Scripting? mIRC help file is not good at all for new commer like me.

Also how do you go through all the items in an array (hash tbl) how do you know the lower/upper bounds of an array in mIRC?

Thanks

Joined: Nov 2003
Posts: 2,327
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Nov 2003
Posts: 2,327
if *p*e*n*i*s* iswm $strip($1-) { }


New username: hixxy
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
For the first question the answer is a regular expression:
Code:
if ([color:red]$regex($1-,/(?:[^\w\s]|_)+p(?:[^\w\s]|_)+e(?:[^\w\s]|_)+n(?:[^\w\s]|_)+i(?:[^\w\s]|_)+s(?:[^\w\s]|_)+/S)[/color]) { profanity10 }

Note: There's no need to use $strip() around $1- in this case, because it's done by the regular expression when the /S modifier is used.

As for hash tables, to use them efficiently you should use them by their item names instead of by their indexes. However, sometimes using the indexes is necessary, in which case the lowest index will always be 1 and the highest can be found with $hget(hash_table_name, 0)

Last edited by starbucks_mafia; 11/06/04 11:54 PM.

Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Using that would get a lot of false positives:
Saying 'Presenting the world famous Flying Arnie' or anything else with the characters p, e, n, i, and s in that order would trigger it.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Nov 2003
Posts: 2,327
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Nov 2003
Posts: 2,327
Yeah, i read it wrong, i presumed he meant p e n i s could be anywhere in the word, i guess he meant p!",\/e,*nis or similar


New username: hixxy
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
For this purpose, the relatively new subroutine feature of PCRE may come in handy:
Code:
/((?:[^\w\s]|_)+)p(?1)e(?1)n(?1)i(?1)s(?1)/Si


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Jun 2003
Posts: 19
S
Pikka bird
OP Offline
Pikka bird
S
Joined: Jun 2003
Posts: 19
HI

I thank you very much for your quick and wise response. I am sorry if I was not able to make it clear. I guess what I need is to catch any punctuation or special charecters between the word penis....
ie: p*e^n$i@s/ etc. If I can only catch those and leave letters a-z alone, it would serve the purpose.

And also it would greatly help and I would be very thankful if you can explain $regex line and all the charecters used and their purpose..... I thank you very much. Thanks

Joined: Feb 2004
Posts: 714
Z
Hoopy frood
Offline
Hoopy frood
Z
Joined: Feb 2004
Posts: 714
If you want it to detect when one single character is between the words, you can also use this:
Code:
if (* p?e?n?i?s * iswm $strip($1-) { commads }
This way it will only detect the word when a character is between the letters, like p.e.n.i.s or p-e-n-i-s and so on..

Hope this helps smile
Zyzzy.

Last edited by Zyzzyx26; 12/06/04 12:14 PM.

"All we are saying is give peace a chance" -- John Lennon
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Thanks, I had a feeling that it was possible but I couldn't remember how.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Apr 2003
Posts: 701
K
Hoopy frood
Offline
Hoopy frood
K
Joined: Apr 2003
Posts: 701
Thanks for telling (or reminding wink ) us that subroutine thingy smile
Just one suggestion for the regex:
/((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si
This matches p--°en__.;is and things like that too

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
OK here's the updated version (using qwerty's adaptation plus another change so it matches properly):
Code:
if ($regex($1-, /((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si)) { profanity10 }


Now to try and explain the regular expression /((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si...
First off I'll point out that my 'explanation' almost certainly won't explain anything unless you already know regular expressions at least a little bit, and even if you do know regular expressions it probably still won't explain anything. Regular expressions are a very powerful tool, unfortunately they're incredibly hard for people to understand and trying to explain a fullblown expression to someone who doesn't already know regular expressions in general is not a good idea. You're better off using google to find a regular expression tutorial and then coming back and reading my explanation when you're comfortable with them. Anyway, here it is:

  • [color:blue]/((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
    The /'s there mark the beginning and end of the actual expression.
  • /((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/[color:blue]Si[/color]
    Since these are outside of the /'s they are treated as modifiers. Modifiers are like switches - they change the behaviour of how the entire expression behaves. The S modifier means that control codes are stripped from the text before it's compared (which is why we don't need to use $strip()). The i modifier means that the match is case-insensitive - this means that it will match 'penis', 'pENIS', 'p*&En>i*S' or any other variation on letter-case.
  • /[color:blue]((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
    The parentheses (( )) create a subpattern, which is used to group the expression and also means that anything matched with the expression within is captured and can be retrieved and used later.
  • /([color:blue](?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
    These inner parentheses are again used to create a subpattern to group the expression within them, however the ?: after the opening parenthesis means that what's inside is not captured, this makes the expression more efficient since we don't need to retrieve what that expression matches.
  • /((?:[color:blue][[color:red]^\w\s][/color]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
    The brackets ([ ]) are used to create a character set, it basically means that it will match if any of the characters it contains appears at that position in the text. However the ^ means that the character is negated, this means that the character set will match any characters that it doesn't contain. \w and \s are metacharacters, they both represent groups of characters (kind of like special built-in character sets). \w represents the characters a to z, A to Z, 0 to 9, and underscore (_). \s represents all whitespace characters such as regular space, tab, and so on. So in total [[color:red]^\w\s][/color] means 'match a character which is not alphanumeric, an underscore, or whitespace'.
  • /((?:[^\w\s][color:blue]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
    The | basically means 'or'. That is, match either the expression to the left or the expression to the right. So in the wider context of this particular expression it means 'match [color:green][^\w\s] (which in turn means 'a character which is not alphanumeric, an underscore, or whitespace') or match _ (which is a literal underscore)[/color]. To put that into a single sentence, it means 'match a character which is not alphanumeric or whitespace'.
  • /((?:[^\w\s]|_)[color:blue]*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
    The * is a repetition quantifier. It means 'match the expression preceding it zero or more times' (any number of times). The expression directly preceding it is (?:[^\w\s]|_) (this is why the parentheses were used to group that expression), so to combine these two meanings we get 'match any number of characters which are not alphanumeric or whitespace'.
  • /((?:[^\w\s]|_)*)[color:blue]p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
    Each of those characters are taken literally, and you can replace them with any alphanumeric characters you want to match any word you choose.
  • /((?:[^\w\s]|_)*)p[color:blue](?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
    Each of those (?1)'s simply means 'apply subpattern number 1 (the first subpattern defined) here' - the first subpattern being ((?:[^\w\s]|_)*). Basically this means that the expression behaves as if ((?:[^\w\s]|_)*) was used in each of those places.


You probably didn't learn anything from that, but it took me an age to write out so just look at the pretty colours anyway.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
This is one of the few times where I'm tempted to say 'LOL'. Thanks for the suggestion, as you can see from my latest (very very long) post I've used the * repetition quantifier in the latest expression. Believe it or not I actually did that before I saw your post, it's just taken me almost an hour and a quarter to write out the entire regex explanation shocked.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Feb 2004
Posts: 2,019
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2004
Posts: 2,019
Quote:
You probably didn't learn anything from that, but it took me an age to write out so just look at the pretty colours anyway.


LOL.

Well even if he didn't understand much, it's still useful for those interested in learning about Regular Expressions. And I think it's a good habit to explain the code written for other users, I usually try to aswell.

And yes, the colors are also pretty to watch :tongue:


Greets



Gone.
Joined: Nov 2003
Posts: 2,327
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Nov 2003
Posts: 2,327
Couldn't [^\w\s] be: \W\S


New username: hixxy
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
No because that would mean 'non-alphanumeric character followed by a non-whitespace character'. It would have to be written as (?:\W|\S).

If you mean could it be written as [\W\S] then that would match every character. If you imagine that \W is replaced with all characters that aren't alphanumeric (which means all whitespace and punctuation characters) and then replace \S with all non-whitespace characters (which would include all punctuation and alphanumeric characters) then you're left with all characters inside the character set (including punctuation characters twice).

Is it me or did I just completely overuse the word 'characters'?


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Apr 2003
Posts: 701
K
Hoopy frood
Offline
Hoopy frood
K
Joined: Apr 2003
Posts: 701
/((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si

This part means everything but letters, digits and whitespace.
Here's some other ways to do it :
[^[:alnum:]\s]
[^a-z0-9\s]
Notice I didn't include A-Z since the i modifier is enabled anyways
[^[:alpha:]\d\s]
[^[:alpha:][:digit:][:space:]]
Well, [:space:] and \s are not identical, but it's close enough for this smile

I removed the (?: |_) part, character classes are normally faster than the | or...

For more info about these: search internet for pcre.txt (that's the regex library used in mIRC). It contains a lot of unneeded info, but all the useful stuff is in there too smile
for the [:blah:] stuff search the file for POSIX CHARECTER CLASSES


PS: starbucks_mafia, allow me to say yipes@1h15m shocked grin

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Very good points smile Character classes are indeed faster than alternating subpatterns, as long as the two patterns are comparable in length (remember that mirc has to parse the string before passing it to PCRE). I guess the fastest alternative would be ([^a-z\d\s]*)


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
A little late with my reply here but I thought I might aswell say this.

The reason I used \w instead of a-z0-9 is that I'm more used to writing regexes for languages where there are many possibilities for the character set in use. Typically \w will respond according to the character set and use the appropriate characters that are defined as alphanumeric, whereas a-z0-9 is always just those 36 characters. Of course this doesn't make much difference in mIRC at this point, it was just a force of habit, however I guess it doesn't hurt to have some future-proofing put in for when Unicode is supported by mIRC.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Jun 2003
Posts: 19
S
Pikka bird
OP Offline
Pikka bird
S
Joined: Jun 2003
Posts: 19
Hi
Thank you all very much for your help especially starbucks_mafia for spending so much time in explaining $regex. I used it but it doesn't work for my purposes.
Your code:

if ($regex($1-, /((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si)) { profanity10 }

will catch even someone who spelled SPANISH as SPENISH wich is not acceptable. I came up with a solution with my limited knowledge of mIRC. May be it will help someone lese like me.

%ln = $1-
if $istok($strip($remove(%ln,*,.,%,/,\,+,-,_,@,!,$,^,~,`,)),penis,32) { profanity10 }

The above line works fine for my purposes. I have no idea why $remove will not take $1- directly so I had to assign it to %ln. I had quite a few bad words to manage and it was really hard and time consuming to type all of them so I created a small .EXE to generate the code.. If someone needs the .exe for code generation, please let me know

sahmed01

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Ahh yes, good point. This should work correctly now:
Code:
if ($regex($1-, /[color:red](?:^|\s)[/color]((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)[color:red](?:\s|$)[/color]/Si)) { profanity10 }

The two bits in red have been added. They simply check that there is a whitespace character or the beginning/end of the string at the beginning/end of the word respectively.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Jun 2003
Posts: 19
S
Pikka bird
OP Offline
Pikka bird
S
Joined: Jun 2003
Posts: 19
Hi

Thank you starbucks_mafia, I will test the code and report back soon.

sahmed01

Page 1 of 2 1 2

Link Copied to Clipboard