OK here's the updated version (using qwerty's adaptation plus another change so it matches properly):
if ($regex($1-, /((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si)) { profanity10 }
Now to try and explain the regular expression
/((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si...
First off I'll point out that my 'explanation' almost certainly won't explain anything unless you already know regular expressions at least a little bit, and even if you do know regular expressions it probably
still won't explain anything. Regular expressions are a very powerful tool, unfortunately they're incredibly hard for people to understand and trying to explain a fullblown expression to someone who doesn't already know regular expressions in general is not a good idea. You're better off using google to find a regular expression tutorial and then coming back and reading my explanation when you're comfortable with them. Anyway, here it is:
- [color:blue]/((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
The /'s there mark the beginning and end of the actual expression. - /((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/[color:blue]Si[/color]
Since these are outside of the /'s they are treated as modifiers. Modifiers are like switches - they change the behaviour of how the entire expression behaves. The S modifier means that control codes are stripped from the text before it's compared (which is why we don't need to use $strip()). The i modifier means that the match is case-insensitive - this means that it will match 'penis', 'pENIS', 'p*&En>i*S' or any other variation on letter-case. - /[color:blue]((?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
The parentheses (( )) create a subpattern, which is used to group the expression and also means that anything matched with the expression within is captured and can be retrieved and used later. - /([color:blue](?:[^\w\s]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
These inner parentheses are again used to create a subpattern to group the expression within them, however the ?: after the opening parenthesis means that what's inside is not captured, this makes the expression more efficient since we don't need to retrieve what that expression matches. - /((?:[color:blue][[color:red]^\w\s][/color]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
The brackets ([ ]) are used to create a character set, it basically means that it will match if any of the characters it contains appears at that position in the text. However the ^ means that the character is negated, this means that the character set will match any characters that it doesn't contain. \w and \s are metacharacters, they both represent groups of characters (kind of like special built-in character sets). \w represents the characters a to z, A to Z, 0 to 9, and underscore (_). \s represents all whitespace characters such as regular space, tab, and so on. So in total [[color:red]^\w\s][/color] means 'match a character which is not alphanumeric, an underscore, or whitespace'. - /((?:[^\w\s][color:blue]|_)*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
The | basically means 'or'. That is, match either the expression to the left or the expression to the right. So in the wider context of this particular expression it means 'match [color:green][^\w\s] (which in turn means 'a character which is not alphanumeric, an underscore, or whitespace') or match _ (which is a literal underscore)[/color]. To put that into a single sentence, it means 'match a character which is not alphanumeric or whitespace'. - /((?:[^\w\s]|_)[color:blue]*)p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
The * is a repetition quantifier. It means 'match the expression preceding it zero or more times' (any number of times). The expression directly preceding it is (?:[^\w\s]|_) (this is why the parentheses were used to group that expression), so to combine these two meanings we get 'match any number of characters which are not alphanumeric or whitespace'. - /((?:[^\w\s]|_)*)[color:blue]p(?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
Each of those characters are taken literally, and you can replace them with any alphanumeric characters you want to match any word you choose. - /((?:[^\w\s]|_)*)p[color:blue](?1)e(?1)n(?1)i(?1)s(?1)/Si[/color]
Each of those (?1)'s simply means 'apply subpattern number 1 (the first subpattern defined) here' - the first subpattern being ((?:[^\w\s]|_)*). Basically this means that the expression behaves as if ((?:[^\w\s]|_)*) was used in each of those places.
You probably didn't learn anything from that, but it took me an age to write out so just look at the pretty colours anyway.