mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Nov 2021
Posts: 48
Simo Offline OP
Ameglian cow
OP Offline
Ameglian cow
Joined: Nov 2021
Posts: 48
hello gentz,

I was wondering if there would be a way to collect all actual bad channels detected and use in the kick message as $regml(1) only detects the patern

Code

Raw 319:*: {
  if ($regex( $+ $regsubex($remove($3-,~,&,@,%,+,$chr(35)),/(.)\1+/g,\1) $+ ,/([fs]uck|s[3e]x|h.*rd.*c.*ck|l[0o]v[3e]r|Pr[3e]gn.*t|b[l1i\x7C]g.*m[3e][l1i\x7C][l1i\x7C].*ns|ba[l1i\x7C].*s.*p|k.*nky.*d.*sir.*|m[il1][il1]ky.*b[0o][0o]b|Pant[il1]|R[0o]l[3e]P[l1i].*Y|Er[o0]t.*c|[3e]r[3e]ct[l1i\x7C][0o]n|xxx|d.*min.*nt|[csf]uck|s[3e][k]s|$&
    $+ [lI\x7C][ae3][ae3]th[3e]r|[l1i\x7C]nj[3e]ct[l1i\x7C][0o]n|m[3ea]r[1il\x7C][3e]d|masag[3e]|s[3e]x|m[a3e]s[s][a3e]g[3e]|p[0o]rn|m.*shr[0o][0o]m|w[[1li\x7c]f[3e]|drunk|b[0o]dy.*m.*ss.*g[3e]|fr.*m.*(b.*h.*nd)|t.*n.*ght.*(w.*f.*|g[il\x7c]r.*|w.*m.*n)|d[il\x7c]ck|cucumb[3e]r|$&
    $+  my.*(h[0o]t|s.*x|pr[3e]gnt|s[1li\x7c]s|w[1li\x7c]f[3e]|m[0ou]m)|(h[0o]t|pr[3e]gnt|badw[0o]rd|s[1li\x7c]s|w[1li\x7c]f[3e]|m[0ou]m).*f[ae3]nt[ae3]sy)/i)) {  
    VAR %bchans = $regml(1)
    VAR %t = $comchan($2,0)
    WHILE (%t) {
      var %chanz1 = $comchan($2,%t) 
      IF ($nick(%chanz1,$me,@&~%)) {
        mode %chanz1 +b $address($2,4)  
        kick %chanz1 $2 your on a banned channel ( $+  $+($chr(35),*,%bchans,*) $+ ) leave it and rejoin.
      }
      DEC %t
    }
  }
}





Last edited by Simo; 14/08/22 02:15 AM.
Joined: Jul 2006
Posts: 3,944
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,944
Hey,

The way your main regex works is that it looks for various regex expression which are not bound to any position/character, for example [fs]uck would only match 'suck' in #popsicles_sucking, and some of the expression are using quantifiers, like .*, which is slow, and incorrect in this case.

Quantifier are by default greedy in regex, the expression "a.*b" on the string "a01b23b" matches "a01b23b", not " a01b". This is very important. You can make a quantifier non greedy by adding a ? after it, "a.*?b" matches only "a01b".

Specifically using * on the dot '.' character is bad here because if you wanted to ban the expression "badXXchannel" where XX represent any string but you use "bad.*channel", it would match for example on "#thisisabad #chan1 #more #whateverchannel" for the same reason as above, while you want it to look for a channel name, not anywhere.

To limit each sub alternative to search only for a channel name, you should replace all the .* by [^ ]* which means anything but a space, that way it will stop when a new channel name starts.

To avoid matching in the middle of a channel name and because you're looking for the exact channel name, you also need to bound your expression to the start of the string or a space, and the end of the string or a space.
Also it looks like you're using () not to capture but to create group, in which case (?:) should be used instead.
Then /g must be used in order to match all the bad channel, then $regml() is used to get all the channel, we use one pair of () to capture what we want, and we use (?:) when we need to group but don't want to capture



Code
Raw 319:*: {
  if ($regex(save, $+ $regsubex($remove($3-,~,&,@,%,+,$chr(35)),/(.)\1+/g,\1) $+ ,/(?<=^| )([fs]uck|s[3e]x|h[^ ]*rd[^ ]*c[^ ]*ck|l[0o]v[3e]r|Pr[3e]gn[^ ]*t|b[l1i\x7C]g[^ ]*m[3e][l1i\x7C][l1i\x7C][^ ]*ns|ba[l1i\x7C][^ ]*s[^ ]p|k[^ ]*nky[^ ]*d[^ ]*sir[^ ]*|m[il1][il1]ky[^ ]*b[0o][0o]b|Pant[il1]|R[0o]l[3e]P[l1i][^ ]*Y|Er[o0]t[^ ]*c|[3e]r[3e]ct[l1i\x7C][0o]n|xxx|d[^ ]*min[^ ]*nt|[csf]uck|s[3e][k]s|$&
 $+ [lI\x7C][ae3][ae3]th[3e]r|[l1i\x7C]nj[3e]ct[l1i\x7C][0o]n|m[3ea]r[1il\x7C][3e]d|masag[3e]|s[3e]x|m[a3e]s[s][a3e]g[3e]|p[0o]rn|m[^ ]*shr[0o][0o]m|w[[1li\x7c]f[3e]|drunk|b[0o]dy[^ ]*m[^ ]*ss[^ ]*g[3e]|fr[^ ]*m[^ ]*(?:b[^ ]*h[^ ]*nd)|t[^ ]*n[^ ]*ght[^ ]*(?:w[^ ]*f[^ ]*|g[il\x7c]r[^ ]*|w[^ ]*m[^ ]*n)|d[il\x7c]ck|cucumb[3e]r|$&
 $+ my[^ ]*(?:h[0o]t|s[^ ]*x|pr[3e]gnt|s[1li\x7c]s|w[1li\x7c]f[3e]|m[0ou]m)|(h[0o]t|pr[3e]gnt|badw[0o]rd|s[1li\x7c]s|w[1li\x7c]f[3e]|m[0ou]m)[^ ]*f[ae3]nt[ae3]sy)(?= |$)/ig)) {
    VAR %bchans = $regsubex($str(#a $+ $chr(32),$regml(save,0)),/a/g,$regml(save,\n))
    VAR %t = $comchan($2,0)
    WHILE (%t) {
      var %chanz1 = $comchan($2,%t) 
      IF ($nick(%chanz1,$me,@&~%)) {
        mode %chanz1 +b $address($2,4)  
        kick %chanz1 $2 your on a banned channel ( $+ %bchans $+ ) leave it and rejoin.
      }
      DEC %t
    }
  }
}
untested.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Nov 2021
Posts: 48
Simo Offline OP
Ameglian cow
OP Offline
Ameglian cow
Joined: Nov 2021
Posts: 48
thanks for the reply wims much aprciated i tried your posted code it didnt seem to trigger tho and no error also perhaps i overcomplicated it a bit

lets say we keep it simple like this before we add more words to look for:

Code

Raw 319:*: {
  if ($regex($3-,/([fs]uck|s[3e]x)/i)) {  
    VAR %bchans = $regml(1)
    VAR %t = $comchan($2,0)
    WHILE (%t) {
      var %chanz1 = $comchan($2,%t) 
      IF ($nick(%chanz1,$me,@&~%)) {
        mode %chanz1 +b $address($2,4)  
        kick %chanz1 $2 your on a banned channel ( $+  $+($chr(35),*,%bchans,*) $+ ) leave it and rejoin.
      }
      DEC %t
    }
  }
}




Last edited by Simo; 14/08/22 08:26 AM.
Joined: Jan 2004
Posts: 1,835
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,835
Your last post makes it sound like you're willing to start from the ground floor.

Well, for one thing, you should confirm what your goals include. If this is to prevent people from joining your channel if they're also in #badword, they can just make that channel +s and keep joining it, unless this code is being run by an omniscient oper.

I don't see a problem with only returning the 1st match. When it comes to ban filters, you're better off giving less info to bad guys. If they get kicked from your channel for being in a bad channel, unless you made a bad match against #starship_essex, they know which channel(s) were why they were kicked. Giving more info to the spammers just makes it easier for them to evade bans. Lotsa luck asking libera for the content of their badword filter.

Though, as something that's echoed to your own window, showing the $regml(1) match is perfectly fine, since it helps you debug your code if you see you're banning people for the wrong things.

https://forums.mirc.com/ubbthreads.php/topics/264489/bad-word-in-txt-file

This is another design that is a little easier to keep up with than joining everything together in a single list. When using $hfind, you can have a table where each pattern is kept separate, and then you can have each pattern matched against $3- individually, without looping through each pattern separately and without stacking them all into one gigantic list that has the potential of getting even longer than what you have now, and is probably already getting hard to maintain.

var %match_against_item_name = $hfind(tablename,$remove($3-,~,&,@,%,+,$chr(35)),1,R)
var %match_against_item_value = $hfind(tablename,$remove($3-,~,&,@,%,+,$chr(35)),1,r)

$hfind always returns the match being the item name, but you could always use $hget(tablename,%match_type) to return the item's value, or use the .data property.

Using hashtables as your wordlist also has the advantage that it's easier for you to show a debug message to yourself which identifies which was the pattern that found a match against their channels-list, which can be a great help when your ban list grows enormously or you have several patterns very similar to each other. It's easy to have a rogue pattern accidentially banning people for being in a channel where the last letter is 'x'

/hadd tablename regex_pattern anything_not_blank_or_zero
/hadd tablename anything_unique regex_pattern
/hadd tablename regex_pattern wildcard_to_be_exempted_from_match

... which determines whether you want to have $3- matched against the itemname or the datavalue.

If you're going to have a regex pattern containing a space, then you obviously can't have the pattern be the item-name:

fail:
/hadd tablename h[^ ]*rd[^ ]*c[^ ]*ck label

And to those who were giving me a hard time for using \x20 instead of a literal space...

ok:
/hadd tablename unique.item h[^ ]*rd[^ ]*c[^ ]*ck
/hadd tablename h\S*rd[^\S]*c[^ ]*ck label
/hadd tablename h\S*rd[^\x20]*c[^ ]*ck label

This hashtable of patterns could be saved to disk and reloaded with /hsave and /hload. The default method of writing hashtables to disk is to put everything on its own line, with itemnames on the odd lines and their value as the even line following them. It's easier to read if you store in the ini item=data format, but you can't do that if your itemname (regex pattern) will contain the '=' character.

At first you may not think you're having a problem with an itemname containing a space, but that's only because mIRC is cheating. If you /hsave -i to disk from table1 then /hload -i from that same disk file into table2, mIRC doesn't actually load it from disk, at least for small files. Instead, it sees that it just hsave'ed table1 to that disk file, and it instead just clones from memory instead of actually parsing what's actually on disk. That's why table2 and table3 are both /hload'ing from 2 different files having identical content but end up with different results. And, even if it looks today like /hload -i is able to load itemnames from disk containing '=', it won't happen after restarting mIRC.

//hfree -w table* | .remove test.dat | hadd -m table1 it=em da=ta | hsave -i table1 test.dat | hload -im table2 test.dat | echo -a read: $read(test.dat,nt,2) vs readini.item: $ini(test.dat,hashtable,1) readini.data: $readini(test.dat,hashtable,$ini(test.dat,hashtable,1)) vs table2item: $hget(table2,1).item table2data: $hget(table2,1).data | write -c test3.dat $+([hashtable],$crlf,it=em=da=ta) | hload -im table3 test3.dat | echo -a table3item: $hget(table3,1).item table3data: $hget(table3,1).data md5: $md5(test.dat,2) same as $md5(test3.dat,2)

If you save/load in -i ini format, the itemnames will get sorted seemingly randomly, unless you /hload -im1 to have 1 bucket, but that only has limited help, because the order of existing items gets flipped each time you load-then-save, and then newly added items get inserted at the top in reverse order of how you added them:

//hfree -w table | .remove test.dat | hadd -m1 table item1 data1 | hadd table item2 data2 | hadd table item3 data3 | hsave -i table test.dat | echo -s ===1 | filter -fs test.dat * | hfree table | hload -im1 table test.dat | hadd table newitem1 newdata1 | hadd table newitem2 newdata2 | hsave -i table test.dat | echo -s ===2 | filter -fs test.dat *

So now let's create your hashtable with a couple of bad words, and you can test them against a sample string. You need to test against both false positives and false negatives

//tokenize 32 a b #channel1 #starship_essex hfree -w badchans | hadd -m1 badchans h\S*rd[^\x20]*c[^ ]*ck label | hadd badchans h\S*rd[^\x20]*c[^ ]*ck label | hadd badchans [fs]u(ck|k|q) label | hadd badchans s[3e]x label | echo -a match: $hfind(badchans,$3-,1,R) vs $regex($3-,$chr(40) $+ $hfind(badchans,$3-,1,R) $+ $chr(41)) $regml(1)

Note that the pattern-match is what you don't want to give them, because I can assure you there are many ways someone can further obfuscate their bad words once they know the pattern. While you can further accelerate the arms race with them, you start risking false positives against #road_to_success etc. They don't need to know what's been matched, that's for you to know and them to find out. And if you echo the match pattern to yourself along with the string that it matched, you can see if you're matching things you shouldn't.

Joined: Jul 2006
Posts: 3,944
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,944
Yeah after posting I realized I forgot to include something.

Code
Raw 319:*: {
  var %pattern (?:[fs]uck|s[3e]x|h[^ ]*rd[^ ]*c[^ ]*ck|l[0o]v[3e]r|Pr[3e]gn[^ ]*t|b[l1i\x7C]g[^ ]*m[3e][l1i\x7C][l1i\x7C][^ ]*ns|ba[l1i\x7C][^ ]*s[^ ]p|k[^ ]*nky[^ ]*d[^ ]*sir[^ ]*|m[il1][il1]ky[^ ]*b[0o][0o]b|Pant[il1]|R[0o]l[3e]P[l1i][^ ]*Y|Er[o0]t[^ ]*c|[3e]r[3e]ct[l1i\x7C][0o]n|xxx|d[^ ]*min[^ ]*nt|[csf]uck|s[3e][k]s|$&
    $+ [lI\x7C][ae3][ae3]th[3e]r|[l1i\x7C]nj[3e]ct[l1i\x7C][0o]n|m[3ea]r[1il\x7C][3e]d|masag[3e]|s[3e]x|m[a3e]s[s][a3e]g[3e]|p[0o]rn|m[^ ]*shr[0o][0o]m|w[[1li\x7c]f[3e]|drunk|b[0o]dy[^ ]*m[^ ]*ss[^ ]*g[3e]|fr[^ ]*m[^ ]*(?:b[^ ]*h[^ ]*nd)|t[^ ]*n[^ ]*ght[^ ]*(?:w[^ ]*f[^ ]*|g[il\x7c]r[^ ]*|w[^ ]*m[^ ]*n)|d[il\x7c]ck|cucumb[3e]r|$&
    $+ my[^ ]*(?:h[0o]t|s[^ ]*x|pr[3e]gnt|s[1li\x7c]s|w[1li\x7c]f[3e]|m[0ou]m)|(h[0o]t|pr[3e]gnt|badw[0o]rd|s[1li\x7c]s|w[1li\x7c]f[3e]|m[0ou]m)[^ ]*f[ae3]nt[ae3]sy)


  if ($regex(save,$regsubex($remove($3-,~,&,@,%,+,$chr(35)),/(.)\1+/g,\1), /(?<=^| )(?|([^ ]*? $+ %pattern $+ [^ ]*)|()[^ ]+)(?=()(?: |$))/Fig)) {
    VAR %3- $3-,%bchans = $remove($regsubex($str(a $+ $chr(32),$regml(save,0)),/a/g,$iif(\n !& 1 && $regml(save,$calc(\n -1)) != $null && $gettok(%3-,$calc(\n / 2),32) != $null,$v1)),~,&,@,%,+)
    VAR %t = $comchan($2,0)
    WHILE (%t) {
      var %chanz1 = $comchan($2,%t) 
      IF ($nick(%chanz1,$me,@&~%)) {
        mode %chanz1 +b $address($2,4)  
        kick %chanz1 $2 your on a banned channel ( $+ %bchans $+ ) leave it and rejoin.
      }
      DEC %t
    }
  }
}
I tested this code quickly, it should work. maroon's comment is worth looking at if you want a clean script and not a mess.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Nov 2021
Posts: 48
Simo Offline OP
Ameglian cow
OP Offline
Ameglian cow
Joined: Nov 2021
Posts: 48
thanks maroon

i tried your last posted code Wims it detects only 1 bad channel and doesnt collect all bad channels they are in to inform them to part all those channels

Last edited by Simo; 14/08/22 11:35 PM.
Joined: Jul 2006
Posts: 3,944
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,944
Can you give examples of channel name it didn't work for?


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Jan 2004
Posts: 1,835
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,835
Please clarify "detects only 1 bad channel". I can't tell from your description what's the problem, since you didn't give examples. Does that mean there's some bad channels it won't match if they're the only bad channel in the string? Or, just that it's only showing the 1st match? or that it's matching some good channels it shouldn't? You don't need to use this inside the reply to whois to test your regex. If the regex is using $3- then make your dummy string to test it:

//tokenize 32 parm1 parm2 #list #of #channels | echo -a $regex($3-,pattern)

I still don't think you need to bend over backwards to give extra info to spammers. Most people won't be in a bad channel, and if someone is, the vast majority of those would be in just 1 bad channel. I see no reason to add complexity to the script in order to make things easier for the worst-of-the-worst being in multiple bad channels. If someone is in channels they shouldn't be in, they know which ones they are without your message even telling them which

Joined: Nov 2021
Posts: 48
Simo Offline OP
Ameglian cow
OP Offline
Ameglian cow
Joined: Nov 2021
Posts: 48
in this case we want to inform our users maroon and not having them to guess what channel to part from.



Wims for example if a nick is on 2 bad channels #sdsfucksd #sfkjssdsexssf and 1 normal channel #test when doing whois on the nick i get:

Nick your on a banned channel ( #sfkjssdsexssf) leave it and rejoin.


rather then:


Nick your on a banned channel(s) ( #sdsfucksd #sfkjssdsexssf ) part that/those channel(s) and rejoin here on #test

Last edited by Simo; 15/08/22 09:02 AM.
Joined: Jan 2004
Posts: 1,835
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,835
This didn't answer my questions, nor did it do any of the simple testing i suggested - is it displaying only the last match? or is it completely not seeing the 1st match if it's the only one there? If you swap the order of the channels does the other one appear instead?

Joined: Jul 2006
Posts: 3,944
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,944
Ok, there was a couple issues, this should work:

Code
Raw 319:*: {
  var %pattern (?:[fs]uck|s[3e]x|h[^ ]*rd[^ ]*c[^ ]*ck|l[0o]v[3e]r|Pr[3e]gn[^ ]*t|b[l1i\x7C]g[^ ]*m[3e][l1i\x7C][l1i\x7C][^ ]*ns|ba[l1i\x7C][^ ]*s[^ ]p|k[^ ]*nky[^ ]*d[^ ]*sir[^ ]*|m[il1][il1]ky[^ ]*b[0o][0o]b|Pant[il1]|R[0o]l[3e]P[l1i][^ ]*Y|Er[o0]t[^ ]*c|[3e]r[3e]ct[l1i\x7C][0o]n|xxx|d[^ ]*min[^ ]*nt|[csf]uck|s[3e][k]s|$&
    $+ [lI\x7C][ae3][ae3]th[3e]r|[l1i\x7C]nj[3e]ct[l1i\x7C][0o]n|m[3ea]r[1il\x7C][3e]d|masag[3e]|s[3e]x|m[a3e]s[s][a3e]g[3e]|p[0o]rn|m[^ ]*shr[0o][0o]m|w[[1li\x7c]f[3e]|drunk|b[0o]dy[^ ]*m[^ ]*ss[^ ]*g[3e]|fr[^ ]*m[^ ]*(?:b[^ ]*h[^ ]*nd)|t[^ ]*n[^ ]*ght[^ ]*(?:w[^ ]*f[^ ]*|g[il\x7c]r[^ ]*|w[^ ]*m[^ ]*n)|d[il\x7c]ck|cucumb[3e]r|$&
    $+ my[^ ]*(?:h[0o]t|s[^ ]*x|pr[3e]gnt|s[1li\x7c]s|w[1li\x7c]f[3e]|m[0ou]m)|(h[0o]t|pr[3e]gnt|badw[0o]rd|s[1li\x7c]s|w[1li\x7c]f[3e]|m[0ou]m)[^ ]*f[ae3]nt[ae3]sy)


  noop $regex(save,$regsubex($remove($3-,~,&,@,%,+,$chr(35)),/(.)\1+/g,\1), /(?<=^| )(?|([^ ]*? $+ %pattern $+ [^ ]*)|()[^ ]+)(?= |$)/Fig)
  VAR %3- $3-,%bchans = $remove($regsubex($str(a $+ $chr(32),$regml(save,0)),/a/g,$iif($regml(save,\n) != $null && $gettok(%3-,\n,32) != $null,$v1)),~,&,@,%,+)

  if ($numtok(%bchans,32)) {
    VAR %t = $comchan($2,0)
    WHILE (%t) {
      var %chanz1 = $comchan($2,%t)
      IF ($nick(%chanz1,$me,@&~%)) {
        mode %chanz1 +b $address($2,4)  
        kick %chanz1 $2 your on a banned channel ( $+ %bchans $+ ) leave it and rejoin.
      }
      DEC %t
    }
  }
}


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Nov 2021
Posts: 48
Simo Offline OP
Ameglian cow
OP Offline
Ameglian cow
Joined: Nov 2021
Posts: 48
thank you for your reply Wims i tried youor last posted code but still the same issue

Joined: Jul 2006
Posts: 3,944
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,944
I tested the code with the two channels you gave and tested various order in $3- from the raw event, two bad channels and 3 good, and it always worked perfectly for me on 7.69.

If you want more help you'll have to come to irc.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Nov 2021
Posts: 48
Simo Offline OP
Ameglian cow
OP Offline
Ameglian cow
Joined: Nov 2021
Posts: 48
Thanks Wims much apreciated ill test it some more to see if i can get it to work

Last edited by Simo; 16/08/22 09:42 PM.

Link Copied to Clipboard