Register Log In

Forums Scripts & Popups Detect exact bad channels raw 319

Print Thread

Re: Detect exact bad channels raw 319 Simo #270642 14/08/22 02:29 PM
Joined: Jan 2004 Posts: 2,081 M maroon Hoopy frood
maroon Hoopy frood M Joined: Jan 2004 Posts: 2,081	Your last post makes it sound like you're willing to start from the ground floor. Well, for one thing, you should confirm what your goals include. If this is to prevent people from joining your channel if they're also in #badword, they can just make that channel +s and keep joining it, unless this code is being run by an omniscient oper. I don't see a problem with only returning the 1st match. When it comes to ban filters, you're better off giving less info to bad guys. If they get kicked from your channel for being in a bad channel, unless you made a bad match against #starship_essex, they know which channel(s) were why they were kicked. Giving more info to the spammers just makes it easier for them to evade bans. Lotsa luck asking libera for the content of their badword filter. Though, as something that's echoed to your own window, showing the $regml(1) match is perfectly fine, since it helps you debug your code if you see you're banning people for the wrong things. https://forums.mirc.com/ubbthreads.php/topics/264489/bad-word-in-txt-file This is another design that is a little easier to keep up with than joining everything together in a single list. When using $hfind, you can have a table where each pattern is kept separate, and then you can have each pattern matched against $3- individually, without looping through each pattern separately and without stacking them all into one gigantic list that has the potential of getting even longer than what you have now, and is probably already getting hard to maintain. var %match_against_item_name = $hfind(tablename,$remove($3-,~,&,@,%,+,$chr(35)),1,R) var %match_against_item_value = $hfind(tablename,$remove($3-,~,&,@,%,+,$chr(35)),1,r) $hfind always returns the match being the item name, but you could always use $hget(tablename,%match_type) to return the item's value, or use the .data property. Using hashtables as your wordlist also has the advantage that it's easier for you to show a debug message to yourself which identifies which was the pattern that found a match against their channels-list, which can be a great help when your ban list grows enormously or you have several patterns very similar to each other. It's easy to have a rogue pattern accidentially banning people for being in a channel where the last letter is 'x' /hadd tablename regex_pattern anything_not_blank_or_zero /hadd tablename anything_unique regex_pattern /hadd tablename regex_pattern wildcard_to_be_exempted_from_match ... which determines whether you want to have $3- matched against the itemname or the datavalue. If you're going to have a regex pattern containing a space, then you obviously can't have the pattern be the item-name: fail: /hadd tablename h[^ ]rd[^ ]c[^ ]ck label And to those who were giving me a hard time for using \x20 instead of a literal space... ok: /hadd tablename unique.item h[^ ]rd[^ ]c[^ ]ck /hadd tablename h\Srd[^\S]c[^ ]ck label /hadd tablename h\Srd[^\x20]c[^ ]ck label This hashtable of patterns could be saved to disk and reloaded with /hsave and /hload. The default method of writing hashtables to disk is to put everything on its own line, with itemnames on the odd lines and their value as the even line following them. It's easier to read if you store in the ini item=data format, but you can't do that if your itemname (regex pattern) will contain the '=' character. At first you may not think you're having a problem with an itemname containing a space, but that's only because mIRC is cheating. If you /hsave -i to disk from table1 then /hload -i from that same disk file into table2, mIRC doesn't actually load it from disk, at least for small files. Instead, it sees that it just hsave'ed table1 to that disk file, and it instead just clones from memory instead of actually parsing what's actually on disk. That's why table2 and table3 are both /hload'ing from 2 different files having identical content but end up with different results. And, even if it looks today like /hload -i is able to load itemnames from disk containing '=', it won't happen after restarting mIRC. //hfree -w table* \| .remove test.dat \| hadd -m table1 it=em da=ta \| hsave -i table1 test.dat \| hload -im table2 test.dat \| echo -a read: $read(test.dat,nt,2) vs readini.item: $ini(test.dat,hashtable,1) readini.data: $readini(test.dat,hashtable,$ini(test.dat,hashtable,1)) vs table2item: $hget(table2,1).item table2data: $hget(table2,1).data \| write -c test3.dat $+([hashtable],$crlf,it=em=da=ta) \| hload -im table3 test3.dat \| echo -a table3item: $hget(table3,1).item table3data: $hget(table3,1).data md5: $md5(test.dat,2) same as $md5(test3.dat,2) If you save/load in -i ini format, the itemnames will get sorted seemingly randomly, unless you /hload -im1 to have 1 bucket, but that only has limited help, because the order of existing items gets flipped each time you load-then-save, and then newly added items get inserted at the top in reverse order of how you added them: //hfree -w table \| .remove test.dat \| hadd -m1 table item1 data1 \| hadd table item2 data2 \| hadd table item3 data3 \| hsave -i table test.dat \| echo -s ===1 \| filter -fs test.dat * \| hfree table \| hload -im1 table test.dat \| hadd table newitem1 newdata1 \| hadd table newitem2 newdata2 \| hsave -i table test.dat \| echo -s ===2 \| filter -fs test.dat * So now let's create your hashtable with a couple of bad words, and you can test them against a sample string. You need to test against both false positives and false negatives //tokenize 32 a b #channel1 #starship_essex hfree -w badchans \| hadd -m1 badchans h\Srd[^\x20]c[^ ]ck label \| hadd badchans h\Srd[^\x20]c[^ ]ck label \| hadd badchans [fs]u(ck\|k\|q) label \| hadd badchans s[3e]x label \| echo -a match: $hfind(badchans,$3-,1,R) vs $regex($3-,$chr(40) $+ $hfind(badchans,$3-,1,R) $+ $chr(41)) $regml(1) Note that the pattern-match is what you don't want to give them, because I can assure you there are many ways someone can further obfuscate their bad words once they know the pattern. While you can further accelerate the arms race with them, you start risking false positives against #road_to_success etc. They don't need to know what's been matched, that's for you to know and them to find out. And if you echo the match pattern to yourself along with the string that it matched, you can see if you're matching things you shouldn't.

Entire Thread
Subject	Posted By	Posted
Detect exact bad channels raw 319	Simo	14/08/22 01:21 AM
Re: Detect exact bad channels raw 319	Wims	14/08/22 02:10 AM
Re: Detect exact bad channels raw 319	Simo	14/08/22 08:25 AM
Re: Detect exact bad channels raw 319	maroon	14/08/22 02:29 PM
Re: Detect exact bad channels raw 319	Wims	14/08/22 05:49 PM
Re: Detect exact bad channels raw 319	Simo	14/08/22 11:35 PM
Re: Detect exact bad channels raw 319	Wims	14/08/22 11:36 PM
Re: Detect exact bad channels raw 319	Simo	15/08/22 08:58 AM
Re: Detect exact bad channels raw 319	maroon	15/08/22 09:35 AM
Re: Detect exact bad channels raw 319	Wims	15/08/22 02:39 PM
Re: Detect exact bad channels raw 319	Simo	15/08/22 08:43 PM
Re: Detect exact bad channels raw 319	Wims	16/08/22 12:18 AM
Re: Detect exact bad channels raw 319	Simo	16/08/22 09:42 PM
Re: Detect exact bad channels raw 319	maroon	15/08/22 06:02 AM

Link Copied to Clipboard