mIRC Home    About    Download    Register    News    Help

Print Thread
Page 2 of 2 1 2
Joined: Nov 2006
Posts: 1,559
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Nov 2006
Posts: 1,559
The main idea is that a set of seperate expressions is far more easy to read and edit than a single, condensed regular expression - in case you want to find/remove/add/... a specific bad-definition.
It would further allow for different switches per expression, or for example to script some kind of dialog-GUI arround it (for viewing/editing of the current bad-definitions).

For the actual checking routine, you'd need one single "if $hfind"-check like in Wims example, which is on par with the single "if $regex"-check in your examples. (The items may be added to the table e.g. on start.)
Altogh mIRC will loop the hash'd items internally of course, this will process faster than e.g. a $*tok-based check on a token list which requires a custom loop.

If you got the idea by now, you can easily modify your script for benching purposes... it won't process faster than your last script, but rather add some flexibility to it. And as stated, this is only a suggestion - to be of use if speed isn't everything to you. smile

Joined: Nov 2009
Posts: 117
Vogon poet
Offline
Vogon poet
Joined: Nov 2009
Posts: 117
Yeah I see the main idea but what I'm not seeing
is a functional script yet.. I'm thinking in the
time it takes you to script up the fancy dialog box
/hsave /hload and all the supporting structure we
could just as easily go ahead and add the fifty or
so words we want to check for and be done.. after
you do all that with the dialog and all in the end
after you add all the words it will just be sitting
there hogging up space with nothing to do.. But,
please proceed..

Joined: Jan 2007
Posts: 1,156
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Jan 2007
Posts: 1,156
Hey I think you lost sight of the intention of this post. Person has a script and needs help with it.

Joined: Nov 2009
Posts: 117
Vogon poet
Offline
Vogon poet
Joined: Nov 2009
Posts: 117
I haven't lost sight.. I solved the problem
seven or eight post back.. this is all about
some theoritical improvement on some percieved
problem that has yet to make sense to me...

Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
gooshie, I think I sort of got what Horstl meant by storing the $regex to the hash table and use the R switch to reference to the $regml(1):

Basically add the whole regex string to a hash table:
Code:
/hadd -m pmban pmban /(bitch|bastard|cunt|cock|fuc?k|<censored>|h(0|o)rn(ie|y)|nigger|\btwat\b|whore|penis|shit|\bcibai\b|di ?ck|pussy-?|fak you|fck|slut)/iS
Then use:
Code:
on *:EXIT: {
  hsave pmban pmban.hsh
}
on *:START: {
  if (!$hget(pmban)) hmake pmban 100
  if ($isfile(pmban.hsh)) hload pmban pmban.hsh
}
on @*:TEXT:*:#: {
  if ($hfind(pmban,$1-,0,R).data) {
    ban -ku600 # $nick 2 Ten minute ban for saying the foul language consists of $regml(1)
  }
}
The use of a hash table should be able to store an unlimited amount of regex matches into its data, without the worry of long add of words, compared to that of the script remote.

Joined: Jan 2007
Posts: 1,156
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Jan 2007
Posts: 1,156
From my experiences, variables are actually faster than a hash table when storing a small value such as an integer.

However when storing and accessing large amounts of data, hash tables are better.

Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
While hash tables have no limit as to how many info you can store, variables, on the contrary, have limit.
Nevertheless I agree with you about using vars when it comes to storing small integers.

Joined: Nov 2006
Posts: 1,559
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Nov 2006
Posts: 1,559
The length limit per item is alike the length limit of a variable (name of table + name of item + data <= ~4140; name of variable + data <= ~4140).

My suggestion was about versatility only - not using a "block"-expression. And I rather wouldn't use backreferences (as they will slow down processing, whatever method you use).
Maybe you don't have 20 but 200 bad"words"... The length limit put aside, if you have a block-expression and there's e.g. some misfire it may be hard to find the erroneous chunk - especially without backreferences - in a multilined, $&-combined huddle-muddle.
If on the other hand side your items are discrete expressions (or manageable "chunks"), you can loop $hfind(table,<text>,N,R) for debugging. Depending on your regex skills, this can be quite handy - even if you don't intend to add/modify the expression(s) frequently, and don't intend to create a dialogue or the like.

It's no suggestion for those who cannot aford to spend a few kB of memory (*sniff*) or sacrifice one or two $ticks... (*whimper*) smirk

Joined: Jul 2008
Posts: 236
S
Fjord artisan
Offline
Fjord artisan
S
Joined: Jul 2008
Posts: 236
Please explain what makes you think hash tables have no limit, Tomao. Hashtables are, as everything, inherently limited by implementation. If your OS has no swap file, then it's limited by your available RAM. If your machine has no swap file and no available RAM, then I assume it will fail to add new items. The items within hashtables appear to be limited to 4096 bytes, much like variables.

I would suggest gooshie's first solution, alongside Tomao's suggestion, with the addition of a second file which stores the words, one to each line. Words would be added using /add_badword function, and removed by hand (I can't be bothered!). If speed becomes an issue (thooouuusands of items here), one could then quite easily then take the TRE regular expression module (which can compile regular expressions at runtime), write a cheap & dirty wrapper and compile. The resulting code would be fine for millions of items, assuming enough system resources are present smile. I was tempted to write code for this, but this message is getting lengthy and it would be a good learning experience for others... and hey, I'm here if anyone who wants to give it a shot runs into any problems smile

Code:
alias conf_badword {
  set %badwords_plaintext badwords.txt
  set %badwords_table badwords.tab
}

alias add_badword {
  var %item = $hget(badwords,0).item, %data = $mid($hget(badwords,%item).data,2,-2)

  conf_badword  

  ; leave more than enough space for the max 4096 per anything...
  if ($len(%data) >= 2048) {
    hadd -m badwords $calc(%item + 1) $+(/,$1,/iS)
  }
  else {
    hadd -m badwords %item $+(/,%data,|,$1,/iS)
  }

  write %badwords_plaintext $1
  hsave badwords %badwords_table
}

on 1:OPEN: conf_badword | hload %badwords_table
on ^*:OPEN:?:pmkban $1-
on *:TEXT:*:?:pmkban $1-
on *:ACTION:*:?:pmkban $1-
alias -l pmkban {
  if ($hfind(badwords,$1-,0,R).data != $null) {
    var %c,%i 1
    while $comchan($nick,%i) {
      %c = $v1
      if $nick(%c,$me,~&@%) {
        ban -ku600 %c $nick 2 Ten minute ban for language in private message!
      }
      inc %i
    }
    .ignore -u600 $nick 2
  }
}

Last edited by s00p; 29/12/09 04:45 PM.
Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
I'm afraid I've been ill-informed or misunderstood of the hash tables being able to store an unlimited number of items regardless of the N you choose. I didn't think of the OS swap file or RAM involved when I posted...

Joined: Jul 2008
Posts: 236
S
Fjord artisan
Offline
Fjord artisan
S
Joined: Jul 2008
Posts: 236
that's ok mate... The help file makes the same mistake so I understand where you might get this idea from wink I'd put it in the suggestions but I don't think it'd make it anywhere because... well the help file is a low priority thing I guess

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
(Not specifically to you, Horstl)

Although you have a limit on size in a hash table, you can store a binary value to the hash table if you need more data on each item. This does, however, limit searching if you don't know which item to look in ($hfind doesn't help if you're using binary values). I didn't go through the entire thread, but just from the last posts, I don't think this was mentioned. Whether or not that's any use here, or if it's faster to split the items, I'm not sure. Just thought I'd throw it out for anyone who didn't know they can do that.


Invision Support
#Invision on irc.irchighway.net
Page 2 of 2 1 2

Link Copied to Clipboard