Forums Scripts & Popups Check words of sentence to textfile

Print Thread

Check words of sentence to textfile #252342 04/04/15 11:55 PM
F FRAG_B
FRAG_B F	Hi, I want to make a script that checks if any of the words of an incoming textline is in any of the lines of a textfile. I've been looking for the correct way of using $read for this but couldn't find it yet. Can anyone help please? For example the script should find a match if the textline is "this is a random sentence" and the textfile contains: word word word word word random word word Edit: for now I fixed it with a while loop that checks it for every word of the sentence, but I'd like to know if there's a better way to do it. var %z = 1 while ($ [ $+ [ %z ] ] != $null) { if ($read(textfile.txt,tnw,$ [ $+ [ %z ] ]) != $null) { msg $chan test } inc %z } Last edited by FRAG_B; 05/04/15 12:25 AM.

Re: Check words of sentence to textfile #252349 05/04/15 03:20 AM
S Sakana
Sakana S	Avoid using a while loop here, it will be a disaster. There's also no real need to keep the words in a txt file, though that's more of a preference thing.. I'd prefer to just use a variable for the words and have it in the code itself You can use the isin operator Code: var %sentence = the man on the moon if (man isin %sentence) { echo -a true } But it's not optimal. It doesn't care about word boundaries, so it will also match on "manfield" in the above example. What you really need is a regular expression. RegEx is somewhat complicated, but also extremely useful. In the code below is an example of the regex pattern you'll need Code: alias test { var %words = potatoes\|banana man\|haumph\|rawr\|etc var %sentence = I am the Banana Man and I like potatoes. Haumph ! echo -a $regex(%sentence,/\b( $+ %words $+ )\b/gi) matches }

Re: Check words of sentence to textfile #252353 05/04/15 12:26 PM
F FRAG_B
FRAG_B F	Thanks, I'll take a look into RegEx. I usually would use variables, but in this case there are hundreds of words in the textfile and I wanna be able to easily add new ones to it, so working with variables would look a bit messy if they're all in my script and wouldn't be as easy to add new ones.

Re: Check words of sentence to textfile #252408 08/04/15 03:58 PM
Joined: Feb 2015 Posts: 241 Greece O OrFeAsGr Fjord artisan
OrFeAsGr Fjord artisan O Joined: Feb 2015 Posts: 241 Greece	Code: on :text::#: { var %x = 1 while (%x <= $numtok($strip($1-),32)) { var %y = 1 while (%y <= $lines(textfile.txt)) { if ($read(textfile.txt, %y) == $gettok($strip($1-),%x,32)) \|\| ($read(textfile.txt, %y) == $gettok($strip($1-),$+(%x,-,$numtok($read(textfile.txt, %y),32)),32)) { msg $chan Test!!! } inc %y } inc %x } } Tested. It works for sentences aswell. I mean it will work if the textfile.txt line is test but it will also work if it is more than one word.

Re: Check words of sentence to textfile OrFeAsGr #252409 08/04/15 07:01 PM
Joined: Jul 2006 Posts: 4,020 France W Wims Hoopy frood
Wims Hoopy frood W Joined: Jul 2006 Posts: 4,020 France	OP stated he had hundred of words to look for and Sakana suggested how a while loop would be a disaster. Yet you come up with a double nested while loop even though the OP already had a code with only one loop (the second nested loop being handled by $read itself) But your code is even more inefficient, you use the same $read twice and you don't use $read to do the second loop. Your $read also don't have the 'n' switch, which will evaluate the line, resulting in possible exploits. @FRAG_B: A more efficient way is to avoid using while loop, you can keep the list of words in the text file but you should convert this text file to one long string, representing a regular expression, to be used like Sakana suggested: Code: on :start:filetovar alias add_word { if (!$read(listofwords.txt,tnw,$1)) { write listofwords.txt $1 filetovar } } alias del_word { if ($read(listofwords.txt,tnw,$1)) { write -dl $readn listofwords.txt filetovar } } alias list_word loadbuf -a listofwords.txt alias filetovar { set %words_reg /\b( var %a 1 while ($read(listofwords.txt,tn,%a) != $null) { set %words_reg $+(%words_reg,\Q,$replacecs($v1,\E,\E\\E\Q),\E\|) inc %a } set %words_reg $left(%words_reg,-1) $+ )\b/Si } on $:text:%words_reg:#chan:msg $chan You matched the word $regml(1) in my file! We're still using a loop to convert the file to a variable, which is slow, but that's done when adding/deleting, not when the on text event trigger and that's where we need to be fast. Of course that while loop could also be improved but that's besides the point here. Use /add_word <word> to add a word, /del_word <word> to delete a word, and /list_word to display the list into the active window. If you edit the text file manually, you can simply execute /filetovar to update the variable containing the regular expression Note that this method has one problem, it is limited in the number of words you can have in the text file. If you reach the limit (you'll get a /set: line too long error) there are others way to make it better. Last edited by Wims; 08/04/15 10:36 PM. #mircscripting @ irc.swiftirc.net == the best mIRC help channel

Re: Check words of sentence to textfile Wims #252440 11/04/15 03:16 PM
Joined: Feb 2015 Posts: 241 Greece O OrFeAsGr Fjord artisan
OrFeAsGr Fjord artisan O Joined: Feb 2015 Posts: 241 Greece	I'm having a hard time understanding why while seems so slow to you..and such a disaster.. since i use it in my bot with a .txt that has many lines and it was tested in big messages and it scanned them in some milliseconds. In the short but much time i've been scripting (about a year) i've seen while lagging rarely and that was for 1-3 seconds. But hey.. if regex does the work ..whatever..

Re: Check words of sentence to textfile OrFeAsGr #252477 15/04/15 10:48 PM
Joined: Jul 2006 Posts: 4,020 France W Wims Hoopy frood
Wims Hoopy frood W Joined: Jul 2006 Posts: 4,020 France	Hey, The while statement isn't just so slow to me but for anyone. The mIRC scripting language isn't really fast, a while statement is slow and should be avoided when possible. The code you suggested had two nested while loop, which is going to be slow, but moreover, you're using $read in a way that avoids searching the whole file, rather you read line by line, which means that each time $read is called, mirc will open the file (/fopen), parse it to get the Nth line as mentioned (calling /fseek -n, N times), close the file (/fclose) and return the result, all of these operation are accessing the file system, it's super-slow. Also, you're using $lines() as part of the condition in the while loop instead of putting that value in a %variable, $lines() involve the same operations as $read, you're to evaluate it each time, but it won't change in value. Of course, if you have like 5-6 tokens in $1- and 5-6 lines in the file, even the slowest code would perform rather quickly, but start putting 80 words in this text file and it will start being an issue, i'm not even talking about 200-300 lines where mIRC would just start freezing to do the job, the more lines you have, the longer it will take. Regex allow you to match severals string together, it avoids having to script the 'match', everything is concatenated to form a regular expression, and this way you avoid both loop: the loop over the lines in the file are handled because the regular expression itself represent "i want to match this, or this, or this, etc" where "this" is each line in the file. But also the loop over all the token in $1- because of the way regex works. You probably have rarely seen a while loop lagging out mIRC because it's not everyday you need to loop over a lot of lines, that typically only happens when a database is involved and you just don't have huge database. Anyway, the OP had already been using one single while loop (the second one being handled by $read itself, which is not how you used $read!) and said that he was looking for a better way. Since only the speed execution of the code can be considered better in this case, clearly you two nested while loop aren't better. Note that a simple suggestion like using the undocumented $* was already a better solution in this case, it's faster than a while loop and could be used on $1- to avoid the loop on all tokens. Quote: since i use it in my bot with a .txt that has many lines and it was tested in big messages and it scanned them in some milliseconds. How much tokens and how many lines in the file? #mircscripting @ irc.swiftirc.net == the best mIRC help channel

Re: Check words of sentence to textfile

#252488 17/04/15 02:16 PM

Joined: Apr 2010

Posts: 964

USA

FroggieDaFrog Offline

Hoopy frood

FroggieDaFrog

Hoopy frood

Joined: Apr 2010

Posts: 964

USA

We can do a little better than looping over the users text, or reading over the text file searching for matches.

Both are very slow as mIRC's while loops are slow, and file reads are even slower.

As wims stated, depending on how many words you have in the text file, you could simply use a single regex to match.

Since you stated your word list was quite large, that isn't an option. so what to do? Well, there is a solution that (1) Doesn't require looping, and (2) doesn't need file reads passed mIRC starting: hashtables with the use of $hfind().

Using either $hfind(..., W, 1) or $hfind(..., R, 1) mIRC will search the hashtable as though its items are the wildcard/regex text for us, being far faster than using mSL looping.

To use this, though, you'll need to convert your textfile.txt into a hashtable:

Code:

; converts the .txt file to a hashtable for use
; usage: /textfile2hashtable filepath/filename.txt
alias textfile2hashtable {

  ; if the hashtable 'word_list' doesn't exist, create it
  if (!$hget(word_list)) {
    hmake word_list 10
  }

  ; get the number of lines for the file, create an illiterator variable, and a few tmp variables.
  var %e = $lines($1-), %i = 0, %line, %item

  ; start looping
  while (%i < %e) {
    inc %i

    ; read the line from the text file
    %line = $read($1-, nt, %n)

    ; sanitize the line read
    %item = /\b\Q $+ $replace($1-, \, \\, $chr(32), \x20) $+ \E\b/i

    ; add the sanitized line to the hash table
    hadd word_list %item %line
  }

  ; once the looping is finished, save the hashtable
  hsave word_list $filename(word_list.dat)
}

Once converted, we can modify wims's script to make use of it

Code:

; when mIRC stats, call /load_words
on *:START:{
  load_words
}

; alias to load word_list.dat into an in-memory hashtable
alias load_words {
  ; free the hashtable if it exists
  if ($hget(word_list)) hfree word_list

  ; create the hashtable
  hmake word_list 10

  ; load entries from <scriptdir>/word_list.dat
  hload world_list $qt($scriptdirword_list.dat)
}

; alias that adds words to the list
alias add_word {

  ; if the hashtable doesn't exist, create it
  if (!$hget(word_list)) load_words
  
  ; store the 'word' in a variable
  var %word = $1-
  
  ; store the 'word' under a sanitized version (/\b\Qword_here\E\b/i)
  hadd -m word_list /\b\Q $+ $replace(%word, \, \\, $chr(32), \x20) $+ \E\b/i %word
  
  ; save the hashtable
  hsave -o word_list $qt($scriptdirword_list.dat)
}

; alias to remove words from the list
alias del_word {
  ; if the hashtable doesn't exist, load the word_list
  if (!$hget(word_list)) load_words
  
  ; remove the sanitized version of the word from the hashtable
  hdel word_list /\b\Q $+ $replace($1-, \, \\, $chr(32), \x20) $+ \E\b/i
  
  ; save the hashtable
  hsave -o word_list $qt($scriptdirword_list.dat)
}

; on text event
on *:TEXT:*:#:{

  ; search the hashtable for a match
  if ($hfind(word_list, R, 1)) {
    ;; text found
  }
}

Last edited by FroggieDaFrog; 17/04/15 02:17 PM.

Re: Check words of sentence to textfile

OrFeAsGr #252489 17/04/15 03:20 PM

Joined: Apr 2010

Posts: 964

USA

FroggieDaFrog Offline

Hoopy frood

FroggieDaFrog

Hoopy frood

Joined: Apr 2010

Posts: 964

USA

While loops are slow, file reads are slower, both together are excessively slower.

To prove just how slow while loops are:

Code:

alias benchNoLoop {
  !var %y = 5
  !var %ticks = $ticks
  !dec %y
  !dec %y
  !dec %y
  !dec %y
  !dec %y
  !inc %benchNoLoop $calc($ticks - %ticks)
}
alias benchLoop {
  !var %y = 5
  !var %ticks = $ticks
  !while (%y) {
    !dec %y
  }
  !inc %benchLoop $calc($ticks - %ticks)
}

alias TestLoop {
  set -u0 %benchNoLoop 0
  set -u0 %benchLoop 0
  var %x = $1
  while (%x) {
    .benchLoop
    .benchNoLoop
    dec %x
  }
  echo -a -
  echo -a Test: $1
  echo -a NoLp: %benchNoLoop $+ ms ( $+ $round($calc(%benchNoLoop / $1),3) per test)
  echo -a Loop: %benchLoop $+ ms ( $+ $round($calc(%benchLoop / $1),3) per test)
  echo -a Diff: $calc(%benchLoop - %benchNoLoop) $+ ms ( $+ $round($calc(%benchLoop / %benchNoLoop),3) times faster)
}

Using a clean mIRC that is offline with no scripts but the above loaded, issuing /testloop 1000000 produces:

Code:

Test: 1000000
NoLp: 35469ms (0.035 per test)
Loop: 72987ms (0.073 per test)
Diff: 37518ms (2.058 times faster)

that's twice the amount of time to issue the same number of commands from within a loop verses not looping at all

Last edited by FroggieDaFrog; 17/04/15 03:20 PM.

Re: Check words of sentence to textfile FroggieDaFrog #252491 18/04/15 11:59 AM
Joined: Feb 2003 Posts: 2,737 Raccoon Hoopy frood
Raccoon Hoopy frood Joined: Feb 2003 Posts: 2,737	Eh. I don't like suggesting to people that while loops are slow. They really aren't. Bad coding practices like unnecessary loops, are slow, but the actual looping mechanism itself isn't "slow" with respect to any other aspect of mIRC's scripting engine. Rather than telling people that while loops are slow, tell them that they can find a better solution that doesn't require a loop or doesn't need to loop as much. Multiple $reads and /writes are slow, so use $fread and /fwrite in while loops. Well. At least I won lunch. Good philosophy, see good in bad, I like!

Re: Check words of sentence to textfile Raccoon #252493 18/04/15 01:01 PM
Joined: Apr 2010 Posts: 964 USA F FroggieDaFrog Hoopy frood
FroggieDaFrog Hoopy frood F Joined: Apr 2010 Posts: 964 USA	If you look above the benchmark post, I gave a suggestion that doesn't use looping or file reads after start up.

Link Copied to Clipboard