mIRC Home    About    Download    Register    News    Help

Topic Options
#264489 - 06/12/18 08:54 PM Bad word in txt file
Doctor_Souza Offline
Pikka bird

Registered: 16/02/17
Posts: 18
hi guys, how to comparate if one word from text exists in txt file?
i used "while" for check line by line if this word exists in all text "if ($read(scripts\blockp.txt,%i) isin $1-) {" but if the word is part of another it considers that the word exists, and I wanted it not to consider the word if it is part of another only it alone

Tankyou guys!!!

Top
#264491 - 06/12/18 10:56 PM Re: Bad word in txt file [Re: Doctor_Souza]
maroon Offline
Hoopy frood

Registered: 12/01/04
Posts: 969
You're doing things the hard way. If you're trying to have a text file containing a bad word list, then read 40 times from disk to check 40 bad words, that will be very slow because of disk access. You'd be much better off putting your bad words into a hash table, then using $hfind, which lets you make 1 command which checks all bad words against all words in the message, all at the same time, without needing to loop 40 times to check 40 words.

However, to answer your question, you're doing the equivalent of:

Code:
//tokenize 32 word1 word2 word3 | if (word isin $1-) echo -a match $v1


which finds the string within the word, and gives you a match that you do not want.

If you want to find if the sentence contains your search word as a the entire word, you can use the string token functions.

Code:
//tokenize 32 word3 word1 word2 | if ($findtok($1-,word2,1,32)) echo -a match $gettok($1-,$v1,32)



However this would have a problem where it wouldn't match if the word touched a comma or other punctuation.

In addition to making all the compares in 1 command, $hfind lets you use regular expressions to find specific matches.

Code:
alias hfind_test {
  //tokenize 32 sentence containing planes, trains, and automobiles and other brain words
  echo -a message is $1-
  hfree -w test
  hadd -m test \brain\b rain
  hadd test \blanes?\b lane
  hadd test \bauto\b auto
  var %a $hfind(test,$1-,1,R)
  if (%a) echo -a first match: %a containing $hget(test,%a)
}


this creates a hash table with 3 search items, where the name of the item is a regular expression. The data is just an optional memo about the item. If you run this alias, it does not find any matches, because none of the regular expressions match the message.

The item named \brain\b does not match "brain" because the 'b' is part of the \b symbol which means "word boundary". However if you add a space to change "automobiles" into "auto mobiles", then it finds a match, because the \b at the front and back cause the match to happen only when 'auto' is a complete word.

Add a space between the "p" and "l" in "planes," creates a match because the "?" makes it match when there is 0-or-1 's' in the word, so that term would match either of the words lane or lanes.

Just like the "?" means something different in regex than in wildcard, the * also works differently in regex than in wildcard.

regex101.com is a good site for testing your regular expressions to see if they match what they should match, but also do not match things they should not match.

Update:

When you're reading from disk, you should prevent it from treating a number on line 1 as if it's the number of lines in the text file, and avoid evaluating any $identifier or %variable contained. $read(scripts\blockp.txt,%i) should have been $read(scripts\blockp.txt,nt,%i)

Also, when using regex, the default is to be case-sensitive, so the above example would not find Auto if the search term had all lower-case letters. There's also the problem where color codes can sometimes prevent matches that should happen. To avoid those, you can substitute

$hfind(test,$1-,1,R)
with
$hfind(test,$lower($strip($1-)),1,R)

Otherwise, you would need to change all your search terms, from \brain\b to /\brain\b/g


Edited by maroon (07/12/18 12:46 AM)

Top