Register Log In

Forums General Discussion Bad word in txt file

Print Thread

Bad word in txt file #264489 06/12/18 08:54 PM
Joined: Feb 2017 Posts: 24 D Doctor_Souza OP Ameglian cow
OP Doctor_Souza Ameglian cow D Joined: Feb 2017 Posts: 24	hi guys, how to comparate if one word from text exists in txt file? i used "while" for check line by line if this word exists in all text "if ($read(scripts\blockp.txt,%i) isin $1-) {" but if the word is part of another it considers that the word exists, and I wanted it not to consider the word if it is part of another only it alone Tankyou guys!!!

Re: Bad word in txt file Doctor_Souza #264491 06/12/18 10:56 PM
Joined: Jan 2004 Posts: 2,127 maroon Hoopy frood
maroon Hoopy frood Joined: Jan 2004 Posts: 2,127	You're doing things the hard way. If you're trying to have a text file containing a bad word list, then read 40 times from disk to check 40 bad words, that will be very slow because of disk access. You'd be much better off putting your bad words into a hash table, then using $hfind, which lets you make 1 command which checks all bad words against all words in the message, all at the same time, without needing to loop 40 times to check 40 words. However, to answer your question, you're doing the equivalent of: Code: //tokenize 32 word1 word2 word3 \| if (word isin $1-) echo -a match $v1 which finds the string within the word, and gives you a match that you do not want. If you want to find if the sentence contains your search word as a the entire word, you can use the string token functions. Code: //tokenize 32 word3 word1 word2 \| if ($findtok($1-,word2,1,32)) echo -a match $gettok($1-,$v1,32) However this would have a problem where it wouldn't match if the word touched a comma or other punctuation. In addition to making all the compares in 1 command, $hfind lets you use regular expressions to find specific matches. Code: alias hfind_test { //tokenize 32 sentence containing planes, trains, and automobiles and other brain words echo -a message is $1- hfree -w test hadd -m test \brain\b rain hadd test \blanes?\b lane hadd test \bauto\b auto var %a $hfind(test,$1-,1,R) if (%a) echo -a first match: %a containing $hget(test,%a) } this creates a hash table with 3 search items, where the name of the item is a regular expression. The data is just an optional memo about the item. If you run this alias, it does not find any matches, because none of the regular expressions match the message. The item named \brain\b does not match "brain" because the 'b' is part of the \b symbol which means "word boundary". However if you add a space to change "automobiles" into "auto mobiles", then it finds a match, because the \b at the front and back cause the match to happen only when 'auto' is a complete word. Add a space between the "p" and "l" in "planes," creates a match because the "?" makes it match when there is 0-or-1 's' in the word, so that term would match either of the words lane or lanes. Just like the "?" means something different in regex than in wildcard, the * also works differently in regex than in wildcard. regex101.com is a good site for testing your regular expressions to see if they match what they should match, but also do not match things they should not match. Update: When you're reading from disk, you should prevent it from treating a number on line 1 as if it's the number of lines in the text file, and avoid evaluating any $identifier or %variable contained. $read(scripts\blockp.txt,%i) should have been $read(scripts\blockp.txt,nt,%i) Also, when using regex, the default is to be case-sensitive, so the above example would not find Auto if the search term had all lower-case letters. There's also the problem where color codes can sometimes prevent matches that should happen. To avoid those, you can substitute $hfind(test,$1-,1,R) with $hfind(test,$lower($strip($1-)),1,R) Otherwise, you would need to change all your search terms, from \brain\b to /\brain\b/g Last edited by maroon; 07/12/18 12:46 AM.

Link Copied to Clipboard