mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
Is there a better and faster way of doing this. This snippet crashes if there are too many lines in the specifies text file.

Code:
alias read {
  if (!$isfile($$1 $+ .txt)) { echo -a error: file $$1 $+ . txt not found. | return }
  echo -a $timestamp Counting words of $$1 $+ .txt
  var %counter = 1, %wordcount
  while ($read($$1 $+ .txt,%counter)) {
    inc %wordcount $numtok($read($$1 $+ .txt,%counter),32)
    inc %counter
  }
  echo -a $timestamp The file $$1 $+ .txt has %wordcount words in it.
}

Joined: Jan 2007
Posts: 259
K
Fjord artisan
Offline
Fjord artisan
K
Joined: Jan 2007
Posts: 259
You could use /fopen and $fread.
/help File Handling
You should remember to use /fclose at the end, as your snippet will most likely fail if you leave the handle open.


Those who can, cannot. Those who cannot, can.
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
/filter with the -k switch will be the fastest and easiest method (no looping necessary).

Code:
alias read { 
  if (!$isfile($1.txt)) { 
    echo -a error: file $1.txt not found.
    return
  }
  set -u %wordcount 0
  echo -at Counting words of $1.txt
  filter -fk $1.txt _count
  echo -at The file $1.txt has %wordcount words in it.
}
alias -l _count { inc %wordcount $numtok($1,32) }


I ran this on the complete versions.txt file from this website (documents all changes in mIRC since the 1st alpha release), and it took a second:

Quote:

(19:23:15) Counting words of versions.txt
(19:23:16) The file versions.txt has 76702 words in it.


It's also worth noting that the $N identifiers can have text added to them without the need for $+

Edit:

Kardafol,

It won't fail if you don't use /fclose but it will leave open an unnecessary file handle, which hogs resources. Without error checking, the snippet will give an error about the file handle with the given name being open already when used again as well. Unless a dynamic handle name is used.

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Here's a fast method:
Code:
alias wordsinfile {
  tokenize 62 $shortfn($1)
  bread -c $1 0 $file($1) &a
  breplace &a 32 10
  bwrite $1.temp 0 -1 &a
  filter -ffx $1.temp nul $crlf
  .remove $1.temp
  $iif($isid,return,echo -at) $filtered
}
What this does is turn spaces into newlines and store the result in a temporary file. The lines of that file are then counted, excluding blank ones.

Edit: hixxy posted in the meantime, but it's always good to have options

Last edited by qwerty; 13/03/07 07:32 PM.

/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Jan 2007
Posts: 259
K
Fjord artisan
Offline
Fjord artisan
K
Joined: Jan 2007
Posts: 259
Thats what I meant. You would sometimes forget to close the handle, then when it runs again, you haven't implemented a if ($fopen(handle) { .fclose handle }.


Those who can, cannot. Those who cannot, can.
Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
Thank you fellas, fantastic solutions. querty your script returns different results and for a really big file i get an error /bread: invalid parameters (line 147, aliases.ini)

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Interesting observations, do you have a small example file handy on which my alias gives a different word count? I tried with the full versions.txt and got the same number as hixxy's alias.

I'm also unsure about the /bread error. How big was the file that gave that error? If you put an /echo in front of the /bread command in my alias and test with that file, what does mirc print?

Edit: I just thought of a reason for a different word count: if the file contains very long lines, mirc's text reading routines seem to split them in lines of around 1000 chars at most. If the split happens in the middle of a word, mirc will report more words than there actually are, so in such cases, my alias may in fact give more accurate results as it uses a different method (it still won't be accurate if there is a single "word" of length >1000 in the file).

Last edited by qwerty; 14/03/07 12:37 AM.

/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
qwerty the file i have has tens of thousands of lines in it. when i add the echo infront of bread i get
bread -c C:\Program Files\mIRC\version 0 &a
-
* /breplace: invalid parameters (line 129, aliases.ini)
-

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
How are you using my alias, as a /command or as a custom $identifier? If it's the former, I would assume the error is caused because I used $1 instead of $1- in $shortfn() (the alias was originally written as a custom identifier), but that doesn't explain the fact that it echoes "C:\Program Files\...", ie that there's a space in there. You can try changing this part:
tokenize 62 $shortfn($1)
to this:
tokenize 62 $shortfn($1-)
but I suspect the problem is elsewhere (not related to my alias). Are you perhaps using it as an identifier and with a filename that contains a comma? Can you paste the full path of that file? (including the filename itself).


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
qwerty, i followed your suggestion and now have no problems. I have never scripted mIRC before so i maybe making a fool of myself. I tried to make an alias that would count the chars in a text file and came with two suggestions. I was hoping you would find something faster.

Here is the first attempt.

Code:
alias charcount {
  if (!$isfile($$1.txt)) return
  echo -at counting characters of $$1
  var %counter = 1, %chars
  while (%counter <= $lines($1.txt)) {
    inc %chars $len($remove($read($1.txt,%counter),$chr(32)))
    inc %counter
  }
  echo -at File $$1.txt has %chars characters in it.
}


Due to my previous experince i realized that the mIRC while loops can be a messy solution, also i was so amazed at the almost instant speed of the /filter i looked it up and saw how the -k switch as hixxy showed calls upon an alias as the output, i tried something else following hixxy's previous suggestion

Code:
alias charcount1 {
  if (!$isfile($$1.txt)) return
  set -u %charcount 0
  echo -at counting characters of $1.txt
  filter -fk $1.txt char
  echo -at The file $1.txt has %charcount characters in it.
}
alias -l char inc %charcount $len($remove($1,$chr(32)))


I was wondering if you or anyone else could perfect it, with an attached explanation.
Thanks

Perhaps an interesting error when using the first script for a file the size of versions i got.

Invalid parameters: $regsubex (line 128, aliases.ini)

edit: i wonder if it may have to be with the fact that $regsubex is mentined in the file versions.txt.

i also noticed when changing the first script ever so slightly to...

Code:
alias charcount {
  if (!$isfile($$1.txt)) return
  echo -at counting characters of $$1
  var %counter = 1, %chars
  while [b]($read($1.txt,%counter))[/b] {
    inc %chars $len($remove($ifmatch,$chr(32)))
    inc %counter
  }
  echo -at File $$1.txt has %chars characters in it.
}


it only counted 20 chars and stopped, as there is a blank line after the first line. It seems that $read is not true for a blank line even though it is a line of a larger file.

i also noticed that using a while loop on a file with 1153 lines gave a slighly different result when using the /filter. I wonder if a loop loops to many times it loses count.

Regarding the wordcount i also came across another observation, i used three different scripts on the same file and i got different counts.

test1 using

Code:
alias wordcount {
  if (!$isfile($$1.txt)) { echo -a error: file $$1 $+ . txt not found. | return }
  echo -at Counting words of $$1 $+ .txt
  var %counter = 1, %wordcount
  while (%counter <= $lines($1.txt)) {
    inc %wordcount $numtok($read($$1.txt,%counter),32)
    inc %counter
  }
  echo -at The file $$1.txt has %wordcount words in it.
}


result:
[01:41:23] Counting words of test.txt
[01:41:30] The file test.txt has 51409 words in it.

addmitedly very slow method

Test 2 using hixxy's method...
Code:
alias wordcount1 { 
  if (!$isfile($1.txt)) { 
    echo -a error: file $1.txt not found.
    return
  }
  set -u %wordcount 0
  echo -at Counting words of $1.txt
  filter -fk $1.txt _count
  echo -at The file $1.txt has %wordcount words in it.
}
alias -l _count { inc %wordcount $numtok($1,32) }


Result
[01:44:54] Counting words of test.txt
[01:44:54] The file test.txt has 51049 words in it.

notice how i get a different number

Test 3 using qwertys method...

Code:
alias wordsinfile {
  tokenize 62 $shortfn($1-)
  bread -c $1 0 $file($1) &a
  breplace &a 32 10
  bwrite $1.temp 0 -1 &a
  filter -ffx $1.temp nul $crlf
  .remove $1.temp
  $iif($isid,return,echo -at) $filtered
}


result
[01:47:30] 51409

the fastest but i get the same number as the first script.

Why do i get a different number using hixxys method??

Last edited by Solo1; 15/03/07 01:50 AM.
Joined: Jan 2003
Posts: 1,063
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Jan 2003
Posts: 1,063
Originally Posted By: qwerty

Edit: I just thought of a reason for a different word count: if the file contains very long lines, mirc's text reading routines seem to split them in lines of around 1000 chars at most. If the split happens in the middle of a word, mirc will report more words than there actually are, so in such cases, my alias may in fact give more accurate results as it uses a different method (it still won't be accurate if there is a single "word" of length >1000 in the file).


If it ain't broken, don't fix it!
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Quote:
Invalid parameters: $regsubex (line 128, aliases.ini)

edit: i wonder if it may have to be with the fact that $regsubex is mentined in the file versions.txt.

Exactly. $read(file.txt,5) (for example) returns the 5th line in a file, but after evaluating its contents. Thus any variables/identifiers in the line will be evaluated. To prevent that, you can use the n switch: $read(file.txt,n,5).

Quote:
it only counted 20 chars and stopped, as there is a blank line after the first line. It seems that $read is not true for a blank line even though it is a line of a larger file.

Correct. An empty string (ie $null) in an if/while condition in mirc is considered FALSE. One way around that for this case is what you already did before with while (%i <= $lines($1.txt)). Another would be to also check $readn, which returns the line number of the last $read() call. Even if the line was blank and $read() returned $null, $readn would be filled with the non-zero line number, so you could check that.

Quote:
i also noticed that using a while loop on a file with 1153 lines gave a slighly different result when using the /filter. I wonder if a loop loops to many times it loses count.

No, loops loop exactly as many times as you tell them smile

Regarding the different word count in hixxy's alias, I can't see why that happens in this case. The theory of long lines doesn't seem to apply here because $read() should agree with hixxy's alias (not with mine) and both should be giving a larger word count than mine.

About the slow performance of the $read() approach, this is happening for two reasons:

1. Each $read(file.txt,N) call makes mirc
  • Open the file
  • Scan through the file starting from the beginning until it finds the Nth line
  • Close the file
The second step above is what takes the most time of course. Looping through a 5-line file would make mirc consider 1+2+3+4+5 = 15 lines. For an N-line file in general, mirc considers N(N+1)/2 lines, so thousands become millions.

2. Having $lines() inside the while condition makes mirc evaluate $lines() over and over again. As mirc again has to go through all lines to find how many there are, this can get slow. To get around that, simply calculate the number of lines once and store the result, ie instead of this:
Code:
  var %counter = 1, %wordcount
  while (%counter <= $lines($1.txt))
you could do this:
Code:
  var %counter = 1, %wordcount, %lines = $lines($1.txt)
  while (%counter <= %lines)


hixxy's approach with /filter is much faster because /filter opens the file once, goes through it line by line (each line is considered only once), then closes the file.

My method is even faster because no external alias is called for each line: all counting is done "internally". Even though it may look slower (as a temporary file is written and then deleted), the time saved by avoiding repeated alias calls is greater than the one spent in writing the temp file. Generally, keep in mind that mirc's internal routines are extremely fast ( mirc is written in C++) but its scripting routines are much slower. So whenever you have a choice between using a single mirc command for something and using a bunch of aliases or a while loop, choose the former.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
Thank-You qwerty and hixxy. Alot of knowledge gained about this scripting language.


Link Copied to Clipboard