|
shawnrgr
|
shawnrgr
|
im working on a good "findline" alias. i've found a few but seem to be sloooow. [ example: $findline(file.txt, *string*, 2) ] some use /filter to dump all the matches in a temp window. and some use /fseek and the other /f... functions. which would be faster to find a line in a file? /fseek or /filter? and if /fseek would probably faster, is it bad to constantly fopen\fclose a file? im talking probably like 50+ times a minute
|
|
|
|
Joined: Dec 2002
Posts: 2,884
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 2,884 |
/f* will be much faster than /filter, and no it's not a good idea to continuously open/close files constantly like that. If you're accessing it that often you should simply load it into a hash table and save yourself a whole load of trouble.
|
|
|
|
shawnrgr
|
shawnrgr
|
yea but the file is contantly updated, like probably at least 2 times every 10 seconds.so wouldn't i have to reload the file anyway?
|
|
|
|
Joined: Dec 2002
Posts: 2,884
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 2,884 |
Well if the file is updated by mIRC scripts then you can just edit them to update the hash tables instead. If it's some other program then you're stuck with /f* I'm afraid.
|
|
|
|
Joined: Jan 2003
Posts: 2,125
Hoopy frood
|
Hoopy frood
Joined: Jan 2003
Posts: 2,125 |
Well I'd expect /f* to be faster than /filter-ing to a hidden window but not much faster. In both cases the file is opened/closed once and all searches are done in memory.
The main difference that makes /f* faster is that /filter reads the entire file and dumps those lines that match $2 in the window, whereas /f* commands don't need to scan till the end of file; only till the Nth matching line. But this speed difference becomes smaller as N in $findfile(file,*wildstring*,N) grows. For N = <total number of matches> the difference is almost negligible: /fseek would have to scan the entire file, so it would essentially read the same amount of data as /filter.
However, no matter what you, me etc think is faster, the best way to make sure is to benchmark, and that's what the original poster should do.
|
|
|
|
Joined: Dec 2002
Posts: 2,884
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 2,884 |
Well presumably if the alias is used enough the average number of lines that need to be read on any given file should average out to half the total number of lines. So ultimately /f* would be roughly twice as fast over a large number of uses, and even on an individual search which means the returned line is the last in the file it should be at least a teeny tiny bit faster because with /filter you must also iterate over the window afterwards.
As you said though, benchmarks should be made of both methods regardless of theory.
|
|
|
|
shawnrgr
|
shawnrgr
|
i started puting it togeather. can you guys walk me in an ok direction or just let me know if im going about this the right way? here is what i have so far
;gline "get line" $gline(file,*string*,N) (if n=0 return total)
gline {
var %file = $1, %string = $2, %num = $3, %tot = 0
if ($fopen(ff).fname) { .fclose ff } | .fopen ff %file
if (%num == 0) {
while !$feof {
.fseek -w ff %string
if $fread(ff) { inc %tot }
}
.fclose(ff)
if (%tot => 1) { return %tot }
else { return $null }
}
}
|
|
|
|
Joined: Dec 2002
Posts: 2,884
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 2,884 |
Well I haven't used the /f* commands that much, so maybe I'm doing something stupid here, but I found the /filter equivalent to be much faster than /f* (and a hell of a lot more complex too). ;gline "get line" $gline(file,*string*,N) (if n=0 return total)
gline {
var %file = $1, %string = $2, %num = $int($3), %r = 0
if ($fopen(ff)) .fclose ff
.fopen ff %file
if ($ferr) return $null
if (%num == 0) {
while !$feof {
.fseek -w ff %string
if ($fread(ff)) inc %r
}
}
else {
var %line
while !$feof {
.fseek -w ff %string
if ($fread(ff)) {
inc %r
%line = $ifmatch
if %num == %r {
var %r = %line
break
}
}
}
}
.fclose ff
return %r
}
gfline {
var %file = $1, %string = $2, %num = $int($3)
window -h @gfline_find
filter -fwc %file @gfline_find %string
var %r = $line(@gfline_find, %num)
window -c @gfline_find
return %r
} $gfline(), which uses /filter, works out over twice as fast in my benchmark (56ms vs. 122ms). Although it did cause some minor screen flicker which could get very annoying if it's being used a lot. Edit: Missed a pair of braces
Last edited by starbucks_mafia; 06/03/04 12:18 AM.
|
|
|
|
shawnrgr
|
shawnrgr
|
well. i did a bunch of benchmarks. the result????? /filter has proven to be much faster then /fseek. here was my echo:
- filter : found 3003 : 40ms - seek : found 3003 : 520ms -
|
|
|
|
Joined: Dec 2002
Posts: 2,884
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 2,884 |
Just goes to show that things should always be benchmarked instead of relying on logic to decide which method to use. The parsing of the while loop just can't match /filter's internal system even when it's only operating on a fraction of the file.
|
|
|
|
Joined: Jan 2003
Posts: 2,125
Hoopy frood
|
Hoopy frood
Joined: Jan 2003
Posts: 2,125 |
Agreed. I guess it was all about how much is "much"  Another thing I had in mind but somehow forgot to include in my previous post is the handling of N = 0. For this functionality, the /f* would have to scan the entire file and increment a variable in the while loop etc. This could even be slower than a filter -ff file.txt nul *matchtext* | return $filtered. ...because with /filter you must also iterate over the window afterwardsActually you don't. You can do something like: filter -fw file.txt @hidden *matchtext* | return $line(@hidden,$3)
|
|
|
|
Joined: Jan 2003
Posts: 2,125
Hoopy frood
|
Hoopy frood
Joined: Jan 2003
Posts: 2,125 |
These apparently contradictory results teach an important lesson in benchmarking: the relative performance of two or more routines greatly depends on the type of input. The aliases you checked would most probably show different results when applied to files of different length and using different matchtext. By observing the results, you can often construct models good enough to explain those results, but it's usually quite hard to make such good models beforehand, ie before the results of the benchmarks. For example, $gline(versions-full.txt,*join*,0) is faster than $gfline(versions-full.txt,*join*,0) by about 10% here (versions-full.txt is the full versions.txt, which is 446 KB)
Last edited by qwerty; 06/03/04 01:09 AM.
|
|
|
|
shawnrgr
|
shawnrgr
|
yea. its obviously good to benchmark. thanks for the help guys. one more thing. since i OBVIOUSLY have alot of info in this file (which is an ini by the way...) i thought i saw somewhere about "flushing" an ini frequently becuase ini's are stored in memory. (i think its /flushini) is this true? and if so, would it help increase performance if i was writing to this ini at an avg 30-50 times a minute?
|
|
|
|
|