|
Joined: Sep 2004
Posts: 10
Pikka bird
|
OP
Pikka bird
Joined: Sep 2004
Posts: 10 |
Hello all scripters out there. I have the following problem. I have text files from around 5 to 10 gb. I use /filter in the following syntax: "filter -cffgn %Email %EmailTempFile track\.php" to filter all lines with track.php in the line. This works like a charm and is really fast but I need the next line and the line after that. Here were my problem starts because at the moment I use the following method. First I filter all lines to a temporary file and then I go back with the line number to the original file to get the next 2 lines. This can take up to 15 minutes depending on processor power. I'm looking for faster solutions. Who has got an idea?
|
|
|
|
Joined: Jun 2007
Posts: 933
Hoopy frood
|
Hoopy frood
Joined: Jun 2007
Posts: 933 |
Maybe use... alias -l emailtemp {
;this gets the matching line number
var %n = $gettok($1,1,32)
write %EmailTempFile $read(%Email,$calc(%n +1))
write %EmailTempFile $read(%Email,$calc(%n +2))
}
filter -fkgn %Email emailtemp track\.php I hope the $read() won't slow it down too much. You will need to clear the output file before calling the filter command.
|
|
|
|
Joined: Jul 2006
Posts: 4,187
Hoopy frood
|
Hoopy frood
Joined: Jul 2006
Posts: 4,187 |
You don't need regex here and I think wildcard are faster, use *track.php* alias -l emailtemp {
var %n $gettok($1,1,32)
;next line are $read(%Email,$calc(%n +1)) and $read(%Email,$calc(%n +2))
} //filter -fkn %Email emailtemp *track.php*
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
Joined: Sep 2004
Posts: 10
Pikka bird
|
OP
Pikka bird
Joined: Sep 2004
Posts: 10 |
Begin at 10:27:00 End at 11:07:19 40 minutes, so this didn't speed up the process, sorry Maybe use... alias -l emailtemp {
;this gets the matching line number
var %n = $gettok($1,1,32)
write %EmailTempFile $read(%Email,$calc(%n +1))
write %EmailTempFile $read(%Email,$calc(%n +2))
}
filter -fkgn %Email emailtemp track\.php I hope the $read() won't slow it down too much. You will need to clear the output file before calling the filter command.
|
|
|
|
Joined: Sep 2004
Posts: 10
Pikka bird
|
OP
Pikka bird
Joined: Sep 2004
Posts: 10 |
Begin at 11:22:22 End at 11:48:58 26 minutes. So yes, wildcards are faster but still i like to have it done within a few minutes max. You don't need regex here and I think wildcard are faster, use *track.php* alias -l emailtemp {
var %n $gettok($1,1,32)
;next line are $read(%Email,$calc(%n +1)) and $read(%Email,$calc(%n +2))
} //filter -fkn %Email emailtemp *track.php*
|
|
|
|
Joined: Mar 2012
Posts: 38
Ameglian cow
|
Ameglian cow
Joined: Mar 2012
Posts: 38 |
try this;
/filter -fkn %Email trackfilt *track.php*
alias trackfilt {
var %ln = $gettok($1,1,32)
filter -ffr $+(%ln,-,$calc(%ln + 2)) %Email %EmailTempFile *
}
Lost in your digital reality. #mIRC / #Helpdesk on DALnet.
|
|
|
|
Joined: Sep 2004
Posts: 10
Pikka bird
|
OP
Pikka bird
Joined: Sep 2004
Posts: 10 |
Thanks for all the help guys, i took a bit of what was recommended and took a bit of my own and this is the result: alias do_now {
echo 4 -sn Begin at $asctime(HH:nn:ss)
fopen filestream %Email
filter -fkn %Email EmailTemp *track.php*
filter -fkn %Email EmailTemp *mailklik*
...
filter -fkn %Email EmailTemp *expreg*
fclose filestream
echo 4 -sn End at $asctime(HH:nn:ss)
}
alias -l EmailTemp {
; this gets the matching line number
var %n = $gettok($1,1,32)
fseek -l filestream %n
set %LinkInEmailBegin $fread(filestream)
fseek -l filestream $calc(%n +1)
set %LinkInEmailMiddle $fread(filestream)
fseek -l filestream $calc(%n +2)
set %LinkInEmailEnd $fread(filestream)
set %LinkInEmail %LinkInEmailBegin $+ %LinkInEmailMiddle $+ %LinkInEmailEnd
; here we modify the total link and write it to a file
...
}
Processing time is now reduced to about 4 minutes and I don't think it can be any faster than that but that is ok. Only one thing left is that it displays this in the status window: - * fseek set 'filestream' to line 74917 -I really don't need to see that. How to get rid of that?
|
|
|
|
Joined: Sep 2004
Posts: 10
Pikka bird
|
OP
Pikka bird
Joined: Sep 2004
Posts: 10 |
This seems to do it within 13 minutes, so it's faster but not yet as fast as my latest code. try this;
/filter -fkn %Email trackfilt *track.php*
alias trackfilt {
var %ln = $gettok($1,1,32)
filter -ffr $+(%ln,-,$calc(%ln + 2)) %Email %EmailTempFile *
}
|
|
|
|
Joined: Jul 2006
Posts: 4,187
Hoopy frood
|
Hoopy frood
Joined: Jul 2006
Posts: 4,187 |
You can hide the output of a command by using the dot prefix: /.fseek instead of /fseek
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
Joined: Mar 2012
Posts: 38
Ameglian cow
|
Ameglian cow
Joined: Mar 2012
Posts: 38 |
glad to see you got it working faster. =)
To silence commands just prefix the command with a '.', eg: '.fopen', '.fseek', '.fclose'
Lost in your digital reality. #mIRC / #Helpdesk on DALnet.
|
|
|
|
Joined: Sep 2004
Posts: 10
Pikka bird
|
OP
Pikka bird
Joined: Sep 2004
Posts: 10 |
thanks for the dot prefix, didn't know about that yet.
I created a new file of 10mb and it took 18 minutes with my new code.
I think I go back to the drawing table...
|
|
|
|
Joined: Sep 2004
Posts: 10
Pikka bird
|
OP
Pikka bird
Joined: Sep 2004
Posts: 10 |
Ok, my last piece of code should be ignored as it was doing weird stuff. This is my new and probably ultimate solution for handling big text files in mirc when the objective is to search for a keyword and get the next 2 lines. /test7 {
echo 6 -sn Begin test7 at $asctime(HH:nn:ss)
var %nbr.line 1
;first word to search for
set %word.to.search * $+ $read(%loc $+ wordlist.txt,%nbr.line) $+ *
while (%nbr.line <= $lines(%loc $+ wordlist.txt)) {
.fopen filestream D:\mIRC_email\kopie.txt
.fseek -w filestream %word.to.search
while (!$ferr && !$feof) {
set %LinkInEmailBegin $wildtok($fread(filestream),%word.to.search,1,62)
set %LinkInEmailMiddle $fread(filestream),$chr(39),$chr(34)
set %LinkInEmailEnd $fread(filestream),$chr(39),$chr(34)
set %LinkInEmail %LinkInEmailBegin $+ %LinkInEmailMiddle $+ %LinkInEmailEnd
;here I doing some modifying on the long link before writing it to a file
.fseek -w filestream %word.to.search
}
.fclose filestream
inc %nbr.line
;next word to search for
set %word.to.search * $+ $read(%loc $+ wordlist.txt,%nbr.line) $+ *
}
echo 6 -sn End test7 at $asctime(HH:nn:ss)
}
The file D:\mIRC_email\kopie.txt was 10Mb and the result was: Begin test7 at 18:44:55 End test7 at 18:46:12 So 1 minute and 17 seconds! and it created 2161 links. Now I'm a happy puppy
Last edited by macrobody; 02/05/12 12:28 PM.
|
|
|
|
|