mIRC Home    About    Download    Register    News    Help

Print Thread
#237051 12/04/12 02:40 PM
M
macrobody
macrobody
M
Hello all scripters out there. I have the following problem.
I have text files from around 5 to 10 gb. I use /filter in the following syntax: "filter -cffgn %Email %EmailTempFile track\.php" to filter all lines with track.php in the line. This works like a charm and is really fast but I need the next line and the line after that. Here were my problem starts because at the moment I use the following method. First I filter all lines to a temporary file and then I go back with the line number to the original file to get the next 2 lines. This can take up to 15 minutes depending on processor power. I'm looking for faster solutions. Who has got an idea?

#237053 12/04/12 02:57 PM
5
5618
5618
5
Maybe use...
Code:
alias -l emailtemp {
  ;this gets the matching line number
  var %n = $gettok($1,1,32)
  write %EmailTempFile $read(%Email,$calc(%n +1))
  write %EmailTempFile $read(%Email,$calc(%n +2))
}
filter -fkgn %Email emailtemp track\.php

I hope the $read() won't slow it down too much.
You will need to clear the output file before calling the filter command.

#237055 12/04/12 03:22 PM
Joined: Jul 2006
Posts: 4,020
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,020
You don't need regex here and I think wildcard are faster, use *track.php*

Code:
alias -l emailtemp {
  var %n $gettok($1,1,32)
  ;next line are $read(%Email,$calc(%n +1)) and $read(%Email,$calc(%n +2))
}
//filter -fkn %Email emailtemp *track.php*


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
#237109 14/04/12 04:18 AM
M
macrobody
macrobody
M
Begin at 10:27:00
End at 11:07:19

40 minutes, so this didn't speed up the process, sorry

Originally Posted By: 5618
Maybe use...
Code:
alias -l emailtemp {
  ;this gets the matching line number
  var %n = $gettok($1,1,32)
  write %EmailTempFile $read(%Email,$calc(%n +1))
  write %EmailTempFile $read(%Email,$calc(%n +2))
}
filter -fkgn %Email emailtemp track\.php

I hope the $read() won't slow it down too much.
You will need to clear the output file before calling the filter command.

Wims #237110 14/04/12 04:52 AM
M
macrobody
macrobody
M
Begin at 11:22:22
End at 11:48:58

26 minutes. So yes, wildcards are faster but still i like to have it done within a few minutes max.


Originally Posted By: Wims
You don't need regex here and I think wildcard are faster, use *track.php*

Code:
alias -l emailtemp {
  var %n $gettok($1,1,32)
  ;next line are $read(%Email,$calc(%n +1)) and $read(%Email,$calc(%n +2))
}
//filter -fkn %Email emailtemp *track.php*

#237111 14/04/12 04:59 AM
T
Twitch
Twitch
T
try this;
Code:
/filter -fkn %Email trackfilt *track.php*

alias trackfilt {
  var %ln = $gettok($1,1,32)
  filter -ffr  $+(%ln,-,$calc(%ln + 2)) %Email %EmailTempFile *
}

#237113 14/04/12 08:40 AM
M
macrobody
macrobody
M
Thanks for all the help guys, i took a bit of what was recommended and took a bit of my own and this is the result:

Code:
alias do_now {
  echo 4 -sn Begin at $asctime(HH:nn:ss)
  fopen filestream %Email
  filter -fkn %Email EmailTemp *track.php*
  filter -fkn %Email EmailTemp *mailklik*
  ...
  filter -fkn %Email EmailTemp *expreg*
  fclose filestream
  echo 4 -sn End at $asctime(HH:nn:ss)
}
alias -l EmailTemp {
  ; this gets the matching line number
  var %n = $gettok($1,1,32)
  fseek -l filestream %n
  set %LinkInEmailBegin $fread(filestream)
  fseek -l filestream $calc(%n +1)
  set %LinkInEmailMiddle $fread(filestream)
  fseek -l filestream $calc(%n +2)
  set %LinkInEmailEnd $fread(filestream)
  set %LinkInEmail %LinkInEmailBegin $+ %LinkInEmailMiddle $+ %LinkInEmailEnd
  ; here we modify the total link and write it to a file
  ...
}

Processing time is now reduced to about 4 minutes and I don't think it can be any faster than that but that is ok.

Only one thing left is that it displays this in the status window:
-
* fseek set 'filestream' to line 74917
-

I really don't need to see that. How to get rid of that?

#237114 14/04/12 09:06 AM
M
macrobody
macrobody
M
This seems to do it within 13 minutes, so it's faster but not yet as fast as my latest code.

Originally Posted By: Twitch
try this;
Code:
/filter -fkn %Email trackfilt *track.php*

alias trackfilt {
  var %ln = $gettok($1,1,32)
  filter -ffr  $+(%ln,-,$calc(%ln + 2)) %Email %EmailTempFile *
}

#237116 14/04/12 02:07 PM
Joined: Jul 2006
Posts: 4,020
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,020
You can hide the output of a command by using the dot prefix:
/.fseek instead of /fseek


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
#237117 14/04/12 02:10 PM
T
Twitch
Twitch
T
glad to see you got it working faster. =)

To silence commands just prefix the command with a '.', eg: '.fopen', '.fseek', '.fclose'

#237182 16/04/12 02:27 PM
M
macrobody
macrobody
M
thanks for the dot prefix, didn't know about that yet.

I created a new file of 10mb and it took 18 minutes with my new code.

I think I go back to the drawing table...

#237412 02/05/12 12:25 PM
M
macrobody
macrobody
M
Ok, my last piece of code should be ignored as it was doing weird stuff.
This is my new and probably ultimate solution for handling big text files in mirc when the objective is to search for a keyword and get the next 2 lines.

Code:
/test7 {
  echo 6 -sn Begin test7 at $asctime(HH:nn:ss)
  var %nbr.line 1
  ;first word to search for
  set %word.to.search * $+ $read(%loc $+ wordlist.txt,%nbr.line) $+ *
  while (%nbr.line <= $lines(%loc $+ wordlist.txt)) {
    .fopen filestream D:\mIRC_email\kopie.txt
    .fseek -w filestream %word.to.search
    while (!$ferr && !$feof) {
      set %LinkInEmailBegin $wildtok($fread(filestream),%word.to.search,1,62)
      set %LinkInEmailMiddle $fread(filestream),$chr(39),$chr(34)
      set %LinkInEmailEnd $fread(filestream),$chr(39),$chr(34)
      set %LinkInEmail %LinkInEmailBegin $+ %LinkInEmailMiddle $+ %LinkInEmailEnd
      ;here I doing some modifying on the long link before writing it to a file
      .fseek -w filestream %word.to.search
    }
    .fclose filestream
    inc %nbr.line
    ;next word to search for
    set %word.to.search * $+ $read(%loc $+ wordlist.txt,%nbr.line) $+ *
  }
  echo 6 -sn End test7 at $asctime(HH:nn:ss)
}


The file D:\mIRC_email\kopie.txt was 10Mb and the result was:

Begin test7 at 18:44:55
End test7 at 18:46:12

So 1 minute and 17 seconds! and it created 2161 links.

Now I'm a happy puppy smile

Last edited by macrobody; 02/05/12 12:28 PM.

Link Copied to Clipboard