mIRC Home    About    Download    Register    News    Help

Print Thread
#237051 12/04/12 02:40 PM
Joined: Sep 2004
Posts: 10
M
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2004
Posts: 10
Hello all scripters out there. I have the following problem.
I have text files from around 5 to 10 gb. I use /filter in the following syntax: "filter -cffgn %Email %EmailTempFile track\.php" to filter all lines with track.php in the line. This works like a charm and is really fast but I need the next line and the line after that. Here were my problem starts because at the moment I use the following method. First I filter all lines to a temporary file and then I go back with the line number to the original file to get the next 2 lines. This can take up to 15 minutes depending on processor power. I'm looking for faster solutions. Who has got an idea?

Joined: Jun 2007
Posts: 933
5
Hoopy frood
Offline
Hoopy frood
5
Joined: Jun 2007
Posts: 933
Maybe use...
Code:
alias -l emailtemp {
  ;this gets the matching line number
  var %n = $gettok($1,1,32)
  write %EmailTempFile $read(%Email,$calc(%n +1))
  write %EmailTempFile $read(%Email,$calc(%n +2))
}
filter -fkgn %Email emailtemp track\.php

I hope the $read() won't slow it down too much.
You will need to clear the output file before calling the filter command.

Joined: Jul 2006
Posts: 4,159
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,159
You don't need regex here and I think wildcard are faster, use *track.php*

Code:
alias -l emailtemp {
  var %n $gettok($1,1,32)
  ;next line are $read(%Email,$calc(%n +1)) and $read(%Email,$calc(%n +2))
}
//filter -fkn %Email emailtemp *track.php*


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
5618 #237109 14/04/12 04:18 AM
Joined: Sep 2004
Posts: 10
M
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2004
Posts: 10
Begin at 10:27:00
End at 11:07:19

40 minutes, so this didn't speed up the process, sorry

Originally Posted By: 5618
Maybe use...
Code:
alias -l emailtemp {
  ;this gets the matching line number
  var %n = $gettok($1,1,32)
  write %EmailTempFile $read(%Email,$calc(%n +1))
  write %EmailTempFile $read(%Email,$calc(%n +2))
}
filter -fkgn %Email emailtemp track\.php

I hope the $read() won't slow it down too much.
You will need to clear the output file before calling the filter command.

Wims #237110 14/04/12 04:52 AM
Joined: Sep 2004
Posts: 10
M
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2004
Posts: 10
Begin at 11:22:22
End at 11:48:58

26 minutes. So yes, wildcards are faster but still i like to have it done within a few minutes max.


Originally Posted By: Wims
You don't need regex here and I think wildcard are faster, use *track.php*

Code:
alias -l emailtemp {
  var %n $gettok($1,1,32)
  ;next line are $read(%Email,$calc(%n +1)) and $read(%Email,$calc(%n +2))
}
//filter -fkn %Email emailtemp *track.php*

Joined: Mar 2012
Posts: 38
Ameglian cow
Offline
Ameglian cow
Joined: Mar 2012
Posts: 38
try this;
Code:
/filter -fkn %Email trackfilt *track.php*

alias trackfilt {
  var %ln = $gettok($1,1,32)
  filter -ffr  $+(%ln,-,$calc(%ln + 2)) %Email %EmailTempFile *
}


Lost in your digital reality.
#mIRC / #Helpdesk on DALnet.
Joined: Sep 2004
Posts: 10
M
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2004
Posts: 10
Thanks for all the help guys, i took a bit of what was recommended and took a bit of my own and this is the result:

Code:
alias do_now {
  echo 4 -sn Begin at $asctime(HH:nn:ss)
  fopen filestream %Email
  filter -fkn %Email EmailTemp *track.php*
  filter -fkn %Email EmailTemp *mailklik*
  ...
  filter -fkn %Email EmailTemp *expreg*
  fclose filestream
  echo 4 -sn End at $asctime(HH:nn:ss)
}
alias -l EmailTemp {
  ; this gets the matching line number
  var %n = $gettok($1,1,32)
  fseek -l filestream %n
  set %LinkInEmailBegin $fread(filestream)
  fseek -l filestream $calc(%n +1)
  set %LinkInEmailMiddle $fread(filestream)
  fseek -l filestream $calc(%n +2)
  set %LinkInEmailEnd $fread(filestream)
  set %LinkInEmail %LinkInEmailBegin $+ %LinkInEmailMiddle $+ %LinkInEmailEnd
  ; here we modify the total link and write it to a file
  ...
}

Processing time is now reduced to about 4 minutes and I don't think it can be any faster than that but that is ok.

Only one thing left is that it displays this in the status window:
-
* fseek set 'filestream' to line 74917
-

I really don't need to see that. How to get rid of that?

Joined: Sep 2004
Posts: 10
M
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2004
Posts: 10
This seems to do it within 13 minutes, so it's faster but not yet as fast as my latest code.

Originally Posted By: Twitch
try this;
Code:
/filter -fkn %Email trackfilt *track.php*

alias trackfilt {
  var %ln = $gettok($1,1,32)
  filter -ffr  $+(%ln,-,$calc(%ln + 2)) %Email %EmailTempFile *
}

Joined: Jul 2006
Posts: 4,159
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,159
You can hide the output of a command by using the dot prefix:
/.fseek instead of /fseek


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Mar 2012
Posts: 38
Ameglian cow
Offline
Ameglian cow
Joined: Mar 2012
Posts: 38
glad to see you got it working faster. =)

To silence commands just prefix the command with a '.', eg: '.fopen', '.fseek', '.fclose'


Lost in your digital reality.
#mIRC / #Helpdesk on DALnet.
Joined: Sep 2004
Posts: 10
M
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2004
Posts: 10
thanks for the dot prefix, didn't know about that yet.

I created a new file of 10mb and it took 18 minutes with my new code.

I think I go back to the drawing table...

Joined: Sep 2004
Posts: 10
M
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2004
Posts: 10
Ok, my last piece of code should be ignored as it was doing weird stuff.
This is my new and probably ultimate solution for handling big text files in mirc when the objective is to search for a keyword and get the next 2 lines.

Code:
/test7 {
  echo 6 -sn Begin test7 at $asctime(HH:nn:ss)
  var %nbr.line 1
  ;first word to search for
  set %word.to.search * $+ $read(%loc $+ wordlist.txt,%nbr.line) $+ *
  while (%nbr.line <= $lines(%loc $+ wordlist.txt)) {
    .fopen filestream D:\mIRC_email\kopie.txt
    .fseek -w filestream %word.to.search
    while (!$ferr && !$feof) {
      set %LinkInEmailBegin $wildtok($fread(filestream),%word.to.search,1,62)
      set %LinkInEmailMiddle $fread(filestream),$chr(39),$chr(34)
      set %LinkInEmailEnd $fread(filestream),$chr(39),$chr(34)
      set %LinkInEmail %LinkInEmailBegin $+ %LinkInEmailMiddle $+ %LinkInEmailEnd
      ;here I doing some modifying on the long link before writing it to a file
      .fseek -w filestream %word.to.search
    }
    .fclose filestream
    inc %nbr.line
    ;next word to search for
    set %word.to.search * $+ $read(%loc $+ wordlist.txt,%nbr.line) $+ *
  }
  echo 6 -sn End test7 at $asctime(HH:nn:ss)
}


The file D:\mIRC_email\kopie.txt was 10Mb and the result was:

Begin test7 at 18:44:55
End test7 at 18:46:12

So 1 minute and 17 seconds! and it created 2161 links.

Now I'm a happy puppy smile

Last edited by macrobody; 02/05/12 12:28 PM.

Link Copied to Clipboard