mIRC Home    About    Download    Register    News    Help

Print Thread
#209535 17/02/09 12:19 AM
Joined: Feb 2007
Posts: 234
M
MTec007 Offline OP
Fjord artisan
OP Offline
Fjord artisan
M
Joined: Feb 2007
Posts: 234
what would the regex be to do the following?

replace everything from the start of the string up to and including 'SERVER:' with nothing and everything after and including ':6667' with nothing

n0=something: Random serverSERVER:irc.someone.org:6667GROUP:#somewhere

results would be irc.someone.org

MTec007 #209536 17/02/09 12:41 AM
Joined: Jul 2006
Posts: 4,172
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,172
Replace what you don't want with $null is more complicated than just take what you want, try :
Code:
//noop $regex(%a,/SERVER:(.+\..+\..+):\d+/) | echo -a > $regml(1)
I'm not sure if SERVER is something replaced by a server name or if it's plain text.

Last edited by Wims; 17/02/09 12:53 AM.

#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Wims #209540 17/02/09 01:08 AM
Joined: Feb 2007
Posts: 234
M
MTec007 Offline OP
Fjord artisan
OP Offline
Fjord artisan
M
Joined: Feb 2007
Posts: 234
that worked great! what is the best way to remove all duplicate lines in a file?

MTec007 #209544 17/02/09 01:40 AM
Joined: Aug 2004
Posts: 7,252
R
Hoopy frood
Offline
Hoopy frood
R
Joined: Aug 2004
Posts: 7,252
With an ini file, you can't have the exact same line more than once in each section, and you can't have the same section name more than once.

With a text file, the easiest way is to use the /filter command

RusselB #209546 17/02/09 02:44 AM
Joined: Feb 2007
Posts: 234
M
MTec007 Offline OP
Fjord artisan
OP Offline
Fjord artisan
M
Joined: Feb 2007
Posts: 234
it was an ini file but im converting it to a text file.

im not ure how to use /filter

i tried /filter -x %file %file * but it didnt do anything that i can notice.

MTec007 #209552 17/02/09 01:48 PM
Joined: Jul 2006
Posts: 4,172
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,172
Quote:
With a text file, the easiest way is to use the /filter command
How ? filter isn't much powerful to match duplicate line.

Why did you have duplicate line ? You should be able to avoid this before writting to your text file.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Wims #209569 17/02/09 11:07 PM
Joined: Feb 2007
Posts: 234
M
MTec007 Offline OP
Fjord artisan
OP Offline
Fjord artisan
M
Joined: Feb 2007
Posts: 234
i didnt write the file, simply downloaded it. but i need to remove duplicate lines

MTec007 #209570 18/02/09 12:00 AM
Joined: Jul 2006
Posts: 4,172
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,172
To remove duplicate line, you can use filter, but you have to use another alias and another text file :
Code:
alias rm_dup_lines filter -k "yourfile" call_back * | filter -c "tempfile.txt" "yourfile.txt" * | .remove "tempfile.txt"
alias call_back if ($!read("tempfile.txt",w,$1-*)) write "tempfile.txt" $1- 
Another solution is to use a @window, /aline -n and /savebuf, but in fact RusselB was right, filter is certainly better.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Wims #209588 18/02/09 02:53 PM
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Why is this method better than /aline -n? The main use of /filter here is just to loop through the lines in the file. /filter could still be used with /aline -n for the same purpose. The actual differences here are:
  • instead of using a hidden window as workspace, you are using a file (tempfile.txt)
  • instead of /aline -n, you are using $read() to check for lines already encountered
So the $read()-based method is essentially the same, except much slower because for each line in the original file, it uses $read() on a temp file to scan for the presence of the line (meaning opens the temp file, looks at each line until a match is found, closes the file) and then calls /write (meaning opening the file, writing a line at the end, closing the file).

/aline -n uses the same (fundamentally slow) approach of scanning all lines in the workspace (until a match is found), for each line in the source. However this is all done in a hidden window, so the process is faster.

A different approach that is much faster in theory (and in practice) is to use hash tables to store the lines encountered so far. Due to the way hash tables work, checking if a new line in the source has been encountered is very fast. Here are all 3 methods:
Code:
rmduplines1 {
  write -c temp.txt
  filter -k $1- rmduplines1-callback ?*
}
rmduplines1-callback {
  if (* !iswm $read(temp.txt,nw,$1)) write temp.txt $1
}

rmduplines2 {
  window -h @a
  filter -k $1- rmduplines2-callback ?*
  savebuf @a temp.txt
  close -@ @a
}
rmduplines2-callback {
  if (* iswm $gettok($1,1-,32)) aline -n @a $1
}

rmduplines3 {
  window -h @a
  filter -k $1- rmduplines3-callback ?*
  savebuf @a temp.txt
  close -@ @a
  hfree -w lines
}
rmduplines3-callback {
  if * iswm $gettok($1,1-,32) {
    if (!$hget(lines,$crc($1,0))) aline @a $1
    hadd -m lines $crc($1,0) 1
  }
}


I made some minor modifications to your original code, like adding the n switch in $read and removing the asterisk from $1-: you want to scan for lines equal to $1-, not starting with $1-. Another minor modification was replacing the if (!$read()) check with if (* !iswm $read()), which is equivalent to if ($read() != $null): this way it won't fail with "$false" or "0" etc.

On a sidenote, using $read with the w switch would break if $1- itself contained wildcards: a way to avoid this is to use the s switch instead and also check $readn.

All 3 rmduplinesN aliases accept a filename and create the file temp.txt with duplicate lines removed.

I timed the 3 methods using the full versions.txt (more than 11,000 lines) and got the following results from one run (subsequent runs produced similar numbers):

rmduplines1 time: 31949 ms
rmduplines2 time: 3588 ms
rmduplines3 time: 873 ms

The $read/write method is by far the slowest, with the hash table method being the fastest.




/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
qwerty #209589 18/02/09 03:36 PM
Joined: Jul 2006
Posts: 4,172
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,172
Quote:
Why is this method better than /aline -n? The main use of /filter here is just to loop through the lines in the file. /filter could still be used with /aline -n for the same purpose.
True, but when I wrote the reply, I wasn't thinking that filter would do the loop (I know it would, but was out of my mind at this time).

I like your demonstration, even if I was aware about what you've said/shown smile
Change (even minor) my code to something better is good too.

There is only one thing I'm wondering now :
Quote:
Another minor modification was replacing the if (!$read()) check with if (* !iswm $read()), which is equivalent to if ($read() != $null): this way it won't fail with "$false" or "0" etc.
Then why did you choose the "* !iswm" method instead of the "!= $null" ?

Also, since the purpose is to not have duplicate lines, why /savebuf to temp.txt instead of the actual input file (of course it could not be applied for the first exemple) ?


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Wims #209590 18/02/09 03:43 PM
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Quote:
Then why did you choose the "* !iswm" method instead of the "!= $null" ?
This is just a little habit of mine (coming from the fact that it's a tiny bit faster than checking $null).

Quote:
Also, since the purpose is to not have duplicate lines, why /savebuf to temp.txt instead of the actual input file (of course it could not be applied for the first exemple) ?
I did that so that the source file wouldn't be ruined in case I (or anybody else testing this) screwed up somewhere smile


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
qwerty #209591 18/02/09 04:26 PM
Joined: Jul 2006
Posts: 4,172
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,172
Perfectly clear now, thanks


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
MTec007 #209605 19/02/09 03:35 AM
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
Originally Posted By: MTec007
what would the regex be to do the following?

replace everything from the start of the string up to and including 'SERVER:' with nothing and everything after and including ':6667' with nothing

n0=something: Random serverSERVER:irc.someone.org:6667GROUP:#somewhere

results would be irc.someone.org


You may as well just use $gettok(n0=something: Random serverSERVER:irc.someone.org:6667GROUP:#somewhere,3,58)

hixxy #209617 19/02/09 05:58 PM
Joined: Feb 2007
Posts: 234
M
MTec007 Offline OP
Fjord artisan
OP Offline
Fjord artisan
M
Joined: Feb 2007
Posts: 234
thanks to all, this is working fine now.


Link Copied to Clipboard