mIRC Home    About    Download    Register    News    Help

Print Thread
Page 1 of 2 1 2
Joined: Dec 2002
Posts: 99
M
MDA Offline OP
Babel fish
OP Offline
Babel fish
M
Joined: Dec 2002
Posts: 99
Greetings,

I'm seeking and needing help in writing a alias to split up a huge text file. It contains 100,000 entrys, 1 entry per line, 100,000 lines in length into files containing 1 entry per line 5,000 entrys total, 5,000 lines in length.

Ideally, the alias would also incorporate a generated number with the output text filename generated for each 5,000 entrys stored, ie text1.txt, text2.txt

Here is a snippit of one of my crude attempts to just pull off the first 5,000 entrys, but its not working:

alias TextSplitter {
set %Textnumb 1
set %Textnum 5000
:next
if (%Textnumb > %Textnum) { /goto finish }
set %info $read(C:\1Report\text.txt,%Textnumb)
.write C:\1Report\text1.txt %info
set %info $null
inc %Textnumb
goto next
:finish
}

Again, this crude attempt of mine doesn't include any updating of the output filename and was just my attempt to peel off the top 5,000 entrys as a example here.

Thanks to each for your comments and suggestions,
MDA

Joined: Nov 2003
Posts: 2,327
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Nov 2003
Posts: 2,327
Use /filter and make use of the n-n2 properties smile

/help /filter


New username: hixxy
Joined: Dec 2002
Posts: 99
M
MDA Offline OP
Babel fish
OP Offline
Babel fish
M
Joined: Dec 2002
Posts: 99
Greetings Matt,

I'm not filtering lines, I'm trying to rewrite the entire whopper 100,000 file into several manageable 5,000 line text files.

Thanks anyway,
MDA

Joined: Nov 2003
Posts: 2,327
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Nov 2003
Posts: 2,327
I understand what you want to do and my suggestion stays the same. /filter will be (one of) the quickest way(s) to do that:

Code:
alias hugefiltertest {
  var %i = 1, %ticks = $ticks
  if (!$isdir) { mkdir test }
  .fopen -no bigfile test\bigfile.txt
  while (%i <= 100000) { 
    .fwrite -n bigfile %i
    inc %i
  }
  .fclose bigfile
  filter -cffr 1-10000 test\bigfile.txt test\file1.txt
  filter -cffr 10001-20000 test\bigfile.txt test\file2.txt
  filter -cffr 20001-30000 test\bigfile.txt test\file3.txt
  filter -cffr 30001-40000 test\bigfile.txt test\file4.txt
  filter -cffr 40001-50000 test\bigfile.txt test\file5.txt
  filter -cffr 50001-60000 test\bigfile.txt test\file6.txt
  filter -cffr 60001-70000 test\bigfile.txt test\file7.txt
  filter -cffr 70001-80000 test\bigfile.txt test\file8.txt
  filter -cffr 80001-90000 test\bigfile.txt test\file9.txt
  filter -cffr 90001-100000 test\bigfile.txt test\file10.txt
  echo -a Finished in $calc(($ticks - %ticks) / 1000) seconds.
}

Last edited by tidy_trax; 03/07/05 08:55 PM.

New username: hixxy
Joined: Dec 2002
Posts: 99
M
MDA Offline OP
Babel fish
OP Offline
Babel fish
M
Joined: Dec 2002
Posts: 99
Matt,

I'm not understanding what you posted here. It looks like you are pulling off the top 5,000 entrys each time. Meaning the same 5,000 would be copyed to files 1,2,3.

MDA

Joined: Nov 2003
Posts: 2,327
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Nov 2003
Posts: 2,327
I was, I made a mistake. I've edited the above post.


New username: hixxy
Joined: Dec 2002
Posts: 99
M
MDA Offline OP
Babel fish
OP Offline
Babel fish
M
Joined: Dec 2002
Posts: 99
Matt,

That routine took awhile to run, and produced 20 files with nothing but the 5,000 line numbers in each of them, no data whatsoever from the original file was transfered over. I need the data transfered to be intact, no extra information added such as line numbers etc.

alias hugefiltertest {
var %i = 1, %ticks = $ticks
if (!$isdir) { mkdir test }
.fopen -no bigfile test\bigfile.txt
while (%i <= 1000000) {
.fwrite -n bigfile %i
inc %i
}
.fclose bigfile
filter -cffr 1-5000 test\bigfile.txt test\file1.txt
filter -cffr 5001-10000 test\bigfile.txt test\file2.txt
filter -cffr 10001-15000 test\bigfile.txt test\file3.txt
filter -cffr 15001-20000 test\bigfile.txt test\file4.txt
filter -cffr 20001-25000 test\bigfile.txt test\file5.txt
filter -cffr 25001-30000 test\bigfile.txt test\file6.txt
filter -cffr 30001-35000 test\bigfile.txt test\file7.txt
filter -cffr 35001-40000 test\bigfile.txt test\file8.txt
filter -cffr 40001-45000 test\bigfile.txt test\file9.txt
filter -cffr 45001-50000 test\bigfile.txt test\file10.txt
filter -cffr 50001-55000 test\bigfile.txt test\file11.txt
filter -cffr 55001-60000 test\bigfile.txt test\file12.txt
filter -cffr 60001-65000 test\bigfile.txt test\file13.txt
filter -cffr 65001-70000 test\bigfile.txt test\file14.txt
filter -cffr 70001-75000 test\bigfile.txt test\file15.txt
filter -cffr 75001-80000 test\bigfile.txt test\file16.txt
filter -cffr 80001-85000 test\bigfile.txt test\file17.txt
filter -cffr 85001-90000 test\bigfile.txt test\file18.txt
filter -cffr 90001-95000 test\bigfile.txt test\file19.txt
filter -cffr 95001-100000 test\bigfile.txt test\file20.txt
echo -a Finished in $calc(($ticks - %ticks) / 1000) seconds.
}

Regards,
MDA

Joined: Dec 2002
Posts: 1,245
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Dec 2002
Posts: 1,245
did you change test\bigfile.txt to C:\1Report\text.txt?

Last edited by MikeChat; 03/07/05 09:38 PM.
Joined: Dec 2002
Posts: 99
M
MDA Offline OP
Babel fish
OP Offline
Babel fish
M
Joined: Dec 2002
Posts: 99
Greetings MikeChat,

No, I created a test subdirectory in the mirc folder, copyed over that huge text file to the test subdirectory, then renamed it to bigfile.txt to fit the script alias you wrote.

MDA

Joined: Feb 2004
Posts: 2,019
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2004
Posts: 2,019
EDIT: Had a version, but I like DaveC's version better, get that one.


Gone.
Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Here mr lazy is ya alias!

Code:
;
;usage $splitup(source,destination folder/file) * Do not add .txt!
;
alias Splitup {
  if (!$isfile($1))                    { return -1 } | ; invalid source file
  if (($nofile($2)) &amp;&amp; (!$isdir($v1))) { return -2 } | ; invalid destination folder
  if (!$mkfn($nopath($2)))             { return -3 } | ; invalid destination file
  var %i = 1 | filter -cffr 1-5000 $+(",$1,") $+(",$nofile($2),$mkfn($nopath($2)),%i,.txt")
  while ($filtered) {
    inc %i | filter -cffr $+($calc(1 + %i * 5000),-,$calc(5000 + %i * 5000)) $+(",$1,") $+(",$nofile($2),$mkfn($nopath($2)),%i,.txt")
  }
  remove $+(",$nofile($2),$mkfn($nopath($2)),%i,.txt") | dec %i
  return %i
}


//echo -a i made $splitup(C:\1Report\text.txt,otherfilename) files

Joined: Nov 2003
Posts: 2,327
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Nov 2003
Posts: 2,327
Mine was just an example, I didn't actually write the code for you smile


New username: hixxy
Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Quote:
EDIT: Had a version, but I like DaveC's version better, get that one.


lol i almost fell of my chair on that one!

Actually i thought mine was a bit dodgy in how it ends up making the extra file, then deletes it at the end.
I was gonna say it was a feature, incase some old filenameNNN.txt files existed you could see where the new ones ended by the missing one, but i thought that would be stretching the fact its a fixup at the end just a little bit.

Joined: Dec 2002
Posts: 99
M
MDA Offline OP
Babel fish
OP Offline
Babel fish
M
Joined: Dec 2002
Posts: 99
Thanks Matt, DaveC but I'll pass on using the Filter command and just go back to the read/write format until I can work it thru.

I don't know enough about the filter command to trust any code that is listed with professional comments like, here mr. lazy. That reminds me very much of some jack that posted to another user having ram problems dumping his kernal32.exe stack to Delete his kernal32 file.

I just haven't used the filter command enough to spot intentional bad code and not. Obviously after following and trying the first suggested alias which generated 20 crap files filled with nothing but line numbers, I'll study up on the Filter command and in the meanwhile work it out with the $read/$write commands.

Thanks anyway,
MDA

Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
lol, dude, carm down, i was being sarcastic, you did ask for the alias remember.

And you really should use the filter command OR at least switch to the /fopen $fread /fwrite /fclose commands.
I say this becuase, when you read a line in a txt file using $read it reads from the begining of the file to the line number you requested, so reading a 100,000 line file means that line 1 takes a read of 1 line , line 2 takes a read of 2 lines, line 3 takes a read of 3 lines, etc by the time you have read to line 100,000 you have read a total of 5,000,000,000 lines, a large number indeed!

The first codes passed were examples of how to use the filter command, you recieved, files with linenumbers in them becuase that was the 100,000 line file it was using as the source, the alias actually made the file, i think you reported a bug in it that caused it to produce duplicate files, and that was corrected, but even then it was still just going to show you how to use filter, not actually allow you to use it on your file.

Ill explain quickly what the filter command does as i and others used it.
/FILTER -cffr 1 5000 source destination matchtext
Filter is designed to search the source (file) for matching lines (if no matchtext is preset then it matches everything so its a copy, however read on)
the -c option tells it to clear the destination (file) before writing to it
the -ff option is needed to tell /filter its using files since it can use windows -w and other things as the source/destination locations
the -r option this is the big key item this tells /filter to read 2 numbers after the options, and these are the starting line and ending line to search from

so we have
/FILTER -cleardestination -file(source) -file(destination) -read 1 5000 source(file) destination(file)
as i said above , since there is no matchtext it just copies all them lines

Ill document my code below to let u see what its doing
***** I HAVE CORRECTED A BUG IN THE CODE ALSO ***** The line numbers in the while loop filter were wrong! ooops!

Code:
;usage $splitup(source,destination folder/file) * Do not add .txt!
;
alias Splitup {
  if (!$isfile($1))                    { return -1 } | ; invalid source file
  ;^ if $1 is not a file exit with a value -1 to indercate bad source filename
  ;
  if (($nofile($2)) &amp;&amp; (!$isdir($v1))) { return -2 } | ; invalid destination folder
  ;^if $2 minus the filename exists (theres a DIR there) then check if the DIR is valid, and if its not then exit with a value -2 to tell you this.
  ;
  if (!$mkfn($nopath($2)))             { return -3 } | ; invalid destination file
  ;^ if after replaceing illegal file characters in the destination filename there turns out to be no filename then exit with a value -3 to tell you this
  ;
  ;* If we reach here we can proceed to make the files. *
  ;
  var %i = 1
  ;^set destination file # fileNNN.txt NNN being 1
  ;
  filter -cffr 1-5000 $+(",$1,") $+(",$nofile($2),$mkfn($nopath($2)),%i,.txt")
  ;^ do the first filter lines 1 to 5000
  ;^ I add " " around the source filename since /filter doesnt like the names if spaces are in it
  ;^ I also do the same to the destination file, but i also get the path $nofile(), 
  ; &amp; then remove illegal filename characters from the filename $mkfn($nopath())
  ; &amp; finally add the file number %i, and the .txt
  ;
  ;A while loop repeats untill the condition becomes false/zero, $filtered is a value saying howmany lines went through the filter
  ;on first encountering this below, unless the source was empty $filter well have some value (5000) on a 100,000 line file
  while ($filtered) {
    ;^ enter here if some or all of the 5000 lines of the last /filter were copied
    ;
    inc %i
    ;^ add 1 to the destination file number counter
    ;
    [color:blue]filter -cffr $+($calc(%i * 5000 - 4999),-,$calc(%i * 5000)) $+(",$1,") $+(",$nofile($2),$mkfn($nopath($2)),%i,.txt")
    ;^ do a filter just like the one above, but this time use (%i * 5000 -4999) for the start line and (%i * 5000) for the end line
    ;^ ie: if %i was 2 then its (2x5000-4999) to (2x5000) aka lines 5,001 to 10,000
    ;^ ie: if %i was 3 then its (3x5000-4999) to (3x5000) aka lines 10,001 to 15,000
    ;[/color]
  }
  remove $+(",$nofile($2),$mkfn($nopath($2)),%i,.txt")
  ;^ since the while loop exits only when the last filter had no lines copied (ie its line numbers are beyond the end of the file)
  ;^ I must delete this last file I created, as its empty anyway
  ;
  dec %i
  ;^ lastly reduce the destination file number counter to be the last file actually written
  ;
  return %i
  ;^ exit with this value, so u know how many files were made
  ;
}

Joined: Dec 2002
Posts: 99
M
MDA Offline OP
Babel fish
OP Offline
Babel fish
M
Joined: Dec 2002
Posts: 99
Greetings,

The code 'looks' fine, however mIRC tells me its missing a bracket. The re-editing is going slowly trying to remove all the comments and there's some pipeing symbols which I'm not quite sure why you added them, they'll probably have to be removed also where they are next to a comment.
ie:
| ; invalid destination folder

What is the correct command line entry for your alias if it becomes functioning for a mIRC folder named Test and the original Filename Fatext.txt in that folder to split up?

is it

/Splitup Test Fatext

??

MDA

Joined: Dec 2002
Posts: 99
M
MDA Offline OP
Babel fish
OP Offline
Babel fish
M
Joined: Dec 2002
Posts: 99
Greetings DaveC,

You wrote, And you really should use the filter command OR at least switch to the /fopen $fread /fwrite /fclose commands.
"I say this because, when you read a line in a txt file using $read it reads from the begining of the file to the line number you requested, so reading a 100,000 line file means that line 1 takes a read of 1 line , line 2 takes a read of 2 lines, line 3 takes a read of 3 lines, etc by the time you have read to line 100,000 you have read a total of 5,000,000,000 lines, a large number indeed!"

DaveC, $read, $write commands also include the ability to remove a specific number line of text after that variable has been read and written. One line would be all that would be needed to read at any time, that line written to another text file, a counter variable increased and then the original top line in the fat text file removed. It's also likely FASTER than using the Filter command.

A Simple Working Example:

alias TextFileSplitter {
set %textnumb 1
set %textnum 5000
:next
if (%textnumb > %textnum) { /goto finish }
set %textee $read($mircdirtest\fatext.txt,1)
if (%textee == $null) { echo 4 Data Textee is null | /goto finish }
.write $mircdirtest\text1.txt %textee
.write -dl $+ 1 $mircdirtest\fatext.txt
set %textee $null
inc %textnumb
goto next
:finish
}

MDA

Last edited by MDA; 05/07/05 08:55 PM.
Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Quote:
| ; invalid destination folder

if you want to add a comment to the end of a line you need to use a | before it.
the | is a command seperator (rather than a output pipe director)
ex: //echo hello | echo blah
hello
blah

so yes you just remove the | as well, i have included a un documented copy here
** one small addition i placed a . infront of the /remove command so it doesnt display the file removed message **

Code:
;usage $splitup(source,destination folder/file) * Do not add .txt!
;
alias Splitup {
  if (!$isfile($1)) { return -1 }
  if (($nofile($2)) &amp;&amp; (!$isdir($v1))) { return -2 }
  if (!$mkfn($nopath($2))) { return -3 }
  var %i = 1
  filter -cffr 1-5000 $+(",$1,") $+(",$nofile($2),$mkfn($nopath($2)),%i,.txt")
  while ($filtered) {
    inc %i
    filter -cffr $+($calc(%i * 5000 - 4999),-,$calc(%i * 5000)) $+(",$1,") $+(",$nofile($2),$mkfn($nopath($2)),%i,.txt")
  }
  .remove $+(",$nofile($2),$mkfn($nopath($2)),%i,.txt")
  dec %i
  return %i
}


PS: incase your having trouble cutting and pasting this into mirc, you copy it, paste into wordpad, then copy it from there and paste into mirc, this is a problem with this forum and the way explorer (i think) copies text from it)

Quote:
What is the correct command line entry for your alias if it becomes functioning for a mIRC folder named Test and the original Filename Fatext.txt in that folder to split up?


.method1
/Splitup sourcefile destinationfile
sourcefile is the exact filename ex blahblah.txt or c:\wobble\blah.txt etc
destinationfile is the fileto split into minus the .txt ex blob or c:\stats\blob etc

.method2
var %filecount = $Splitup(sourcefile,destinationfile)
sourcefile is the same as in method one, but can now also be big file.txt or c:\my folder\blah file.txt
destination file is the same, but may also contain spaces like sourcefile

you need to do it this way, becuase mirc doesnt handle spaces in filenames passed to the routine well
amagine
/splitup text file results
is it
text & file results
or
text file & results
or even
text & file & ignore the word results!

Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Quote:
DaveC, $read, $write commands also include the ability to remove a specific number line of text after that variable has been read and written. One line would be all that would be needed to read at any time, that line written to another text file, a counter variable increased and then the original top line in the fat text file removed. It's also likely FASTER than using the Filter command.


It well be faster but still hugely disk intensive
When you remove the first line of the file, the file is completely rewritten minus the firstline, i beleive however, it is rewritten in large blocks (64kb or 1meg etc) rather a line at a time.
Doing so would be incredibly wastefull as its a process that is not needed.

The fastest method of all would to be use the /fopen $fread /fwrite /fclose commands, then the source file well be read a total of 1 times, and the result files well be written to a total of 1 times, how ever i just ran a test on my alias and it took 2 seconds, 2 seconds for 100,006 lines isnt to shabby.

Joined: Apr 2003
Posts: 701
K
Hoopy frood
Offline
Hoopy frood
K
Joined: Apr 2003
Posts: 701
I really doubt it, but feel free to actually test them both (or all three) on such a file of 100000 lines.

The point remains, $read and /write do the following:
- open file
- read characters and count $crlf until specified line reached
- read line in and return it
- close file

This means those files are opened and closed 100000 times, and you still have to search 1+2+3+4+...+5000 lines for the /write in the smaller files...

Using /fopen, $fread /fwrite and /fclose, you bring that number back to 1+20 times, $fread reads in sequence, it remembers the last position it read, /fwrite just writes at the back, no searching needed. This makes it very likely to be faster, a lot faster even.

Now do the same in native compiled code instead of a script language like mIRC script and you get the performance of /filter.

* Kelder goes for the /filter if possible

Since you'll probably not beleive me, try these 2 scripts:
alias test1 {
var %i = 1, %time = $ticks
fopen -no blub delme.txt
while (%i < 100000) {
.fwrite blub look! this is line number %i !
inc %i
}
fclose blub
echo -s time taken: $calc($ticks - %time) ms
}

alias test2 {
var %i = 1, %time = $ticks
while (%i < 100000) {
write delme.txt look! this is line number %i !
inc %i
}
echo -s time taken: $calc($ticks - %time) ms
}

Test1 runs in 10500 ms, test2 in 84200, and this test is only half the requirements...


ps: Look up /while, it might not be faster, but it has more chance of being readable and correct.
While you're at it, /var is nice too!

Page 1 of 2 1 2

Link Copied to Clipboard