mIRC Home    About    Download    Register    News    Help

Print Thread
#119609 07/05/05 06:42 PM
Joined: Sep 2004
Posts: 10
B
Pikka bird
OP Offline
Pikka bird
B
Joined: Sep 2004
Posts: 10
ok im having major issues with mircs sockets...
i am trying to open and download a webpage then process the incoming html data.
my issue is with the way mirc handles and delivers the incoming data.
when i do a /sockread %var mirc will fill the %var with a $crlf terminated string of data from the buffer,
the problem occurs when the html data does not contain any $crlf terminated lines, mirc fills the %var with upto 949 chars of data, this causes neumurous "string too long errors" when trying to process the data in the %var, frown
it also causes the incoming data to be split up into multiple blocks of data and the loss of data where mirc strips out spaces at the end of the data currently in the buffer and inserts hex numbers inbetween blocks of data.
and requires multiple read events to get all the data as mirc will not refill the buffer with data while the event script is running. crazy
this is proving extremely frustrating and doing my head in confused

i am working with the latest v6.16 mirc.
any suggestions greatly apreciated thanx.

p.s would any of the above be worthy of a bug report? wink

#119610 07/05/05 08:30 PM
Joined: Apr 2003
Posts: 701
K
Hoopy frood
Offline
Hoopy frood
K
Joined: Apr 2003
Posts: 701
Either only read 200 bytes at a time and process that or use a &binvar to read the entire buffer and then parse that. Do remember that &binvars disappear when the event ends, so you'll need to either get the needed data out immediately or write it all to a file or hash table to use it all later.

There are many scripts to download html using mIRC sockets and probably just as many using other programs or libraries, are you sure you need to re-invent it? It is ofcourse a good challenge, but if you just need it, then try searching for download or parsing scripts on this forum or on other mIRC script related websites. You could then maybe adjust one of those scripts to your needs if it doesn't do what you need already.

If you're determined to script it yourself, make sure you have the HTTP RFC, a lot of other http protocol related documents (chunked encoding for example), the mIRC help file, google and lots and lots of spare time...

As for the bug reports:
-> the length constraint for %vars is already brought up a number of times, and for your case it won't help if this constraint is made to 2000 or 4000 or 10000 bytes, since someone would probably end up downloading some 200MB installer using your script.
-> /sockread does what the help file says it does (read to first $crlf or to end of buffer)
-> yes, you will need multiple read events, letting mIRC hang until all those megabytes are in isn't my idea of an improvement. Triggering the event isn't that difficult for mIRC. For scripters it's somewhat more difficult since they need to get some status %vars read in again.
-> you can do multiple /sockread's in one event untill you reach the end of the buffer.

#119611 07/05/05 08:43 PM
Joined: Sep 2004
Posts: 10
B
Pikka bird
OP Offline
Pikka bird
B
Joined: Sep 2004
Posts: 10
to clarify the points i was trying to make above
here is a typical example of the problems im facing...

33. 2-11 211 2 Oct 98 <a href="http://www.tvtome.com/StargateSG1/season2.html#ep33">The Tok'ra (1)</a>
34. 2-12 212 9 Oct 98 <a href="http://www.tvtome.com/StargateSG1/season2.html#ep34">The Tok'ra (2)</a>
35. 2-13 213 23 Oct 98
10F7
<a href="http://www.tvtome.com/StargateSG1/season2.html#ep35">Spirits</a>
36. 2-14 214 30 Oct 98 <a href="http://www.tvtome.com/StargateSG1/season2.html#ep36">Touchstone</a>
37. 2-15 216 22 Jan 99 <a href="http://www.tvtome.com/StargateSG1/season2.html#ep37">The Fifth Race</a>

you can see from the above example that line 35. has been split in 2, the spaces between the date and the url have been stripped out, and a hex code inserted at the next line?.
its not just a simple matter of apending the split line back together, especially when the split occurs in the middle of the date or the middle of the url or tital where the stripped spaces become critical. (theres noway to know wether spaces were stripped and need to be replaced and how many?)
the problem is compounded when the $cr/lf 's are not spotted and a block of concantenated lines is recieved and split between /sockread 's
the -fn and [numbytes] switches appear to be unimplemented/do nothing in all the tests i've made with both normal and binary variables.

thanx

p.s there are 3 spaces between the date and the url, but this editor obviously likes mirc and has stripped 2 of them from each line:D

#119612 07/05/05 09:03 PM
Joined: Apr 2003
Posts: 701
K
Hoopy frood
Offline
Hoopy frood
K
Joined: Apr 2003
Posts: 701
Chunked transfer coding is the cause of those hex numbers...

As for the other, try this as the body of your sockread event and see if out.txt contains all spaces and everything. The hex numbers will still be in there, have fun counting those bytes yourself grin I'll give you that $base(5FA7,16,10) returns 24487 for converting those hex numbers.

sockread 200 &data
while (!$sockerr && $sockbr) {
bwrite out.txt -1 -1 &data
sockread 200 &data
}

#119613 07/05/05 09:31 PM
Joined: Sep 2004
Posts: 10
B
Pikka bird
OP Offline
Pikka bird
B
Joined: Sep 2004
Posts: 10
the example above was retrieved with...

sockread 100 &buffer
bwrite socktest.txt -1 -1 &buffer

tho this method does reduce the number of line splits, it dosen't stop them.
but using a &binary var and writing the data to a txt file for later processing looks like the easier option.
as for the rfc link, that is way too advanced for my megre abilities laugh
thank you for your help:)

#119614 08/05/05 05:20 PM
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Depending on the site and the source, it can be possible to use a variable to set when you reach a certain like and then it'll read from the following line instead of the first line.

such as:

This is my line of
text and I want to see THIS.

in your sockread, you can have something like:

if (%splitline) {
echo -a $gettok(%temptext,7,32)
unset %splitline
}
if (*is my line* iswm %temptext) {
set %splitline 1
}

This sort of thing can be worked into most scripts when having to deal with line splits. How you do it is dependent on the source you're working with.

Another possibility (tho, it is generally not a good idea... but it does work on occasion) is to use embedded sockreads.


Invision Support
#Invision on irc.irchighway.net
#119615 09/05/05 12:34 AM
Joined: Sep 2004
Posts: 10
B
Pikka bird
OP Offline
Pikka bird
B
Joined: Sep 2004
Posts: 10
Hi Riamus2

Kelder's suggestion of using binary variables and writing the data to a text file got me thinking.
after some experimentation i discovered that when you do a sockread to a binary varible
the &bvar is filled with the raw source complete with no stripping of any chars.
the stripping was a result of echo'ing/printing the data.
after reading through the mirc help file on binary variables i found i could scan the &bvar with the source data and extract just the parts i wanted into normal %vars using $bvar
from there parsing the extracted data is a simple matter.
to handle the gaps in the data between sockread events all i had to do is to save the right 200 chars of the source data...

if $calc(%size - %pos) <= 200 {
breplace &buffer 32 183
set %buffer $bvar(&buffer,%pos,200).text
return
}

then append the next sockread to the saved data...

if %buffer != $null {
sockread &buffer2
bset -t &buffer 1 %buffer
breplace &buffer 183 32
set %pos $calc($len(%buffer) + 1)
var %size = $calc(%pos - 1 + $sockbr)
bcopy &buffer %pos &buffer2 1 -1
unset %buffer
}

and then contintiue extracting the wanted data where i left off in the last sockread event laugh
this has resolved all the issues i was having using normal %vars
thank you for your help:)

p.s the breplace preserves the spaces from being stripped when assigned to a normal %var:)


Link Copied to Clipboard