mIRC Home    About    Download    Register    News    Help

Print Thread
#258232 20/06/16 04:46 PM
Joined: Jul 2006
Posts: 3,969
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,969
When you read a line into a binvar using the -n switch of /sockread, mirc seems to replace $crlf with two binary 0, which seems wrong since sending an empty line should result in an empty string being read, instead $bvar(&binvar,0) will return 2.

Code:
alias testsr socklisten serv 8000 | sockopen client 127.0.0.1 8000 
on *:socklisten:serv:sockaccept client1 | sockclose serv
on *:sockopen:client:if (!$sockerr) sockwrite -n client
on *:sockread:client1:sockread -n &a | echo -s > $bvar(&a,0) -- $bvar(&a,1-) -- $bvar(&a,1-).text | sockclose client*
/testsr


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Dec 2002
Posts: 245
T
Fjord artisan
Offline
Fjord artisan
T
Joined: Dec 2002
Posts: 245
I'm assuming that this is a C++ globally allocated character array. $bvar(&var,0) returns the array length, where as $bvar(&var,X) returns whatever is in that position within the array. character arrays and of the likes are null terminated "\0" or ASCII code 0. $bvar(&var,1-).text is the string literal from the contents of the array, which returns up to null termination.

With that in mind, during mIRC's WinSock processing, it has read x bytes until it finds an ASCII chr 13, chr 10 or both ($cr,$lf, or $crlf) since you've used the -n flag. Take this example below.

"abc<crlf>def<crlf>"

during it's scan, it finds "abc<crlf>" which has a length of 5, A,B,C,$cr,$lf. This is put into a globally allocated character array and removed from the WinSock buffer so what's left is "def<crlf>" for the next read. This character array needs to be able to fit the entire content of all 5 characters.

with the -n flag, you specify that you want to break at crlf, which is what it did, upon the return it replaces the end two characters with 0 clearing the ascii codes 13 and 10.

The string literal (not the array) $bvar(&var,1-).text IS correct in the return of "abc", as it reads up to null termination.

if you REALLY need no trailing termination (zero filled similar to overwriting the contents of a binvar with /bcopy -z) then you need to create a new array, and copy the contents you need into it.

Since you used &a, we'll make a new one and call it &b.

bcopy &b 1 &a 1 $calc($bfind(&a,1,0) - 1)

now with a sockwrite -n client

A:
$bvar(&a,0) = 2
$bvar(&a,1) = 0
$bvar(&a,2) = 0

B:
$bvar(&b,0) = 0

now with a sockwrite -n client Hi!
A:
$bvar(&a,0) = 5
$bvar(&a,1) = 72
$bvar(&a,2) = 105
$bvar(&a,3) = 33
$bvar(&a,4) = 0
$bvar(&a,5) = 0

B:
$bvar(&b,0) = 3
$bvar(&b,1) = 72
$bvar(&b,2) = 105
$bvar(&b,3) = 33

Note: If you're doing a sockread loop with checking $sockbr, don't forget to /bunset &b before you bcopy again, or you could use -z to zero fill, but then you'd be back in the same boat with trailing ASCII 0's smile

Last edited by Talon; 20/06/16 07:04 PM.
Joined: Apr 2004
Posts: 868
Sat Offline
Hoopy frood
Offline
Hoopy frood
Joined: Apr 2004
Posts: 868
I'm sorry, but the first part of your post is based on a flawed understanding on how receiving data on sockets works. Simply put, mIRC always retrieves data from WinSock into a local per-socket buffer (the "receive queue", referred to by eg $sock().rq) first, and it is then copied to the binvar from there. As such, the size of the binvar need not be determined by the size of the substring that is being obtained from the socket.

Even then, that would not be a good excuse for the current behavior. The result of the current situation is that the script gets bytes that were not received on the socket. This is a bug, plain and simple. Not a major one (it looks like it's been there since at least 6.x and it seems that Wims is the first to have run into it), but a bug nonetheless.


Saturn, QuakeNet staff
Joined: Dec 2002
Posts: 245
T
Fjord artisan
Offline
Fjord artisan
T
Joined: Dec 2002
Posts: 245
WinSock API: recv(SOCKET thesock, char *buf, int len, int flags)

buf is a pointer to your "local per-socket buffer", which is filled up to len, which builds your "recieve queue" your recieve queue is iterated for cr, lf, or crlf with use of -n.

from position 1 to position <cr,lf,or crlf> is copied onto your binvar (which I'm treating as a globally allocated character array, under assumption, especially since the recieve buffer is already a pointer to a character array, be it iterate and set, memcp, etc... it's just speculation, only khaled knows) much like echoing the array utilizing $bfind() such as

echo -a $bvar(&var,$+(1-,$bfind(&var,1,13)))

the result will have $chr(13) in it, much like what's happening. You did in fact recieve bytes, two of them, 13 and 10.

"The result of the current situation is that the script gets bytes that were not received on the socket."

Albiet these are now 0, instead of 13 and 10, this is answered by the replacement of those characters with "\0" so $bvar(&var,x-).text for the "string" much like using sprintf, wsprintf, cout, etc.. will terminate upon the first found ascii code 0.

Joined: Jan 2004
Posts: 1,352
L
Hoopy frood
Offline
Hoopy frood
L
Joined: Jan 2004
Posts: 1,352
How it happens is irrelevant, inserting these two bytes into the bvar is inappropriate.

Joined: Dec 2002
Posts: 245
T
Fjord artisan
Offline
Fjord artisan
T
Joined: Dec 2002
Posts: 245
to use -n is an assumption of text, why else would you break on a carrage return, line feed or both if it wasn't text? If you're expecting raw data, you'd use -f

This also has several benefits, which I have utilized such as parsing html or json or whatever from a raw socket, since the event fires even if you specify -n if the recieve queue is completely full. Such an instance would be say store this "chunk" in a hash table, and re-read from the socket retrieve "chunk" from the hash table, append "new chunk", etc.. until you finally reach your newline break, which is detectable having the trailing 0's.

Changing this would break things where this behavior is expected.

Joined: Jan 2004
Posts: 1,352
L
Hoopy frood
Offline
Hoopy frood
L
Joined: Jan 2004
Posts: 1,352
I'd do whatever I want if it suits my needs, mIRC shouldn't be changing it.

Joined: Apr 2004
Posts: 868
Sat Offline
Hoopy frood
Offline
Hoopy frood
Joined: Apr 2004
Posts: 868
Quote:
buf is a pointer to your "local per-socket buffer", which is filled up to len, which builds your "recieve queue" your recieve queue is iterated for cr, lf, or crlf with use of -n.

Exactly. Therefore, there is no need for mIRC to allocate a binvar with space for the cr/lf in order to get the cr/lf out of the socket buffer. It can simply discard the cr/lf, since they're no longer pending on the socket anyway. If the opposite of that was not what you were trying to argue in your first post, then I'm at loss as to what point you were trying to make there.

Quote:
from position 1 to position <cr,lf,or crlf> is copied onto your binvar [..] Albiet these are now 0, instead of 13 and 10

You are describing what is happening in the current mIRC version. I don't see anything in your post that explains why this is desirable behavior rather than a bug. The fact that one could potentially make a script that relies on such broken behavior is rather besides the point.


Saturn, QuakeNet staff
Joined: Apr 2010
Posts: 966
F
Hoopy frood
Offline
Hoopy frood
F
Joined: Apr 2010
Posts: 966
I'd say this is a bug aswell. be it intentional or not. The reason being a simple answer:

There's no way to determine if the data read was actually "\x0D \x0A" or "\x0 \x0 \x0D \x0A"


I am SReject
My Stuff
Joined: Dec 2002
Posts: 5,244
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,244
Thanks for your bug report. This has been the behaviour as far back as v5.5, the time that it was implemented (and presumably tested by the scripters that requested it). While it would be easy to change the behaviour, I try not to change behaviours that have been in place for so long that scripts may depend on them. In this case, if a script uses /sockread -n into a &binvar and then $bvar().text, it will not be an issue. However, if a script parses the &binvar directly, it may expect the terminating zeros and changing this now will break that script.

Joined: Apr 2004
Posts: 868
Sat Offline
Hoopy frood
Offline
Hoopy frood
Joined: Apr 2004
Posts: 868
There's no arguing against the legacy argument, but in that case (and for such cases in general) I'd like to request that you add a note about this peculiarity to the helpfile..


Saturn, QuakeNet staff
Joined: Jul 2006
Posts: 3,969
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,969
Could I get the name of the beta testers from v5.5??? grin

I'm ok for not changing behaviors because script rely on it.
For example, recently you changed the behavior of "alias join { join $2 }", which should technically be written as "alias join { !join $2 }"
This was known to anyone helping on IRC, billions of script are misusing this and indeed a lot of them started to broke. After you reverted the change, nobody complained because the (current) 'wrong' behavior is not an annoyance to anyone, it doesn't prevent functionnality, mIRC is/was already guessing that it should call the built-in alias in this case.

But here, it is breaking functionnality and let's be honnest, this behavior wasn't requested, this is not a feature request: there is no logic behind changing /sockread -n &binvar to add these \0, it's a bug plain and simple.
Khaled, if you think this is a bug, please make this one a try, change the behavior to what it should be and let's see if it breaks any script (you could always revert back the change in the next version!)


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Dec 2002
Posts: 245
T
Fjord artisan
Offline
Fjord artisan
T
Joined: Dec 2002
Posts: 245
I think you're still not understanding what's happening, the sockread is putting the ENTIRE CONTENT pulled from the receive queue into the binvar INCLUDING the line break, $bvar(&var,0) and $sockbr are equal, what was removed from the recieve queue is represented in the binary variable it was moved to.

For your convenience, since you used -n, the CRLF is nullified, replaced, changed to termination character, There is no magical addition of 0's they're simply the result of a replacement. These didn't come from no-where.

Take this example, like you were processing characters one at a time and removing them:
Code:
alias example {
  ;-- Set up some binvar with some characters
  bset -t &var 1 test
  ;-- Counter for as characters are removed, and placeholder of string remaining.
  var %tot = 0 , %str = $bvar(&var,1-).text
  echo -a Array Size: $bvar(&var,0) Array: $bvar(&var,1-) -=- String Size: $len(%str) String: $qt(%str)
  ;-- Remove 1st character until none remain
  while ($bvar(&var,1-).text) {
    bcopy -z &var 1 &var 2 -1
    var %tot = %tot + 1 , %str = $bvar(&var,1-).text
    echo -a %tot - Array Size: $bvar(&var,0) Array: $bvar(&var,1-) -=- String Size: $len(%str) String: $qt(%str)
  }
}


Output:
Code:
Array Size: 4 Array: 116 101 115 116 -=- String Size: 4 String: "test"
1 - Array Size: 4 Array: 101 115 116 0 -=- String Size: 3 String: "est"
2 - Array Size: 4 Array: 115 116 0 0 -=- String Size: 2 String: "st"
3 - Array Size: 4 Array: 116 0 0 0 -=- String Size: 1 String: "t"
4 - Array Size: 4 Array: 0 0 0 0 -=- String Size: 0 String: ""


while the string itself keeps changing, the array size does not.

Joined: Jul 2006
Posts: 3,969
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,969
There is no discussion about this being a bug or not Talon, you can put it any way you want, this is a bug, while not explicitely stating it's a bug, Khaled, by not saying it's not a bug, agreed with this.
What you seem to have missed is that there is also no discussion about why these \0 are here, you spent multiple hours on IRC explaining why, but it's not difficult to understand why, I know why, you know why, Khaled knows why, Sat knows as well.
Your example is not showing anything, yes 4 bytes is 4 bytes, these can be 0 48 or 125, it doesn't matter.

These \0 weren't sent, I should not get them, end of the story. If mIRC wanted to replace newline sequence by \0 in this case because it was very convenient, that's fine, but now if it puts these non received bytes into my binvar, it's a huge problem.
If I get \x20\x00\x00, what is the line that was sent? was it \x20\x13\x10? was it \x20\x00\x10 ?
If I were to decode utf8 from the bytes I get, it would make it invalid utf8 sequences
But just this: the data I'm getting is corrupted.

I'm not the one not understanding and I don't understand why you refuse to get it; because the behavior can be explained does not make it a correct behavior.
Please don't make another post about this. @Khaled, if you could tell him this is incorrect behavior, regardless of if you are going to fix it, that would be nice.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Dec 2002
Posts: 245
T
Fjord artisan
Offline
Fjord artisan
T
Joined: Dec 2002
Posts: 245
You stated "while not explicitely stating it's a bug, Khaled, by not saying it's not a bug, agreed with this."

The only thing you can assume out of his message was that this has been the implementation since 5.5 settled upon by himself and several beta testers, He did not say one way or the other it was a bug or not. The only thing you could assume is that he's weighing the odds, and coming to his own conclusion, which also includes the result of affecting any scripts that may be accustomed to it's default behavior way back since 5.5


read this article.

https://en.wikipedia.org/wiki/Newline

You using -n improperly for anything besides text does not constitute this as a bug. I'm sure you can find either character within a binary file, these don't mean "newline" in this case.

You wouldn't sockread in to a %var, bset -t it and brwite it to a file and expect it to be correct as-it-came from the socket, same sense, you wouldn't sockread into a &binvar, and /write $bvar(&var,1-).text to a file and expect it to be correct as-it-came from the socket either.

Similar to FTP protocol, if You're using ASCII mode for transfer, it's under the assumption of text, and has no issues with files that are of text. Upload an image file or something that may contain chr 13 or chr 10 which does NOT mean new line, and the file is corrupted. Use BIN mode and the file transfers perfectly fine perfectly in-tact.

I don't know what you're doing but obviously you're not dealing with text, or trying to mix the two, either way you seem to be attempting to treat it as-if it were text.

If you are attempting to mix and match, and this is your own design, try looking at other protocols that already does this behavior such as HTTP. The header starts out as text, to break "text-mode" two newlines are sent. Depending on the content type you recieved, the result of the data that follows afterwards may need handled in a different manner or continue on as text such as a content-type of text/plain, a good example for mix-and-match using both together is a content-type of multipart/form-data.

Under multipart/form-data, a special sequence is given to mark where the transmitted content would end commonly known as the boundary. you are again given sub-header details relaying what is in this particular section, which again marks the end of the sub-header with two newlines. Anything after that should be treated as binary data, and be processed as such until the found "boundary" sequence, marking the end of that section, and potentially the start of a new one.

Multipart header example
Code:
POST /path HTTP/1.0
Host: example.com
Content-type: multipart/form-data, boundary=AaB03x
Content-Length: xxxx

--AaB03x
content-disposition: form-data; name="field1"

<whatever was field1's data>
--AaB03x
content-disposition: form-data; name="field2"

<whatever was field2's data>
--AaB03x
content-disposition: form-data; name="file"; filename="files name"
Content-Type: some mime type
Content-Transfer-Encoding: binary

<RAW file data from a input type of file (selector)>
--AaB03x--

Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
I'm not sure why there is such fervent argument about this not being a bug. The behavior is very clearly surprising, both because it defies typical behavior (there is no other place where mIRC will read data for you and then rewrite bytes without an explicit command), and because the help file explicitly contradicts the behavior, or, in its most generous interpretation, omits critical information:

Originally Posted By: mirc.hlp
The -n switch allows you to read a $crlf terminated line into a &binvar.


The help file should include a note: "The $crlf will be replaced by null bytes." If this were there, I would agree, this is an odd design, but not a bug. But with (a) no explicit description and (b) a very surprising behavior, it seems reasonable to call this a bug.

The distinction of binary and text is rather moot here. As you pointed out, protocols that mix binary and plaintext exist. In such a case, it's common to /sockread plaintext data into a &binvar to keep your code path consistent, even though you are processing plaintext. You may even still be processing across raw bytes, and not $bvar().text. Granted, you can easily work around this issue (\0\0 could be treated as a newline boundary), but nobody would expect mIRC to be tampering with the integrity of the data read across a socket.

Whether it can be fixed is another story, but the fact that you (or others) rely on this behavior does not mean it should be considered expected behavior. Attempting to convince others that a surprising behavior is normal seems odd to me. Surely you understand that any routine that reads data from a pipe and silently modifies that data violates the principle of least surprise?

The term "bug" or "not bug" is a gray area, it depends on an arbitrary line. At the end of the day, "bug" is used to communicate the idea that something needs to be changed. In this case, the thing being communicated is that this behavior is confusing and should be adjusted, either via code or by adding a note in the help file. Something legitimately needs to be changed. Priority and method of fix are up for discussion, but this is a legitimate report.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Dec 2002
Posts: 5,244
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,244
This seems like a bug to me but it is hard to say for sure.

It often happens that an implementation makes sense at the time that it is implemented but does not make sense when it is looked at later. Occasionally during development, if an implementation takes a lot of time and work, with many intervening changes, a coding/testing fatigue sets in both for me and the testers. And while 99% of the implementation is fine, 1% might be slightly odd but we will accept it anyway - we might justify it in some way, or we might not bother to question it, or we might just miss it, or our test scripts might make it appear reasonable.

For example, in this case, I might have used a test script that used /sockread -n &binvar to append multiple incoming lines to the same &binvar, so perhaps including the terminating zeros made sense in that context. Or maybe, if the terminating zeros were not included, the discrepancy between $sockbr, $sock().rq, and the length of the &binvar was considered an issue.

As /sockread -n was very likely added due to a specific request by one or more scripters, and no one questioned the implementation then, this seems to imply that the behaviour was considered okay at the time.

Joined: Jul 2006
Posts: 3,969
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,969
Appending multiple line to the same binvar means that if you are using $bvar().text, you'll get the first line only (so, because of the binary 0 added), and if you were just going to append multiple lines and then handle the bytes manually, well the binaries 0 are just extra. So these binaries 0 didn't make sense at the time.

As far as the dispecrancy between $sockbr, $sock().rq and the length of the binvar goes, it's totally correct behavior. They are all independant features.

For example for http, if you wanted to make sure you received the expected number of bytes, you would first store the content-lenght value, and then decrease that value by $sockbr each time you sockread REGARDLESS of how you read the data.
You would never decrease that value by either $bvar(&var,0) or by $len(%var) because they are not meant to return what was really read from the received buffer.

In pretty much any socket script, $sockbr and $sock().rq will be different unless that script specifically reads all the bytes available with "/sockread $sock($sockname).rq &bv". "/sockread -n &bv" reads a line (not a line + extra bytes), it should be expected that $sockbr and $bvar() will have a difference in length of either 1 or two, depending on if the line was ending with $crlf or $lf.
Considering binvar are used to overcome the line length limit, it makes sense for a script to handle manually the byte in a binvar filled by /sockread -n, since trying to get all the text with $bvar(,1-).text would fail. Though it's not everyone who do that, which is why I think it should be ok to fix this.

I was not there at the time of the implementation, but to me, it is pretty obvious the feature was only tested using $bvar().text, which does not show the issue. This was missed, it was not considered okay.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Dec 2002
Posts: 5,244
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,244
Thanks for your comments everyone.


Link Copied to Clipboard