mIRC Home    About    Download    Register    News    Help

Print Thread
Hotlink's URL parsing algorithm #2945 22/12/02 01:20 AM
Joined: Dec 2002
Posts: 15
G
gpn Offline OP
Pikka bird
OP Offline
Pikka bird
G
Joined: Dec 2002
Posts: 15
The algorithm used to parse out URLs when you double-click them is still not quite right.

If you click a URL that is properly enclosed in angle brackets (like all good URLs in a text based context should be), it will take the trailing > sign and sometimes additional characters as well, and include it in the URL, which will of course fail when it is passed to the browser, and generate a 404 error at best.

I think it's fine that the rules are relaxed enough to find and process URLs that are not enclosed in angle brackets, but it should certainly correctly handle those that are are, rather than penalizing them. URL processing, finding a candidate URL between angle brackets should use only the text between the brackets. A > character is never valid in a URL (that's why it was chosen as the standard delimiter).


Re: Hotlink's URL parsing algorithm #2946 23/12/02 04:31 PM
Joined: Dec 2002
Posts: 2,809
C
codemastr Offline
Hoopy frood
Offline
Hoopy frood
C
Joined: Dec 2002
Posts: 2,809
The rules you are referring to apply to news and email. They were not designed to apply to URLs in all context. The reason they are for news and email is because of hard line wrapping. Many email clients insert a \r\n after a certain column length, this would cause a URL to be split over multiple lines. IRC doesn't have this problem. Neither mIRC nor any other client I've used has any problems with understanding a URL split over two lines, so having <> around it wouldn't be of any help. And if it really annoys you, mIRC allows you to customize the hotlinks to your liking.

/help on hotlink

Re: Hotlink's URL parsing algorithm #2947 27/12/02 05:46 AM
Joined: Dec 2002
Posts: 15
G
gpn Offline OP
Pikka bird
OP Offline
Pikka bird
G
Joined: Dec 2002
Posts: 15
Well, you're entitled to your opinion, but the fact remains that this is a bug. You may like it broken, but that does not mean it is not broken.

The reason the > and < characters were chosen for URLs delimiters, is that the characters are prohibited within a URL, so parsing them into a URL field, and passing them to a browser is, by definition, a bug.

Your reasoning that < and > are unneccesary except for line breaks is partially correct, but irrelevant. Even if they were never needed, adding them should never break something.

Another reason for the use of < and > is that some punctuation characters, such as the period, ARE allowed in URLs but are also present in common sentences (on IRC as well as in e-mail and news and everyday writing). Use of the correct delimiters will eliminate ambiguous situations where it is not possible to tell if a trailing period is the name of a file or path, or is the end of the sentence containing the URL. Well, they WOULD, if mIRC worked correctly.

But whether you like the style or not is really of no importance. Since they are invalid characters, any correctly functioning URL parsing algorithm MUST stop scanning when it reaches one. It's not about what I need or want or what I can or cannot script (of course I can work around most bugs, but why should I have to?)

Oh, and the style is not limited to news and email. It is also part of the MLA style standard for all scholarly works, and by extension, all knowledgable publications in the English Language. Just a minor point. smirk

Re: Hotlink's URL parsing algorithm #2948 27/12/02 07:39 AM
Joined: Dec 2002
Posts: 1,321
H
Hammer Offline
Hoopy frood
Offline
Hoopy frood
H
Joined: Dec 2002
Posts: 1,321
I strongly object to your assertion that MLA is the only standard for scholarly publications when, in fact, that statement is not true. APA is more widely used in non-English/non-Journalism scholarly publications than MLA is; Kate Turabian still has her proponents out there too. Each of those styles are basically the same, but they do differ in very important ways.

You seem to be upset that mIRC is not anticipating every possible occurance of a web page address because it does not strip out some characters, to whit, < and > (commonly refered to as angle brackets) . You have not mentioned mIRC automatically stripping control codes yet, but I'm assuming that will be forthcoming shortly. While we're talking about modifying what mIRC should and should not tamper with, exactly how do you suggest dealing with those malformed URLs that are commonplace in almost all media now?

Everyone (intentional hasty generalization) knows that if you are told to go to amazon.com, you open your browser and type in http://www.amazon.com and then click the "Go" button or depress the Enter key and off you go, ordering nifty media over the Internet. Should mIRC recognize such things as amazon.com or ebay.com as URLs and allow you to double-click them? Common usage indicates that it should; strictest adherence to some RFC written ages ago, however, may preclude such an adherence for the purists. (That argument continues that those purists should go back to using Telnet as their IRC client.)

How about other forms of puntuation that are commonly concatenated with a valid URL? I frequently see someone say something like:

<Someone> That's on http://www.thissite.com/thispage...but I haven't been there in a while.

(Note how this .php page messes up the URL as well and that is its business!) Now obviously, a human can look at that URL and tell that it stops at the first period (full stop) and that you might have to add .html or .htm or something similar to make the page show up in your browser...these days, .php is a new popular extension, though you will also see extensions over 3 or 4 letters with regularity. So, obviously we cannot use the space character.

You advocate using < and > as delimiters and this will solve the problem. Yeah right! Good luck training the general public to ALWAYS, ALWAYS, ALWAYS enclose their URLs in angle brackets. Complaining about something like this is petty; it's exactly like the time I emailed Eudora because their email client cannot correctly handle a validly-formed mailto string (while even Outlook Express can!).

What was the problem? Eudora (and a few other email clients) cannot correctly handle %0D%0A in the body section of a mailto string.
Did it get changed? No.
Why not? Because in the long run, it doesn't matter. There are more important bugs to fix.
Is it an anomaly or does it "break" a protocol that some folks agreed on and wrote down? Perhaps. I didn't agree to it, though I did agree with it.
Is it high on the list of things to fix? I wouldn't think so.
Will it ever get fixed to your satisfaction? Perhaps. smirk

Quote:
But whether you like the style or not is really of no importance. Since they are invalid characters, any correctly functioning URL parsing algorithm MUST stop scanning when it reaches one.

Speaking as a programmer and systems analyst, I believe you are confusing functions here. The browser of your choice should be the process figuring out what is and what is not a valid URL for it to try to access. True, perhaps it would help if the process feeding the URL to the browser could get it close, but I should think that a well-written browser should be able to parse a URL; wouldn't you think so, too? That is, after all, a major part of what a browser does; conversely, mIRC is not a browser, though it can launch a browser when it sees what it thinks might be a URL. mIRC even provides a way (since 23/09/99 - mIRC v5.61) for the end user to customize, through script, exactly what hotlinks are and what are not. This thread even shows how to strip those naughty angle brackets out, as well as a few other characters that seem to give folks fits when they double-click. I think that's pretty darned good, really. It has made it easier for us to get to web pages that our friends show us.

Don't forget. Hotlinks originally were intended to be used for nicks and channels. Khaled just expanded that idea to include web pages and came up with a way to let mIRC launch a browser using that URL. mIRC is not a URL parser, it was never intended to be; it is an Internet Relay Chat program that has, over the years, blossomed into a multimedia extravaganza. Your mIRC can start up movies now, bring up web pages in a browser, play MP3s ... and STILL let you chat with your friends from around the world.

mIRC is not forced to submit to MLA standards, nor APA, nor Kate L. Turabian's (may she rest in peace) nor even any (outdated?) RFCs you may have read. I find it rather amazing that mIRC can access so many different IRC networks and work on them all, even though many of them break protocols all the time, or even write their own (re: IRCnet). mIRC is the brainchild and play toy of one man, Khaled Mardam-Bey. It just so happens that many of us like his toy and ask to play with it too. It seems to me that this particular request is asking mIRC's creator to alter his toy that he designed and built because you don't like one of the ways it works since it does not follow some ordinance written somewhere.

Mind you, I am objecting to your tone and your legalistic approach, not the fact that you are making a suggestion. Suggestions are quite welcome; I frequently make them myself. Some suggestions make it into mIRC and some don't. In the end, those suggestions that Khaled likes and wants to add to his toy, he does. The rest are filled for future consideration.


DALnet: #HelpDesk and #m[color:#FF0000]IR[color:#EEEE00]C
Re: Hotlink's URL parsing algorithm #2949 01/01/03 10:56 PM
Joined: Dec 2002
Posts: 15
G
gpn Offline OP
Pikka bird
OP Offline
Pikka bird
G
Joined: Dec 2002
Posts: 15
Bah, you're hopeless.

You don't like my tone? Sheesh, what do you call the self-righteous prose you've been writing.

Look, it was NOT a feature suggestion, Sparky, it was a bug report. I found a bug, I reported it, my conscience is clear, my job is done. I don't know where you figure into this at all. Did I ask for opinions? I just reported a bug. Period. Your free advice is worth every penny.

You can argue all you like about how the RFC's are outdated or not. The point is that NO standard, be it the original HTML RFC, or anything to come out of ANY standards organization since, has EVER allowed the use of < and > (yes I know what they're commonly called) within a URL.

Quite the contrary, EVERY standard explicitly PROHIBITS their use within a URL.

Deny it or moan about it all you want, but that's the fact. The characters cannot be in a URL. (That may or may not have anything to do with why they were chosen as delimiters. Let's pretend we don't even know there is such a thing as a delimiter.)

So it makes no sense that an algorithm that claims to be attempting to do its best to parse what it thinks might be URLs into a buffer would continue to copy illegal characters that cannot possibly work. It is flat out GUARANTEED to produce an invalid URL, first time, every time, 100% of the time.

Maybe there are other tweaks that could be made to improve URL recognition, especially the bogus "informal" notations you referred to, and maybe there are not. That's neither here nor there. I would not presume to label the inability to recognize Amazon.com as a URL as some sort of bug.

But the failure to recognize that <http://www.amazon.com/> is a correct URL definitely IS a bug, since the URL is fine, recognizable, and meets all applicable standards and permissives, yet mIRC screws it up.

The browser's job is not to correct malformed URLs, especially when the weren't really malformed, except by a bug in mIRC. If you pass or type a URL such as crunk>flap.com into the browser, it will tell you (quite correctly) that there is no such address.

It would seem to me that it WOULD be mIRC's job to at least attempt to pass CORRECT URLs to the browser, insofar as it was able. If it were my program, I would take pride in seeing that it did, but that's just me. Clearly, blithely passing incorrect characters, characters that are KNOWN to be incorrect under ALL circumstances, cannot improve the outcome.

I would think (though you seemingly disagree) that the author would LIKE to know if his software were doing something incorrect. That's why there is a bug report forum. In fact, I am confident that that IS his attitude.

Since you are not he, and you show no desire to be part of the solution, the most helpful thing you could do is to get out of the way.

I know perfectly well how to work around this bug using hotlink events. (I have written many of them, and mine all work just fine, because I know how to write, implement, and test code for bugs.)

You seem to by consciously trying to ignore the fact that this bug effectively bothers ONLY people who have no desire or ability to write scripts, which is almost everybody. They just want to click on the nice links.

You seem to think you're "defending" Khaled by sticking up for what is, in this case bad code. Maybe you are, but you're not doing him a favor. Any programmer worth his pocket protector would rather have bug free code than empty compliments. If he doesn't want to fix this bug then that is his choice. Of course we haven't heard him say anything of the sort, so my comments stand.

I'm trying to help him improve the program by reporting bugs I have encountered. I have found one or two in nearly every version of mIRC I've worked with (since way before color codes). In each case, I have reported them. In almost every case, Khaled has fixed them.

I haven't figured out what your function is, but it does not seem to be facilitating the above-mentioned process.

If you don't like my tone, I refer you to complaint.com

Re: Hotlink's URL parsing algorithm #2950 01/01/03 11:22 PM
Joined: Dec 2002
Posts: 2,962
S
starbucks_mafia Offline
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Could I just ask one thing? What the hell are you talking about?

mIRC (6.03) doesn't send the angle brackets as part of the URL provided that they are at the beginning and/or end of the URL, which they would have to be if the person meant them to delimit the URL. Your example URL <http://www.amazon.com/> works perfectly when double clicked in mIRC.

If anything, mIRC is too loose with treating them as delimiters, since < and > can be used validly as GET arguments - but mIRC would still ignore the final > in something like http://www.mirc.com/blah?arnie=moo> despite no leading <.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Re: Hotlink's URL parsing algorithm #2951 01/01/03 11:47 PM
Joined: Dec 2002
Posts: 3,138
C
Collective Offline
Hoopy frood
Offline
Hoopy frood
C
Joined: Dec 2002
Posts: 3,138
He means
If someone clicks "<http://www.pokemon.com>" it will send them to http://www.pokemon.com
If someone clicks "<http://www.pokemon.com>..." it will send them to http://www.pokemon.com/>..
And if someone clicks ".<http://www.pokemon.com>" it won't send them anywhere.

Re: Hotlink's URL parsing algorithm #2952 01/01/03 11:53 PM
Joined: Dec 2002
Posts: 2,962
S
starbucks_mafia Offline
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
If someone is following 'procedure' so much that they bother to put <>'s around a URL I don't think they'd be stupid enough to have adjoining characters. If they are then they would quickly learn otherwise. Not a bug IMO.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Re: Hotlink's URL parsing algorithm #2953 03/01/03 11:13 PM
Joined: Dec 2002
Posts: 15
G
gpn Offline OP
Pikka bird
OP Offline
Pikka bird
G
Joined: Dec 2002
Posts: 15
No, that's exactly backwards. The main reason someone WOULD use angle brackets is to make sure the URL is separated from adjoining text and punctuation marks, such as sending someone to <http://www.mirc.com>. In that example, mirc does not handle it properly, and tries to connect to "www.mirc.com>., which is not a valid URL, even though it's written properly.