mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
Hi people

What is the most efficient way to check if a URL is in a string? i have a feeling it may be a regex. Can anyone help me?

Thanks in advance

Joined: Jul 2006
Posts: 3,960
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,960
Have you make a search on this forum ?
look here http://forums.mirc.com/ubbthreads.php?ub...true#Post192127


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
Hi
It does not look like its going to help me as the OP of that post wanted something else. I do not know how to implement it as i know very little regex. does anyone else perhaps have a suggestion?

thank you kindly

Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
Hi people
I have tried searching the forum again and cannot find a satisfactory post. For example this is not a reliable way of checking and that check is looking to ban advertisers on a channel although it would ban anyone typing www. All i am looking for is a reliable way to check that if a URL is in a string to return true.

Thanks

Joined: Jul 2008
Posts: 24
W
Ameglian cow
Offline
Ameglian cow
W
Joined: Jul 2008
Posts: 24
I'd say it depends on the type of URL you wish to process. you could get the regex to look for 'http://' or 'www.'. But it is not necessary to use regex per se. Once you've worked out what regex should search for, it shouldn't be to hard.

Why not just use:
Code:
if (*http://* isin $1-) || (*www.* isin $1-) { Do something here }


The regex i got is:
Code:
$regex(test, $1-, http://.+|www\..+)


The regex will not work for only 'www.' or 'http://' it needs to have 'www.somethingmorehere' or 'http://this could be a web address'


=======================
Count WhipLash
Services Administrator
KnightNet
=======================
Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
Hi
Thanks for your reply. the problem with your fist solution is that if the string contains just http:// or "this is a www string" it will return true even though technically there is not a URL in the string.
Your regex is a little better but not perfect. Is there any better more assured way?
I for now will use your regex.

Thanks

Joined: Jul 2008
Posts: 24
W
Ameglian cow
Offline
Ameglian cow
W
Joined: Jul 2008
Posts: 24
you see, im not sure what you plan on using the script for. Adding more to the regex is not easy considering the range of possiblities in web address today. I'll play some more though and get back to you. smile

Last edited by WhipLash; 16/07/08 12:09 PM.

=======================
Count WhipLash
Services Administrator
KnightNet
=======================
Joined: Jul 2006
Posts: 3,960
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,960
Code:
alias isurl return $iif($regex($1-,/\b(\^@\S+|www\.\S+|http://\S+|irc\.\S+|irc://\S+|\w+(?:[\.-]\w+)?@\w+(?:[\.-]\w+)?\.[a-z]{2,4})\b/gi),$iif($prop,$regml($v1),$true))

$isurl(www.mirc.com) return $true
$isurl(www.mirc.com www.GaisGa.com).2 return www.GaisGa.com


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Mar 2007
Posts: 139
S
Solo1 Offline OP
Vogon poet
OP Offline
Vogon poet
S
Joined: Mar 2007
Posts: 139
woah laugh that looks complicated. I will try it, thanks

Joined: Jun 2008
Posts: 48
T
Ameglian cow
Offline
Ameglian cow
T
Joined: Jun 2008
Posts: 48
Hey. I saw this post yesterday and since I have recently decided I would take on regex I figured this would be perfect practice.
I will show what I have built and then compare it to the code wims provided.
I use regexbudy to build and test the regex's but I also throw one into mirc occasionally to make sure the results are matching up.
Here is the code I have come up with so far. I will still be working on it to try and see if I can improve on it.
Quote:
\b(?:(?:htt|ft)ps?://(?:www|ftp\.)?|www|ftp\.).*(?:\.[a-z]{2,4})(?::\d+)?(?:/\w+(?:/\w*/*)*|(?:\.[a-z]{2,4})|\?\S*)*\b

It responds to the url in every one of the following lines.
Quote:
www.sss.com/hello.html hello
hello and welcome to www.regexbuddy.com hello
http://www.regexbuddy.com/ hello
Yes, http://www.regexbuddy.com/index.html is a link!
https://www.regexbuddy.com/index.html?source=library
You can download RegexBuddy at http://www.regexbuddy.com/download.html.
hello http://host.com:21/directory/path/
hello ftp.host.com:21/directory/path/file.ext hello
hello ftp://ftp.host.com:21/directory/path/file.ext hello
hello ftp://host.com:21/directory/path/file.ext hello
hello ftp://user:password@host.ext:21/path/ hello
And lastly in ftp's is ftp://ftp.com hello
hello http://user:password@host.ext/path hello
hello http://user:password@host.ext:21/path/directory/hello.html?1=2aaa hello


Wims code is
Quote:
\b(\^@\S+|www\.\S+|http://\S+|irc\.\S+|irc://\S+|\w+(?:[\.-]\w+)?@\w+(?:[\.-]\w+)?\.[a-z]{2,4})\b

And responds to
Quote:
www.sss.com/hello.html hello
hello and welcome to www.regexbuddy.com hello
http://www.regexbuddy.com/ hello
Yes, http://www.regexbuddy.com/index.html is a link!
https://www.regexbuddy.com/index.html?source=library
You can download RegexBuddy at http://www.regexbuddy.com/download.html.
hello http://host.com:21/directory/path/
hello http://user:password@host.ext/path hello
hello http://user:password@host.ext:21/path/directory/hello.html?1=2aaa hello

But misses
Quote:
hello ftp.host.com:21/directory/path/file.ext hello
hello ftp://ftp.host.com:21/directory/path/file.ext hello
hello ftp://host.com:21/directory/path/file.ext hello
And lastly in ftp's is ftp://ftp.com hello
hello ftp://user:password@host.ext:21/path/ hello


I feal like I should say thanks because I really did learn a LOT making that regex. I'm sure someone more experienced in regex could find some problems with it but it works and I feal like I have a much better grasp on regex all together.

If I update the code Ill throw the new code in a new reply. I know I plan on trying to see if I can still make it better or find any problems in it or just make it smaller. I also have a friend thats decent with regex so I plan on running the line by him whenever I see him. First tho I need food and to let my eyes regain some focus. lol.

Good luck.


I've gone to look for myself. If I should return before I get back, please keep me here.
Joined: Jul 2006
Posts: 3,960
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 3,960
I've simply extracted the pattern from the link i gave above and it simply doesn't match for "ftp"

Last edited by Wims; 17/07/08 11:16 AM.

#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Jun 2008
Posts: 48
T
Ameglian cow
Offline
Ameglian cow
T
Joined: Jun 2008
Posts: 48
I have to admit, I was wondering why you decided to include irc and not ftp.

Sorry if it seamed like I was singling out your regex but it was the only other good one in this thread so naturally I compared to it. Besides, like I said, I've just recently taken on regex on a more serious level so I cannot promise my regex is error free and I also believe it could be smaller.

Last edited by Typos; 17/07/08 11:58 AM.

I've gone to look for myself. If I should return before I get back, please keep me here.
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
(?:www|ftp\.)?|www|ftp\.)

In this section of your regex you should be matching . outside of the brackets. At the moment that will match "www" or "ftp.", not "www." or "ftp." like it should.

It would also be a good idea to match a single digit directly after "www" so that it will work with things like "www2.google.com"

Updated regex:

Code:
\b(?:(?:htt|ft)ps?://(?:www\d?\.|ftp\.)?|www\d?|ftp)\..*(?:\.[a-z]{2,4})(?::\d+)?(?:/\w+(?:/\w*/*)*|(?:\.[a-z]{2,4})|\?\S*)*\b

Joined: Jun 2008
Posts: 48
T
Ameglian cow
Offline
Ameglian cow
T
Joined: Jun 2008
Posts: 48
Thank you very much for catching that hixxy. I decided while I was fixing the regex since I couldn't see the code in your post because it was in a little code box that was so short I couldnt see the text that I would also get rid of the second www|ftp part. I hope I got the digit idea right, it did past testing so I'm sure its prolly exactly what you did. I think I'll use firefox next time I come here so that doesnt happen again.
Quote:
\b(?:(?:(?:htt|ft)ps?://)|(?:www\d?|ftp)\.).*(?:\.[a-z]{2,4})(?::\d+)?(?:/\w+(?:/\w*/*)*|(?:\.[a-z]{2,4})|\?\S*)*\b

I just got on the pc for the first time today so I will be looking at this more to see how else I can improve it like I said I would be trying to do in my earlier post.

Im sure you notice I use quote boxes for my code. They dont shrink to an unusable size on me like the text boxes. Very very strange.



I've gone to look for myself. If I should return before I get back, please keep me here.
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
Yep that's very similar to what I posted. smile


Link Copied to Clipboard