mIRC Home    About    Download    Register    News    Help

Print Thread
Identifying URL's using regex #158540 05/09/06 08:48 PM
Joined: Apr 2006
Posts: 464
O
OrionsBelt Offline OP
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
I've been seeing regex strings coming by many times now.
And it seems to be a very powerfull feature.

I have a small problem with catching / recognising URL's.
The URL's are not always being recognised. For example ftp addresses, or irc server addresses.
The below code, I use for storing the URL's that I click and adding it to an EventsLog window.

Code:
on ^1:HOTLINK:*http*:*:{
  if (http* iswm $1) return
  halt
}

on ^1:HOTLINK:*www*:*:{
  if (www* iswm $1) return
  halt
}

on 1:HOTLINK:*:*:{
  url -an $1
  write url-catcher.txt $date $time $1 ( $+ $iif($chan == $null,Private,$chan) $+ )
  echo @EventsLog $timestamp 5URL detected:
  echo @EventsLog $timestamp You just clicked: 12 $+ $1- $+ 
  echo @EventsLog -
}


Can anyone help me to improve this code?
I would like it to be a bit smarter, recognising as much as possible valid URL's. All if possible grin

Thanks in advance.

Last edited by OrionsBelt; 05/09/06 08:55 PM.
Re: Identifying URL's using regex #158541 06/09/06 10:05 AM
Joined: Feb 2005
Posts: 342
R
Rand Offline
Fjord artisan
Offline
Fjord artisan
R
Joined: Feb 2005
Posts: 342
This is something I made to help someone else. Basically, you shift click the link to add it to your (mIRC's builtin) URL list.

Code:
on ^$*:hotlink:/[^[:alnum:]]*(?:((http|ftp)\x3A\x2F{2}\S+)|(www\.\S+\.\S+))/i:*:{ }
on $*:hotlink:/[^[:alnum:]]*(?:((http|ftp)\x3A\x2F{2}\S+)|(www\.\S+\.\S+))/i:*:{
  if ($mouse.key & 4) {
    url -i @ $regml(1)
    echo -ag Url Added: $regml(1)
    return 
  }
  run $1-
}


This will match any URL starting with http://, ftp://, or www.*.* (as in, www DOT something DOT (has to be two dots in there.)

As far as recognizing addresses go.. That's a lot tougher.

As it's hard to tell what is and isn't a link. It would take a lot more regex then it's worth. But, for examples sake, you could be cheap and just do this:

Code:
on ^$*:hotlink:/[^[:alnum:]]*(?:((http|ftp)\x3A\x2F{2}\S+)|(www\.\S+\.\S+)|(\S+\.(?:com|org|net|biz|us|ru)\S*))/i:*:{ }
on $*:hotlink:/[^[:alnum:]]*(?:((http|ftp)\x3A\x2F{2}\S+)|(www\.\S+\.\S+)|(\S+\.(?:com|org|net|biz|us|ru)\S*))/i:*:{
  if ($mouse.key & 4) {
    url -i @ $regml(1)
    echo -ag Url Added: $regml(1)
    return 
  }
  run $1-
}


This should recognize anything that has a .com / .net / .org / .ru / .biz / .us in it. Though it's never guaranteed that they are infact links.

But that should do what you need.

Just remember, the example above requires you to hold shift, and double click the link to get it to add it to the URL list. Otherwise if you just double click it, it'll open the link in your browser.

Re: Identifying URL's using regex #158542 06/09/06 09:23 PM
Joined: Apr 2006
Posts: 464
O
OrionsBelt Offline OP
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Thank you for that Rand.
I'm gonna try to integrate those regex strings in my current code. I hope there is some way to make that work.

Thx again :tongue:

Re: Identifying URL's using regex #158543 07/09/06 12:45 AM
Joined: Oct 2005
Posts: 1,741
G
genius_at_work Offline
Hoopy frood
Offline
Hoopy frood
G
Joined: Oct 2005
Posts: 1,741
I use this regex in onACTION/NOTICE/TEXT events to catch links (and server spam):

Code:
on *:TEXT:*:#:{
  var %text = $1-

  ;*** Ignore non-links ***
  var %islink = $false
  if ($regex(%text,/(www\.|http\:|\.com|\.net|\.org|irc\.|\/server)/i)) %islink = $true
  if ($regex($3-,/[aeo]wwww*\./ig)) %islink = $false
  if (!%islink) return

  echo -a $nick posted a link in $chan
}


It's fairly accurate in finding links, and it doesn't match if someone says "awww..." or "owww..." etc.

-genius_at_work