mIRC Home    About    Download    Register    News    Help

Topic Options
#158540 - 05/09/06 09:48 PM Identifying URL's using regex
OrionsBelt Offline
Fjord artisan

Registered: 25/04/06
Posts: 464
Loc: Amsterdam, NL
I've been seeing regex strings coming by many times now.
And it seems to be a very powerfull feature.

I have a small problem with catching / recognising URL's.
The URL's are not always being recognised. For example ftp addresses, or irc server addresses.
The below code, I use for storing the URL's that I click and adding it to an EventsLog window.

Code:
on ^1:HOTLINK:*http*:*:{
  if (http* iswm $1) return
  halt
}

on ^1:HOTLINK:*www*:*:{
  if (www* iswm $1) return
  halt
}

on 1:HOTLINK:*:*:{
  url -an $1
  write url-catcher.txt $date $time $1 ( $+ $iif($chan == $null,Private,$chan) $+ )
  echo @EventsLog $timestamp 5URL detected:
  echo @EventsLog $timestamp You just clicked: 12 $+ $1- $+ 
  echo @EventsLog -
}


Can anyone help me to improve this code?
I would like it to be a bit smarter, recognising as much as possible valid URL's. All if possible grin

Thanks in advance.


Edited by OrionsBelt (05/09/06 09:55 PM)

Top
#158541 - 06/09/06 11:05 AM Re: Identifying URL's using regex
Rand Offline
Fjord artisan

Registered: 28/02/05
Posts: 342
This is something I made to help someone else. Basically, you shift click the link to add it to your (mIRC's builtin) URL list.

Code:
on ^$*:hotlink:/[^[:alnum:]]*(?:((http|ftp)\x3A\x2F{2}\S+)|(www\.\S+\.\S+))/i:*:{ }
on $*:hotlink:/[^[:alnum:]]*(?:((http|ftp)\x3A\x2F{2}\S+)|(www\.\S+\.\S+))/i:*:{
  if ($mouse.key & 4) {
    url -i @ $regml(1)
    echo -ag Url Added: $regml(1)
    return 
  }
  run $1-
}


This will match any URL starting with http://, ftp://, or www.*.* (as in, www DOT something DOT (has to be two dots in there.)

As far as recognizing addresses go.. That's a lot tougher.

As it's hard to tell what is and isn't a link. It would take a lot more regex then it's worth. But, for examples sake, you could be cheap and just do this:

Code:
on ^$*:hotlink:/[^[:alnum:]]*(?:((http|ftp)\x3A\x2F{2}\S+)|(www\.\S+\.\S+)|(\S+\.(?:com|org|net|biz|us|ru)\S*))/i:*:{ }
on $*:hotlink:/[^[:alnum:]]*(?:((http|ftp)\x3A\x2F{2}\S+)|(www\.\S+\.\S+)|(\S+\.(?:com|org|net|biz|us|ru)\S*))/i:*:{
  if ($mouse.key & 4) {
    url -i @ $regml(1)
    echo -ag Url Added: $regml(1)
    return 
  }
  run $1-
}


This should recognize anything that has a .com / .net / .org / .ru / .biz / .us in it. Though it's never guaranteed that they are infact links.

But that should do what you need.

Just remember, the example above requires you to hold shift, and double click the link to get it to add it to the URL list. Otherwise if you just double click it, it'll open the link in your browser.

Top
#158542 - 06/09/06 10:23 PM Re: Identifying URL's using regex
OrionsBelt Offline
Fjord artisan

Registered: 25/04/06
Posts: 464
Loc: Amsterdam, NL
Thank you for that Rand.
I'm gonna try to integrate those regex strings in my current code. I hope there is some way to make that work.

Thx again :tongue:

Top
#158543 - 07/09/06 01:45 AM Re: Identifying URL's using regex
genius_at_work Offline
Hoopy frood

Registered: 08/10/05
Posts: 1741
I use this regex in onACTION/NOTICE/TEXT events to catch links (and server spam):

Code:
on *:TEXT:*:#:{
  var %text = $1-

  ;*** Ignore non-links ***
  var %islink = $false
  if ($regex(%text,/(www\.|http\:|\.com|\.net|\.org|irc\.|\/server)/i)) %islink = $true
  if ($regex($3-,/[aeo]wwww*\./ig)) %islink = $false
  if (!%islink) return

  echo -a $nick posted a link in $chan
}


It's fairly accurate in finding links, and it doesn't match if someone says "awww..." or "owww..." etc.

-genius_at_work

Top