mIRC Home    About    Download    Register    News    Help

Print Thread
Page 1 of 2 1 2
Simple Web Grabber. #188505 25/10/07 01:49 AM
Joined: Oct 2007
Posts: 10
M
Moptop650 Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Oct 2007
Posts: 10
Hello, Could anyone show me a SIMPLE web grabber, that just gets text from a URL?

Re: Simple Web Grabber. [Re: Moptop650] #188506 25/10/07 01:58 AM
Joined: Oct 2004
Posts: 8,330
Riamus2 Offline
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
This is a very basic and general script to read data from a webpage.

Code:
alias Web {
  sockopen Web www.website.com 80
}

on *:sockopen:Web: {
  sockwrite -n GET /path/page.htm HTTP/1.0
  sockwrite -n Host: www.website.com
  sockwrite -n $sockname Accept: */* $+ $crlf $+ $crlf
}

on *:sockread:Web: {
  if ($sockerr) {
    echo -a Error.
    halt
  }
  else {
    var %temptext
    sockread %temptext
    echo -a %temptext
  }
}


Invision Support
#Invision on irc.irchighway.net
Re: Simple Web Grabber. [Re: Riamus2] #188507 25/10/07 01:59 AM
Joined: Oct 2007
Posts: 10
M
Moptop650 Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Oct 2007
Posts: 10
Thanks!

But is there a way that I don't need another copy of that per page I need to grab?

Re: Simple Web Grabber. [Re: Moptop650] #188510 25/10/07 02:04 AM
Joined: Oct 2004
Posts: 8,330
Riamus2 Offline
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Huh? That doesn't make sense. Can you explain better what you are looking for? You wanted a basic script to grab a webpage and this is it. Obviously, you will probably want to parse the page for just the relevant data. If there's something specific, please be specific in what you ask and give an example or explanation to help us to understand what you want.

EDIT:
Okay, after re-reading that again, I think I understand what you're looking for. If you aren't going to parse the page's data and just want to get the entire page from multiple site, you can just put the host and path\page into variables.

Last edited by Riamus2; 25/10/07 02:05 AM.

Invision Support
#Invision on irc.irchighway.net
Re: Simple Web Grabber. [Re: Riamus2] #188514 25/10/07 03:37 AM
Joined: Dec 2002
Posts: 503
B
Bekar Offline
Fjord artisan
Offline
Fjord artisan
B
Joined: Dec 2002
Posts: 503
If you want to, you can have a look at the wwwget.mrc routine.

You just /getdata <url>, and it dumps it locally..

(Yeah, it's old, but the concepts are sound.. I should really rewrite it)

Re: Simple Web Grabber. [Re: Riamus2] #188568 25/10/07 07:35 PM
Joined: Oct 2007
Posts: 10
M
Moptop650 Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Oct 2007
Posts: 10
How would I use this.. (Im a newb to coding..)

Like if someone were to type !tellme <something> it would grab website.com/page.php?a=<something> and say it.

Last edited by Moptop650; 25/10/07 07:44 PM.
Re: Simple Web Grabber. [Re: Moptop650] #188579 25/10/07 11:51 PM
Joined: Oct 2004
Posts: 8,330
Riamus2 Offline
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
You see, that's where you have to be specific. You asked for a basic script to get an entire webpage. You did not ask for how to get specific information from a specific website and that would be necessary for us to help you other than to say to parse the HTML that you get using things like $regex or $gettok.


Invision Support
#Invision on irc.irchighway.net
Re: Simple Web Grabber. [Re: Riamus2] #188585 26/10/07 01:47 AM
Joined: Oct 2007
Posts: 10
M
Moptop650 Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Oct 2007
Posts: 10
I am using PHP on the website I am grabbing, to limit the received data to JUST what I want. I am loads better at php then this, Lol.

So anyways, I got it working sorta. Currently it just outputs the http headers - my code is-

Code:
on *:TEXT:m~test:#:{ 
  echo Connecting... 
  /sockopen home home.moptop.info 80 
} 
on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET /index.php HTTP/1.1 
  sockwrite -n home Host: home.moptop.info 
  sockwrite -n home $crlf 
} 
on *:sockread:home:{ 
  echo Echoing Data... 
  sockread %temp 
  echo %temp 
} 


Its spits back...

HTTP/1.1 200 OK
Date: (Date of the request)
Server: Apache/2.0.59 (Unix)

And that stuff.

How do I fix that?

Last edited by Moptop650; 26/10/07 01:48 AM.
Re: Simple Web Grabber. [Re: Moptop650] #188588 26/10/07 03:31 AM
Joined: Aug 2005
Posts: 1,052
L
Lpfix5 Offline
Hoopy frood
Offline
Hoopy frood
L
Joined: Aug 2005
Posts: 1,052
This following script will return whatever text you put in this section here if ($regex(%x,/(thistext|orthistxt etc... change those 3 words all in between for a wider search if you want or just replace the whole if statement with if (myword isin %x) the following code also removes HTML data (I know you said php) just go with it.

Code:
on *:TEXT:m~test:#:{ 
  echo Connecting... 
  /sockopen home home.moptop.info 80 
} 
on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET /index.php HTTP/1.1 
  sockwrite -n home Host: home.moptop.info 
  sockwrite -n home $crlf 
} 
on 1:sockread:home:{
  if ($sockerr > 0) return 
  var %x | sockread %x
  if ($sockbr == 0) return 
  if (%x == $null) { return } 
  if ($regex(%x,/(thistext|orthistext|oreventhistext)/g) == 1) { 
    echo -a $nhtml(%x)
  }
}

alias -l nhtml { return $remove($regsubex($1-,/(^[^<]*>|<[^>]*>|<[^>]*$)/g,),$chr(9)) }


this next code returns the WHOLE line of text you search with without removing HTML code...

Code:
on *:TEXT:m~test:#:{ 
  echo Connecting... 
  /sockopen home home.moptop.info 80 
} 
on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET /index.php HTTP/1.1 
  sockwrite -n home Host: home.moptop.info 
  sockwrite -n home $crlf 
} 
on 1:sockread:home:{
  if ($sockerr > 0) return 
  var %x | sockread %x
  if ($sockbr == 0) return 
  if (%x == $null) { return } 
  if ($regex(%x,/(thistext|orthistext|oreventhistext)/g) == 1) { 
    echo -a %x
  }
}


Code:
if $reality > $fiction { set %sanity Sane }
Else { echo -a *voices* }
Re: Simple Web Grabber. [Re: Moptop650] #188591 26/10/07 06:00 AM
Joined: Feb 2005
Posts: 342
R
Rand Offline
Fjord artisan
Offline
Fjord artisan
R
Joined: Feb 2005
Posts: 342
Originally Posted By: Moptop650
I am using PHP on the website I am grabbing, to limit the received data to JUST what I want. I am loads better at php then this, Lol.

So anyways, I got it working sorta. Currently it just outputs the http headers - my code is-

Code:
on *:TEXT:m~test:#:{ 
  echo Connecting... 
  /sockopen home home.moptop.info 80 
} 
on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET /index.php HTTP/1.1 
  sockwrite -n home Host: home.moptop.info 
  sockwrite -n home $crlf 
} 
on *:sockread:home:{ 
  echo Echoing Data... 
  sockread %temp 
  echo %temp 
} 


Its spits back...

HTTP/1.1 200 OK
Date: (Date of the request)
Server: Apache/2.0.59 (Unix)

And that stuff.

How do I fix that?



Well, since I can't connect to this website, I can't really test anything. I've modified the code a bit as well.

Code:
alias home {
  if ($sock(home)) { sockclose home }
  echo -s *** Trying to connect to home.moptop.info
  sockopen home home.moptop.info 80
}

on *:sockopen:home:{ 
  if ($sockerr) { echo -s *** Can't connect. | return }
  var %% = sockwrite -n $sockname
  %% GET /index.php HTTP/1.0
  %% Host: home.moptop.info 
  %% 
} 
on *:sockread:home:{
  if ($sockerr) { echo -s *** Sock error. | return }
  var %s | sockread -fn %s
  while ($sockbr) {
    echo -s  $+ %s 
    sockread -fn %s
  }
}

Re: Simple Web Grabber. [Re: Moptop650] #188595 26/10/07 10:11 AM
Joined: Oct 2004
Posts: 8,330
Riamus2 Offline
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Your problem with receiving just the headers is almost guaranteed to be because you used HTTP/1.1 instead of HTTP/1.0 like I showed you. 1.1 gets data in chunks and can make trying to do anything with it a challenge. 1.0 is nice and easy to work with.


Invision Support
#Invision on irc.irchighway.net
Re: Simple Web Grabber. [Re: Riamus2] #188614 26/10/07 07:37 PM
Joined: Oct 2007
Posts: 10
M
Moptop650 Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Oct 2007
Posts: 10
Originally Posted By: Rand
Well, since I can't connect to this website, I can't really test anything.


Yeah that URL is linked to my home computer.. Thanks for the code, I'll try that in a few.

Riamus2, Ill try that too. Edit: Eh same thing =/

Edit: Rand, It works! It gets the content, but the headers are still there, how can I remove them?

Edit also: How can I get it to say its results to the channel it was called from?\

Last edited by Moptop650; 26/10/07 08:16 PM.
Re: Simple Web Grabber. [Re: Moptop650] #188620 26/10/07 09:32 PM
Joined: Feb 2005
Posts: 342
R
Rand Offline
Fjord artisan
Offline
Fjord artisan
R
Joined: Feb 2005
Posts: 342
Originally Posted By: Moptop650
Originally Posted By: Rand
Well, since I can't connect to this website, I can't really test anything.


Yeah that URL is linked to my home computer.. Thanks for the code, I'll try that in a few.

Riamus2, Ill try that too. Edit: Eh same thing =/

Edit: Rand, It works! It gets the content, but the headers are still there, how can I remove them?

Edit also: How can I get it to say its results to the channel it was called from?\


Well.. I'll edit this in a bit if you don't figure out how to queue messages so that you don't flood the channel off. For now, I need to nap, so you'll have to deal with a partial edit. smile

Code:
alias home {
  ; /home <chan|nick>
  if (!$1) { echo -a *** Invalid parameters. /home <chan|nick> | return }
  if ($sock(home)) { sockclose home }
  unset %home.*

  echo -s *** Trying to connect to home.moptop.info

  sockopen home home.moptop.info 80
  sockmark home $1
}

on *:sockopen:home_headers:{ 
  if ($sockerr) { echo -s *** Can't connect. | return }
  var %% = sockwrite -n $sockname
  %% GET /index.php HTTP/1.0
  %% Host: home.moptop.info 
  %% 
} 

on *:sockread:home:{
  if ($sockerr) { echo -s *** Sock error. | return }
  var %s | sockread -fn %s
  while ($sockbr) {
    if (%home.headers) {
      msg $sock($sockname).mark  $+ %s
    }
    elseif (%s == $null) { set %home.headers 1 }
    sockread -fn %s
  }
}
on *:sockclose:home:{ unset %home.* }

Re: Simple Web Grabber. [Re: Moptop650] #188643 27/10/07 02:08 AM
Joined: Aug 2005
Posts: 1,052
L
Lpfix5 Offline
Hoopy frood
Offline
Hoopy frood
L
Joined: Aug 2005
Posts: 1,052
Try this This will remove headers using HTTP 1.1 also it will display only the data entry from webpage and nothing HTML wise if you want that let me know else this will parse data has text base only no HTML tags

Code:
alias home { 
  echo -s *** Trying to connect to home.moptop.info 
  sockopen home home.moptop.info 80 
} 

on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET / HTTP/1.1 
  sockwrite -n home Host: home.moptop.info
  sockwrite -n home $crlf 
} 

on 1:sockread:home:{
  if ($sockerr > 0) return 
  var %x | sockread -fn %x
  if ($right($gettok(%x,1,32),1) == :) || (http/1.1 == $gettok(%x,1,32)) { return }
  elseif ($nhtml(%x) == $null) { return }
  else {
    echo -a $nhtml(%x)
  }
}

alias -l nhtml { return $remove($regsubex($1-,/(&nbsp;|^[^<]*>|<[^>]*>|<[^>]*$)/g,),$chr(9)) }


Code:
if $reality > $fiction { set %sanity Sane }
Else { echo -a *voices* }
Re: Simple Web Grabber. [Re: Lpfix5] #188647 27/10/07 02:14 AM
Joined: Oct 2007
Posts: 10
M
Moptop650 Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Oct 2007
Posts: 10
That doesn't seem to want to return the data.

Quote:
*** Trying to connect to home.moptop.info
Trying to communicate...


Re: Simple Web Grabber. [Re: Moptop650] #188650 27/10/07 02:25 AM
Joined: Aug 2005
Posts: 1,052
L
Lpfix5 Offline
Hoopy frood
Offline
Hoopy frood
L
Joined: Aug 2005
Posts: 1,052
ya thats because you have 0 HTML and 1 word

try adding a few words and html tags like this

<HTML>
<TITLE>MYTEST</TITLE>
<FONT COLOR="RED">word</FONT>
</HTML>

it should pick up proper data

If you want a better test replace home.moptop.info with www.mirc.com and try the event it will return txt base only


Code:
if $reality > $fiction { set %sanity Sane }
Else { echo -a *voices* }
Re: Simple Web Grabber. [Re: Lpfix5] #188651 27/10/07 02:29 AM
Joined: Oct 2007
Posts: 10
M
Moptop650 Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Oct 2007
Posts: 10
Yay that works! ^_^

Thanks for the help everyone!

Re: Simple Web Grabber. [Re: Moptop650] #188653 27/10/07 02:40 AM
Joined: Aug 2005
Posts: 1,052
L
Lpfix5 Offline
Hoopy frood
Offline
Hoopy frood
L
Joined: Aug 2005
Posts: 1,052
:P MiRc is stubborn to fetch data if it equals 1 word or little text for some reason maybe mostly all the time if no HTML tags are in Don't forget that a true website would contain html tags even if its coded by php, now maybe an alternative route to fetching word data would be to have a .txt file online like

home.moptop.info/mytest.txt :P


Code:
if $reality > $fiction { set %sanity Sane }
Else { echo -a *voices* }
Re: Simple Web Grabber. [Re: Lpfix5] #188654 27/10/07 02:57 AM
Joined: Oct 2007
Posts: 10
M
Moptop650 Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Oct 2007
Posts: 10
Yes, I know smile I plan to only use this to grab stuff from my site. (The scripts on my site are made for this, they get stuff from other sites.)

Re: Simple Web Grabber. [Re: Lpfix5] #188658 27/10/07 10:09 AM
Joined: Oct 2004
Posts: 8,330
Riamus2 Offline
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Originally Posted By: Lpfix5
:P MiRc is stubborn to fetch data if it equals 1 word or little text for some reason maybe mostly all the time if no HTML tags are in


I haven't had any issues where a txt file only has a version # as an update check without HTML or other text. Use HTML/1.0 as mentioned and there should be no problem with that. HTML/1.1 shouldn't really be used in most cases as HTML/1.0 usually works better for grabbing socket data.


Invision Support
#Invision on irc.irchighway.net
Page 1 of 2 1 2