mIRC Homepage
Posted By: Moptop650 Simple Web Grabber. - 25/10/07 01:49 AM
Hello, Could anyone show me a SIMPLE web grabber, that just gets text from a URL?
Posted By: Riamus2 Re: Simple Web Grabber. - 25/10/07 01:58 AM
This is a very basic and general script to read data from a webpage.

Code:
alias Web {
  sockopen Web www.website.com 80
}

on *:sockopen:Web: {
  sockwrite -n GET /path/page.htm HTTP/1.0
  sockwrite -n Host: www.website.com
  sockwrite -n $sockname Accept: */* $+ $crlf $+ $crlf
}

on *:sockread:Web: {
  if ($sockerr) {
    echo -a Error.
    halt
  }
  else {
    var %temptext
    sockread %temptext
    echo -a %temptext
  }
}
Posted By: Moptop650 Re: Simple Web Grabber. - 25/10/07 01:59 AM
Thanks!

But is there a way that I don't need another copy of that per page I need to grab?
Posted By: Riamus2 Re: Simple Web Grabber. - 25/10/07 02:04 AM
Huh? That doesn't make sense. Can you explain better what you are looking for? You wanted a basic script to grab a webpage and this is it. Obviously, you will probably want to parse the page for just the relevant data. If there's something specific, please be specific in what you ask and give an example or explanation to help us to understand what you want.

EDIT:
Okay, after re-reading that again, I think I understand what you're looking for. If you aren't going to parse the page's data and just want to get the entire page from multiple site, you can just put the host and path\page into variables.
Posted By: Bekar Re: Simple Web Grabber. - 25/10/07 03:37 AM
If you want to, you can have a look at the wwwget.mrc routine.

You just /getdata <url>, and it dumps it locally..

(Yeah, it's old, but the concepts are sound.. I should really rewrite it)
Posted By: Moptop650 Re: Simple Web Grabber. - 25/10/07 07:35 PM
How would I use this.. (Im a newb to coding..)

Like if someone were to type !tellme <something> it would grab website.com/page.php?a=<something> and say it.
Posted By: Riamus2 Re: Simple Web Grabber. - 25/10/07 11:51 PM
You see, that's where you have to be specific. You asked for a basic script to get an entire webpage. You did not ask for how to get specific information from a specific website and that would be necessary for us to help you other than to say to parse the HTML that you get using things like $regex or $gettok.
Posted By: Moptop650 Re: Simple Web Grabber. - 26/10/07 01:47 AM
I am using PHP on the website I am grabbing, to limit the received data to JUST what I want. I am loads better at php then this, Lol.

So anyways, I got it working sorta. Currently it just outputs the http headers - my code is-

Code:
on *:TEXT:m~test:#:{ 
  echo Connecting... 
  /sockopen home home.moptop.info 80 
} 
on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET /index.php HTTP/1.1 
  sockwrite -n home Host: home.moptop.info 
  sockwrite -n home $crlf 
} 
on *:sockread:home:{ 
  echo Echoing Data... 
  sockread %temp 
  echo %temp 
} 


Its spits back...

HTTP/1.1 200 OK
Date: (Date of the request)
Server: Apache/2.0.59 (Unix)

And that stuff.

How do I fix that?
Posted By: Lpfix5 Re: Simple Web Grabber. - 26/10/07 03:31 AM
This following script will return whatever text you put in this section here if ($regex(%x,/(thistext|orthistxt etc... change those 3 words all in between for a wider search if you want or just replace the whole if statement with if (myword isin %x) the following code also removes HTML data (I know you said php) just go with it.

Code:
on *:TEXT:m~test:#:{ 
  echo Connecting... 
  /sockopen home home.moptop.info 80 
} 
on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET /index.php HTTP/1.1 
  sockwrite -n home Host: home.moptop.info 
  sockwrite -n home $crlf 
} 
on 1:sockread:home:{
  if ($sockerr > 0) return 
  var %x | sockread %x
  if ($sockbr == 0) return 
  if (%x == $null) { return } 
  if ($regex(%x,/(thistext|orthistext|oreventhistext)/g) == 1) { 
    echo -a $nhtml(%x)
  }
}

alias -l nhtml { return $remove($regsubex($1-,/(^[^<]*>|<[^>]*>|<[^>]*$)/g,),$chr(9)) }


this next code returns the WHOLE line of text you search with without removing HTML code...

Code:
on *:TEXT:m~test:#:{ 
  echo Connecting... 
  /sockopen home home.moptop.info 80 
} 
on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET /index.php HTTP/1.1 
  sockwrite -n home Host: home.moptop.info 
  sockwrite -n home $crlf 
} 
on 1:sockread:home:{
  if ($sockerr > 0) return 
  var %x | sockread %x
  if ($sockbr == 0) return 
  if (%x == $null) { return } 
  if ($regex(%x,/(thistext|orthistext|oreventhistext)/g) == 1) { 
    echo -a %x
  }
}
Posted By: Rand Re: Simple Web Grabber. - 26/10/07 06:00 AM
Originally Posted By: Moptop650
I am using PHP on the website I am grabbing, to limit the received data to JUST what I want. I am loads better at php then this, Lol.

So anyways, I got it working sorta. Currently it just outputs the http headers - my code is-

Code:
on *:TEXT:m~test:#:{ 
  echo Connecting... 
  /sockopen home home.moptop.info 80 
} 
on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET /index.php HTTP/1.1 
  sockwrite -n home Host: home.moptop.info 
  sockwrite -n home $crlf 
} 
on *:sockread:home:{ 
  echo Echoing Data... 
  sockread %temp 
  echo %temp 
} 


Its spits back...

HTTP/1.1 200 OK
Date: (Date of the request)
Server: Apache/2.0.59 (Unix)

And that stuff.

How do I fix that?



Well, since I can't connect to this website, I can't really test anything. I've modified the code a bit as well.

Code:
alias home {
  if ($sock(home)) { sockclose home }
  echo -s *** Trying to connect to home.moptop.info
  sockopen home home.moptop.info 80
}

on *:sockopen:home:{ 
  if ($sockerr) { echo -s *** Can't connect. | return }
  var %% = sockwrite -n $sockname
  %% GET /index.php HTTP/1.0
  %% Host: home.moptop.info 
  %% 
} 
on *:sockread:home:{
  if ($sockerr) { echo -s *** Sock error. | return }
  var %s | sockread -fn %s
  while ($sockbr) {
    echo -s  $+ %s 
    sockread -fn %s
  }
}
Posted By: Riamus2 Re: Simple Web Grabber. - 26/10/07 10:11 AM
Your problem with receiving just the headers is almost guaranteed to be because you used HTTP/1.1 instead of HTTP/1.0 like I showed you. 1.1 gets data in chunks and can make trying to do anything with it a challenge. 1.0 is nice and easy to work with.
Posted By: Moptop650 Re: Simple Web Grabber. - 26/10/07 07:37 PM
Originally Posted By: Rand
Well, since I can't connect to this website, I can't really test anything.


Yeah that URL is linked to my home computer.. Thanks for the code, I'll try that in a few.

Riamus2, Ill try that too. Edit: Eh same thing =/

Edit: Rand, It works! It gets the content, but the headers are still there, how can I remove them?

Edit also: How can I get it to say its results to the channel it was called from?\
Posted By: Rand Re: Simple Web Grabber. - 26/10/07 09:32 PM
Originally Posted By: Moptop650
Originally Posted By: Rand
Well, since I can't connect to this website, I can't really test anything.


Yeah that URL is linked to my home computer.. Thanks for the code, I'll try that in a few.

Riamus2, Ill try that too. Edit: Eh same thing =/

Edit: Rand, It works! It gets the content, but the headers are still there, how can I remove them?

Edit also: How can I get it to say its results to the channel it was called from?\


Well.. I'll edit this in a bit if you don't figure out how to queue messages so that you don't flood the channel off. For now, I need to nap, so you'll have to deal with a partial edit. smile

Code:
alias home {
  ; /home <chan|nick>
  if (!$1) { echo -a *** Invalid parameters. /home <chan|nick> | return }
  if ($sock(home)) { sockclose home }
  unset %home.*

  echo -s *** Trying to connect to home.moptop.info

  sockopen home home.moptop.info 80
  sockmark home $1
}

on *:sockopen:home_headers:{ 
  if ($sockerr) { echo -s *** Can't connect. | return }
  var %% = sockwrite -n $sockname
  %% GET /index.php HTTP/1.0
  %% Host: home.moptop.info 
  %% 
} 

on *:sockread:home:{
  if ($sockerr) { echo -s *** Sock error. | return }
  var %s | sockread -fn %s
  while ($sockbr) {
    if (%home.headers) {
      msg $sock($sockname).mark  $+ %s
    }
    elseif (%s == $null) { set %home.headers 1 }
    sockread -fn %s
  }
}
on *:sockclose:home:{ unset %home.* }
Posted By: Lpfix5 Re: Simple Web Grabber. - 27/10/07 02:08 AM
Try this This will remove headers using HTTP 1.1 also it will display only the data entry from webpage and nothing HTML wise if you want that let me know else this will parse data has text base only no HTML tags

Code:
alias home { 
  echo -s *** Trying to connect to home.moptop.info 
  sockopen home home.moptop.info 80 
} 

on *:sockopen:home:{ 
  echo Trying to communicate... 
  sockwrite -n home GET / HTTP/1.1 
  sockwrite -n home Host: home.moptop.info
  sockwrite -n home $crlf 
} 

on 1:sockread:home:{
  if ($sockerr > 0) return 
  var %x | sockread -fn %x
  if ($right($gettok(%x,1,32),1) == :) || (http/1.1 == $gettok(%x,1,32)) { return }
  elseif ($nhtml(%x) == $null) { return }
  else {
    echo -a $nhtml(%x)
  }
}

alias -l nhtml { return $remove($regsubex($1-,/(&nbsp;|^[^<]*>|<[^>]*>|<[^>]*$)/g,),$chr(9)) }
Posted By: Moptop650 Re: Simple Web Grabber. - 27/10/07 02:14 AM
That doesn't seem to want to return the data.

Quote:
*** Trying to connect to home.moptop.info
Trying to communicate...

Posted By: Lpfix5 Re: Simple Web Grabber. - 27/10/07 02:25 AM
ya thats because you have 0 HTML and 1 word

try adding a few words and html tags like this

<HTML>
<TITLE>MYTEST</TITLE>
<FONT COLOR="RED">word</FONT>
</HTML>

it should pick up proper data

If you want a better test replace home.moptop.info with www.mirc.com and try the event it will return txt base only
Posted By: Moptop650 Re: Simple Web Grabber. - 27/10/07 02:29 AM
Yay that works! ^_^

Thanks for the help everyone!
Posted By: Lpfix5 Re: Simple Web Grabber. - 27/10/07 02:40 AM
:P MiRc is stubborn to fetch data if it equals 1 word or little text for some reason maybe mostly all the time if no HTML tags are in Don't forget that a true website would contain html tags even if its coded by php, now maybe an alternative route to fetching word data would be to have a .txt file online like

home.moptop.info/mytest.txt :P
Posted By: Moptop650 Re: Simple Web Grabber. - 27/10/07 02:57 AM
Yes, I know smile I plan to only use this to grab stuff from my site. (The scripts on my site are made for this, they get stuff from other sites.)
Posted By: Riamus2 Re: Simple Web Grabber. - 27/10/07 10:09 AM
Originally Posted By: Lpfix5
:P MiRc is stubborn to fetch data if it equals 1 word or little text for some reason maybe mostly all the time if no HTML tags are in


I haven't had any issues where a txt file only has a version # as an update check without HTML or other text. Use HTML/1.0 as mentioned and there should be no problem with that. HTML/1.1 shouldn't really be used in most cases as HTML/1.0 usually works better for grabbing socket data.
Posted By: Lpfix5 Re: Simple Web Grabber. - 27/10/07 05:00 PM
Depending on the site you are on HTML 1.0 sometimes does not pull data, if i come across a site ill pm it to you one day vs. 1.1
Posted By: Moptop650 Re: Simple Web Grabber. - 27/10/07 08:39 PM
Ok Im back, one more question-

I am grabbing

Code:
<html>
Blah blah
more words
maybe some more
and more words
</html>


But its only sending back the first line?

Edit: And it is saying

"(First word in the line) Unknown Command"

In the status window.
Posted By: Lpfix5 Re: Simple Web Grabber. - 27/10/07 09:18 PM
Can you update me with the link of the data your trying to get? because from yesterday I get your Price: : 6,000gp - 8,000gp
Posted By: Lpfix5 Re: Simple Web Grabber. - 27/10/07 09:22 PM
Im sorry replace this sockread script your using :

Code:
on 1:sockread:home:{
  if ($sockerr > 0) return 
  var %x | sockread -fn %x
  if ($regex(%x,/(Content-|X-Powered|Server:|Date:|HTTP/1)/g) == 1) { return }
  elseif ($nhtml(%x) == $null) { return }
  else {
    echo -a %x
  }
}


Just replace that sockread part.
Posted By: Moptop650 Re: Simple Web Grabber. - 27/10/07 10:50 PM
That works, thanks! smile
Posted By: Lpfix5 Re: Simple Web Grabber. - 27/10/07 11:01 PM
yw
© mIRC Discussion Forums