mIRC Home    About    Download    Register    News    Help

Print Thread
#12990 25/02/03 10:17 AM
Joined: Feb 2003
Posts: 8
E
Ex3 Offline OP
Nutrimatic drinks dispenser
OP Offline
Nutrimatic drinks dispenser
E
Joined: Feb 2003
Posts: 8
I made a script that uses sockets to get the entire source of a requested website.
But all I need is only the top item of the page which I know is surrounded by specific html tags combination which is uniqeu for that page.
How can I get only the text that is between those tags?

#12991 25/02/03 11:45 AM
Joined: Dec 2002
Posts: 1,922
O
Hoopy frood
Offline
Hoopy frood
O
Joined: Dec 2002
Posts: 1,922
My friend [on] created this alias:
Code:

alias removehtml {
  var %a
  !.echo -q $regsub($1-,/(^[^<]*>|<[^>]*>|<[^>]*$)/g,,%a)
  return %a
}


Example: //echo -a $removehtml(<h1>Hello</h1>)

#12992 25/02/03 01:20 PM
Joined: Jan 2003
Posts: 44
L
Ameglian cow
Offline
Ameglian cow
L
Joined: Jan 2003
Posts: 44
ex3 that is cool.

Can you do me a big favor and paste the script that you just made which you said will get the source of a page. I could really use it.

thanks in advance.

#12993 25/02/03 01:27 PM
Joined: Dec 2002
Posts: 1,922
O
Hoopy frood
Offline
Hoopy frood
O
Joined: Dec 2002
Posts: 1,922
You can learn how to make one here (click on the 'HTTP Protocol' by Pasmal).

#12994 25/02/03 05:22 PM
Joined: Dec 2002
Posts: 1,321
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Dec 2002
Posts: 1,321
This post is one I wrote recently and shows you how to download the file in binary through HTTP.


DALnet: #HelpDesk and #m[color:#FF0000]IR[color:#EEEE00]C
#12995 25/02/03 06:13 PM
Joined: Feb 2003
Posts: 8
E
Ex3 Offline OP
Nutrimatic drinks dispenser
OP Offline
Nutrimatic drinks dispenser
E
Joined: Feb 2003
Posts: 8
Here is my code:
Code:
on *:SOCKOPEN:site*: { 
  echo -s *** $sockname ( $+ $sock($sockname).ip $+ ) opened 
  sockwrite -n $sockname GET / HTTP/1.1 
  sockwrite -n $sockname Host: www.site.net 
  sockwrite -n $sockname Connection: keep-alive 
  sockwrite $sockname $crlf 
} 
on *:SOCKCLOSE:site*: echo -s *** $sockname ( $+ $sock($sockname).ip $+ ) closed. 
on *:SOCKREAD:site*: { 
  sockread %socktemp
  if (%socktemp) {    
    write file.txt $removehtml(%socktemp)
  } 
}
alias removehtml {
  var %a
  echo -s $regsub($1-,/(^[^&lt;]*&gt;|&lt;[^&gt;]*&gt;|&lt;[^&gt;]*$)/g,,%a)
  return %a
}


After running the code, I have the output (the text of the page, without any html tags) inside file.txt (including the HTTP GET headers).
Few problems I have here, next time I will run the code, it will write once again to the end of the file, while I need to write it instead of the previous output. How do I do it?
Another problem I have is that while the running of the output into file.txt, I see numbers outputed and I don't really understand what are those numbers (though I assume it's the number of $regsub actions), anyway I'd like to gt rid of those numbers or at least to hide them, but I didn't manage to...
That's all for now, thanks in advance!

#12996 25/02/03 07:09 PM
Joined: Dec 2002
Posts: 1,922
O
Hoopy frood
Offline
Hoopy frood
O
Joined: Dec 2002
Posts: 1,922
1. Put a /write -c file.txt in the Sockopen event to clear the file before making a new request.

2. Right, I don't know why did you put an echo -s $regsub(..) instead of the default !.echo -q $regsub(..). change it back and the numbers will be hidden.

#12997 25/02/03 08:27 PM
Joined: Feb 2003
Posts: 8
E
Ex3 Offline OP
Nutrimatic drinks dispenser
OP Offline
Nutrimatic drinks dispenser
E
Joined: Feb 2003
Posts: 8
Thanks Online!

Now I have few final questions:
1) I have the text now inside a text file called file.txt and I want to read from there the text that comes this way:
Code:
document.write('HH:nn dd/mm/yytext goes here&amp;nbsp;HH:nn dd/mm/yyanother text goes here&amp;nbsp;

I want the output to be something like:
Code:
(HH:nn) text goes here.

And the most important, I need only the text that goes between document.write(' and the first &nbsp;.
I can get rid of the document.write(' by using the $right function, but how can I tell the script where comes the first &nbsp;?
2) If there is no answer to the first question, how can I break the line each time there comes the &nbsp; tag and to erase the tag itself? (Maybe inside the removehtml alias?)


Link Copied to Clipboard