mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Aug 2004
Posts: 423
C
Fjord artisan
OP Offline
Fjord artisan
C
Joined: Aug 2004
Posts: 423
hello all...

ok, i've nearly out of hair from this.. i've been searching all the sites looking for examples of how to do this..

i'm trying to loop thru some specific html text. i know i've mentioned this before but here is the code i'm trying to parse..
<p><b>Popular Titles</b> (Displaying 3 Results)<ol><li> <a href="/title/tt0360717/" onclick="set_args('tt0360717',1,1)">King Kong</a> (2005)<br>&#160;aka <em>"Peter Jackson's King Kong"</em> - USA <em>(promotional title)</em></li>
<li> <a href="/title/tt0024216/" onclick="set_args('tt0024216',2,1)">King Kong</a> (1933)</li>
<li> <a href="/title/tt0074751/" onclick="set_args('tt0074751',3,1)">King Kong</a> (1976)<br>&#160;aka <em>"King Kong: The Legend Reborn"</em> - USA <em>(working title)</em></li>
</ol>
</p>

so like is says would someone be able to give an example of how this is done?

i've been trying stuff sim to this (this is the last bit i've tried)
Code:

on *:sockread:movieinfo1:{ 
  if ($sockerr &gt; 0) return
  else { 
    var %tmptxt 
    sockread -f %tmptxt 
    if (Location: isin %tmptxt) { 
      set %movieinfolink2 $remove(%tmptxt,Location: http://www.imdb.com) 
      sockopen movieinfo2 www.imdb.com 80 
    } 
    else {
      if ($sockerr &gt; 0) return
      var %tmptxt
      :repeat
      sockread %tmptxt
      if ($sockbr == 0) return
      if (&lt;p&gt;&lt;b&gt;Popular Titles&lt;/b&gt; isin %tmptxt) {
        echo -a $htmlconv(%tmptxt)
        sockclose movieinfo1
      }
     else goto repeat
    }

  


but it only parses the first thing come comes up with...
and also could someone maybe make a suggestion on how to catch the Displaying 3 Results portion also?? if not, tis alright..

thanks in advance

Joined: Dec 2002
Posts: 1,245
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Dec 2002
Posts: 1,245
you are over thinking things
try this
Code:
on *:sockread:movieinfo1:{ 
  if ($sockerr) return
  var %tmptxt
  sockread -f %tmptxt
  while($sockbr) {
    if (Location: isin %tmptxt) { 
      set %movieinfolink2 $remove(%tmptxt,Location: http://www.imdb.com) 
      sockopen movieinfo2 www.imdb.com 80 
    } 
    elseif (&lt;p&gt;&lt;b&gt;Popular Titles&lt;/b&gt; isin %tmptxt) {
      echo -a $htmlconv(%tmptxt)
      sockclose movieinfo1
    }
    sockread %tmptxt
  }
}


closer to what the help file shows
Code:
on *:sockread:movieinfo1:{ 
  if ($sockerr) return
  var %tmptxt
  :next
  sockread -f %tmptxt
  if (Location: isin %tmptxt) { 
    set %movieinfolink2 $remove(%tmptxt,Location: http://www.imdb.com) 
    sockopen movieinfo2 www.imdb.com 80 
  } 
  elseif (&lt;p&gt;&lt;b&gt;Popular Titles&lt;/b&gt; isin %tmptxt) {
    echo -a $htmlconv(%tmptxt)
    sockclose movieinfo1
  }
  goto next
}

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Also, I would recommend using iswm instead of isin with sockets. It's much more likely to give you what you're looking for rather than multiple matches. isin will work, but iswm tends to be easier in the long run.

if (<p><b>Popular Titles</b>* iswm %temptext) { }

It's really up to you which way works best. Sometimes, isin is plenty for a site and other times, iswm saves you a lot of trouble. That's why I suggest using iswm all the time rather than having your scripts set up differently.


Invision Support
#Invision on irc.irchighway.net
Joined: Aug 2004
Posts: 423
C
Fjord artisan
OP Offline
Fjord artisan
C
Joined: Aug 2004
Posts: 423
thanks riamus2 for you suggestion... i'll change that and see what happens...

i don't get it... i've tried both examples but they don't seem to work.

with example 1 i get bucnhes of errors saying $sockbr unknown command... example 2 my mirc crashes... like the loop never ends or somting....

i appreciate what youve shown.... but do ya have any further suggestions mike.... or anyone?

Joined: Dec 2002
Posts: 1,245
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Dec 2002
Posts: 1,245
in your original post you are opening another socket in one part and using an alias in another, since you didnt show the coding for those parts here its hard to help with the problems you are having.

Joined: Aug 2004
Posts: 423
C
Fjord artisan
OP Offline
Fjord artisan
C
Joined: Aug 2004
Posts: 423
yeah, your right... sorry just didn't think that those things were important for this instance since that the second socket would've been trigger only for a exact match, and the alias only strips the html code... but anywho here are the missing pieces:

Code:
on *:sockopen:movieinfo2:{ 
  sockwrite -n $sockname GET %movieinfolink2 HTTP/1.1 
  sockwrite -n $sockname Host: www.imdb.com $+ $crlf $+ $crlf 
  set %zinfonumber 1
} 
on *:sockread:movieinfo2:{ 
  if ($sockerr) { 
    Halt 
  } 
  else { 
    var %tmptxt 
    sockread -f %tmptxt 
    if (&lt;b class="blackcatheader"&gt; isin %tmptxt) {
      if (Directed isin %tmptxt) { set %directcatch 1 }
    }
    elseif (%directcatch &lt;= 1) {
      hadd $remove(%movietitle,$chr(32)) $+ .hsh Director $htmlconv(%tmptxt)
      unset %directcatch
    }
    if (&lt;b class="ch"&gt; isin %tmptxt) { 
      if (%zinfonumber &lt;= 8) { 
        if (%zinfonumber = 1) { inc %zinfonumber } 
        elseif (%zinfonumber = 2) {
          hadd $remove(%movietitle,$chr(32)) $+ .hsh Tagline $remove($htmlconv(%tmptxt),Tagline:,(more),(view trailer)) 
          inc %zinfonumber 
        } 
        elseif (%zinfonumber = 3) {
          hadd $remove(%movietitle,$chr(32)) $+ .hsh Plot $remove($htmlconv(%tmptxt),Plot $+ $chr(32) $+ Outline:,Plot $+ $chr(32) $+ Summary:,(more),(view trailer)) 
          inc %zinfonumber 
        } 
        elseif (%zinfonumber = 4) { inc %zinfonumber } 
        elseif (%zinfonumber = 5) { inc %zinfonumber } 
        elseif (%zinfonumber = 6) { inc %zinfonumber } 
        elseif (%zinfonumber = 7) { inc %zinfonumber } 
        elseif (%zinfonumber = 8) { replywithstuff } 
      } 
      if (Genre: isin %tmptxt) { set %genrecatch 1 }
      if (Runtime: isin %tmptxt) { set %rtcatch 8 }
    }
    elseif (%genrecatch &lt;= 1) { 
      hadd $remove(%movietitle,$chr(32)) $+ .hsh Genre $remove($htmlconv(%tmptxt),(more)) 
      unset %genrecatch 
    }
    elseif (%rtcatch &lt;= 8) {
      hadd $remove(%movietitle,$chr(32)) $+ .hsh runtime $remove($htmlconv(%tmptxt),runtime:)  
      unset %rtcatch
    }
  }
}

alias -l htmlconv { 
  var %x, %i = $regsub($1-,/(^[^&lt;]*&gt;|&lt;[^&gt;]*&gt;|&lt;[^&gt;]*$)/g,$null,%x), %x = $remove(%x,&amp;,$chr(9)) 
  return %x 
} 

  


thats the rest.... hope it helps

thanks

Joined: Feb 2004
Posts: 2,019
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2004
Posts: 2,019
The suggestion you gave of iswm is not the same kind of matching as the original script.

if (string1 isin string2) will try to find string1 anywhere in string2.
if (string1* iswm string2) will try to find string2 at the start of string2.

Which one you prefer depends on how you want to do the matching. If it's irrelevant where the string is matched in the matchtext, then there's no point to use iswm (*string* iswm text) over isin (string isin text).

If you meant he should use *string1* iswm over string1 isin then I do not agree, there are no benefits at all. One should use the matching that fits the case best, and it is uncertain to say in advance, so advising to use iswm is arbitrarily.


Gone.
Joined: Aug 2004
Posts: 423
C
Fjord artisan
OP Offline
Fjord artisan
C
Joined: Aug 2004
Posts: 423
ok well,

in your opinion in this case which would you go with..? to tell ya the truth useing (string1 isin string2) seems to work well for my needs. but hey i'm no pro.... but i'd like to at least know why both examples don't seem to work.. its almost like i'm missing a break somewhere... i dunno?

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Quote:
The suggestion you gave of iswm is not the same kind of matching as the original script.

if (string1 isin string2) will try to find string1 anywhere in string2.
if (string1* iswm string2) will try to find string2 at the start of string2.

Which one you prefer depends on how you want to do the matching. If it's irrelevant where the string is matched in the matchtext, then there's no point to use iswm (*string* iswm text) over isin (string isin text).

If you meant he should use *string1* iswm over string1 isin then I do not agree, there are no benefits at all. One should use the matching that fits the case best, and it is uncertain to say in advance, so advising to use iswm is arbitrarily.


Look at the example I gave. That was not using *string*. If you just check for a word anywhere in the text, it is likely to trigger multiple times in a web page. That's why I like being a bit more specific. Of course, if you verify that the page will only trigger it one time (or the number of times you want it to trigger) while using isin, then that is fine. I just like to be thorough. wink


Invision Support
#Invision on irc.irchighway.net
Joined: Feb 2004
Posts: 2,019
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2004
Posts: 2,019
I know you didn't suggest *string* because that would have made your entire argument void, since *string* iswm string2 is exactly the same as string isin string2. My point is that you should not advice people to use iswm just for the sake of it. Especially not for the reason of "not having your scripts setup differently". That's laughable.

Parsing of a webpage will be different for every webpage, and should be evaluated each time differently. Advising to use iswm just for the sake of it makes no sense, so the isin would be perfectly fine. With your example (string*) it would mean string has to be the first part of the string in the sentence. One should only advice this is you know for sure that string will always be the first part. It has nothing to do with the fact that using isin could trip more matches, that may very well be a requirement if the html changes sometimes, making string not always the first part of the line.

isin and iswm are two different operators, it makes no sense to prefer one over the other by default, or to advice others to prefer it by default. The right advice would be: use the right tool for the job, whichever one that may turn out to be.


Gone.
Joined: Aug 2004
Posts: 423
C
Fjord artisan
OP Offline
Fjord artisan
C
Joined: Aug 2004
Posts: 423
okay this is what i have now...

Code:
 
on *:sockread:movieinfo1:{ 
  if ($sockerr &gt; 0) return
  ;var %tmptxt
  :next
  sockread -f %tmptxt
  if ($sockbr == 0) return
  ;  if (Location: isin %tmptxt) { 
  ;   set %movieinfolink2 $remove(%tmptxt,Location: http://www.imdb.com) 
  ;    sockopen movieinfo2 www.imdb.com 80 
  ; } 
  if (&lt;p&gt;&lt;b&gt;Popular Titles&lt;/b&gt; isin %tmptxt) {
    echo -a $htmlconv(%tmptxt)
    sockclose movieinfo1
  }
  goto next
}

 

trying it without the first condition... but with this code, i get one match and then an error saying /sockread: socket unavailable.. so, now i'm at a loss.... anyone have any further suggestrions?....

thanks in advance again.......

Joined: Dec 2002
Posts: 1,245
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Dec 2002
Posts: 1,245
ok so in there you are closing the socket before the script is done
you could let the socket close on its own
Code:
on *:sockread:movieinfo1:{ 
  if ($sockerr &gt; 0) return
  ;var %tmptxt
  :next
  sockread -f %tmptxt
  if ($sockbr == 0) return
  ;  if (Location: isin %tmptxt) { 
  ;   set %movieinfolink2 $remove(%tmptxt,Location: http://www.imdb.com) 
  ;    sockopen movieinfo2 www.imdb.com 80 
  ; } 
  if (&lt;p&gt;&lt;b&gt;Popular Titles&lt;/b&gt; isin %tmptxt) {
    echo -a $htmlconv(%tmptxt)
  }
goto next
}

Joined: Aug 2004
Posts: 423
C
Fjord artisan
OP Offline
Fjord artisan
C
Joined: Aug 2004
Posts: 423
it's still not working ....... have i got the basic principle correct at least? cuase i'm think perhaps to get this thing to do what i want i'll have to a bit more complex.. and not sure which route to take on how to do that.. lol

Joined: Dec 2002
Posts: 1,245
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Dec 2002
Posts: 1,245
test this
Code:
alias imdb { sockclose imdb | sockopen imdb www.imdb.com 80 }

on *:sockopen:imdb:{
  echo -s sock $sockname opened
  sockwrite -n $sockname GET http://www.imdb.com/ HTTP/1.0
  sockwrite -n $sockname Host: www.imdb.com $+ $crlf $+ $crlf 
} 
on *:sockread:imdb:{ 
  if ($sockerr) { 
    return 
  }
  var %tmptxt
  sockread -f %temptxt
  while ($sockbr) {
    sockread -f %tmptxt 
    var %cleaned = $htmlconv(%tmptxt)
    if (%cleaned) { echo -s %cleaned }
  }
}

alias -l htmlconv { 
  var %x, %i = $regsub($1-,/(^[^&lt;]*&gt;|&lt;[^&gt;]*&gt;|&lt;[^&gt;]*$)/g,$null,%x), %x = $remove(%x,&amp;,$chr(9)) 
  return %x 
} 


see if that works in a fresh remote file
you might even move that remote to the top to make sure you arent having a conflict

if you dont see anything in the status window, try /remote on
then try again

Joined: Aug 2004
Posts: 423
C
Fjord artisan
OP Offline
Fjord artisan
C
Joined: Aug 2004
Posts: 423
that does work... and well i have to say...

so i started going from there and if come to find that its this part here that i need to get..
Code:
if (&lt;p&gt;&lt;b&gt;Popular Titles&lt;/b&gt; isin %tmptxt) {
    echo -a $htmlconv(%tmptxt)
    sockclose movieinfo1
  }

that'll only read the first part.. ei: like 1st result
and i do something like this..

Code:
if (&lt;p&gt;&lt;b&gt;Popular Titles&lt;/b&gt; isin %tmptxt) {
    sockread %tmptxt
echo -a $htmlconv(%tmptxt)
   }
  

i get the second result only... so my problem is to getting to loop thru that instance so i can pull the whole list under the topic popular tittles..

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Don't use sockclose on the first one. Close the socket only after reaching the end of the html file (</html>).

if (</html> isin %temptext) { sockclose socketname }

Right now, your first example closes it after reading only one item, which isn't what you want. Your second example reads the second one because you used a sockread event inside of it, so it skips a line.


Invision Support
#Invision on irc.irchighway.net

Link Copied to Clipboard