mIRC Homepage
Posted By: Lpfix5 Hmph @ WebData get tricky - 08/11/07 07:59 PM
In Succession of trying to make fast easy read access to the newspaper I run an issue into the careers section of the site.

Well im trying to strip out 4 basic things out of this webdata and well im kinda running into a wall because with many regex tries I do or what not the DATA in question that im trying to receive is not echoing back to me..

Here's what im trying to get out of this link.

http://ospreycareers.com/results.asp?sea...bmit=New+Search

In the middle column where red data is

A) Job Title
B) Date
C) Source

Like this ex.: first one MACHINE OPERATORS: Fabricating... 11/8/2007 Sault Ste Marie

I'll give you the base code I got to start off, w/echo data get but I sit here not finding the specific information I need. (Im not sure by connecting through socket if theres an issue on that site getting the info I need but hopefully not)

Code:
alias news_career sockclose news_career | sockopen news_career ospreycareers.com 80

on 1:sockopen:news_career:{
  .sockwrite -n $sockname GET /results.asp?search_type=quick&kw=&city=Sault+Ste+Marie&submit=New+Search HTTP/1.1
  .sockwrite -n $sockname HOST: ospreycareers.com
  .sockwrite -n $sockname $crlf
}

on 1:sockread:news_career:{
  if ($sockerr > 0) return 
  var %x | sockread -fn %x
  echo -a %x 
}


Again I tried numerous ways to strip the data yet its still not fetching the proper area of the dump.

Any help would greatly be appreciated.
Posted By: Lpfix5 Re: Hmph @ WebData get tricky - 09/11/07 06:35 PM
www.ospreycareers.com/results.asp?search_type=quick&kw=&city=Sault+Ste+Marie

seems that with the submit new search it would not pull up correctly but I still remain with the issue
Posted By: deegee Re: Hmph @ WebData get tricky - 10/11/07 02:37 AM
I'd say you need to send cookie data, neither of the links above yield any search results if no cookie exists for that site.
Posted By: Lpfix5 Re: Hmph @ WebData get tricky - 10/11/07 03:50 AM
OMG i forgot your right, cookies!!! I guess I wasn't hungry enough to think about htat.
Posted By: Lpfix5 Re: Hmph @ WebData get tricky - 11/11/07 03:43 AM
Code:
alias news_career sockclose news_career | sockopen news_career ospreycareers.com 80

on 1:sockopen:news_career:{
  .sockwrite -n $sockname GET /results.asp?search_type=quick&kw=&city=Sault+Ste+Marie HTTP/1.1
  .sockwrite -n $sockname HOST: ospreycareers.com
  .sockwrite -n $sockname COOKIE: np%5Furl=www%2Esaultstar%2Ecom; expires=Mon, 10-Nov-2008 05:00:00 GMT; path=/
  .sockwrite -n $sockname COOKIE: ASPSESSIONIDSASCATTB=EAMACMEBNHDADKKGAPJGNGLL; path=/
  .sockwrite -n $sockname $crlf
}

on 1:sockread:news_career:{
  if ($sockerr > 0) return 
  var %x | sockread -fn %x
  if (onmouseover isin %x) {
    echo -a %x
  }
}


I need the UPPERCASE ONLY, like THIS IS WHATEVER ignore Jidaida Aaojkodaodad I want the TOKEN to be upper is what I mean to return that part of the on mouse over.
Posted By: deegee Re: Hmph @ WebData get tricky - 11/11/07 06:34 AM
Say what? Not all items have any UPPERCASE text.
Quote:
• CAA South Central Ontario is looking for a ... « "CAA"
&#8226; FOOD SERVICES<BR><BR>HOUSEKEEPER Full Time ... « "FOOD SERVICES" and "HOUSEKEEPER"
&#8226; <BR><BR>Electrical Inspectors <BR><BR>Dryden ... « none.

Is what you're after what you see on the website without the mouseover box? (Job title, Date, Source)
Posted By: Lpfix5 Re: Hmph @ WebData get tricky - 11/11/07 04:58 PM
Oh Some are not uppercase I didn't notice im actually trying to get the Job Title Date and Source
Posted By: deegee Re: Hmph @ WebData get tricky - 11/11/07 05:37 PM
Try this, it also gets the cookie data first and stores it for 3600 secs, you can make it longer but I'm not sure how long it will be valid for.
Code:
on *:sockopen:news_career:{
  if $sockerr { echo -ac info * Sockerr (news_career): $sock($sockname).wsmsg | return }

  ; Check if you have a variable with cookie data
  if $($+(%,$sockname,.cookie),2) {
    ; if so go ahead and request the page
    sockwrite -n $sockname GET /results.asp?search_type=quick&kw=&city=Sault+Ste+Marie HTTP/1.0
    sockwrite -n $sockname HOST: www.ospreycareers.com
    sockwrite $sockname Cookie: $v1 $+ $str($lf,2)
    return
  }
  ; else request headers only (for the cookie)
  sockwrite -n $sockname HEAD / HTTP/1.0
  sockwrite $sockname HOST: www.ospreycareers.com $+ $str($lf,2)
}

on *:sockread:news_career:{
  if $sockerr { echo -ac info * Sockerr (news_career): $sock($sockname).wsmsg | return }
  var %a | sockread %a

  ; Check if you have a cookie
  if !$($+(%,$sockname,.cookie),2) {
    ; if not, check for the cookie data
    if *Set-Cookie: ASPSESSIONID* iswm %a {
      ; set it to a variable
      set -u3600 $+(%,$sockname,.cookie) np%5Furl=www%2Esaultstar%2Ecom; $gettok(%a,2,32)
      ; and start over
      sockclose $sockname | news_career
    }
  }
  ; else read the page ;)

  ; IF you have a job title...
  if %JobTitle {
    ; add the date and location data
    if *<td class="rowSep"* iswm %a { set -e %JobTitle %JobTitle $regsubex(%a,/\t|<.*?>/g,) }
    ; if its end of item, display the item & unset the variable
    elseif *</tr>* iswm %a { echo -a %JobTitle | unset %JobTitle }
  }

  ; else check incoming data for a job title & set it to a variable
  elseif *onMouseout="hideddrivetip()">* iswm %a { set -e %JobTitle $gettok($gettok(%a,1,60),2,62) }
}

Posted By: Lpfix5 Re: Hmph @ WebData get tricky - 12/11/07 03:30 AM
Nice thanks
© mIRC Discussion Forums