|
Joined: Oct 2005
Posts: 827
Hoopy frood
|
OP
Hoopy frood
Joined: Oct 2005
Posts: 827 |
How do I deal with stuff like this? bvars? what the server sends back is a massive xml reply.. here's a shrinked down example. it's actually a reply containing all the contacts on my msn list.. each contact info is displayed within the <contact> .. </contact> section - so you can imagine how big this reply is if my msn contact list has well over 100 people?
HTTP/1.1 200 OK
Date: Fri, 11 Nov 2005 23:55:09 GMT
Server: Microsoft-IIS/6.0
P3P:CP="BUS CUR CONo FIN IVDo ONL OUR PHY SAMo TELo"
Cache-Control: private, max-age=0
Content-Type: text/xml; charset=utf-8
Content-Length: 2207
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header>
<ServiceHeader xmlns="http://www.msn.com/webservices/AddressBook">
<Version>11.01.0922.0000</Version>
</ServiceHeader>
</soap:Header>
<soap:Body>
<ABFindAllResponse xmlns="http://www.msn.com/webservices/AddressBook">
<ABFindAllResult>
<contacts>
<Contact>
<contactId> Removed </contactId>
<contactInfo>
<annotations>
<Annotation>
<Name>MSN.IM.MBEA</Name>
<Value>0</Value>
</Annotation>
<Annotation>
<Name>MSN.IM.GTC</Name>
<Value>1</Value>
</Annotation>
<Annotation>
<Name>MSN.IM.BLP</Name>
<Value>0</Value>
</Annotation>
</annotations>
<contactType>Me</contactType>
<quickName>Q</quickName>
<passportName> Removed </passportName>
<IsPassportNameHidden>false</IsPassportNameHidden>
<displayName>Inky | Hello, World from WLM</displayName>
<puid>0</puid>
<CID>0</CID>
<IsNotMobileVisible>false</IsNotMobileVisible>
<isMobileIMEnabled>false</isMobileIMEnabled>
<isMessengerUser>false</isMessengerUser>
<isFavorite>false</isFavorite>
<isSmtp>false</isSmtp>
<hasSpace>true</hasSpace>
<spotWatchState>NoDevice</spotWatchState>
<birthdate>0001-01-01T00:00:00.0000000-08:00</birthdate>
<primaryEmailType>ContactEmailPersonal</primaryEmailType>
<PrimaryLocation>ContactLocationPersonal</PrimaryLocation>
<PrimaryPhone>ContactPhonePersonal</PrimaryPhone>
<IsPrivate>false</IsPrivate>
<Gender>Unspecified</Gender>
<TimeZone>None</TimeZone>
</contactInfo>
<propertiesChanged />
<fDeleted>false</fDeleted>
<lastChange>2005-11-11T15:55:03.2600000-08:00</lastChange>
</Contact>
</contacts>
<ab>
<abId>00000000-0000-0000-0000-000000000000</abId>
<abInfo>
<ownerPuid>0</ownerPuid>
<OwnerCID>0</OwnerCID>
<ownerEmail> Removed </ownerEmail>
<fDefault>true</fDefault>
<joinedNamespace>false</joinedNamespace>
</abInfo>
<lastChange>2005-11-11T15:55:03.2600000-08:00</lastChange>
<DynamicItemLastChanged>2005-11-09T09:16:56.2970000-08:00</DynamicItemLastChanged>
<createDate>2003-07-14T15:46:20.6500000-07:00</createDate>
<propertiesChanged />
</ab>
</ABFindAllResult>
</ABFindAllResponse>
</soap:Body>
</soap:Envelope>
anyway what im wanting to do is parse out every contacts email, which is contained within the <passportName> tags. can anyone help me get started please?
Last edited by pouncer; 07/09/09 10:24 PM.
|
|
|
|
Joined: Jul 2008
Posts: 236
Fjord artisan
|
Fjord artisan
Joined: Jul 2008
Posts: 236 |
Binary variables, or hashtables... or both hashtables are pretty useful... Have you tried looking for a library that parses XML streams?
|
|
|
|
Joined: Oct 2003
Posts: 3,918
Hoopy frood
|
Hoopy frood
Joined: Oct 2003
Posts: 3,918 |
This is trivial to deal with.. it's no different than any other HTML scraping. Given that the <passportName>...</passportName> is conveniently located on one line, all you have to do is read the socket line by line and check for it. on *:SOCKREAD:mysock: {
sockread %line
if ($regex(%line, /<passportName>(.+?)<\/passportName>/)) {
echo -a Found another contact: $regml(1)
}
} If you need more than just one of those tags, well, it gets slightly more complicated, but you can follow the same general rule of matching each line individually. If you're not comfortable with that, look for an mIRC XML library. There are a couple of options, but I'm not too familiar with them.
- argv[0] on EFnet #mIRC - "Life is a pointer to an integer without a cast"
|
|
|
|
Joined: Oct 2005
Posts: 827
Hoopy frood
|
OP
Hoopy frood
Joined: Oct 2005
Posts: 827 |
Odd. That only seemed to echo 7 emails when the email has well over 1000 contacts :|
|
|
|
|
Joined: Jul 2006
Posts: 4,192
Hoopy frood
|
Hoopy frood
Joined: Jul 2006
Posts: 4,192 |
Maybe because his regex only match one time, if more than one email is present in a line, it won't catch it, try : on *:SOCKREAD:mysock: {
var %line,%a
sockread %line
if ($regex(%line, /<passportName>(.+?)<\/passportName>/g)) {
%a = $regml(0)
while (%a) { echo -a contact: $regml(%a) | dec %a }
}
}
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
Joined: Jul 2008
Posts: 236
Fjord artisan
|
Fjord artisan
Joined: Jul 2008
Posts: 236 |
XML is not always so convenient. Really, that regex should be /<passportName>([^<]+)<\/passportName>/... This would explain why he's only getting 7 matches.
There are more than a couple of options, providing you're willing to compile a DLL. Google "parsing XML streams in C/C++". It may not be "built in", but it's likely a faster, and more complete (so you won't have to write any other regular expressions) way to parse streams of XML.
edit: just noticed you're using non-greedy match, but still...
Last edited by s00p; 09/09/09 04:31 AM.
|
|
|
|
Joined: Oct 2003
Posts: 3,918
Hoopy frood
|
Hoopy frood
Joined: Oct 2003
Posts: 3,918 |
The regular expression /<passportName>(.+?)<\/passportName>/ is correct. The //g modifier is not needed (there is only one match per call).
Getting 7 of 1000 means theres a problem reading the data, OR perhaps the syntax changes after the 7th and the regex breaks (it could break if the data start splitting over multiple lines)
- argv[0] on EFnet #mIRC - "Life is a pointer to an integer without a cast"
|
|
|
|
Joined: Jan 2007
Posts: 1,156
Hoopy frood
|
Hoopy frood
Joined: Jan 2007
Posts: 1,156 |
I use binvars if a single line is too long.
For this just grab the info you need and store it in a hash table.
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
No offence but if you don't know how to parse a string you're not going to be able to write a messenger client.
You will be back here asking hundreds of questions before this project is up.
|
|
|
|
Joined: Oct 2005
Posts: 827
Hoopy frood
|
OP
Hoopy frood
Joined: Oct 2005
Posts: 827 |
Guys, the string is over 100,000 in length.. (it's 1 whole xml document being sent in 1 line - too big for mIRC to read in 1 line) Thats what the problem is.
Content-Length: 1340022
i need to somehow loop from char 0 to 1340022 and read all occurences of <PassportName>email</PassportName>
could anyone show me how this could be done using binvars?
Last edited by pouncer; 09/09/09 09:35 PM.
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
You need something like this in the sockread event: sockread &data
while ($sockbr) sockread -f &data Then after that you need to loop with $bfind() to get all occurrences.
|
|
|
|
Joined: Oct 2005
Posts: 827
Hoopy frood
|
OP
Hoopy frood
Joined: Oct 2005
Posts: 827 |
on *:SOCKREAD:membership: {
sockread &data
while ($sockbr) sockread -f &data
echo -a $bvar(&data, 1-).text
}
It gives me * /echo: insufficient parameters
Last edited by pouncer; 10/09/09 07:18 PM.
|
|
|
|
Joined: Jan 2007
Posts: 1,156
Hoopy frood
|
Hoopy frood
Joined: Jan 2007
Posts: 1,156 |
This is how I do it.
on *:sockread:b_rlist.*:{
if ($Sockerr > 0) return
sockread -n &brL
if ($sockbr = 0) return
echo -s . $bvar(&brL,1-).text
}
|
|
|
|
Joined: Apr 2003
Posts: 342
Fjord artisan
|
Fjord artisan
Joined: Apr 2003
Posts: 342 |
Use CURL to download the XML document to a file... then use the file handler routines (/fopen, /fseek, $fgetc) to parse thru the file.
Beware of MeStinkBAD! He knows more than he actually does!
|
|
|
|
|