mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Jul 2003
Posts: 33
B
Bilge Offline OP
Ameglian cow
OP Offline
Ameglian cow
B
Joined: Jul 2003
Posts: 33
I am trying to parse the source of a web page. I connect to the site and request the page I want using mIRC sockets, and then save the source including headers to a file using /bwrite. So far so good, I have the page source in a file.

Now I need to parse that web page file using regex matches to extract the data that I want. My problem is as follows: I need to break the data down because I get:
* /set: line too long
* /bset: line too long, etc.
In addition, the source for the page is generated mechanically and doesn't contain many line breaks. Even if it did, I need to parse the entire document together, because it is no good if there is a break in the data I am trying to search for, because then it won't match.

I cannot assign the entire page content to a variable because the page is 17KB. Even if I try to use many variables by splitting the page content by each line feed character, still not a single line will fit into a variable, because the first line is about 5000 bytes. I even tried creating a dialog with a box for the data, and box for the regex and clicking a button to perform the search. The data did fit into a multi-line edit box, but when trying to perform a regex match on such a large amount of data, mIRC would crash, even when I tried it on the 5000 byte line (as opposed to the whole 17KB), it also crashed.

Splitting up this 17KB file into lots of little variables wouldn't be a problem, but then the problem would be how to parse it then. So please, someone tell me how one can practically parse a large text file using regex, and I would be most grateful.

Joined: Dec 2002
Posts: 56
Babel fish
Offline
Babel fish
Joined: Dec 2002
Posts: 56
Maybe you could try using the regex option in filter smile



Link Copied to Clipboard