Register Log In

Forums Scripts & Popups Parsing large amounts of data with $regex

Print Thread

Parsing large amounts of data with $regex #38898 30/07/03 04:37 AM
Joined: Jul 2003 Posts: 33 B Bilge OP Ameglian cow
OP Bilge Ameglian cow B Joined: Jul 2003 Posts: 33	I am trying to parse the source of a web page. I connect to the site and request the page I want using mIRC sockets, and then save the source including headers to a file using /bwrite. So far so good, I have the page source in a file. Now I need to parse that web page file using regex matches to extract the data that I want. My problem is as follows: I need to break the data down because I get: * /set: line too long * /bset: line too long, etc. In addition, the source for the page is generated mechanically and doesn't contain many line breaks. Even if it did, I need to parse the entire document together, because it is no good if there is a break in the data I am trying to search for, because then it won't match. I cannot assign the entire page content to a variable because the page is 17KB. Even if I try to use many variables by splitting the page content by each line feed character, still not a single line will fit into a variable, because the first line is about 5000 bytes. I even tried creating a dialog with a box for the data, and box for the regex and clicking a button to perform the search. The data did fit into a multi-line edit box, but when trying to perform a regex match on such a large amount of data, mIRC would crash, even when I tried it on the 5000 byte line (as opposed to the whole 17KB), it also crashed. Splitting up this 17KB file into lots of little variables wouldn't be a problem, but then the problem would be how to parse it then. So please, someone tell me how one can practically parse a large text file using regex, and I would be most grateful.

Re: Parsing large amounts of data with $regex #38899 31/07/03 10:03 AM
Joined: Dec 2002 Posts: 56 Infernal_Demon Babel fish
Infernal_Demon Babel fish Joined: Dec 2002 Posts: 56	Maybe you could try using the regex option in filter

Link Copied to Clipboard