|
|
Rank: Newbie Groups: Member
Joined: 1/6/2009 Posts: 9 Points: 27 Location: Istanbul
|
I am using your sitemap generator and I have some problems with it. I have a big site over 1.000.000 pages. Whenever I try to crawl the site with keylimetie generator, after about 800.000 pages are in queue, the software gives an error and stops crawling. Then I restart the software, choose “resume”, it gives the error “end of stream encountered before parsing was completed”. I have tried several times from several machines with several setups, but the result is the same. What should I do?
|
|
|
|
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
Please send us a PM with the URL so we can investigate.
Thanks, KeyLimeTie
|
|
Rank: Newbie Groups: Member
Joined: 1/6/2009 Posts: 9 Points: 27 Location: Istanbul
|
Still no answer?
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
We are still investigating. Your website has over 1,000,000 URLs and it takes time to spider.
|
|
Rank: Newbie Groups: Member
Joined: 1/6/2009 Posts: 9 Points: 27 Location: Istanbul
|
Any progress?
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
I'm sure the developer investigating has made progress, but I will see if any resolution is available on Monday.
Thanks, Brian
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
No updates from developer yet. We expect to be done spidering your website within the next day or two (we have a machine dedicated to spidering it 24 x 7).
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
easena,
I just spoke with the developer and he said the software has only crawled roughly 150,000 URLs so far. He estimates it'll take at least 5-7 more days to complete. We'll keep it running and if there's a problem we'll investigate it. In any case, we'll get the entire site spidered and send you files so you have them. One question: How much memory does your computer have? For a site with over 1,000,000 URLs, we think you'll need around 4 B.
Thanks, Brian
|
|
Rank: Newbie Groups: Member
Joined: 1/6/2009 Posts: 9 Points: 27 Location: Istanbul
|
Thanks for the reply. I have 4 Gb of memory..
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
Update: Currently at 450,000 URLs spidered...it'll be a few more days.
|
|
Rank: Newbie Groups: Member
Joined: 1/6/2009 Posts: 9 Points: 27 Location: Istanbul
|
Thanks! I am waiting..
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
Update: 1,331,509 URLs found, 851,769 spidered so far...a few days to go still...
|
|
Rank: Newbie Groups: Member
Joined: 1/6/2009 Posts: 9 Points: 27 Location: Istanbul
|
Thanks for the information..
|
|
Rank: Newbie Groups: Member
Joined: 1/6/2009 Posts: 9 Points: 27 Location: Istanbul
|
Done?
|
|
Rank: Newbie Groups: Member
Joined: 1/6/2009 Posts: 9 Points: 27 Location: Istanbul
|
Still way to go?
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
Over 2.1 million URLs found. Over 1.4 million crawled. Probably 3-4 more days left...unless a good number more of URLs are found. It would be done by now, but a Windows Update occurred and forced a reboot. When that happened, we received the error you received: "end of stream encountered before parsing was completed". I'm betting something similar happened where you had some type of outage, be it a forced rebbot, power loss, etc. We will continue running the application and send you the results.
Thanks, Brian KeyLimeTie
|
|
Rank: Newbie Groups: Member
Joined: 1/6/2009 Posts: 9 Points: 27 Location: Istanbul
|
Thanks...
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
easena,
Our server ran out of memory and cannot finish spidering the website. We have successfully spidered sites with around 2 million URLs, but your site is larger than that and the URLs are very long which takes up a lot of space in memory). The URLs have a lot of querystring parameters. Can any of the querystring parameters be ignored? For example, are there parameters that are used for sorting the page content? If so, parameters like these can be ignored since the content is the same.
Please let us know how you would like to proceed. If you would like your payment refunded, we understand. Sorry for the inconvenience.
Thanks, Brian KeyLimeTie
|
|
Rank: Administration Groups: Administration
Joined: 1/31/2007 Posts: 590 Points: 396 Location: Chicago, IL
|
easena - Please contact us directly via email if you would like to discuss more.
Thank you, KeyLimeTie
|
|
|
Guest |