Welcome Guest Search | Active Topics | Members | Log In | Register

Problems with redirects Options
mamadouthiam
Posted: Thursday, April 05, 2007 9:56:07 AM
Rank: Newbie
Groups: Member

Joined: 4/5/2007
Posts: 1
Points: 3
Location: US
I am running a trial version of the software and I had to stop it as the page count was going into the hundreds of thousands of pages whereas we only have ~10,000. Looking at the pages being returned it was giving some very strange urls. You would see a normal url (that was actually a redirect that we use to handle some of our dynamice pages) followed by, it seemed, by the rest of the pages on the site.

For example: http://www.csrees.usda.gov/fo/plantsappliedgenomicscapnri.html is a redirect for us to a dynamic a page but among the pages the tool is bringing back is (notice the 2 ".html"):

http://www.csrees.usda.gov/fo/plantsappliedgenomicscapnri.html/about/about.html
http://www.csrees.usda.gov/fo/plantsappliedgenomicscapnri.html/about/background.html
http://www.csrees.usda.gov/fo/plantsappliedgenomicscapnri.html/about/leadership.html

etc. until it looks as if it doing this for our whole site (and then again for the next redirect).

Also, we have a rediret for one string that goes to a different site and the tool then goes to try and spider that entire site too.

Help?
Sponsor
Posted: Thursday, April 05, 2007 9:56:07 AM
Get your Sitemap Generator license today! http://www.keylimetie.com/Checkout/Quick-PayPal/
KeyLimeTie
Posted: Monday, April 09, 2007 10:25:52 AM
Rank: Administration
Groups: Administration

Joined: 1/31/2007
Posts: 409
Points: 541
Location: Chicago, IL
I will run through your site later today.
FYI - If the spider finds a URL with a different domain name, it will not keep it or try to spider it.
Also, are any of your pages automatically redirecting to other pages? If so, how is it coded? META tag? javascript?
The spider will navigate to this page and since it's a real pagfe and get a 200 HTTP status, it will save it as a webpage.
It doesn't matter if it's redirecting or now...the program cannot determine that.
KeyLimeTie
Posted: Monday, April 09, 2007 10:50:46 PM
Rank: Administration
Groups: Administration

Joined: 1/31/2007
Posts: 409
Points: 541
Location: Chicago, IL
I ran the spider against your entire website and just as I suspected, your HTML is the culprit.

For example, go to:
http://www.csrees.usda.gov/newsroom/news/2006news/water_quality.html

Then view the source code and search for "/plantsappliedgenomicscapnri.html/".
You will find the following code:

<a href="http://www.csrees.usda.gov/fo/plantsappliedgenomicscapnri.html/"><strong>CSREES National Research Initiative: Applied Plant Genomics Coordinated Agricultural Project</strong></a>

That URL is incorrect and needs to be fixed by you.
KeyLimeTie
Posted: Thursday, April 19, 2007 10:57:04 PM
Rank: Administration
Groups: Administration

Joined: 1/31/2007
Posts: 409
Points: 541
Location: Chicago, IL
Locking topic as issue has been explained.
Users browsing this topic
Guest


Forum Jump
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Main Forum RSS : RSS

None
Powered by Yet Another Forum.net version 1.9.1.2 (NET v2.0) - 9/27/2007
Copyright © 2003-2006 Yet Another Forum.net. All rights reserved.
This page was generated in 0.212 seconds.