473,714 Members | 2,699 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Use of Xenu Link Sleuth on very large sites?

Does anyone have any experience running Xenu Link Sleuth:
http://home.snafu.de/tilman/xenulink.html
version 1.2e on very large sites?

I'm having problems running it against our site, in that
on my PC it will, for extended periods of time, consume
100% of the CPU cycles(usually with no internet activity).

I've been in touch with the program's author, but this
problem doesn't happen on his system.

Because of the 100% CPU utilisation, the program is of
course much slower than it should be, and I just terminated
the program after trying it again.
In 4 hours, it had checked 96,146 of 112,138 links, but
I know there are many more to check.

--
Dave Patton
Canadian Coordinator, the Degree Confluence Project
http://www.confluence.org dpatton at confluence dot org
My website: http://members.shaw.ca/davepatton/
Vancouver/Whistler - host of the 2010 Winter Olympics
Jul 20 '05 #1
14 3417
In article <Xn************ *************** ***@24.71.223.1 59>, one of infinite monkeys
at the keyboard of Dave Patton <dp*****@remo ve-for-nospam.confluen ce.org> wrote:
In 4 hours, it had checked 96,146 of 112,138 links, but
I know there are many more to check.


96000 links in 4 hours is nearly 7 links per second, and if it's
consuming CPU without internet activity that's all hits to your
own server! That seems to imply an extremely ill-behaved robot.
Of course if it's only your server that's getting hit then it's
your business, but your question suggests it is a problem.

You'd probably be better off using a well-behaved robot like Site Valet.
It'll take longer (by definition) even if you configure a short
revisit-site time, but the system load is negligible even when it's
running at many[1] hits per second (all to different servers so as
not to expose any one server to rapid-fire, of course).

[1] I can't tell you an upper limit to "many", it's bandwidth-limited.

--
Nick Kew
Jul 20 '05 #2
Nick Kew wrote:
That seems to imply an extremely ill-behaved robot.
Of course if it's only your server that's getting hit then it's
your business, but your question suggests it is a problem.

You'd probably be better off using a well-behaved robot like Site Valet.


You're jumping to an inappropriate conclusion, there is no indication
that Xenu is anything less that a well behaved robot. The OP's
description indicates that it is a problem local to the OP as the
author's system does not exhibit this behaviour.

I've been using Xenu for years and it's not only well behaved, it's also
lightning fast and free (is Site Valet free?).

--
Spartanicus
Jul 20 '05 #3
in post <news:Xn******* *************** ********@24.71. 223.159>
Dave Patton said:
Does anyone have any experience running Xenu Link Sleuth:
http://home.snafu.de/tilman/xenulink.html
version 1.2e on very large sites?

I'm having problems running it against our site, in that
on my PC it will, for extended periods of time, consume
100% of the CPU cycles(usually with no internet activity).


i cant remember the version (it was years ago) but i used to check a
site locally made up of 15k+ pages with about 300k internal links (no
external). everything else had to be shut down and it took about 7 hours
to complete. it often appeared there was no activity and that the
computer had crashed. it was on a spare computer so it didn't really
matter.

--
brucie
19/December/2003 07:01:08 pm kilo
Jul 20 '05 #4
Dave Patton wrote:
Does anyone have any experience running Xenu Link Sleuth:
http://home.snafu.de/tilman/xenulink.html
version 1.2e on very large sites?

I'm having problems running it against our site, in that
on my PC it will, for extended periods of time, consume
100% of the CPU cycles(usually with no internet activity).


I used to have a problem like that with an earlier version (1.2c IIRC, I
use 1.2d currently), it opened and parsed multimedia files (local drive
scan). I brought this to the author's attention and it was fixed in the
subsequently released 1.2d version.

--
Spartanicus
Jul 20 '05 #5
Spartanicus <me@privacy.net > writes:
Nick Kew wrote:
That seems to imply an extremely ill-behaved robot.
Of course if it's only your server that's getting hit then it's
your business, but your question suggests it is a problem.

You'd probably be better off using a well-behaved robot like Site Valet.


You're jumping to an inappropriate conclusion, there is no indication
that Xenu is anything less that a well behaved robot. The OP's


I've seen a Xenu (a recent one, at that) hit a server with hundreds of
consecutive page requests, with no delay between each hit. It's not
particularly well-behaved, though at least it didn't parallelise.

--
Chris
Jul 20 '05 #6
ni**@fenris.web thing.com (Nick Kew) wrote in
news:c1******** ***@jarl.webthi ng.com:
In article <Xn************ *************** ***@24.71.223.1 59>, one of
infinite monkeys
at the keyboard of Dave Patton
<dp*****@remo ve-for-nospam.confluen ce.org> wrote:
In 4 hours, it had checked 96,146 of 112,138 links, but
I know there are many more to check.


96000 links in 4 hours is nearly 7 links per second, and if it's
consuming CPU without internet activity that's all hits to your
own server!


Maybe I didn't explain things properly. Xenu is running on my PC.
When there is no internet activity, it isn't using the internet to
check our website. Your comment would seem to indicate you thought
Xenu was running on the same platform as the webserver.

--
Dave Patton
Canadian Coordinator, the Degree Confluence Project
http://www.confluence.org dpatton at confluence dot org
My website: http://members.shaw.ca/davepatton/
Vancouver/Whistler - host of the 2010 Winter Olympics
Jul 20 '05 #7
On Fri, 19 Dec 2003 19:17:06 +1000, brucie <sh**@bruciesus enetshit.info>
wrote in <br************ @ID-117621.news.uni-berlin.de>:
in post <news:Xn******* *************** ********@24.71. 223.159>
Dave Patton said:
Does anyone have any experience running Xenu Link Sleuth:
http://home.snafu.de/tilman/xenulink.html
version 1.2e on very large sites?

I'm having problems running it against our site, in that
on my PC it will, for extended periods of time, consume
100% of the CPU cycles(usually with no internet activity).


i cant remember the version (it was years ago) but i used to check a
site locally made up of 15k+ pages with about 300k internal links (no
external). everything else had to be shut down and it took about 7 hours
to complete. it often appeared there was no activity and that the
computer had crashed. it was on a spare computer so it didn't really
matter.


I made a speed improvement in 2001 (starting with 1.2a) by adding a hash
table, so that new links no longer needed to be looked up sequentially
in the URL table.

Tilman
Jul 20 '05 #8
Spartanicus <me@privacy.net > wrote in
news:5i******** *************** *********@news. spartanicus.utv internet.ie:
Dave Patton wrote:
Does anyone have any experience running Xenu Link Sleuth:
http://home.snafu.de/tilman/xenulink.html
version 1.2e on very large sites?

I'm having problems running it against our site, in that
on my PC it will, for extended periods of time, consume
100% of the CPU cycles(usually with no internet activity).


I used to have a problem like that with an earlier version (1.2c IIRC, I
use 1.2d currently), it opened and parsed multimedia files (local drive
scan). I brought this to the author's attention and it was fixed in the
subsequently released 1.2d version.


I've been in touch with the author, Tilman, who has been quite helpfull.
He has made some suggestions, and also said that he can confirm
the behaviour I mention.
I'm going to try his suggestions, and I've also made some suggestions
for some enhancements to Xenu Link Sleuth, although whether they are
good ideas, or get implemented, is up to Tilman :-)

--
Dave Patton
Canadian Coordinator, the Degree Confluence Project
http://www.confluence.org dpatton at confluence dot org
My website: http://members.shaw.ca/davepatton/
Vancouver/Whistler - host of the 2010 Winter Olympics
Jul 20 '05 #9
On Fri, 19 Dec 2003 02:44:34 GMT, Dave Patton
<dp*****@remo ve-for-nospam.confluen ce.org> wrote in
<Xn************ *************** ***@24.71.223.1 59>:
Does anyone have any experience running Xenu Link Sleuth:
http://home.snafu.de/tilman/xenulink.html
version 1.2e on very large sites?

I'm having problems running it against our site, in that
on my PC it will, for extended periods of time, consume
100% of the CPU cycles(usually with no internet activity).

I've been in touch with the program's author, but this
problem doesn't happen on his system.

Because of the 100% CPU utilisation, the program is of
course much slower than it should be, and I just terminated
the program after trying it again.
In 4 hours, it had checked 96,146 of 112,138 links, but
I know there are many more to check.


(To others, since I already told Dave about this)

I was able to find out in the meantime why Xenu would go "100%" with no
internet activity for some time: your site has several pages which have
several 1000 links each. An example is this page:
http://www.confluence.org/showworld....w=true&scale=1
Xenu then needs 1-2 minutes to process all these links, i.e. look them
up, find out if they are new or not, and add them to the correct
location. In the meantime, background threads terminate normally but no
new threads are created.

The solution would be to exclude certain pages that have only
"automatic" links, i.e. links calculated by the server.

I may add something to the FAQ... another problem with big websites are
people who insist on making a site map. This takes forever, especially
if the website has a forum. While I haven't investigated this fully, it
seems that only the first report option works fast enough.

Tilman
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2974
by: news frontiernet.net | last post by:
I have key entered and tried to run example 4-6 from Dany Goodmans DYNAMIC HTML book, version one that is on pages 94-96. This is part of my effort to learn JavaScript. I checked each byte and position back against the book for syntax errors but still cannot get this script to work. I tells me that; 1. Line 49 has a missing ";" at bye 13 2. Line 89 has a missing object at byte 1
22
5139
by: Jonathan Snook | last post by:
I've been contemplating what the recommended usage of a "top of page" link should be? Should there only ever be one at the bottom of the page? Should they be sprinkled at various points on the page? Or should they be used at all? Lately, I've been leaning towards the last option because my thought is that most browsers have a method to make it back to the top of the page (home button, scroll bar, whatever). It seems I never use the...
55
5175
by: Jonas Smithson | last post by:
I've seen a few attractive multi-column sites whose geometry is based on pure CSS-P, but they're what you might call "code afficionado" sites, where the subject matter of the site is "coding practices." (One example of this is alistapart.com.) However, the project/development realities for small boutique sites are completely different from those of large commercial or institutional sites -- and I was curious to see what coding approaches...
26
3849
by: Harrie | last post by:
Hi, After Brian mentioned the use for <link rel=..> for navigational purposes in another thread, I've been looking into it and found that HTML 3.2 has two other recognized link types than HTML 4.01, which are "top" and "search". I compared these two pages: http://www.w3.org/TR/REC-html32-19970114#link http://www.w3.org/TR/html4/types.html#type-links
14
2832
by: Steve McLellan | last post by:
Hi, Sorry to repost, but this is becoming aggravating, and causing me a lot of wasted time. I've got a reasonably large mixed C++ project, and after a number of builds (but not a constant number) linking (and sometimes compiling) becomes immensely slow, and task manager shows that link.exe (or cl.exe) is barely using any processor time, but an awful lot of RAM (around 150-200MB). I'm going to keep an eye on page faults since I can't...
1
1563
by: kalpanaali | last post by:
Link and Banner Exchange ..pls help I am building a site....very very little traffic I am totally ingnorant of Internet marketing... I have heard of different types of link/ bannerexchanges...is it something that you would recommend for a new site? Is it effective for new sites ?
38
5070
by: ted | last post by:
I have an old link that was widely distributed. I would now like to put a link on that old page that will go to a new page without displaying anything.
13
5110
by: trpost | last post by:
I am looking for a way to send data from one page to another as POST data without using forms or cURL. I have a php script that is passing a list of cases from on page to another when a link is clicked. This is working fine for the most part as a link, but sometimes the list gets very large and gets cut off. The reason it gets cut off appears to be a limitation on the amount of data that can be passesd in the URI. It looks like I can...
22
3658
by: Jesse Burns | last post by:
I'm about to start working on my first large scale site (in my opinion) that will hopefully have 1000+ users a day. ok, this isn't on the google/facebook scale, but it's going to be have more hits than just family and friends. Either way, I'm planning on this site blowing up once I have enough of a feature set, so I'm concerned about performance and scalability in the long run. I've worked for a software company, but I've never...
0
8797
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8704
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
9071
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9010
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6629
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5945
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4717
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2514
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2107
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.