Get LinkChecker at SourceForge.net.
LinkChecker

LinkChecker

Frequently asked questionsΒΆ

Q: LinkChecker produced an error, but my web page is ok with Mozilla/IE/Opera/... Is this a bug in LinkChecker?

A: Please check your web pages first. Are they really ok? Use the --check-html option, or check if you are using a proxy which produces the error.

Q: I still get an error, but the page is definitely ok.

A: Some servers deny access of automated tools (also called robots) like LinkChecker. This is not a bug in LinkChecker but rather a policy by the webmaster running the website you are checking. Look the /robots.txt file which follows the robots.txt exclusion standard.

Q: How can I tell LinkChecker which proxy to use?

A: LinkChecker works transparently with proxies. In a Unix or Windows environment, set the http_proxy, https_proxy, ftp_proxy environment variables to a URL that identifies the proxy server before starting LinkChecker. For example

$ http_proxy="http://www.someproxy.com:3128"
$ export http_proxy

Q: The link “mailto:john@company.com?subject=Hello John” is reported as an error.

A: You have to quote special characters (e.g. spaces) in the subject field. The correct link should be “mailto:...?subject=Hello%20John” Unfortunately browsers like IE and Netscape do not enforce this.

Q: Has LinkChecker JavaScript support?

A: No, it never will. If your page is not working without JS, it is better checked with a browser testing tool like Selenium.

Q: Is LinkCheckers cookie feature insecure?

A: If a cookie file is specified, the information will be sent to the specified hosts. The following restrictions apply for LinkChecker cookies:

  • Cookies will only be sent to the originating server.
  • Cookies are only stored in memory. After LinkChecker finishes, they are lost.
  • The cookie feature is disabled as default.

Q: I see LinkChecker gets a /robots.txt file for every site it checks. What is that about?

A: LinkChecker follows the robots.txt exclusion standard. To avoid misuse of LinkChecker, you cannot turn this feature off. See the Web Robot pages and the Spidering report for more info.

Q: How do I print unreachable/dead documents of my website with LinkChecker?

A: No can do. This would require file system access to your web repository and access to your web server configuration.

Q: How do I check HTML/XML/CSS syntax with LinkChecker?

A: Use the --check-html and --check-css options.

Q: I want to have my own logging class. How can I use it in LinkChecker?

A: A Python API lets you define new logging classes. Define your own logging class as a subclass of StandardLogger or any other logging class in the log module. Then call the addLogger function in Config.Configuration to register your new Logger. After this append a new Logging instance to the fileoutput.

import linkcheck, MyLogger
log_format = 'mylog'
log_args = {'fileoutput': log_format, 'filename': 'foo.txt'}
cfg = linkcheck.configuration.Configuration()
cfg.logger_add(log_format, MyLogger.MyLogger)
cfg['fileoutput'].append(cfg.logger_new(log_format, log_args))