Drupal 404 Error Handling Fix to Avoid Duplicate Content Issues

Drupal 404 error handling has a bug that causes it to redirect to the front page if a node is not found. This happens if you attempt to access a page at node/* such as node/33 where node 33 does not exist. I discovered this error on one of my sites where I had deleted a bunch of nodes and had my site go into the supplemental listings on Google. This bug affects version 5.1 and prior versions. The patch on this page is for version 5.1.

When Google would go to reindex a page it was being silently redirected to the site home page and seeing duplicate content. This is very bad as Google and I’m sure other search engines think that you are spamming them. I applied a patch that I found on the Drupal website and it seems to have fixed the cause but I’m not sure what to do about Google. I'm hoping they will recheck and delete the missing pages from their index and not delete my entire site from the index.

I have patched all my sites and will apply to another other sites I bring on line. The page can be found at http://drupal.org/files/issues/node404.5.0.patch_2.txt and the thread where you can find a 4.7 patch is at http://drupal.org/node/90780

Index: modules/node/node.module
===================================================================
RCS file: /cvs/drupal/drupal/modules/node/node.module,v
retrieving revision 1.776
diff -u -p -r1.776 node.module
--- modules/node/node.module 14 Jan 2007 02:12:29 -0000 1.776
+++ modules/node/node.module 26 Jan 2007 03:31:37 -0000
@@ -2365,8 +2365,13 @@ function node_revisions() {
 /**
  * Menu callback; Generate a listing of promoted nodes.
  */
-function node_page_default() {
-
+function node_page_default($validateargs = true) {
+  if ($validateargs) {
+    if (arg(1) != '') {
+      drupal_not_found();
+      return;
+    }
+  }
   $result = pager_query(db_rewrite_sql('SELECT n.nid, n.sticky, n.created FROM {node} n WHERE n.promote = 1 AND n.status = 1 ORDER BY n.sticky DESC, n.created DESC'), variable_get('default_nodes_main', 10));
 
   if (db_num_rows($result)) {

 

You can check the server headers being returned by running a server header check at http://www.seoconsultants.com/tools/headers.asp If you test before the patch you will receive a status ‘200 OK’ and after a ‘404 NOT FOUND’ error. For those that have just installed Drupal I  would recommend applying this patch.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options