Caveats And Tips Concerning Search Engine Friendly URLs
December 11, 2006 – 6:07 amSearch Engine Friendly (SEF) URLs are all the rage, and it’s really no surprise why. They help your pages get spidered and ranked in the popular search engines. With so many websites running PHP and ASP, it’s no surprise that people would want to rewrite the dynamic URLs to something a bit more elegant.
I applaud your efforts to make your website more Search Engine Friendly, but I’ll point out a few “gotchas” that might get you in trouble
1) The / trailing slash problem
2) The Case Insensitive URL problem
3) The 404 and Custom 404 issues
We’ll take each one of these in order:
1) The Trailing Slash problem
These two web pages are not the same:
1) mypage/my-directory
2) mypage/my-directory/
Yahoo, in particular will struggle with this issue. Make sure you append the / on all requests to your website, and 301 redirect the URL to the new, consolidated URL. Here’s how:
RewriteRule ^/*(.+/)?([^.]*[^/])$ http://%{HTTP_HOST}/$1$2/ [L,R=301]
2) The Case Insensitive URL problem and solution
Again, these two are not the same URLs:
1) my-directory/Society/People/
2) my-directory/society/people/
Luckily, it’s mod_rewrite to the rescue yet again. You can make all your letters lowercase by using the following code.
In httpd.conf add this line:
RewriteMap lowercased int:tolower
And then, in your .htaccess file, add the following:
RewriteRule ^([^A-Z]*[A-Z].*)$ http://directory.sootle.com/${lowercased:$1} [R=301,L]
Voila! All or your URLs will now go through the RewriteMap function, which has one rule: make all letters lower case. Then you 301 redirect old Uppercase letter URL to the new one with lower case letters, solving the issue.
3) The 404 and Custom 404 issue.
The first thing you need to do if you’re using your own custom CMS to handle document requests, is to check for the existence of the page, and if it doesn’t exist, then send the visitors to your 404 page. If your 404 page is a custom 404, make sure that it actually responds as “404″ and not as “200″. If you have any question, check your server headers for the response code. Make sure you actually enter the URL of the page that is being 404′ed and make sure the response is correct. Read this for more information on 404 pages that return 200 status codes.
Always return a “404″ when the page no longer exists.
These three caveats are big. Hopefully this post will save you some serious time when you go to make your pages more Search Engine Friendly.

Sorry, comments for this entry are closed at this time.