Dupe content for the search engines with the permalink

bibosk8 · December 2010

First, sorry for my english.

Actually, the link "permalink" on every post of the vanilla 2 create dupe content for the search engines.

For example:
This is the normal url
http://vanillaforums.org/discussion/13863/vanilla-2.0.16-released

And this is the url of the permalink:
http://vanillaforums.org/discussion/13863/vanilla-2.0.16-released/p1

Dupe content... the best is change this url of the permalink for the first, or add rel=nofollow.

Thanks!

judgej · December 2010

Have a look at the source of the first page, and you will find this element in it:

<link rel="canonical" 
href="http://vanillaforums.org/discussion/13863/vanilla-2.0.16-released/p1" />

That tells the search engines that the first page is just another view of the second page you have listed. The search engines know they are the same page, and so do not treat them as duplicated content. Only the second page will be indexed.

bibosk8 · December 2010

Ok Thanks @judgej
But I think is not good this method.

Where I can change this permalink for the original?

Thanks!

garymardell · December 2010

The tag is supported by all major search engines. They do say that: "It's a hint that we honor strongly" (google). They say this because they do check that the content is the same or roughly the same (minor differences) otherwise it could be abused.

judgej · January 2011

@bibosk8 what are the issues you have with this? The same content appearing in many pages, and the canonical tag pointing to the source page, is pretty much the standard way of handling content these days, and it works very well in my experience.

I use it, for example, in jobs sites, where the same job will appear in many pages, under many categories and index pages, but ultimately all those pages point to one source page or "master page" where the job can be read and indexed. Google honours the canonical tag as expected, and only those main pages appear in Google's index. Putting the canonical links into the sitemap file (e.g. http://www.iema.net/jobs/sitemap.xml) also helps direct the search engines.

The user of rel="nofollow" I think, is not appropriate in this case. The nofollow attribute is really there to say, "hey search engines, whatever this links to, has no association with this site".

chuckD · January 2011

@judgej

I think I may be confused. Please correct me if I'm wrong. Are you saying that links in vanilla using the standard default theme will have P1 indexed by search engines? I would prefer that links without P# at the end indexed.

judgej · January 2011

Yes, I believe that is what would happen. Whether the permalink (i.e. the canonical page URL) for page 1 using the page number suffix (.../p1) is the desired page for indexing or not, I guess is another issue. Should we index .../p1, .../p2, .../p3 etc. or just lose "p1" and retain p# where # >1? I don't know.

One thing is for certain, when a discussion is more than one page long, then it will have multiple pages (all different), and it does make sense for the first page to have a slightly simpler URL for neatness.

Also, if there is no other link to the "p1" page, then the search engines may not actually be able to find it. Just because it is defined in the canonical tag, it does not mean the search engines will go there. I'd personally be inclined to drop the page number for page 1 every time, no matter where it is referenced.

chuckD · January 2011

I believe the best choice would be to leave out p# (even # > 1) since they are all duplicates of the original.

this original
http://www.example.com/this-is-the-one.html

and not
http://www.example.com/this-is-the-one.html/p1
http://www.example.com/this-is-the-one.html/p2/#another-referece-123

Hopefully someone can clarify for the default theme included with 2.0.16

judgej · January 2011

The p# is the page number, so they do not contain the same content. p1 may have comments 1 to 20, and p2 would have comments 21 to 40, and so on. To get a whole discussion into a search engine, all the pages would need to be indexed.

p1 and p2 may be identical on this forum, because the page length seems to be set to some enormous value. This discussion, for example, has 70 comments, and it is still all on one page.

I agree that "p1" is a little redundant, and so could always be left out (i.e. assumed) but the remaining pages would need to be available for indexing under their own unique URLs.

The targets (or fragments, i.e. #whatever) are probably irrelevant in this case, since targets are never sent to the server in a page request and search engines ignore them completely (at least, they do for now).

Just out of interest, the canonical for this page is:

http://vanillaforums.org/discussion/14064/dupe-content-for-the-search-engines-with-the-permalink/p1

and that applies even if you are viewing a specific comment, for example, this comment:

http://vanillaforums.org/discussion/comment/133428#Comment_133428

judgej · January 2011

Also related to this is the discussion index here in this forum. The front page of the index is at:

http://vanillaforums.org/discussions

which is great. But click on page 1 of the pager at the bottom of the screen, and you get this:

http://vanillaforums.org/discussions/p1

which has a redundant page number on the end. This a danger, in my experience, with piecing together URLs all over the place. Different bits of code and plugins apply slightly different rules when constructing URLs, and they tend to fall out of step. It means that no matter how many places you clean up the "p1" suffixes, something somewhere will be overlooked and will create them, so they are always there to be contented with by the search engines.

An approach I've used successfully on other CMSs, is to pass all the URL data into a single core function, and have the URLs constructed there. Plugins and modules can then offer services to that central function or method to apply rules in constructing the URL path. Any parameters that don't fall into a rule then just get added as named GET parameters.

Idan · February 2012

I have problem with Canonical URLs.

I put the domain http://mydomain.com/categories/features in the browser and check the canonical tag, and it is showing 'http://mydomain.com/categories and it's not ending with the category name. Any reason why. It happens to all categories. Using the latest version of Vanilla.

Dupe content for the search engines with the permalink

Comments