Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Robots.txt disallow entries for Vanilla forums

Hello community,

I have searched here looking for some disallow entries to put on my robots.txt for Vanilla. I have found nothing but I'm sure you must have something as this is essential now a days to be properly indexed on search engines.

I noticed Google indexed one of my Vanilla pages that has this in the URL: action=search

I will disallow it:

Disallow: *action=search*

Does anyone know which other entries I must disallow?

I appreciate your input on this.

Comments

  • I would also exclude the login pages, settings, and the start a new discussion pages since they don't offer any unique content. Note that wildcards in the disallow lines are not specified in the robots.txt standard and may not be supported by all search engines, in which case this would be more appropriate:
    Disallow: /forum/search.php Disallow: /forum/people.php Disallow: /forum/post.php DIsallow: /forum/settings.php
    Of course, if you are using Friendly URLs, the format should be different:
    Disallow: /forum/search/ Disallow: /forum/people/ Disallow: /forum/post/ Disallow: /forum/settings/ Disallow: /forum/discussions/
    I've added /discussions/ to the Friendly URLs example since both example.com/forum/discussons/ and example.com/forum/ both point to the same content and may cause a duplicate content or canonical URL issue.

    I'm also not entirely convinced having forum search results appear in a search engine is all that bad.
  • Hello WallPhone,
    thank you for your input. yes I'm using Friendly URLs so I go woth the second one, I have some doubts here:

    1) This works well if I'm using the forum file as a subdomain?
    Disallow: /forum/search/
    Disallow: /forum/people/
    Disallow: /forum/post/
    Disallow: /forum/settings/
    Disallow: /forum/discussions/

    2) Does the code above include: login pages and the start a new discussion pages?

    3) What about: View=ParticipatedThreads and other actions in PHP ?
  • Robots.txt works on the domain level, so if you are using a sub-domain, you need to have a dedicated robots.txt for the subdomain

    i.e. if your vanilla is installed at forum.example.com, your robots.txt should be forum.example.com/robots.txt and the contents should be something like:user-agent: * Disallow: /search/ Disallow: /people/ Disallow: /post/ Disallow: /settings/ Disallow: /discussions/

    Those other actions shouldn't matter as robots won't see the links unless someone happened to post one in a comment. If they did, most of them only work for people signed in anyway. Search engine bots don't usually sign in with accounts.

    robotstxt.org is the main reference about the robots.txt conventions. There are some other checker utilities around that you can run on it also.
  • Thank you for the explanation WallPhone. Yes I got what you mean. Then must be just worry about the links appearing in the forum on Guest view as robots acts like guests. I will look at which links the guest has.
  • Hello, have friendly URLs add installed and I have put the robot.txt inside the folder of my forum because it is a subdomain but Google is still indexing things like: myforum.domain.com/search.php myforum.domain.com/account.php?u=1 My robots.txt is like this: User-agent: * Disallow: /ajax/ Disallow: /appg/ Disallow: /cgi-bin/ Disallow: /conf/ Disallow: /extensions/ Disallow: /js/ Disallow: /languages/ Disallow: /library/ Disallow: /setup/ Disallow: /themes/ Disallow: /discussions/ Disallow: /1/ Disallow: /people/ Disallow: /post/ Disallow: /account/ Disallow: /settings.php Disallow: /termsofservice.php Disallow: /gpl.txt Disallow: /readme.html Disallow: /index.php?topic=* Disallow: /categories.php* Disallow: /people.php* Disallow: /?CategoryID=* Disallow: /comments.php?* It is something wrong? I appreciate your help
This discussion has been closed.