Please upgrade here. These earlier versions are no longer being updated and have security issues.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.
Robots.txt in the Vanilla root is needed!
phreak
MVP
Hi all,
A robots.txt is needed in the Vanilla root! Should be delivered with the download.
It's a must, i think. Any counterarguments?
What code do you think should the Vanilla robots.txt hold?
A robots.txt is needed in the Vanilla root! Should be delivered with the download.
It's a must, i think. Any counterarguments?
What code do you think should the Vanilla robots.txt hold?
- VanillaAPP | iOS & Android App for Vanilla - White label app for Vanilla Forums OS
- VanillaSkins | Plugins, Themes, Graphics and Custom Development for Vanilla
0
Comments
Mmh, usually CMS's i use block out there system-files. Means, they define a list of folders or files that are then "usually" not crawlable and indexable by search engines.
This can come in handy if there is known security issue thats not yet fixed and script kiddies are searching the related file to test out their latest milw0rm script.
Ähm, but thats just what i've heard from i'm not really a specialist regarding the use of robots.txt (sorry, if my entrance-statement suggested this. ). I don't know if this still of any regular issue.
Since there are only a handful of such unwanted pages, I would just add a robots.txt myself into my root folder of installation.
However saying that, vanilla could do a lot of other things for great SEO eg. Title generation, Meta Tags generation based on page content, tags and category, canonical urls to indicate duplicate content etc.
Thanks.
http://nepali.im
@lincoln & LinusIndigo: I didn't mean what you understand. I stated that server-related files can get indexed by search engines (different kinds of) and could probably be listed by a simple search. Means if a script kiddy tries a script on 50 pages (of the list) and you're page is in the search results, it could be a security vulnerability.
This can be of various reasons, for example if you accidentially have a folder and its files set to CHMOD 777, because you forgot to set it back. The robots.txt would just hinder a file in this folder to get indexed.
Hi all, recently i was applying my google advsense, but it failed showing that your site is incomplete, I then drill down the possible reason and came to know that google has crawled content less pages like tags, profile, discussion and came up with result that my site has incomplete or less data in comparison to number of pages.
One solution is that I put robot.txt to avoid crawling in these pages, but I fear that if i block discussion page, will it also block rest of forum content as it behaves discussion as directory structure.
Can any one help me formulate robot.txt so that I don't end up blocking important content page
For your reference: link to the site in case necessary http://myhealthfellow.com
Hi I have searched and made custom robots.txt specially for vanilla forum. following is the code for the same
Hi I have searched and made custom robots.txt specially for vanilla forum. following is the code for the same
Pls find attached robots.txt file
Yeah, there should absolutely be NO DEFAULT robots.txt. Why? Because:
1) Anyone can create one themselves, and
2) Everyone has different needs for search engines.
In my own case, I didn't really care one way or another until I discovered that MS's BING and Yahoo's crawler were tearing my site apart in terms of page hits. I blocked Yahoo pretty easy, Bing took forever because MS's Bing search engines just seem to sometimes totally ignore robots.txt (or it takes a long time for a specific server to ack it). They suck and I want them to die in a lake.
I want my stuff to show up in Google, so I don't block anything from Google's crawlers (and Google is so unobtrusive compared to Bing it's crazy).
So I fear that people will go and take a generic robots.txt and block everything rather than actually blocking problem engines...
This is a good discussion to have so that people know the issues with search engines. If you look at the robots.txt Vanilla uses with its sitemaps plugin we do the following:
We also put some noindex/nofollow rels and meta tags throughout the software to try and keep crawlers off certain pages that won't benefit your site.
The one page where we are still seeing problems is the recent discussions list. Big sites can see real slowdowns when crawlers hit pages in the thousands. Recently we've added a config parameter to limit the page count on recent discussions to a reasonable number like five pages. Crawlers can find the pages through the individual categories and they are much faster.
Limiting the page lists is the direction we'll be going in with the software. I just need to get a good ux for searching past that fifth page.
An alternative to blocking access to directories with program only content is to put program files outside of the directory from which html is served. Although that might complicate installation a bit.