The less developed Google Sitemap it's just a list of plain text URLs (which also works with Yahoo!). You can also tweak the syndication files to do so or just follow the whole Sitemap XML Protocol.
If someone helps with the coding I'm willing to help with the SEO adjustments.
Oh all right... I will do it. Just don't expect anything soon.
Took a peek at the sitemaps.xml spec. It should be posted at the lowest directory the user has access to, which might be a bit problematic.
We could just post it at the level Vanilla is installed in, but that leaves the rest of the site's pages out of the map. I suppose you would need some other tool to map that section.
im not sure about that... but couldn't we do a rewrite mod on the url to make it work?
Also, i dont think it has to be in the lowest directory, because you can have the sitemaps in various folders according to the webmasters blog by google.
Hmm... I will check the blog, but appears to be conflicting information about the location of the file. I guess you can have it in a folder, but all links in the sitemap must refrence files that are also in that folder.
Don't want to mess with mod_rewrite if I don't have to, because of potential problems with it and an existing file, friendly URLs and such.
I will test first with a dynamic file and see if google likes it.
Whoops, that last sentence was funnier in my head than on screen. Sorry about that.
Anyways... I figure I would need a SQL query to get a list of the public discussion IDs, number of comments in each discussion, last active time, and the discussion topic.
Combine that with a loop that puts that information inside the sitemaps XML schema, and it would be golden. Of course, add some lines for the categories tab, and maybe another query and loop for user accounts.
Another item is the priority field in the sitemaps. The wordpress extension puts a higher priority on the blog posts with the most comments, I believe that is probably best, but also make stickies higher, while closed discussions and sink discussions should have the lowest priority.
Would be great if you could start on any of that... either post the code publically or we can whisper it back and forth. Do you have a vanila install with friendly URLs installed? If not, I have one that we can use for testing, but its not live yet.
This is what I have so far:
<?php
/*
Extension Name: Sitemaps
Extension Url: http://lussumo.com/docs/
Description: Generates an sitemaps.xml file of all discussion and account URLs on the forum
Version: 0.0
Author: WallPhone
Author Url: http://wallphone.com/
*/
if ( ($Context->SelfUrl == 'extension.php') && ($PostBackAction == 'sitemap.xml') ) {
//
echo '<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">';
// Get the discussions list
// Loc should probably be built by the URL builder--does it work automatically for friendly URLs?
// Lastmod is set to last comment time stanp
// Changefreq? its optional, probably leave it out for old discussions, and figure average update or something out for current discussions.
// Comment count plus comments per page should be used to determine if the discussion spans more than one page.
// Example schema:
// <url>
// <loc></loc>
// <lastmod>2005-01-01</lastmod>
// <changefreq>hourly</changefreq>
// <priority>0.8</priority>
// </url>
// Get the user accounts
// Close the sitemap
echo '</urlset>';
}
?>
this should be very similar to the RSS2 extension by mark the code to look at is this one
function ReturnSitemapData($Properties) {
return '<url>
<loc>'.$Properties["Link"].'</loc>
<lastmod>'.$Properties["Updated"].'</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
';
}
function FixDateForSitemap($Date = '') {
$DateFormat = 'r';
if ($Date == '') {
return date($DateFormat, mktime());
} else {
return date($DateFormat, UnixTimestamp($Date));
}
}
$Sitemap = "";
$Properties = array();
while ($DataSet = $DiscussionGrid->Context->Database->GetRow($DiscussionGrid->DiscussionData)) {
$Properties["Link"] = GetUrl($DiscussionGrid->Context->Configuration, "comments.php", "", "DiscussionID", ForceInt($DataSet["DiscussionID"], 0));
$Properties["Updated"] = FixDateForSitemap(@$DataSet["DateLastActive"]);
$Sitemap .= ReturnSitemapData($Properties);
}
// Set the content type to xml
echo('<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.google.com/schemas/sitemap/0.84">');
// Dump the Sitemap
echo($Sitemap);
echo('</urlset>');
we'll worry about priority later on. once this code works then we will do the rest. I haven't tried this code so Wallphone give it a shot.
For priority the top 5 should be 1.0 next 5 = 0.75 next 5 = 0.5 next 5 = 0.25 next 5 = 0.0
u only need to do the first page so only 25-30 discussions will go in the sitemap. rest will be taken care by frequency, to make sure that no discussion is missed. and it doesn't fall off the front page in between google crawls. also have to get the date in the correct format
Big news.
Google, Yahoo and Microsoft just agreed (last night) to use the Google Sitemaps protocol as a new standard. So, this extension just became pretty important for everyone.
Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License, which means that all web crawlers can now use the sitemaps protocol if they wish.
http://sitemaps.org
http://techcrunch.com/2006/11/15/google-yahoo-and-microsoft-agree-to-standard-sitemaps-protocol/
http://blog.searchenginewatch.com/blog/061116-000001
Comments
Took a peek at the sitemaps.xml spec. It should be posted at the lowest directory the user has access to, which might be a bit problematic.
We could just post it at the level Vanilla is installed in, but that leaves the rest of the site's pages out of the map. I suppose you would need some other tool to map that section.
Does anybody know if we can return this file dynamically, (e.g. http://vanilla.com/extension.php?sitemap.xml) or if it must be a static file?
Don't want to mess with mod_rewrite if I don't have to, because of potential problems with it and an existing file, friendly URLs and such.
I will test first with a dynamic file and see if google likes it.
It would be good to see something similar as an addon for Vanilla.
Maybe I will get to it this weekend... or tomorrow night if I can't sleep.
Or maybe I should just ignore the community and spend the time I usually spend here on it!
Anyways... I figure I would need a SQL query to get a list of the public discussion IDs, number of comments in each discussion, last active time, and the discussion topic.
Combine that with a loop that puts that information inside the sitemaps XML schema, and it would be golden. Of course, add some lines for the categories tab, and maybe another query and loop for user accounts.
Another item is the priority field in the sitemaps. The wordpress extension puts a higher priority on the blog posts with the most comments, I believe that is probably best, but also make stickies higher, while closed discussions and sink discussions should have the lowest priority.
Would be great if you could start on any of that... either post the code publically or we can whisper it back and forth. Do you have a vanila install with friendly URLs installed? If not, I have one that we can use for testing, but its not live yet.
<?php /* Extension Name: Sitemaps Extension Url: http://lussumo.com/docs/ Description: Generates an sitemaps.xml file of all discussion and account URLs on the forum Version: 0.0 Author: WallPhone Author Url: http://wallphone.com/ */ if ( ($Context->SelfUrl == 'extension.php') && ($PostBackAction == 'sitemap.xml') ) { // echo '<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84">'; // Get the discussions list // Loc should probably be built by the URL builder--does it work automatically for friendly URLs? // Lastmod is set to last comment time stanp // Changefreq? its optional, probably leave it out for old discussions, and figure average update or something out for current discussions. // Comment count plus comments per page should be used to determine if the discussion spans more than one page. // Example schema: // <url> // <loc></loc> // <lastmod>2005-01-01</lastmod> // <changefreq>hourly</changefreq> // <priority>0.8</priority> // </url> // Get the user accounts // Close the sitemap echo '</urlset>'; } ?>
the code to look at is this one
function ReturnSitemapData($Properties) { return '<url> <loc>'.$Properties["Link"].'</loc> <lastmod>'.$Properties["Updated"].'</lastmod> <changefreq>daily</changefreq> <priority>0.5</priority> </url> '; } function FixDateForSitemap($Date = '') { $DateFormat = 'r'; if ($Date == '') { return date($DateFormat, mktime()); } else { return date($DateFormat, UnixTimestamp($Date)); } } $Sitemap = ""; $Properties = array(); while ($DataSet = $DiscussionGrid->Context->Database->GetRow($DiscussionGrid->DiscussionData)) { $Properties["Link"] = GetUrl($DiscussionGrid->Context->Configuration, "comments.php", "", "DiscussionID", ForceInt($DataSet["DiscussionID"], 0)); $Properties["Updated"] = FixDateForSitemap(@$DataSet["DateLastActive"]); $Sitemap .= ReturnSitemapData($Properties); } // Set the content type to xml echo('<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.google.com/schemas/sitemap/0.84">'); // Dump the Sitemap echo($Sitemap); echo('</urlset>');
we'll worry about priority later on. once this code works then we will do the rest.
I haven't tried this code so Wallphone give it a shot.
For priority the top 5 should be 1.0
next 5 = 0.75
next 5 = 0.5
next 5 = 0.25
next 5 = 0.0
u only need to do the first page so only 25-30 discussions will go in the sitemap. rest will be taken care by frequency, to make sure that no discussion is missed. and it doesn't fall off the front page in between google crawls.
also have to get the date in the correct format