Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Scaling, load balancing

2

Comments

  • I know/agree y2/Toiv *but* if the present method prevents caching of forum pages, would you prefer to:

    A. - Live without Whispers altogether
    B. - Live with Whispers on a Tab (private messages)
    C. - Live with *another* implementation of Whispers that looks the same but is implemented differently (e.g. via browser-level 'comment/whisper-interleaving')
    D. - Live with the current Whisper stuff and screw all those speed-freaks

    Of course I'm being facetious here... but I really believe the present Whisper implementation (comments
    marked as private) is sub-optimal... I *think* it limits page 'cache-ability' and also at the root of several
    'positioning' errors of various extensions.

    Also note: initially I did *not* get the benefit of the Whispers over Private messages. I think Whispers
    are one of the more difficult things to convey ton Vanilla-'noobs' and experienced 'other forum'-users.

    In that context it may make sense to re-think the Whispers too.
  • C sounds good, when can we expect it? :)
  • MarkMark Vanilla Staff
    It may be a shock to Vanilla users ... but Databases are slow man. You wanna avoid them - esp on high traffic sites where load-balancing is required. In my case, I'd like to integrate Vanilla on a high-traffic site that already has an efficient client-side session mechanism. Retrieving session variables from the Vanilla database on every page (to enable a user access to other parts of the site) isn't acceptable.

    Too right! I've been telling people that about databases for a long time, but they just don't want to hear it. All of Vanilla's configuration settings are in php-editable files for precisely that reason: I don't want to have to pull those out of the database on every page load. I think you'll be hard-pressed to find any database driven forum that doesn't call the database for some data on every page - if it's not user related information, it will be general configuration settings.

    I made Vanilla pull the user's id and role information on every page load so that permissions are always up to date. If I ban you, you won't be able to see the forum the next time you load a page. Other forum packages have problems with things like this - you ban someone and they stay signed in until their session expires and they have to re-authenticate (either by re-signing in or by cookie).
  • TomTesterTomTester New
    edited February 2007
    @Stash:

    Speaking of non-database use... (and perhaps useful as a start for browser-based whisper interleaving) did anyone see this:

    http://simile.mit.edu/exhibit/
    Exhibit
    Exhibit is a lightweight structured data publishing framework that lets you create web pages with support for sorting, filtering, and rich visualizations by writing only HTML and optionally some CSS and Javascript code.

    It's like Google Maps and Timeline, but for structured data normally published through database-backed web sites. Exhibit essentially removes the need for a database or a server side web application. Its Javascript-based engine makes it easy for everyone who has a little bit of knowledge of HTML and small data sets to share them with the world and let people easily interact with them.
    [...]

    How Does Exhibit Work and Why Use It?
    Exhibit consists of a bunch of Javascript files that you include in your web page. At load time, this Javascript code reads in one or more JSON data files that you link from within your web page and constructs a database implemented in Javascript right inside the browser of whoever visits your web page. It then dynamically re-constructs the web page as the visitor sorts and filters through the data. As the visitor interacts with the web page, only the web browser is responsible for providing the interaction; the web server is no longer needed.

    So, where's the database, again? The data is stored in JSON files, and the database is implemented in Javascript and running inside the web browser.

    The advantages of Exhibit are as follows:

    * No traditional database technology involved even though Exhibit-embedding web pages appear as if they are backed by databases. So you don't have to design any database, configure it, and maintain it. After all, if you only have a few dozens of things to publish rather than thousands, why would you spend so much effort in dealing with database technologies?
    * No server-side code required even though Exhibit-embedding web pages are heavily templated. So, there is no need to learn ASP, PHP, JSP, CGI, Velocity, etc. There is no need to worry which server-side scripting technology your hosting provider supports.
    * No need for web server if you only want to create exhibits and keep them on your own computer for your own use. They work straight from the file system.
    [...]
  • Certainly sounds interesting. So this basically puts the strain on the end user rather than the server? I wonder how this affects mobile devices?
  • Hmmmm, you got a point there that this of course it won't work on mobile devices...
    (at least not the CURRENT mobile devices).

    Case in point:
    My blackberry 8700g chokes and ends up with an 'uncaught exception' (i.e. nobody
    would even dare call this choking gracefully).

    Of course where there's problems, there's solutions... Theoretically you can switch
    page generation to the server for specific browser IDs etc. Perhaps people on PDAs
    can also live without Whispers ;-)
  • (PS) Of course I *never* even tried Vanilla on my BlackBerry... Color me impressed,
    it's darn fast *and* pretty. The only problem I see is a repeat of the USER ICON
    causing a whole line of 'stashes'
  • We are evaluating Vanilla for our environment. Since we have made it policy to not allow mySQL searches anymore (don't scale, kill the database) we would integrate Vanilla with Solr (Lucene). If we choose Vanilla for our new forums we would contribute the code. But don't hold your breath, first we need to integrate Vanilla with out user/login system. Are there tutorials for that by the way, besides the one for integrating Vanilla with Wordpress? Interesting for us would be: a) how to use different login credentials and b) how to munge Vanilla with external data. Thanks
  • What kind of performance are others getting out of Vanilla? When you're talking large-medium-small scale, what sizes/levels of activity are you actually talking? It would be nice to get at least a ballpark estimate of the volume of traffic, concurrent users, other measures of performance that you're talking about. Maybe a bit about the processor/memory/OS you're running as well. I too am evaluating Vanilla for a larger installation. It would be nice to know if we're on the same scale with one another.
  • I've been using Vanilla for my forum since March - it was an active forum, the database reached over 300 MB in size.

    The forum has since been brought down - the requests on the CPU using a forum of that size crashed a server.

    There is apparently an upscaling problem with the code - MySQL seems to handle it the large DB OK, the forum takes forever to load. The problem is particularly acute on the home page.

    Unless there is a remedy to this (either trimming down the posts in a safe way or scaling up the solution), I'm going to have to roll the dice, empty out some tables and start again.
  • I'm assuming you were using a shared server? How many users did you have? And how many posts a day?
  • Note also that there have been issues reported here with some extensions adding load.
  • Do you have access to your slow query log?
  • edited January 2008
    I suspected that posting to this thread would be inevitable, and the time has come, so here goes.

    My relatively new database is getting hammered by Vanilla. It's an HP DL380 G5, dual-Xeon 2.33GHz (quad core, 8 logical CPUs), 16G RAM, running FreeBSD 6.2-REL. This server also serves about 20 other sites, which (without Vanilla running) peak at about 30K queries/minute and never get the server load above 0.50.

    When I let my users onto the Vanilla forum, the load climbs to around 2.5-3.0 and requests to the front page and any discussion page take ~30s to load. Using mtop to monitor the queries, I can clearly see that the Vanilla queries are the ones holding up the gravy train, taking ~30s to sort/send the top 10 and locking the rest. The database is average sized, about 62M with 210,000 comments and 2,500 discussions. I've run EXPLAIN queries on these statements which are taking the longest, but nothing pops out at me as being glaringly wrong.

    There are also a large number of database proccesses in the Sleep state which accumulate at random intervals on my Vanilla database, which I thought I had disabled by turning MySQL persistence off in the php.ini file.

    Extensions installed:
    AjaxQuote
    Audioscrobbler
    CommentRemoval
    CommentsPermalinks
    DiscussionFilters
    EmericaCrossOver (custom extension which just force-forwards people to our main site login/logout/register pages)
    HtmlFormatter
    IPBlocker
    JQuery
    Legends
    PanelLists
    ParticipatedThreads
    PreviewPost
    PrivateMessages
    UserTasks
    WhosOnline
    YellowFade

    I'm tempted to go out and get a handle on using squid or another caching method (SimpleCache maybe?), but perhaps I can get some insight here first. Are any of the extensions I've got installed known to be dog-slow that I can disable first?
  • How many concurrent active users/pageviews do you have on vanilla? This site runs on a far less powerful machine than yours (nice boxes the G5's, we have a few at work for virtualisation. lots of shiny blue lights too :)) and obviously is running fine. Since it has double the number of discussions and presumably comments that seems a bit strange - though yours could be a LOT more active just newer, if you see what I mean. I know this box also hosts 5+ other sites but I'm not sure exactly how many. The only extensions I can see on there which might be worth looking at (i.e. disabling, see if it makes a difference) are ParticipatedThreads (though i believe this runs on a seperate page so shouldn't be too bad unless users are using it a lot), maybe PrivateMessages (havn't checked out how this works but generally speaking whispers put a lot of load on DB calls - i guess in theory PM should reduce that load but I'm not 100%). Which queries are they that are taking up all the power? I'm guessing probably the one for the all discussions page?
  • Ya, the time-consuming queries appear to be the ones for standard Discussion pages (not the front page). Here's a generic example that seems pretty representative of the ones taking the longest. This one was nearing the 60s mark when I ran the process list:

    SELECT m.CommentID AS CommentID, m.DiscussionID AS DiscussionID, m.Body AS Body, m.FormatType AS FormatType, m.DateCreated AS DateCreated, m.DateEdited AS DateEdited, m.DateDeleted AS DateDeleted, m.Deleted AS Deleted, m.AuthUserID AS AuthUserID, m.EditUserID AS EditUserID, m.DeleteUserID AS DeleteUserID, m.WhisperUserID AS WhisperUserID, m.RemoteIp AS RemoteIp, a.Name AS AuthUsername, a.Icon AS AuthIcon, r.Name AS AuthRole, r.RoleID AS AuthRoleID, r.Description AS AuthRoleDesc, r.Icon AS AuthRoleIcon, r.PERMISSION_HTML_ALLOWED AS AuthCanPostHtml, e.Name AS EditUsername, d.Name AS DeleteUsername, t.WhisperUserID AS DiscussionWhisperUserID, w.Name AS WhisperUsername
    FROM LUM_Comment m
    INNER JOIN LUM_User a ON m.AuthUserID = a.UserID
    LEFT JOIN LUM_Role r ON a.RoleID = r.RoleID
    LEFT JOIN LUM_User e ON m.EditUserID = e.UserID
    LEFT JOIN LUM_User d ON m.DeleteUserID = d.UserID
    INNER JOIN LUM_Discussion t ON m.DiscussionID = t.DiscussionID
    LEFT JOIN LUM_User w ON m.WhisperUserID = w.UserID
    LEFT JOIN LUM_CategoryRoleBlock crb ON t.CategoryID = crb.CategoryID
    AND crb.RoleID =1
    WHERE (crb.Blocked = '0' OR crb.Blocked =0 OR crb.Blocked IS NULL)
    AND (m.Deleted = '0' OR m.Deleted =0 )
    AND (m.WhisperUserID = '0' OR m.WhisperUserID =0 OR m.WhisperUserID IS NULL)
    AND m.DiscussionID = '5100'
    ORDER BY m.DateCreated ASC
    LIMIT 4980 , 20

    Upon closer inspection, I noticed that they were all hitting the same discussion, the "conversation" thread on my forum with over 27,000 comments. Digging deeper still to see who's hitting my forum despite currently having a RewriteRule to elsewhere on the site, I'm getting multiple simultaneous hits from GoogleBot and the Yahoo crawler. Perhaps this has something to do with it.
  • Yeah I suspect having a 27k comment discussion would hurt a bit.... Can you try closing that discussion for a while and see if it helps?
  • Running EXPLAIN on your request probably shows, on the second line that MySQL must examine 27000 lines and then, from the extra information of the first line, run a filesort on the result. This is definitely a bad request.
    I'm curious to know how much adding an index on m.DateCreated (to speed sorting) would help. I tried it on a test install but the explain does not reflect the new index, as I though it should, and still mention 'using filesort'.

    Anyhow, robots on such a discussion are not welcome.
  • Is there a way to auto limit threads ,or shut them down when they get too large?
  • @Minisweeper
    I'm sure that's not helping. I actually just enabled PFC and it seems to have been quite well received in lieu of a dedicated "conversation" type thread.

    @Max_B
    I did run EXPLAINs on some of the queries and noted the filesorts, which obviously would be best to avoid. I'm not sure if there's a way in Vanilla to specify which index(es) to use, but that would be a nice addition to Mark's SQLBuilder class if it isn't.

    post-mortem:
    Thanks for the help everyone. After forbidding robots from accessing the forum, load has gone down by orders of magnitude. I'm serving to ~15-40 simulataneous users with the load peaking around ~0.50 on the database end. As it should be.
Sign In or Register to comment.