Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.
Scaling, load balancing
Evaluating forum packages has been fairly depressing - until I stumbled upon Vanilla. Finally something that isn't yet another clone of every other ugly forum out there (beside bbPress - but Vanilla's better). So I download it - installed it in minutes ... and I love it. But I've got questions. The big one for me is ... will this baby scale? I'm sure I'm not the only one who hopes the site I create will become popular. But it surprises me there aren't more discussions about this topic.
I've got some more directed questions that other posts :
-Searches. I don't see any kind of indexing of keywords - nor use of MySQL's "fulltext" feature (built-in keyword indexes for full-text searching). Keyword indexing scares me -it's complicated and bulky - but it's fast. I believe phpBB has dedicated tables to do so. So what gives? I find searching this forum pretty speedy. Does anyone have a good reason why "fulltext" should not be used? Are there issues with hyphenated strings - or multi-lingual?? Are there any plans for some kind of keyword indexing in the future?
-Caching. There isn't any. Caching HTML and Database queries are the natural choices. Though even with my very limited knowledge of the app - I see problems with both of these - given that Vanilla makes a lot of use of recent data. But - for instance - couldn't caching the main "discussions" page be done on a per user basis - with invalidation able to be performed per user (e.g. when user reads a discussion and the highlight changes) and globally (e.g. when a new post is made) basis? Load balancing this would be the tricky part -see next question. Anyway - are there plans to create a caching mechanism ?
-Load balancing. This is most important. Replicating or clustering the database can be done -that's not the issue. Having multiple web servers is. I see that all data is stored in a database - except images (which can be centralized via NFS anyway - provided file locking issues aren't a problem). is that all true? If so - it just comes down to session handling :
I see Vanilla uses PHP's std built-in "SESSION[]". This (by default) creates a PHPSESSID cookie with a key - which maps to a file holding the serialized session variables. While this is fine for most - I have a problem with it -as these "server-side sessions" (meaning that session data is stored on the server) do not enable sessions past one server. Load balancing Vanilla would involve forcing sessions to go back to the machine that created them. While load-balancers (such as "Pen" and "Pound") are capable of doing this - it really doesn't lend itself to "load balancing". Over time - you find one machine is over-loaded with users who never sign off - while others are sitting around doing nothing.
I'm no expert - but I see Zend provides a package with cross-session support. Prob real expensive - and besides - it's a bandaid approach. Beyond that - I don't see a way of achieving this without creating custom session handling. I see a lot of custom stuff in Vanilla - and I think smart session handling would be a great option.
So an alternative to "server-side sessions" are "client-side sessions". Rather than have a cookie holding an ID (which maps to your session data) - you put the session data in the cookie. That way - HTTP traffic can simply be round-robin'ed to any web-server - as the cookie provides all the info about the user. And what info is that - all I can see in Vanilla's session info is a UserID (and a blank Password?? - which I'm sure isn't needed). Of course - security is more of an issue with client-side sessions - but can be solved in many ways. I have developed this technique for a large website and it works really well (and just as secure). The hard part with Client-Side Sessions is to log someone out (completely) to limit session hijacking. I solved this - and I can explain it if anyone is interested. But I 'm pretty sure Vanilla could solve this more easily .. though hey .. enough detail. I wrote too much an hour ago..
Anyway - I'd like to know if anyone has looked into doing this - and what might be involved. I think Vanilla could use an 'alternate client-side session manager' for those wishing to scale. Perhaps it could be an option - I think the feature would be welcomed by many wishing to grow. From my quick poking - I see the People.Class.Session class could be replaced - while needing some modification to People.Class.Authenticator.
Is it that simple?
I've got some more directed questions that other posts :
-Searches. I don't see any kind of indexing of keywords - nor use of MySQL's "fulltext" feature (built-in keyword indexes for full-text searching). Keyword indexing scares me -it's complicated and bulky - but it's fast. I believe phpBB has dedicated tables to do so. So what gives? I find searching this forum pretty speedy. Does anyone have a good reason why "fulltext" should not be used? Are there issues with hyphenated strings - or multi-lingual?? Are there any plans for some kind of keyword indexing in the future?
-Caching. There isn't any. Caching HTML and Database queries are the natural choices. Though even with my very limited knowledge of the app - I see problems with both of these - given that Vanilla makes a lot of use of recent data. But - for instance - couldn't caching the main "discussions" page be done on a per user basis - with invalidation able to be performed per user (e.g. when user reads a discussion and the highlight changes) and globally (e.g. when a new post is made) basis? Load balancing this would be the tricky part -see next question. Anyway - are there plans to create a caching mechanism ?
-Load balancing. This is most important. Replicating or clustering the database can be done -that's not the issue. Having multiple web servers is. I see that all data is stored in a database - except images (which can be centralized via NFS anyway - provided file locking issues aren't a problem). is that all true? If so - it just comes down to session handling :
I see Vanilla uses PHP's std built-in "SESSION[]". This (by default) creates a PHPSESSID cookie with a key - which maps to a file holding the serialized session variables. While this is fine for most - I have a problem with it -as these "server-side sessions" (meaning that session data is stored on the server) do not enable sessions past one server. Load balancing Vanilla would involve forcing sessions to go back to the machine that created them. While load-balancers (such as "Pen" and "Pound") are capable of doing this - it really doesn't lend itself to "load balancing". Over time - you find one machine is over-loaded with users who never sign off - while others are sitting around doing nothing.
I'm no expert - but I see Zend provides a package with cross-session support. Prob real expensive - and besides - it's a bandaid approach. Beyond that - I don't see a way of achieving this without creating custom session handling. I see a lot of custom stuff in Vanilla - and I think smart session handling would be a great option.
So an alternative to "server-side sessions" are "client-side sessions". Rather than have a cookie holding an ID (which maps to your session data) - you put the session data in the cookie. That way - HTTP traffic can simply be round-robin'ed to any web-server - as the cookie provides all the info about the user. And what info is that - all I can see in Vanilla's session info is a UserID (and a blank Password?? - which I'm sure isn't needed). Of course - security is more of an issue with client-side sessions - but can be solved in many ways. I have developed this technique for a large website and it works really well (and just as secure). The hard part with Client-Side Sessions is to log someone out (completely) to limit session hijacking. I solved this - and I can explain it if anyone is interested. But I 'm pretty sure Vanilla could solve this more easily .. though hey .. enough detail. I wrote too much an hour ago..
Anyway - I'd like to know if anyone has looked into doing this - and what might be involved. I think Vanilla could use an 'alternate client-side session manager' for those wishing to scale. Perhaps it could be an option - I think the feature would be welcomed by many wishing to grow. From my quick poking - I see the People.Class.Session class could be replaced - while needing some modification to People.Class.Authenticator.
Is it that simple?
0
Comments
Though I guess Vanilla could cache things like total post counts for each topic so it doesn't have to query that as often. I know that vBulletin does that (well at least has the option to do that).
Vanilla use the php built-in session manager. By default php save the session data in local files but you can set php to save them in a database..
Hi, I reply publicly because it may be useful for others.
I apologize for my imprecise statement, the relevant comment is on VanillaDev.
It's a shame about caching. After looking into Vanilla code more, I see it would be really difficult to get right - especially given the extensible nature of the product. Plus it's not really aimed for large forums. Fortunately, throwing more hardware at the problem suffices.
Dinoboff, I think you are talking about the session_set_save_handler() routine to define user-defined session storage routines, right? Thanks for the pointer. You're right that, provided the callbacks are established before session_start(), I can easily override the storage of the PHP Session ID, and redirect storage to (for instance) a database. PHP itself still creates that Session ID (with extreme low-prob that it will conflict with another server) - but you can simply ignore it. You can use custom cookies and implement everything yourself.
It may be a shock to Vanilla users ... but Databases are slow man. You wanna avoid them - esp on high traffic sites where load-balancing is required. In my case, I'd like to integrate Vanilla on a high-traffic site that already has an efficient client-side session mechanism. Retrieving session variables from the Vanilla database on every page (to enable a user access to other parts of the site) isn't acceptable. Most discussions here relate to molding a site to Vanilla, whereas I'd like to do the opposite.
So - for others wishing to do the same, check out session_set_save_handler(). I haven't tried, but I think it's the cleanest way to alter Vanilla's session handling.
A see a quick'n'dirty way also : achieved by using Vanilla's "PersistentSession" (remember me) logic, and assigning different session names (the default is "PHPSESSID") on each servers. A persistent session is stored in the database, but only retrieved once for each web-server (which creates its own session, with a unique session cookie name). The "remember me" would have to be forced on, and the logout code needs to clear all the web-servers' session cookies .. and session data can't change. But - it's quick.
I found apples and oranges in the forum world. let's call them "bad apples" and oranges. The bad apples all look alike - I find them unacceptable from a user-interface perspective (and not easily corrected). These are written by the engineer, so a lot of them are built to scale (phpBB for instance). For the oranges, I found only two : Vanilla and bbPress. I stopped looking when I found Vanilla, there may be more. So suggestion 1 is : look for more here - http://en.wikipedia.org/wiki/Comparison_of_Internet_forum_software
Some form of caching is built into bbPress (I'm not sure how much) - but from the little I know, the caching is broken right now. bbPress doesn't index keywords, but does use the "fulltext" feature of mysql. It is very likely than bbPress will scale much better than Vanilla. But the main problem with bbPress is that it is still being developed. It isn't as slick as Vanilla, but that may also change down the road. So suggestion 2 : check bbPress out.
Another is to build caching in Vanilla. I'm real new here, so take what I say with a grain of salt (everything below is 'my opinion', not fact) :
While I respect and consider what others say, I actually think building a caching mechanism in Vanilla is possible - just pretty darn hard. I'll stick to discussing database query caching, since that interests you. So, there's smart caching and dumb caching :
Dumb Caching : As you point out, building a "dumb cache" is dirty but very effective. it involves caching data for a page and blindly using it for a set period of time (you used 5 seconds). It said it can't be done because of "whispering". i think it just changes the implementation of the cache. Take the main discussions page. When Bob is signed-in, he sees all the discussions like most people see them, but popular Sally (who gets a lot of whispers), sees a different list. To do effective caching, you need to cache per user. Imagine the cache is a directory structure, the first directory is the user_id and cached data held under them. So when Bob accesses the main page, a cache file is generated and put into his cache directory. When he refreshes the page, his cache file is fed back to him. Same for Sally. You could argue that dumb caching per user doesn't help, but visitors will count as one user - so all visitors share the same cache file. Still - it's pretty weak. If you turn off whispering, maybe you don't need per user caching (but I still think you do - see my note below about this).
Smart Caching : This is where you cache data and use it until the data changes. This is done by purging the cache file (aka invalidating the cache). The trick is to detect when the cache should be invalidated. If whispering is turned on, and caching is required per user, invalidating everyone's cache files becomes a challenge. It's not acceptable to go remove a thousand cache files (one per user). So a global cache directory should be made, and timestamps compared as part of cache fetching. Per user caching would be very effective in this case. But if whispering is turned off, per user caching *may* not be required (but I still think so) - and would greatly simplify the implementation.
Up till now, I've been talking about caching data for 'pages' - when in fact the code gets data in terms of a database query. For dumb or smart caching, the cache has to be keyed by the inputs to the database query. Fortunately, the code is already geared for this. Database queries are constructed by calls to the SqlBuilder class. The Select() and GetRow() calls of the Database class are the points to implement the caching mechanism. The cache filename is a construction of all the inputs to the database query. This all sounds easy until you look at the vast number of queries that take place in Vanilla. And some of these inputs may be things like the current time or others that constantly change. So it has to be somewhat smart. It's kinda scary. Cache invalidation would have to happen in the same place - calls in the database to update the data. I think only Mark could write the algorithm to match select inputs with update inputs to do cache invalidation. It looks way too complicated. I doubt he would be interested to do this, it looks really hard.
The other option is to cache data at certain places outside the Database class. But this requires invalidation to be done at certain places too. Since it is not a generic cache, it'll likely break and be difficult to maintain.
so, the note about whispering : The other thing I see unique per user is the highlighting showing which entries have been read and which are new. This is done via the LUM_UserDiscussionWatch table. When a user views a discussion, a timestamp entry into this table is updated. So the main discussions page will still be a unique database fetch per user, even without whispering activated. So I think that feature makes per-user caching mandatory also.
So my suggestion 3 : don't even try to implement a cache in Vanilla. In writing this up, I've convinced myself that it's just too hard man.
Suggestion 4 : search for "FARM_DATABASE_HOST" - I don't know much about it, but it looks like Vanilla somehow has built-in support for a database farm. Combine that with web-server load-balancing, and your forum scales - though at a cost.
I'm quite familiar with various cache levels/methods.
'REAL-TIME'
Experience has taught me that REAL-TIME data is rarely required in a forum setting (it's not
instant messaging). Just like a CD-Player can give a good representation of an analog signal,
'dumb-cached' pages/page chunks can provide a 'representation' of a forum that works 'good
enough' for most users.
STUPID is SMART
Since I'm pretty sure nobody can anticipate ALL exceptions, not even if you're Mark, and
complexity ALWAYS increases the # of bugs, I'm a big proponent of simple (80%) solutions
and thus 'dumb caching' pages or page elements if the way to go for me.
SCALE & SACRIFICE
I'd be more than happy to sacrifice some of the 'NICE BUT NOT REQUIRED' features for speed &
performance...
I'm sure even the exact features that PREVENT caching could be replaced by *similar* ones that
do not require a user-specific database lookup (e.g. local cookies) or can be SPLIT into generic
and user-specific queries (where the former could be cached).
Finally, whispers could perhaps be re-coded, e.g. retrieved as a separate, non-cached query and
then merged on the server, or even off-loaded to the user's machine via JavaScript for page
creation, interleaving 'regular' comments and whispers?
Just a thought...
I should really stop proposing things and look at the code myself.
@Toivo: the 'strength' of whispers however is the fact that they appear in-line with the
other comments. A separate TAB makes them into private messages (not bad,
but also not the same).