Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.
Options

slow queries on large db - 1.1.7

klkl
edited July 2009 in Vanilla 1.0 Help
Hello, The following is taken from my MySQL slow query log. MySQL is running with the default settings on its own machine. Several queries are slow, but so far this is the worst, and I decided this would be a good place to start. This query takes twenty-five seconds to execute. # Time: 090427 12:38:30 # User@Host: vanilla # Query_time: 25 Lock_time: 0 Rows_sent: 30 Rows_examined: 1101781 SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUserID, t.WhisperUserID AS WhisperUserID, t.Active AS Active, t.Closed AS Closed, t.Sticky AS Sticky, t.Sink AS Sink, t.Name AS Name, t.DateCreated AS DateCreated, t.LastUserID AS LastUserID, t.DateLastActive AS DateLastActive, t.CountComments AS CountComments, t.CategoryID AS CategoryID, u.Name AS AuthUsername, lu.Name AS LastUsername, c.Name AS Category, b.DiscussionID is not null AS Bookmarked, utw.LastViewed AS LastViewed, coalesce(utw.CountComments, 0) AS LastViewCountComments FROM LUM_Discussion t left join LUM_User u ON t.AuthUserID = u.UserID left join LUM_User lu ON t.LastUserID = lu.UserID left join LUM_Category c ON t.CategoryID = c.CategoryID left join LUM_CategoryRoleBlock crb ON t.CategoryID = crb.CategoryID and crb.RoleID = 4 left join LUM_UserBookmark b ON t.DiscussionID = b.DiscussionID and b.UserID = 93 left join LUM_UserDiscussionWatch utw ON t.DiscussionID = utw.DiscussionID and utw.UserID = 93 left join LUM_CategoryBlock cb ON t.CategoryID = cb.CategoryID and cb.UserID = 93 WHERE (crb.Blocked = '0' or crb.Blocked = 0 or crb.Blocked is null ) AND t.Active = '1' and (cb.Blocked = '0' or cb.Blocked = 0 or cb.Blocked is null ) and (t.WhisperUserID = '0' or t.WhisperUserID = 0 or t.WhisperUserID is null ) ORDER BY t.Sticky ASC, t.DateLastActive desc LIMIT 0, 30; SQL Beautifier + EXPLAIN: http://pastie.org/private/iczbi2zp4ai4zkvt8f4dg You'll notice that it hits every row in LUM_Discussion. My first suspicion was that the 'ORDER BY t.Sticky ASC, t.DateLastActive desc' part of the query was slowing things down. (By default there is an index on LUM_Discussion.DateLastActive.) There is no index on LUM_Discussion.Sticky. Adding an index to Sticky did not work. Nor did removing the DateLastActive index and adding a multi-column index on (Sticky, DateLastActive). So it appears the problem is not there. The EXPLAIN command seems to confirm this. How can we speed this up?

Comments

  • Options
    klkl
    edited April 2009
    .
  • Options
    klkl
    edited May 2009
    I've found a way to speed up this query. Without such a speedup it is impossible run Vanilla on a large forum (~1M discussions). Because the query can be called in [at least] two different ways, you will need to add two different indexes to LUM_Discussion. They are as follows: KEY `discussion_category_sticky_dateactive` (`CategoryID`,`Sticky`,`DateLastActive`), KEY `discussion_sticky_dateactive` (`Sticky`,`DateLastActive`) and you can view the complete table creation syntax here: http://pastie.org/private/j1lzmbz6ppn6ac3cz0jriq The three-column index is used for SELECTs involving a CategoryID in the where clauses. This corresponds to browsing an individual category. The two-column index is used for SELECTs which select discussions irrespective of CategoryID. This corresponds to browsing all threads regardless of CategoryID. These new indexes will support splitting up the main discussion query into two parts. The basic idea is first select the stickied threads, perform a small calculation, and then select unstickied threads to fill up the remaining slots. It may be helpful to see an examples before I post the SQL. When we load the first page of discussions we need, say, the following list in the following order: STICKY thread33 lastpost-2008-01-01 STICKY thread11 lastpost-2007-01-01 STICKY thread44 lastpost-2005-01-01 UNSTICKY thread55 lastpost-2009-01-01 UNSTICKY thread00 lastpost-2008-01-01 UNSTICKY thread22 lastpost-2007-01-01 now, it's entirely possible to have the sublist of stickied threads spill over to the second page, but in our example page one already exhausted the stickied threads so page two must look something like this: UNSTICKY thread95 lastpost-2006-01-01 UNSTICKY thread92 lastpost-2005-01-01 UNSTICKY thread89 lastpost-2004-01-01 UNSTICKY thread85 lastpost-2003-01-01 UNSTICKY thread84 lastpost-2002-01-01 UNSTICKY thread80 lastpost-2001-01-01 So our PAGESIZE is 6, and the number of stickied threads, STICKYSIZE, is 3. Page one (PAGEINDEX=0) contains three stickied threads and three unstickied threads. Page two (PAGEINDEX=1) contains all unstickied threads. If you can "SELECT * from LUM_Discussion ORDER BY Sticky ASC, DateLastActive DESC", then it is easy to retrieve a page (Sticky is a 'backwards' enum). You simply append the phrase "LIMIT a, b" to your query, where 'a' is PAGESIZE * PAGEINDEX, and 'b' is PAGESIZE. This is what Vanilla does right now. Unfortunately, this query is very slow. It performs a table scan. This may be because Vanilla is missing some index which is beyond my ability to identify. (The WHERE clause is large and contains several joins.) Since I could not determine which index needed to be added, my solution is to split the standard SELECT query into two different SELECT queries. The first query is roughly the same as the slow query but limited to sticky threads (WHERE Sticky ='1'), and the second query is roughly the same as the slow query but limited to threads which are not sticky (WHERE Sticky='0'). One additional change to both queries is that "Sticky ASC," is dropped from the ORDER BY clause. Because of the new indexes both of these queries run very quickly. For the first query you can append the standard "LIMIT a, b" clause, but for the second query you must account for the number of stickied threads you retrieved in the first. This is accomplished by simple subtraction. If STICKYSIZE equals the number of stickied threads you retrieved in the first query, then the LIMIT clause you append to the second query looks like: "LIMIT (a-STICKYSIZE), (b-STICKYSIZE)", and a table UNION of the two queries will return the same result as our initial, too slow query. It is probably easier to perform this UNION as an array_merge in PHP. We already have to stop to calculate the number of sticky threads returned from the first query. The first query (with arbitrary values as arguments to LIMIT): http://pastie.org/private/nu1nwt0i4dh80oukjkqoow The second query (with arbitrary values as arguments to LIMIT): http://pastie.org/private/i7q4xzoeoj5lqst6wkmdja So far all of the queries have been dependent on the two-column index `discussion_sticky_dateactive`. There is also the three-column index `discussion_category_sticky_dateactive`. This column is necessary for queries against the discussion table which filter by CategoryID. Any example of filtering by CatergoryID is here (for the unsticky side): http://pastie.org/private/kv3hippprfkl7jymhup4tq I am planning on creating an add-on that implements this, but this is something that should be patched in the core. Do any Vanilla developers have time to look at this (Dinoboff?)? There are several other slow queries in Vanilla that need to be fixed, so I would expect this to be the first of a suite of changes.
  • Options
    This is *WAY* beyond me, but if you're planning on creating an addon that can speed up Vanilla, it would greatly benefit the community, because it is something that has been mentioned as a problem before. Thank you!
  • Options
    edited May 2009
    Beyond me too. So what do I do to implement this kl? I don't need it right now, but if things go as planned I may end up with a somewhat huge forum in the future, and don't want it to die a slow death. Thanks. [edit] I've been testing bbPress (from the creators of WordPress). They run their own forum with it, and say they have in excess of 100,000 members and 500,000 posts, and it still runs as fast as ever. All the plugins I've tried work first time too, without any hacking around. I'll keep a close eye on Vanilla 2, but perhaps bbPress is the way to go for now.
  • Options
    Any news on this?
  • Options
    MarkMark Vanilla Staff
    Sorry guys. All I can say is that Vanilla 2 is much faster and will be available soon :/
Sign In or Register to comment.