Please upgrade here. These earlier versions are no longer being updated and have security issues.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Vanilla 2.1 - Why the self joins?

businessdadbusinessdad Stealth contributor MVP

While I was doing some tests with Vanilla 2.1, I noticed that some queries use a self join, but I could not figure out the reason for it. For example, the query for Discussions does the following:

SELECT SomeFields
FROM
  Discussions d
  JOIN
  Discussions d2 ON
    (d.DiscussionID = d2.DiscussionID)
  JOIN
  // Other tables

In all my tests, the self join never changed, it seems to be the basic part of the query. Is that a "trick" of some sort?

Comments

  • GaryFunkGaryFunk Senior Application Developer ✭✭

    Can you point to the file where you see one of these? I'd love to look at it and see if there is a reason.

  • businessdadbusinessdad Stealth contributor MVP

    @GaryFunk Sure, here you go:

    DiscussionModel::GetWhere()

      // Build up the base query. Self-join for optimization.
      $Sql->Select('d2.*')
         ->From('Discussion d')
         ->Join('Discussion d2', 'd.DiscussionID = d2.DiscussionID')
         ->OrderBy('d.DateLastComment', 'desc')
         ->Limit($Limit, $Offset);
    

    Another one, with a slightly clearer comment, in ConversationModel::Get2():

      // The self join is intentional in order to force the query to us an index-scan instead of a table-scan.
      $Data = $this->SQL
         ->Select('c.*')
         ->Select('uc2.DateLastViewed')
         ->Select('uc2.CountReadMessages')
         ->Select('uc2.LastMessageID', '', 'UserLastMessageID')
         ->From('UserConversation uc')
         ->Join('UserConversation uc2', 'uc.ConversationID = uc2.ConversationID and uc.UserID = uc2.UserID')
         ->Join('Conversation c', 'c.ConversationID = uc2.ConversationID')
         ->Where('uc.UserID', $UserID)
         ->Where('uc.Deleted', 0)
         ->OrderBy('uc.DateConversationUpdated', 'desc')
         ->Limit($Limit, $Offset)
         ->Get();
    

    Please note that I'm not stating that above queries are wrong (after all, they work). It's just that I write queries all the time, and I never heard a self join improving performances. I reckon it must be a "trick" specific to MySQL (I normally use SQL Server), and it would be interesting to read more about it. :)

  • I wonder USE INDEX or FORCE INDEX would help.

    grep is your friend.

  • GaryFunkGaryFunk Senior Application Developer ✭✭

    I fully understand the second example. I do queries like that a lot. But the first one, I don't get it. I see no reason to join when no fields are selected and the order can be done on the d2 table.

  • businessdadbusinessdad Stealth contributor MVP

    @GaryFunk said:
    I fully understand the second example. I do queries like that a lot.

    So, you can shed some light on it. Why a self join between uc and uc2, which are the same table? It should be exactly the same if the query used uc only.

  • GaryFunkGaryFunk Senior Application Developer ✭✭
    edited March 2013

    Just as it states: The self join is intentional in order to force the query to us an index-scan instead of a table-scan.

    On a small table you won't see any difference. BUT, with several thousands of records, it will make a big difference.

    I think that is the reason in the first example also.

  • businessdadbusinessdad Stealth contributor MVP

    @GaryFunk said:
    Just as it states: The self join is intentional in order to force the query to us an index-scan instead of a table-scan.

    I understood that. As I wrote earlier, I assume that's a MySQL trick to use when queries get a bad plan, as such additional join would decrease performances, rather than increase it. I will probably ask the MySQL community why the engine chooses a poor plan in these cases.

  • GaryFunkGaryFunk Senior Application Developer ✭✭

    I am looking into the same. That may have been necessary back with MySQL3, but I'm betting it's not necessary in MySQL 5.5.

  • they would have never tested for MySQL 3. The requirement is for 5. The engine is pretty good an optimisation, but, even they give a way to force the index.

    grep is your friend.

Sign In or Register to comment.