Please upgrade here. These earlier versions are no longer being updated and have security issues.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

InnoDB, Foreign Keys and Not Null

R_JR_J Admin
edited January 2014 in Vanilla 2.0 - 2.8

I have been surfing around a bit for db optimization and came across three thesis that I wanted to discuss here. I'm no expert and I wouldn't bet on any of this points but I found them convincing and worth considering. So what follows below is no knowledge or truth but simply thoughts I'd like to share and discuss.

Maybe I start with the least effective one:

Saving memory space by disallowing Null values in columns

Thesis: If you don't allow Null values in columns, you will need less memory
Explanation: if you choose for example integer for your column and allow Null values, you will have to store either integer or Null and so you will have to consume more memory per column. Easy: integer is one piece of info, integer or Null are two different kind of data.
My opinion: This seems to be design decision that could give only a small benefit. Nevertheless it would be better than nothing. I don't know if Vanilla needs Null or if it couldn't be substituted by 0 or '' where it is used by now. "null" is found in about 250 files in my local Vanilla 2.1b2 copy (might also include plugins). So this can not be easily judged.

Beside from small memory benefits when Null is abandoned, I simply don't like Null because Null means dontknowanddontcare in most cases.
10 years ago I was working with databases quite more often than nowadays. I remember than Null values where in 90% the reason when something which seemed simple did not work.

So I would like to abandon Null and if it has a performance (or better resource) advantage too, then I would think it would be worth the effort to go through Vanillas sources.

Foreign Keys to speed up queries

Thesis: Foreign Keys speed up joins
Explanation: I don't remember well, but it was something like helping the query compiler to optimize queries? I almost forgot it, sorry...
My opinion: Whether you look at discussions or user info, Vanilla uses a lot of joins, so if there is something that's speeding up joins, it fit's well to Vanilla.
Downside of Foreign Keys seem to be that writing actions will take longer. So there seems to be a general decision again: is speeding up read actions worth slowing down right actions. I'd say yes, for a forum! You will have a huge amount more reads than writes to your db. So if you sacrifice writing speed but gain reading speed for that, go for it!

InnoDB is better than MyIsam

Thesis: InnoDB is the better choice for a forum.
Explanation: MyIsam (M.) is blocking the complete table whenever any query uses the table, InnoDB (I.) locks only the current row. I. allow to use foreign keys, which should be a benefit (see above) and M. does not, I. is actively developed and might become even better in the future while M. is old fashioned and kind of fix.
My opinion: I was shocked when I read that M. is blocking the complete table on every query. That is simply not acceptable to me.
The downside of I. is, that it does not support fulltext indexing of columns in "older" versions of MySQL. I think it has only been implemented in the most current version (5.6). Vanilla relies on that fulltext index for its search capability :/
But what if someone implements a basic search plugin that uses an internal index table instead of the built in search function? Parsing each post on save, store the words of the post in an index table and use that one for a custom search function should be possible. If such a plugin would exist, comment and discussion table could be used with the InnoDB engine.

Well, what do you think of this three tweaking possibilities? Are they worth any effort at all?
I wanted to test the InnoDB and Foreign Key effect for discussion and comment table but to my shame, Ihave to confess that I didn't even know how to do this. If anybody can point me in the right direction, I'd be happy to provide some numbers.

Comments

  • My opinion, based on my experience as a DBA:

    • Nullable columns should be used when needed, and only when needed. Removing all nullable columns "just because" is a bad idea and it won't bring much of an improvement.
    • The purpose of foreign keys is to guarantee data integrity. The performance increase that they bring is a side effect that occurs because, to use foreign keys, indexes have to be created on the involved fields.
    • InnoDB is a more robust engine than MyISAM and should be preferred whenever possible. Apart from locking, InnoDB supports transactions, which are a must when data integrity is important, and it performs faster if configured properly. It also supports neat features like row compression, which may come handy in some setups.
  • Null should be used where ever there is a chance of incomplete data and a default value is unacceptable. If it is implemented properly (and I see no reason it wouldn't be), NULL be a 1 bit boolean. Add in the fact that NULL is handled natively, I really don't see a reason to throw out some information about the data for a single bit/column. You could add a bit to whatever column type
    you want and implement your own masking for the type on the application end, but that seems like a lot of extra work rather than checking for NULL.

    I will be the first to admit that my db optimization skills are pretty low, so take this with a grain of salt. :D

    Search first

    Check out the Documentation! We are always looking for new content and pull requests.

    Click on insightful, awesome, and funny reactions to thank community volunteers for their valuable posts.

  • Thanks for the feedback by now. From what I get back, it is not completely idiotic to switch discussions and comments to innodb and I have been thinking about a custom search for quite a while. It would be easy to exclude your own stopwords and define the length of words to search.

    I need two tables and one temp table (although I think creating three tables and deleting the 'temp' table at the end of the process would be better):

    CREATE TABLE `0WordList` (
      `WordID` int(11) NOT NULL AUTO_INCREMENT,
      `Word` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
      PRIMARY KEY (`WordID`),
      UNIQUE KEY `Word` (`Word`)
    ) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
    
    CREATE TABLE `0Occurance` (
      `OccuranceID` int(11) NOT NULL AUTO_INCREMENT,
      `DiscussionID` int(11) NOT NULL DEFAULT '0',
      `CommentID` int(11) NOT NULL,
      `InsertUserID` int(11) NOT NULL,
      `WordID` int(11) NOT NULL,
      `OccuranceCount` int(11) NOT NULL DEFAULT '1',
      PRIMARY KEY (`OccuranceID`),
      KEY `WordID` (`WordID`)
    ) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
    

    I start the names of my tables with "0" to see them at the top of my db, but that will be changed later on. Just if you are curious...

    Here's my temp table

    CREATE TEMPORARY TABLE IF NOT EXISTS PostWords (WordID INT(11), Word VARCHAR(255)) ENGINE=InnoDB CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    

    I have taken a text and made an array of it after performing some search and replace steps:

    1. search \r\n, replace with " " (one space)
    2. search [^\w\s] (everything that is not a character or a space) and replace with " "

    That text will be my example text, stored in the temp table. The plugin would hook somewhere AfterSaveDiscussion/Comment and use the post body. Here is how I store it:

    -- insert all words into temp table (words should be filtered before that!)
    INSERT INTO PostWords (Word)
    VALUES ('interview'),('with'),('keeper'),('zoölogical'),('gardens'),('after'),('many'),('inquiries'),('almost'),('many'),('refusals'),('perpetually'),('using'),('words'),('pall'),('mall'),('gazette'),('sort'),('talisman'),('managed'),('find'),('keeper'),('section'),('zoölogical'),('gardens'),('which'),('wolf'),('department'),('included'),('thomas'),('bilder'),('lives'),('one'),('cottages'),('enclosure'),('behind'),('elephant'),('house'),('was'),('just'),('sitting'),('down'),('his'),('tea'),('when'),('found'),('him'),('thomas'),('his'),('wife'),('are'),('hospitable'),('folk'),('elderly'),('without'),('children'),('specimen'),('enjoyed'),('their'),('hospitality'),('be'),('average'),('kind'),('their'),('lives'),('must'),('be'),('pretty'),('comfortable'),('keeper'),('would'),('not'),('enter'),('on'),('what'),('called'),('business'),('until'),('supper'),('was'),('over'),('were'),('all'),('satisfied'),('then'),('when'),('table'),('was'),('cleared'),('had'),('lit'),('his'),('pipe'),('said'),('now'),('sir'),('you'),('can'),('go'),('on'),('arsk'),('me'),('what'),('you'),('want'),('you'),('ll'),('excoose'),('me'),('refoosin'),('talk'),('perfeshunal'),('subjects'),('afore'),('meals'),('gives'),('wolves'),('jackals'),('hyenas'),('all'),('our'),('section'),('their'),('tea'),('afore'),('begins'),('arsk'),('them'),('questions'),('how'),('do'),('you'),('mean'),('ask'),('them'),('questions'),('queried'),('wishful'),('get'),('him'),('into'),('talkative'),('humour'),('ittin'),('them'),('over'),('ead'),('with'),('pole'),('one'),('way'),('scratchin'),('their'),('hears'),('another'),('when'),('gents'),('flush'),('wants'),('bit'),('show'),('orf'),('their'),('gals'),('don'),('so'),('much'),('mind'),('fust'),('ittin'),('with'),('pole'),('afore'),('chucks'),('their'),('dinner'),('but'),('waits'),('till'),('they'),('ve'),('ad'),('their'),('sherry'),('kawffee'),('so'),('speak'),('afore'),('tries'),('on'),('with'),('ear'),('scratchin'),('mind'),('you'),('added'),('philosophically'),('there'),('s'),('deal'),('same'),('nature'),('us'),('them'),('theer'),('animiles'),('here'),('s'),('you'),('comin'),('arskin'),('me'),('questions'),('about'),('my'),('business'),('that'),('grumpy'),('like'),('that'),('only'),('for'),('your'),('bloomin'),('arf'),('quid'),('seen'),('you'),('blowed'),('fust'),('fore'),('answer'),('not'),('even'),('when'),('you'),('arsked'),('me'),('sarcastic'),('like'),('like'),('you'),('arsk'),('superintendent'),('you'),('might'),('arsk'),('me'),('questions'),('without'),('offence'),('did'),('tell'),('yer'),('go'),('ell'),('you'),('did'),('when'),('you'),('said'),('you'),('report'),('me'),('for'),('usin'),('obscene'),('language'),('that'),('was'),('ittin'),('me'),('over'),('ead'),('but'),('arf'),('quid'),('made'),('that'),('all'),('right'),('weren'),('goin'),('fight'),('so'),('waited'),('for'),('food'),('did'),('with'),('my'),('owl'),('wolves'),('lions'),('tigers'),('does'),('but'),('lor'),('love'),('yer'),('art'),('now'),('that'),('old'),('ooman'),('has'),('stuck'),('chunk'),('her'),('tea');
    

    Don't ask me from which book "zoölogical" comes from! :o

  • I clean up the temp table but I think this should be done beforehand with PHP

    -- delete words <= 3 letters
    DELETE FROM PostWords
    WHERE LENGTH(Word) <= 3 -- should be done before wordlist is passed to db
    OR Word IN ('this','that') -- should be done before wordlist is passed to db
    ;
    

    Afterwards, I add all "new" words to my wordlist

    -- update word list
    INSERT INTO 0WordList (Word)
    SELECT DISTINCT Word
    FROM PostWords
    WHERE LENGTH(Word) > 3 -- should be done before wordlist is passed to db
    AND Word NOT IN (SELECT Word FROM 0WordList) -- should be done before wordlist is passed to db
    AND Word NOT IN ('this', 'that') -- should be done before wordlist is passed to db
    GROUP BY Word
    ;
    

    Now I can update my temp table with the IDs of the words:

    -- add WordID to words
    UPDATE PostWords
    SET WordID = (SELECT WordID FROM 0WordList WHERE PostWords.Word = 0WordList.Word)
    ;   
    

    Now I can insert new lines for my discussion/comment in the occurance table:

    -- save results in occurance table
    INSERT INTO 0Occurance (DiscussionID, CommentID, InsertUserID, WordID, OccuranceCount)
    SELECT 17, 4, 1, WordID, COUNT(WordID)
    FROM PostWords
    GROUP BY WordID
    ;
    

    Searching for words will first look up the WordID and then do a SELECT * FROM 0Occurance WHERE WordID = X ORDER BY OccuranceCount DESC

    These are my thoughts so far. I do not know when I will have time to elaborate that so I just posted it here in order not to lose track of it ;)
    And if you have some thoughts and early feedback that you like to share with me, I would be happy!

    (My biggest concern by now is that I will not be able to use the sad blubbot as an icon for that plugin)

Sign In or Register to comment.