HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Looking for experience with implementing cron functionality in Vanilla

ToddTodd Chief Product Officer Vanilla Staff

Hi all. We are doing some research into implementing cron functionality into vanilla core. I noticed that @businessdad made a plugin that does this and it looks very promising. However, it does require some sort of cron ability on your web host. We need to have a fall back.

I just want to see if anyone has any experience with other projects that implement some sort of cron ability in a nice way. The way I see it is we have two scenarios.

Webhost has native cron functionality and the forum administrator knows how to use it.

This is the best case for us. We can make an endpoint such as /cron that gets pinged by the real cron job. This will work well and will be performant. The downside of course is that many web hosts don't offer cron functionality and setting it up is too difficult for a lot of people.

Webhost doesn't have cron functionality or the forum administrator doesn't know how to use it.

Traditionally, I've seen other projects check to run a cron job on every postback or launch an ajax request to do so. The check would have to check for jobs and then run them. This process works without any additional setup however it has downsides too.

  1. Doing a check every postback adds a lot of traffic to the site. Usually hosts that don't allow cron also don't allow too much traffic either.

  2. There is the possibility of a race condition where two processes may grab the same cron job at the same time.

What should we do?

I think we need a good non-native cron way of implementing cron or else we just can't do it. I'm hoping other people have seen this problem solved reasonably well in other projects. Any ideas?

Tagged:

Comments

  • vrijvlindervrijvlinder Papillon-Sauvage MVP

    I found some issues with croning. I set up a post by mail plugin in a website for someone and I host it for them. It is called Postie, on WP . It's cron is to check for mail to post a new blog. I installed it because the jetpack post by mail was not working properly . The problem I found is, is it littered my log with checking mail . I had to make it check less frequently which defeats the purpose really.

    I do wonder if this sort of croning would affect the server performance if it involved more users.

  • I've used Drupal, which has its own implementation of cron. It's not required functionality, but it is handy for managing search indexes, aggregation, pings, and routine maintenance tasks.

    In Drupal 8, admins are able to set the frequency of when cron will be triggered by users visiting the website. This is good for websites with large traffic, but for low traffic websites where there is interest in using the cron system, one would add a command to the crontab which links to a secure private URL generated in Drupal.

    Add Pages to Vanilla with the Basic Pages app

  • businessdadbusinessdad Stealth contributor MVP
    edited August 2013

    @Todd My Cron plugin does not necessarily require cron to be installed on the server. The Cron schedule is invoked via a secured URL, therefore you can trigger it manually, run a scheduled task on another server, on your client PC, or do anything you like to call that URL. This is, more or less, what Drupal does as well (at least, it did until version 7).

    Personally, I don't like to tie the Cron execution to User's visits, because cron tasks can be time consuming and one unlucky visitor would see an unresponsive site until they are completed. Also, visits occur at random times, it's quite messy to figure out if the cron should be run "in arrear".

    In my opinion, a scheduled task is supposed to run on a regular basis. Everything else is just manual or random, and I don't know which of the two is worse.

  • businessdadbusinessdad Stealth contributor MVP
    edited August 2013

    @Todd, in addition to my previous post, I would like to share some suggestions, based on my experience dealing with heavy tasks at risk of race conditions.

    As you pointed out, triggering a postback at every load would have several undesirable side effects:

    • Increase the traffic.
    • Race conditions may occur.
    • If Cron tasks are heavy (and they would have all the right to be, that's why they are scheduled), the site would become unresponsive.
    • A misconfigured Cron task would run at every page load, potentially killing the site.

    Suggestions for pseudo-cron based on page load

    • A Cron process should go through all jobs, or none of them. Even if two Cron were to run at the same time, they should not be allowed to cherry pick the jobs to run. By ensuring atomicity, Cron task dependency could be implemented at a later time.
    • Use flock() to implement a poor man's semaphore, which would work as follows.

      • When Cron is fired on page load, attempt to acquire a lock on a file.
      • If flock() fails, then it means that another Cron is running and nothing should be done.
      • If flock() succeeds, run the Cron tasks.
      • At the end of the Cron process, release the lock. Uou can wrap this lock/unlock logic in a class and put the unlock call in the __destruct() method, so that it will be called automatically.
        Note: This method may not work flawlessly on IIS, due to its multi-thread implementation, but it doesn't require any particular library to be installed, unlike proper semaphores.
    • Ensure that a throttling mechanism is in place, and that it's respected, or you would risk running Cron tasks all the time. My plugin already implements throttling on a minute, hour and day basis, so we could start from there.

    If you wish, I can review my Cron plugin and see if I can extend it to implement the missing part. I would dare to say that 90% is covered already, only the "on page load" trigger would have to be implemented.

    By the way, I knew that the Cron feature was going to be useful! :D

  • x00x00 MVP
    edited August 2013

    I think using an external service to trigger the cron makes sense, where they are unable to set it up themselves.

    As businessdad says you can use a secured api, but the cron scheduler could have its own api, to set up the crons, via the dashboard.

    The main problem is the reliability of the cron scheduler and site is attempting to trigger the cron on.

    grep is your friend.

  • x00x00 MVP
    edited August 2013

    What do actually want to do with this cron? Is it just a general facility or do you

    In other words what problem are you trying to solve? Because that can make a differnce whit the solution.

    I think doing expensive operations on a plaform without the resources is a bad idea anyway, whether you use a cron or not. Therefore breaking the problem up, or rethinking the problem is necessary.

    Other then that all I can say is I don't think I could live without crontab an the like. Seriously cheap host are offering cron schedulers. There are common formats which come with the well known panels, so you could provide pointers, and suggest the nag their host if they get stuck.

    grep is your friend.

  • ToddTodd Chief Product Officer Vanilla Staff

    We have ideas for a lot of cron type jobs. Some examples would be:

    • Throttle and group notifications.
    • Temporary bans.
    • Remove old data from the activity table.
    • Implement a discussion archiving mechanism (this one is only a slight possibility).

    On our hosted version we will be implementing a memory queue for a lot of stuff too, so an often running cron may be what we use on the open source version for certain things.

  • ToddTodd Chief Product Officer Vanilla Staff

    @businessdad, I think I'd prefer some sort of lock made using the database rather than the filesystem.

  • businessdadbusinessdad Stealth contributor MVP

    @Todd said:
    businessdad, I think I'd prefer some sort of lock made using the database rather than the filesystem.

    Locking via database could be tricky to achieve.

    • If a transaction is used, there's the guarantee that it will rolled back in case of crash, but it would risk to lock every other process that tries to acquire the lock before running the cron. Also, the behaviour of transactions greatly depends on the transaction isolation setting, which is quite an advanced concept (sometimes, even for developers).
    • If a transaction is not used, then, in case of crash, the locking element would remain locked forever, until someone runs an SQL to reset it.

    I understand why a database sounds more robust, but it's not very easy to use it as a semaphore.

  • ToddTodd Chief Product Officer Vanilla Staff

    There's other ways to do a lock in a db without significant transactions. Here is one suggestion.

    1. Update a column to a random number where it is equal to 0. This is an atomic operation.
    2. Select the row from the database.
    3. Check to see if the column is equal to my random number.
    4. Do the cron.
    5. Update the column to 0.
  • businessdadbusinessdad Stealth contributor MVP
    edited August 2013

    @Todd said:
    There's other ways to do a lock in a db without significant transactions. Here is one suggestion.

    1. Update a column to a random number where it is equal to 0. This is an atomic operation.
    2. Select the row from the database.
    3. Check to see if the column is equal to my random number.
    4. Do the cron.
    5. Update the column to 0.

    That's what I would have done. However, what if the cron crashes half way through? The column is not zero, thus no update will ever be made again until somebody resets it manually. I've been there, I implemented this logic several times before and kept finding pitfalls... Total PITA.

    Correction

    In my implementation, the steps were four, rather than five:

    1. Update a column to anything where it is equal to zero.
    2. Check the number of updated rows. If the number is zero, then the update failed, which should mean that Cron is already running.
    3. Run the cron tasks.
    4. Reset the column.

    Such logic will not work, in case of crash, exactly like the one above, but it saves a SELECT operation and it doesn't run the risk of finding the correct random number by coincidence (e.g. two processes generate the same random number, thus running twice in parallel).

  • @Todd said:
    We have ideas for a lot of cron type jobs. Some examples would be:

    • Throttle and group notifications.
    • Temporary bans.
    • Remove old data from the activity table.
    • Implement a discussion archiving mechanism (this one is only a slight possibility).

    On our hosted version we will be implementing a memory queue for a lot of stuff too, so an often running cron may be what we use on the open source version for certain things.

    1. The first one, I guess you want to consolidate notification and send them in a single consolidated email, and you want to do this when the user is not online. I can see why you want to use a cron for this. However there could be an external service which could be tasked pinging a secured handler, and it could either be sent out using local resources, or you could use services like mandrill to send them out and/or schedule them.

    2. I don't really see why you need a cron for this, becuase it is based on access anyway, and besides a date and some logic is enough to determine if they are still banned, if someone else wants to know.

    3. I do this regularly, however this is one of the more simple crons. In fact I wouldn't even consider a fixed cycle cron at all. Removing one Activity record is not expensive, in fact moving dozens of records is not necessarily expensive, and once you reach your ideal size, so long as you are removing records often enough, you don't really need to set a special cron.

    4. There a bit more to archiving discussions true, but maybe you should avoid physically archiving them. Archiving could also be based on display conditions. If what you want is more backups and dumps, then if you feel this is your responsibility at all, you could provide an separate bit of code for this with bit longer execution times, but break up the task over several requests. I wouldn't link this to any cron.

    grep is your friend.

Sign In or Register to comment.