Vanilla 1 is no longer supported or maintained. If you need a copy, you can get it here.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Attachments download problem

jazjaz
edited August 2008 in Vanilla 1.0 Help
Hi there!

I'm having the same problem as motora reported in another post.

When downloading an Excel-file (or any other file), which was attached with the Attachements 2.0 extension, the file is corrupted with the following extra bytes in the beginning of the file (hex dump):

EF BB BF EF BB BF

Through some Googling I found out that EF BB BF is the BOM (Byte Order Mark) of UTF-8. So for some reason, PHP 5 (5.1.4) is adding the BOM twice in the beginning of the downloaded file, which makes Excel unable to read it.

To narrow down the problem, I created the following experimental PHP-file in the vanilla directory:

<?php include("appg/settings.php"); $Configuration['SELF_URL'] = 'test.php'; include('appg/init_vanilla.php'); header('Pragma: public'); header('Expires: 0'); header('Cache-Control: must-revalidate, post-check=0, pre-check=0'); header('Content-Type: application/force-download'); header('Content-Type: application/octet-stream'); header('Content-Type: application/download'); header('Content-Disposition: attachment; filename=test.xls'); header('Content-Transfer-Encoding: binary'); header('Content-Length: 4'); echo('Test'); exit; ?>

This piece of code is based on the SaveAsDialogue function in the Framework.Functions.php-file. By saving the file generated by this code and inspecting with a hex-editor, you get, surprise-surprise, not just the word Test, but instead also EF BB BF EF BB BF in the beginning. By commenting out the three first lines (the Vanilla init code), the extra bytes are gone. Hence, there seems to be something in the Vanilla init code, which makes the UTF-8 BOMs appear in the beginning of the script output. I suspect this is something that PHP5 has brought along and hence not many people have ran into the problem yet.

Could somebody more familiar with the Vanilla code base have a look at the code to diagnose this problem? I tried using the PHP multibyte-functions for controlling the output, with no success (mb_http_output, mb_convert_encoding etc).

BR,
Johan

P.S. Vanilla is an excellent piece of SW! Simple but elegant - good work!

Comments

  • Could it be because of appg/headers.php?
    * Description: Assigns headers to all pages */ // PREVENT PAGE CACHING header ('Expires: Mon, 26 Jul 1997 05:00:00 GMT'); // Date in the past header ('Last-Modified: ' . gmdate('D, d M Y H:i:s') . ' GMT'); // always modified header ('Cache-Control: no-cache, must-revalidate'); // HTTP/1.1 header ('Pragma: no-cache'); // HTTP/1.0 // PROPERLY ENCODE THE CONTENT header ('content-type: text/html; charset='.$Configuration['CHARSET']);
  • That's a good idea, but I tried commenting out the content-type header and this had no effect. The extra bytes still appear in the beginning of the data sent out from the server. Any other ideas?
  • Has anybody else managed to get the Attachments extension to work on PHP 5? (attach a binary file, such as a word document, and try opening the attachment) For me, the file is saved ok into the server file system, but when requesting the download, the extra bytes are prepended to the file, which screws up the download and causes an error message in the target application (MS Word in this case).
  • Hi again,

    I was now able to solve this problem myself. The problem was that the language files (definitions.php) and Framework.Funtions.php are UTF-8 encoded. This seems to be something that at least the PHP 5.1.4 parser reacts to by sending the UTF-8 BOMs in the beginning of these files directly to the HTTP output. Hence there are some "garbage" bytes in the beginning of the attachment files, which screws up binary files.

    To solve this I extracted the one thing from Framework.Functions.php, which seems to require the UTF-8 encoding, namely the CleanupString-function, which has some special character defined within it. I put this function in a separate UTF-8 encoded file Framework.Function.CleanupString.php as follows:

    <?php // Lussumo community alias jaz 28.11.2006 // This is in a separate file because it is UTF-8 encoded and hence messes // up downloads at least on PHP 5.1.4. To fix this, this file won't be included // in Framework.Functions.php if the current request is a request to download // an attachment. // function CleanupString($InString) { .... } ?>
    Then, within FrameworkFunctions.php i added this code instead of the CleanupString-function:

    if(!isset($_REQUEST['Download'])) { include($Configuration['LIBRARY_PATH'].'Framework/Framework.Function.CleanupString.php'); }
    Finally I converted Framework.Functions.php to ASCII and saved it. The other file which caused problems, the language file, I dealt with in a similar fashion. In init_vanilla.php i replaced the language file include directive with:

    if(!isset($_REQUEST['Download'])) { include($Configuration['LANGUAGES_PATH'].$Configuration['LANGUAGE'].'/definitions.php'); }
    This way I managed to get binary attachments to work on PHP5. I know it's not a very elegant way, but it works. Maybe the real development team can come up with a better way to do it, but whatever the means, I think that this UTF-8 problem should be somehow fixed in coming releases.

    BR,
    Johan
  • I found a better way to get around the attachment problem. This also tackles another problem I found with the extra UTF-8 BOMs, which causes Internet Explorer to not show the 'Apply for membership'-link on the login screen. The trick is to use output buffering when including UTF-8 encoded php-files. First, you start output buffering, then you include the UTF-8 php-file and finally you discard the output buffer, hence getting rid of the extra bytes in the beginning of the output.

    So, for the Attachments 2.0 plugin, do the following instead of the ifs i presented above.

    init_vanilla.php:
    // DEFINE THE LANGUAGE DICTIONARY ob_start(); include($Configuration['LANGUAGES_PATH'].$Configuration['LANGUAGE'].'/definitions.php'); ob_end_clean();
    Framework.Functions.php:
    ob_start(); include($Configuration['LIBRARY_PATH'].'Framework/Framework.Function.CleanupString.php'); ob_end_clean();
    ... where Framework.Function.CleanupString.php is the isolated CleanupString-function, which requires UTF-8 encoding in the PHP-file.
  • Is this a faililng of Vanilla rather than the Attachments 2.0 extension then?
  • jazjaz
    edited December 2006
    Actually yes. I think it is a problem of Vanilla not properly supporting PHP5 for UTF-8 encoded files, which as a side-effect, makes binary downloads impossible.

    One more correction to the SaveAsDialogue-function within Vanilla Framework.Functions.php is the following (also useful when using the Attachments extension):

    header('Content-Disposition: attachment; filename="'.$filename.'"');
    By adding the double quotes around the filename, filenames containing spaces can be downloaded properly. Without this fix, only the first part of the filename is interpreted by the browser at download-time. So, instead of "My Excel Sheet.xls" the browser asks where to store a file named "My".
  • <cheeky>Any chance of this going into Vanilla 1.0.4 and getting released pronto? :D</cheeky>
  • MarkMark Vanilla Staff
    output bufferring is already in place in Vanilla (it starts at the top of appg/settings.php). So, I just added the ob_end_clean() function after the framework function file and the language dictionary are included and then restarted it. It *should* resolve the issue.
  • With all the latest fixes, what is the best way to get them all into my site?
  • MarkMark Vanilla Staff
    Probably this week sometime - I have found a couple of bugs with some of the new code and I will keep testing it here on this site until it is resolved.
  • Why isn't this fix for init_vanilla.php in V 1.1.1?
  • I've got some kind of problem with this fix - I simply get a white page an nothing else. When I comment out appg/init_vanilla.php line 57,58: // ob_end_clean(); // ob_start(); everything is working again as expected. I can't figure out in what way this change is conflicting with something else. Kind regards Markus
  • I also encountered this problem, while vanilla and wordpress together. it turns out that having gzip enabled was the cause. since wordpress runs before vanilla does, its ob_start(ob_gziphandler) call is cleaned out by that call... I'm not sure how the handler works but my guess is that it takes over the http output and sends its buffer contents(nothing) to the client. or it could be a nested ob_start problem too... dunno I don't use the attachments plugin yet, but I may someday so I don't just want to remove this.. although that seems to fix it.. there has to be a better way to fix the utf8 problem on downloads... couldn't the attachments plugin use a separate php file to handle the downloads, either outside of vanilla or using some custom include setup.. what about calling ob_end_clean() after including the settings.php in a custom plugin file(download.php for instance) I really need gziping on my forums, I have alot of dial-up users and a 50k html file... I've already made a plugin for wordpress that rapes the <head></head> section for all the css/javascript files and compiles them into one cashed file(kinda like vanilla packer) that's served gziped, so the only think left that isn't is the html..
  • edited November 2007
    Is this still a known problem? I seem to be experiencing the same symptoms with attachments being corrupted when they are downloaded from the forum. I looked at implementing the same fix mentioned above (comments #6 and #8), but as expected they seem to have already been incorporated into the later releases of Vanilla anyway.

    I'm running:
    Vanilla 1.1.4
    Attachments 2.1

    on a Windows 2000 server running:
    Apache 2.0.54
    PHP 5.0.4

    When a user downloads an attached .xls or .doc they can't be opened. I've checked the uploaded files on the server and they are in tact. It's only after download via Vanilla that they seem to be corrupted.

    I'm really not all that used to either Apache or PHP unfortunately (I've been a slave to IIS and .asp for the last few years) - could it be some Apache configuration that's causing my problem?

    Any help at all would be greatly appreciated. I do have attachments working on two other Vanilla forums on a Linux box with PHP4, this particular install sadly has to be on the windows box.



    After some more digging it seems that my files are being prefixed with the following bytes (hex) 20 0A. This seems to relate to the discussion here: http://bugs.php.net/bug.php?id=41491, but I'm still not closer to a solution. Just thought the further info might help someone to help me?

    I wonder if there's an extra line feed at the end of some file that's causing this. I did upload these files from a Linux box to the Windows server come to think of it...
  • edited November 2007
    Problem solved. It was indeed some UNIX linefeeds upsetting the windows box/php.
  • I can confirm: I have this same problem but on a normal linux server running php 4.4.6

    I seem to be experiencing the same symptoms with attachments being corrupted when they are downloaded from the forum. I looked at implementing the same fix mentioned above (comments #6 and #8), but as expected they seem to have already been incorporated into the later releases of Vanilla anyway.

    I'm running:
    Vanilla 1.1.4
    Attachments 2.1

    on a Linux server running:
    Apache/1.3.37 (Unix) PHP/4.4.6 with Suhosin-Patch

    When a user downloads an attached .xls or .doc they can't be opened. I've checked the uploaded files on the server and they are intact. It's only after download via Vanilla that they seem to be corrupted.

    ------

    Strangely PDF files are not a problem.

    The mime-type patch here does not seem to solve it.

    What's the best way to go about solving this?
  • Was this issue ever solved? I am running Vanilla 1.1.4 & Attachment 2.1, and I am still experiencing this problem. Word and Excel files are uploaded OK to the server, but when they are downloaded, they got corrupted. Any help, pls??? I tried the fixes explained above, but got no luck.
This discussion has been closed.