"Undefined foreign content" when fetching wordpress post
I set up Vanilla 2.0.18.4 (w/ Vanilla plugin) and Wordpress (w/ Vanilla Forums). All updated to the lastest version.
I correctly connected the Vanilla install through the wp plugin, choose the forum category where the wp posts end up.
When someone comments, it does create a new post in the chosen category, but the content is just a broken link to the post and the title is Undefined foreign content.
What do?
Best Answer
-
x00
MVP
If vanilla cannot access the site to scrape the title it will not be able to find it. The facility uses
FetchPageInfowhich usesProxyRequestwhich requirescurl.FetchPageInforequiresDOMDocumentit will first search for the title element, and meta description, if it doesn't find it it will look for the first p element with content greater than 90 characters, and will chop it at 400 characters.I not really sure why they implemented it this way. Personally I would have use the excellent JSON api that is available for a plugin with wordpress.
grep is your friend.
0
Answers
Vanilla 2.0.18.4 (w/ Vanilla plugin) - I meant w/ < embed > Vanilla plugin.
If vanilla cannot access the site to scrape the title it will not be able to find it. The facility uses
FetchPageInfowhich usesProxyRequestwhich requirescurl.FetchPageInforequiresDOMDocumentit will first search for the title element, and meta description, if it doesn't find it it will look for the first p element with content greater than 90 characters, and will chop it at 400 characters.I not really sure why they implemented it this way. Personally I would have use the excellent JSON api that is available for a plugin with wordpress.
grep is your friend.
well then what's the problem?
I don't know I'm, pointing you in the right direction, but I'm not goign to investigate further.
grep is your friend.
Simple gotacha is a private site. Scraping need to be done on public content, if not this solution is not suitable.
grep is your friend.
I've put the blog offline to non-administrator with a plugin, could that be the problem?
likely, basically if it can't access publicly it via curl it can't scrape the title.
This solution only works with public content.
grep is your friend.
nope, tried disabling it and commenting a post, still get "Undefined foreign content" post from the "System" user with a broken link to the wp post.
well I would work through the dependencies.
grep is your friend.
what do you mean?
I said I could only take this so far, that is my lot. I mentioned the dependencies of this system above, if you don't know you need to find someone who would be able to do that for you. It is not something that can just be sorted out on the discussion.
grep is your friend.
I'm surprised to be the only one to have this issue.. I performed various google searches and apparently no one had my same problem.
You get "undefined foreign content" when Vanilla fails to retrieve the page in question. So, either your page is unavailable to unauthenticated users (ie. in draft mode), or curl is not set up or working properly.
This is the curl configuration of the server I'm using.
Using this advice it now works.
But why does it fetch BLOG_NAME » BLOG_POST as title and the same thing with a thumbnail of blog's logo and metalink as content? Shouldn't it fetch post name and content?
that was their design I'm not a fan of it, but hey fetch whatever the title element is. So what you could do is make sure the title of post is exactly that.
You can also override the FetchPageInfo function. create
conf/boostrap.before.phpand put<?php if (!defined('APPLICATION')) exit(); if (!function_exists('FetchPageInfo')) { /** * Examines the page at $Url for title, description & images. Be sure to check the resultant array for any Exceptions that occurred while retrieving the page. * @param string $Url The url to examine. * @param integer $Timeout How long to allow for this request. Default Garden.SocketTimeout or 1, 0 to never timeout. Default is 0. * @return array an array containing Url, Title, Description, Images (array) and Exception (if there were problems retrieving the page). */ function FetchPageInfo($Url, $Timeout = 0) { $PageInfo = array( 'Url' => $Url, 'Title' => '', 'Description' => '', 'Images' => array(), 'Exception' => FALSE ); try { $PageHtml = ProxyRequest($Url, $Timeout, TRUE); $Dom = new DOMDocument(); @$Dom->loadHTML($PageHtml); // Page Title $TitleNodes = $Dom->getElementsByTagName('title'); $PageInfo['Title'] = $TitleNodes->length > 0 ? $TitleNodes->item(0)->nodeValue : ''; /* *Do some string manipulation here * * e.g. $PageInfo['Title']=substr($PageInfo['Title'],stripos('» ',$PageInfo['Title'])); */ // Page Description $MetaNodes = $Dom->getElementsByTagName('meta'); foreach($MetaNodes as $MetaNode) { if (strtolower($MetaNode->getAttribute('name')) == 'description') $PageInfo['Description'] = $MetaNode->getAttribute('content'); } // Keep looking for page description? if ($PageInfo['Description'] == '') { $PNodes = $Dom->getElementsByTagName('p'); foreach($PNodes as $PNode) { $PVal = $PNode->nodeValue; if (strlen($PVal) > 90) { $PageInfo['Description'] = $PVal; break; } } } if (strlen($PageInfo['Description']) > 400) $PageInfo['Description'] = SliceString($PageInfo['Description'], 400); // Page Images (retrieve first 3 if bigger than 100w x 300h) $Images = array(); $ImageNodes = $Dom->getElementsByTagName('img'); $i = 0; foreach ($ImageNodes as $ImageNode) { $Images[] = AbsoluteSource($ImageNode->getAttribute('src'), $Url); } // Sort by size, biggest one first $ImageSort = array(); // Only look at first 10 images (speed!) $i = 0; foreach ($Images as $Image) { $i++; if ($i > 10) break; list($Width, $Height, $Type, $Attributes) = getimagesize($Image); $Diag = (int)floor(sqrt(($Width*$Width) + ($Height*$Height))); if (!array_key_exists($Diag, $ImageSort)) $ImageSort[$Diag] = $Image; } krsort($ImageSort); $PageInfo['Images'] = array_values($ImageSort); } catch (Exception $ex) { $PageInfo['Exception'] = $ex; } return $PageInfo; } } ?>Do string manipulation as appropriate.
grep is your friend.