HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.
Searching
stevevisokio
New
Vanilla 2 search uses MySql full text searching which doesn't behave intuitively. In fact, our users find the our forums' search completely unusable. The information about how to correct this as best possible is somewhat buried, as are the customisation options for searching.
Using Google Custom Search is a poor solution, because you have to wait for the next page crawl.
A mainstream search function should try to mirror the same query syntax used for Google, since that is what most users will expect. You don't need the full Google syntax. You just need the following rules:
The problems with the existing behaviour are:
Using Google Custom Search is a poor solution, because you have to wait for the next page crawl.
A mainstream search function should try to mirror the same query syntax used for Google, since that is what most users will expect. You don't need the full Google syntax. You just need the following rules:
- term => all discussions with "term" anywhere (discussion title, discussion body, comment body)
- term1 term2 => all discussions with "term1" AND "term2" anywhere
- "some phrase" => all discussions with "some phrase" anywhere
- "some phrase" term2 => all discussions with "some phrase" AND "term2" anywhere.
The problems with the existing behaviour are:
- You enter 2 words, and get something along the lines of word1 OR word2, completely counter what you expect
- Words of 3 characters or less result in no hits at all
- There's no explanation of existing behaviour
- You often get back either nothing, or completely irrelevant results. Often you *know* a page exists but just can't find it.
- The SQL query appears to be wrongly structured; with boolean searching, and the query "+term1 +term2", you only get results where *just* the discussion, or *just* the a single comment, contains both terms, but what I'm after is where the [discussion and all comments] contains both terms anywhere - for example, the discussion contains term1, and the 3rd comment contains term2.
- Provide a list of the configuration options with explanation. I've discovered this option:
$Configuration['Garden']['Search']['Mode'] = 'matchboolean'; // matchboolean, match, boolean, like
and worked out that:- match = default behaviour; natural language searching which attempts to sort by relevance and largely fails, in part due to the min-4-character word indexing, and OR operator
- boolean = use mysql boolean searching, where you enter "+term1 +term2 -term3" to mean "term1 AND term2 AND NOT term 3"
- matchboolean = hybrid; if there are any pluses or minuses, it uses boolean searching - this is probably the best default option (in the absence of below improvements to vanilla search), but you'll need to explain to users that the Google search
eggs sausages "scrambled eggs"
needs to be written+eggs +sausages +"scrambled eggs"
- like = use SQL LIKE, low performance but OK for small forums
- Change the SQL query so it is effectively searching the page you see when you view the discussion - i.e. the discussion title, body, and all comment bodies. Allow each term to "hit" anywhere in the discussion and comments.
- Provide documentation explaining how to remove the mysql 4 character min word length and stop words. In short:
Edit /etc/mysql/my.conf
Under [mysqld], add something like this:ft_min_word_len = 1 ft_max_word_len = 30 ft_stopword_file = ''
You might need to restart (sudo /etc/init.d/mysql restart)
Log into mysql (mysql -u username -p databasename) and execute:
REPAIR TABLE GDN_Discussion QUICK;
REPAIR TABLE GDN_Comment QUICK; - Lastly have a preprocessor in the PHP which executes the SQL. Before constructing the full text search query (using boolean mode), it should parse the search query using Google syntax and convert into Mysql boolean syntax.
So, when the user enterseggs sausages "scrambled eggs" -mushrooms
vanilla supplies+eggs +sausages +"scrambled eggs" -mushrooms
to mysql "in boolean mode".
3
Comments
Iterate through all characters in the search query.
Track whether we are "in quotes" or not.
When not "in quotes", and a word start is encountered (the first non-space character, or any non-space character following a space):
If it is a plus, keep
If it is a minus, keep
Otherwise, insert a plus.
I think that's right. I'm going to try hacking this up on our site's PHP.
There was an error rendering this rich post.
Searching produces an overhead that needs to be factored. It is also a good idea to promote tagging, you can enhance the tag search with multiple tags.
It is best using a dedicated search engine like Lucene or Sphinx if you are really serious about having search functionality.
grep is your friend.
There was an error rendering this rich post.
Vanilla Forums COO [GitHub, Twitter, About.me]
The fact of the matter is MySQL is not built for search. The other fact is that no matter how good of a search engine we use for a site, nothing will beat Google. And real-time searching is a a goal that is really not achievable for full-text search. In a way, putting an internal search on a forum is just for the graphic design.
Our default search sucks because MySQL full-text search sucks and that's what we have to support. However, search is not our goal with Vanilla. Building a great forum is.
We may change the default search, but if we do the following will be used.
1. A simple like search on search terms.
2. More search options like author and date and tags.
Another question: in other forums software I've found a "restore searching database" (or similar) function to improving searching results. In Vanilla there's something similar? I've tried to searching something (in a 250Mb database) but the results are max on 2 pages. I expected more.
p.s.: more I use Vanilla (just installed with some problems of collation) more I like it...
I'm not trying to completely solve the problem of search and come up with a perfect solution. I'm only trying to do an 80:20 - to make a few simple tweaks that measurably improve the user experience for the most common search requirements, while keeping the basic paradigm (MySQL full text searching).
(Unfortunately, we can't rely on Google CSE, since half of our site is restricted-access, only available to logged-in users in permissioned groups. For public sites, a "fig-leaf" search isn't an issue if you can swap it for CSE, but for ours, it's a really big deal, since the entire point of our forums is to be a searchable knowledge repository for users of our product).
To sum up, I'm trying to allow users to search, for example, for
eggs bacon -"scrambled eggs"
as they would in Google and have that be interpreted as "Any discussion or comment containing"eggs" AND "bacon" AND NOT EXACT PHRASE "scrambled eggs"
, rather than the default behaviour of natural language searching which returns results that don't containing all required phrases and misses out words 3 letters or shorter.The limitations of the changes I'm outlining are:
- Unlike Google, search terms must be exact. You can't search for "colour" and get hits for "color" or "colouring"
- I don't know how this will work for unicode. It certainly won't work for multi-byte, but there are a great number of non-multibyte locales using Vanilla, I should imagine. I would expect unicode to work OK; all I'm doing is inserting a + character before every search term.
- They don't address the issue where all your search terms must match against a single discussion or comment. So, if you are searching for [term1 term2], and your discussion contains "term1" in one comment and "term2" in another comment, you still won't get any results.
- Performance - I don't know how the full-text indexing tweaks will perform on very large sites; on our modest 1000+ discussion site, they have no noticeable impact.
For those who wish to try this out, here are the instructions:conf/config.php
, add the following to the end of the 'Garden' section:$Configuration['Garden']['Search']['Mode'] = 'google';
applications/dashboard/models/class.searchmodel.php
Line numbers relate to Vanilla 2.0.17.3.
After "Reset" function definition, around line 71: In "Search" function definition, around line 82, between the "ForceSearchMode" and "matchboolean" 'if' blocks:
1. Create a plugin.
2. Copy your version of class.searchmodel.php into /plugins/yourplugin/class.searchmodel.php.
The framework will use your search model instead of the core when your plugin is enabled.
Yes, please share @stevevisokio
Is this true? I have been trying this with no luck....@todd
This is true. Make sure you clear your /cache/*.ini if you are creating files in development.
Nice call. It is working as intended. Thanks Todd.