This commit only affects Chinese and Japanese where the search terms are processed by cppjieba prior to searching.
白名单 term becomes
名单 白名单 after it is processed by
白名单 is not tokenized as such by cppjieba when it
appears in a string of text. The workaround we took here is to match on
白名单 when terms are processed by cppjieba.
The change here will result in partial matches of terms making search
slightly less accurate. For example,
when processed by cppjieba and the change here means that we will match
指南 instead of matching on both terms. This is a
concious trade-off which we’re making where we think having a poor
result is better than having no result for the Chinese and Japanese
language. To properly support search for Chinese and Japanese languages,
we may look into integrating the PGroonga extension into Discourse in