Please upgrade here. These earlier versions are no longer being updated and have security issues.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

non-ascii symbols cause errors

lukoielukoie New
edited June 2010 in Vanilla 2.0 - 2.8
If my users wants to change their name to something non-ascii, there's error message, saying "username can only contain letters, numbers, underscores and must be between 3 and 20 characters long"
so they cant use cyrillic for their name

but when you create user, it can be created with cyrillic name, so basically if you want to change your signature, you must edit your profile, which cannot be done, because vanilla says that user name is invalid!
also, if you create user with cyrillic name, you cant see his activity string, because instead of opening url http://sitename.com/profile/activity/2/nick it tries to open http://sitename.com/profile/activity/2
so, having cyrillic name it fails open activity page in the user profile, but "discussions" page in the user profile is ok

Comments

  • SS ✭✭
    Redefine ValidateUsername() function
  • huh?
  • edited March 2011
    Hi there,, i came to your discussion when i was looking for a solution to that language issue and after alot of research i came here to post the codes that worked for me,, just in case you (or any another dude :P) is still looking for it..

    1) go ahead and edit this file: {Your Vanilla Folder}/library/core/functions.validation.php
    2) look for this line:
    '/^([\d\w_]{3,20})?$/si'
    and replace it with this:
    '/^([\p{L}\p{N}\p{Pd}\p{Pc}\p{Lm}\p{M}]{3,20}+)?$/'
    3) save the file & your done!

    What we have just done was changing the Regex filter from ascii-only to Unicode,
    which will allow you to use any kind of letters from any language..

    you can also customize your filter to allow or disallow specific characters by adding or removing Unicode components (the "p{x}" components within the Regex filter). Here is a small list of what each of those components mean:

    \p{L}: any kind of letter from any language.
    \p{Lm}: a special character that is used like a letter.
    \p{M}: a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.).
    \p{N}: any kind of numeric character in any script.
    \p{Z}: any kind of whitespace or invisible separator.
    \p{Pd} or \p{Dash_Punctuation}: any kind of hyphen or dash.
    \p{Pc}: a punctuation character such as an underscore that connects words.

    for a complete list of Unicode components, refer to this url:
    http://www.regular-expressions.info/unicode.html

    hope you find it easy to understand :)
    good luck!!

  • judgejjudgej
    edited March 2011
    How long have these regex matching patterns been around? They look pretty good, but are new to me. I'm just wondering if there is a minimum PHP version required to guarantee support for this? And if not, what regular expression library versions are needed.

    Also, I assume this will only work on specific encodings, i.e. UTF-8 in this case, and not any other unicode encodings. I am wondering whether these preg rules need to be handled in some central place in Vanilla, so that different site-wide encodings can automatically be adjusted for.
  • @judgej For the Unicode PCRE you need a particular version of that library which not everyone has. We've started testing for Unicode support and falling back to "good enough" versions when it's not available. See Gdn_Format::Url() for an example.
Sign In or Register to comment.