PHP regular expression to match English/Latin characters only

Today at work I came across a task which turned out to be much easier and simpler than I originally thought it would.  We have have a site with some user registration forms.  The site is translated into a number of languages, but due to the regulatory procedures, we have to force users to input their registration details in English only.  Using Latin characters, numbers, and punctuation.

I’ve refreshed my knowledge of Unicode and PCRE.  And then I came up with the following method which seems to do the job just fine.

 * Check that given string only uses Latin characters, digits, and punctuation
 * @param string $string String to validate
 * @return boolean True if Latin only, false otherwise
public function validateLatin($string) {
    $result = false;

    if (preg_match("/^[\w\d\s.,-]*$/", $string)) {
        $result = true;

    return $result;

In other words, just a standard regular expression with no Unicode trickery.  The ‘/u’ modifier would cause this to totally malfunction and match everything.  Good to know.

Read 13 comments

  1. I do not get why the \u modifier should break the whole thing.

    When I encode my php-file correctly in utf8 these two function calls work as expected

    single-quoted: validateLatin(‘as\xc3\xb6′); => false

    double-quoted: validateLatin(“as\xc3\xb6″);

    Which basically means, that if the form, which sends the login credentials correctly submits utf8 you can also use german umlauts etc.

    If it does not, you will end up with a string that has backslashes in it, so you should be fine with the \u modifier.

    Did I miss something?

Leave a Reply