PHP regular expression to match English/Latin characters only

Today at work I came across a task which turned out to be much easier and simpler than I originally thought it would.  We have have a site with some user registration forms.  The site is translated into a number of languages, but due to the regulatory procedures, we have to force users to input their registration details in English only.  Using Latin characters, numbers, and punctuation.

I’ve refreshed my knowledge of Unicode and PCRE.  And then I came up with the following method which seems to do the job just fine.

/**
 * Check that given string only uses Latin characters, digits, and punctuation
 *
 * @param string $string String to validate
 * @return boolean True if Latin only, false otherwise
 */
public function validateLatin($string) {
    $result = false;

    if (preg_match("/^[\w\d\s.,-]*$/", $string)) {
        $result = true;
    }

    return $result;
}

In other words, just a standard regular expression with no Unicode trickery.  The ‘/u’ modifier would cause this to totally malfunction and match everything.  Good to know.

15 thoughts on “PHP regular expression to match English/Latin characters only”


    1. One could do that too of course. Nothing wrong with that. It’s only a matter of the coding style. I personally prefer to have an if block. In case that I need to debug or extend it, I can just add extra statements inside the block.


  1. I do not get why the \u modifier should break the whole thing.

    When I encode my php-file correctly in utf8 these two function calls work as expected

    single-quoted: validateLatin(‘as\xc3\xb6’); => false

    double-quoted: validateLatin(“as\xc3\xb6”);

    Which basically means, that if the form, which sends the login credentials correctly submits utf8 you can also use german umlauts etc.

    If it does not, you will end up with a string that has backslashes in it, so you should be fine with the \u modifier.

    Did I miss something?


    1. I had a unit test which was not encoded properly, I guess. The /u modifier tells preg_match to treat the string as Unicode. \w doesn’t match a Unicode character. That’s why it’s failing for me.


  2. I am a noob to regex. I have been trying to make your func work for multiple paragraphs. Do you have any suggestion to expand this regex to allow for more than just 1 string ? thank you.

Leave a Reply to Arne TararaCancel reply