PHP regular expression to match English/Latin characters only

Today at work I came across a task which turned out to be much easier and simpler than I originally thought it would. We have have a site with some user registration forms. The site is translated into a number of languages, but due to the regulatory procedures, we have to force users to input their registration details in English only. Using Latin characters, numbers, and punctuation.

I’ve refreshed my knowledge of Unicode and PCRE. And then I came up with the following method which seems to do the job just fine.

/**
 * Check that given string only uses Latin characters, digits, and punctuation
 *
 * @param string $string String to validate
 * @return boolean True if Latin only, false otherwise
 */
public function validateLatin($string) {
    $result = false;

    if (preg_match("/^[\w\d\s.,-]*$/", $string)) {
        $result = true;
    }

    return $result;
}

In other words, just a standard regular expression with no Unicode trickery. The ‘/u’ modifier would cause this to totally malfunction and match everything. Good to know.

15 thoughts on “PHP regular expression to match English/Latin characters only”

Shein Alexey says:

August 19, 2011 at 7:27 am

Why not just return preg_match(“/^[\w\d\s.,-]*$/”, $string); ?

Reply
1. Leonid Mamchenkov says:
  
  August 19, 2011 at 9:15 am
  
  One could do that too of course. Nothing wrong with that. It’s only a matter of the coding style. I personally prefer to have an if block. In case that I need to debug or extend it, I can just add extra statements inside the block.
  
  Reply
drwitt says:

August 20, 2011 at 9:20 am

Nice one!
– Do not dots and hyphens/’minus signs’ have to be escaped inside these square brackets?

Reply
1. Leonid Mamchenkov says:
  
  August 20, 2011 at 1:10 pm
  
  Nope, they don’t. :)
  
  Reply
  1. drwitt says:
    
    August 20, 2011 at 1:20 pm
    
    ok, …always learning… Thx :-)
    
    Reply
Arne Tarara says:

August 23, 2011 at 5:45 am

I do not get why the \u modifier should break the whole thing.

When I encode my php-file correctly in utf8 these two function calls work as expected

single-quoted: validateLatin(‘as\xc3\xb6’); => false

double-quoted: validateLatin(“as\xc3\xb6”);

Which basically means, that if the form, which sends the login credentials correctly submits utf8 you can also use german umlauts etc.

If it does not, you will end up with a string that has backslashes in it, so you should be fine with the \u modifier.

Did I miss something?

Reply
1. Leonid Mamchenkov says:
  
  August 23, 2011 at 8:32 am
  
  I had a unit test which was not encoded properly, I guess. The /u modifier tells preg_match to treat the string as Unicode. \w doesn’t match a Unicode character. That’s why it’s failing for me.
  
  Reply
Rafael Duarte says:

December 16, 2012 at 4:42 pm

Love you man….

Reply
Pragam says:

May 21, 2013 at 10:58 am

Very good, nice one………..

Reply
ganaysa says:

August 10, 2014 at 3:36 pm

what about this:
preg_match(“/[^\x00-\x7F]/”,$name)

Reply
1. Leonid Mamchenkov says:
  
  August 11, 2014 at 12:09 am
  
  This will allow for too much. You could minimize it to the range of space (\x20) to tilda (\x7e), but you’ll still get a whole bunch of brackets and slashes into there. YMMV I guess.
  
  Reply
nyancode says:

December 1, 2014 at 9:09 am

It’s good to filter Chinese. I create a copy at liveregex tester. You can take a look at here:

https://www.liveregex.com/A8Zbg

Reply
Gabriel Reguly says:

December 11, 2014 at 7:56 pm

Thanks for sharing this, very useful.

Reply
Pingback: preg_match-Snippet für Latin Characters - entwickler.de
lee edwards says:

October 27, 2016 at 9:09 pm

I am a noob to regex. I have been trying to make your func work for multiple paragraphs. Do you have any suggestion to expand this regex to allow for more than just 1 string ? thank you.

Reply

PHP regular expression to match English/Latin characters only

Related

15 thoughts on “PHP regular expression to match English/Latin characters only”

Leave a Reply Cancel reply

Share:

Related

15 thoughts on “PHP regular expression to match English/Latin characters only”

Leave a Reply Cancel reply