Today at work I came across a task which turned out to be much easier and simpler than I originally thought it would. Ā We have have a site with some user registration forms. Ā The site is translated into a number of languages, but due to the regulatory procedures, we have to force users to input their registration details in English only. Ā Using Latin characters, numbers, and punctuation.
I’ve refreshed my knowledge of Unicode and PCRE. Ā And then I came up with the following method which seems to do the job just fine.
/** * Check that given string only uses Latin characters, digits, and punctuation * * @param string $string String to validate * @return boolean True if Latin only, false otherwise */ public function validateLatin($string) { $result = false; if (preg_match("/^[\w\d\s.,-]*$/", $string)) { $result = true; } return $result; }
In other words, just a standard regular expression with no Unicode trickery. Ā The ‘/u’ modifier would cause this to totally malfunction and match everything. Ā Good to know.