PHP regular expression to match English/Latin characters only

Today at work I came across a task which turned out to be much easier and simpler than I originally thought it would.  We have have a site with some user registration forms.  The site is translated into a number of languages, but due to the regulatory procedures, we have to force users to input their registration details in English only.  Using Latin characters, numbers, and punctuation.

I’ve refreshed my knowledge of Unicode and PCRE.  And then I came up with the following method which seems to do the job just fine.

/**
 * Check that given string only uses Latin characters, digits, and punctuation
 *
 * @param string $string String to validate
 * @return boolean True if Latin only, false otherwise
 */
public function validateLatin($string) {
    $result = false;

    if (preg_match("/^[\w\d\s.,-]*$/", $string)) {
        $result = true;
    }

    return $result;
}

In other words, just a standard regular expression with no Unicode trickery.  The ‘/u’ modifier would cause this to totally malfunction and match everything.  Good to know.

Native Client built into Google Chrome 14

TechChrunch points out that Native Client is coming built into Google Chrome 14:

As Google notes on their Chrome Blog blog today, the latest beta version of Chrome (version 14) has Native Client built-in. Their implementation allows for C and C++ code to be executed inside of the browser while maintaining the security that a web technology like JavaScript offers.

This is a big deal.  This is a bridge between system software and web applications.

On teaching programming languages

Via this tweet I came across this insightful comment over at Slashdot.  Quoting in its entirety:

A bit off topic, but you triggered something I’ve been thinking about for a couple of years. That “spark” is fluency.

I swtiched jobs from being a computer programmer to being an ESL teacher in Japan. Japan is somewhat famous for churning out students who know a lot *about* English, but can’t order a drink at Mac Donald’s. We used to have a name for those kinds of people with regard to programming languages: language laywers. They can answer any question you put to them *about* a programming language, but couldn’t program to save their life. These people often make it past job interviews easily, but then turn out to be huge disappointments when they actually get down to work. I’ve read a lot about this problem, but the more I look at it, the more I realise that these disabled programmers are just like my students. They have a vocabulary of 5000 words, know every grammar rule in the book but just can’t speak.

My current theory is that programming is quite literally writing. The vast majority of programming is not conceptually difficult (contrary to what a lot of people would have you believe). We only make it difficult because we suck at writing. The vast majority of programmers aren’t fluent, and don’t even have a desire to be fluent. They don’t read other people’s code. They don’t recognise or use idioms. They don’t think *in the programming language*. Most code sucks because we have the fluency equivalent of 3 year olds trying to write a novel. And so our programs are needlessly complex.

Those programmers with a “spark” are programmers who have an innate talent for the language. Or they are people who have read and read and read code. Or both. We teach programming wrong. We teach it the way Japanese teachers have been teaching English. We teach about programming and expect that students will spontaneously learn to write from this collection of facts.

In language acquisition there is a hypothesis called the “Input Hypothesis”. It states that *all* language acquisition comes from “comprehensible input”. That is, if you hear or read language that you can understand based on what you already know and from context, you will acquire it. Explanation does not help you acquire language. I believe the same is true of programming. We should be immersing students in good code. We should be burying them in idiom after idiom after idiom, allowing them to acquire the ability to program without explanation.

I’ve been thinking about this for a long time as well.  And I do agree.  I also think that programming is a very practical matter. As the comment says, one could know everything about programming in general and some programming language in particular, and yet be totally useless when it comes to writing code.

I think, when it comes to getting an online IT degree or a degree from a traditional school that most colleges and universities lack on the practical side when teaching programming.  At most I’ve seen done were short group assignments.  I think programming projects should be much larger and longer than that.   I don’t see anything wrong with having a couple of programming assignments spanning a couple of years.  Bachelor degree takes longer than that, and all that time could be used to teach students not only how to write code, but also how maintain it, how to document, how to work in groups, how to use all those tools that programmers in real world are using – IDEs, debuggers, compilers, version control, project build tools, continuous integration systems, and so on and so forth.  All of those won’t do any good (and possibly quite the opposite) on a tiny little short assignment.

Why reporting bugs is so important

Here is a quote from the Google Chrome 12 stable release blog post:

We’d also like to call particular attention to Sergey Glazunov’s $3133.7 reward. Although the linked bug is not of critical severity, it was accompanied by a beautiful chain of lesser severity bugs which demonstrated critical impact.

My focus here is not on the money that Sergey earned with his bug report, even though that is definitely an important and motivating factor.  My focus is on the chain of the events.  While this chain of events happens pretty much every time a bug is fixed, few people know about it.  Maybe nobody, in fact, except for developers themselves.

The thing is that when a bug is discovered and fixed, pretty much every developer searches the code for problems similar to those brought up by the bug report.  Be those issues typing mistakes, documentation inconsistencies, memory leaks, security issues, performance bottlenecks, or anything else – the code will be checked to make sure that the same problem doesn’t come up twice. From this perspective, I think that bug reports are so important not because of the specific bugs that they report, but because of those other bugs which aren’t yet fixed and probably aren’t yet reported.

Conclusion: every time you come across a bug in the application, don’t just work around it – take a few minutes of your time to report the problem properly to the developers.  Chances are, they will fix some problems that you haven’t yet come across, but have pretty good chances to otherwise.