language

Why the World Only Has Two Words For Tea

Slashdot has an interesting story of why there are only two variations of the word tea in the majority of languages:

With a few minor exceptions, there are really only two ways to say “tea” in the world. One is like the English term — te in Spanish and tee in Afrikaans are two examples. The other is some variation of cha, like chay in Hindi. Both versions come from China. How they spread around the world offers a clear picture of how globalization worked before “globalization” was a term anybody used. The words that sound like “cha” spread across land, along the Silk Road. The “tea”-like phrasings spread over water, by Dutch traders bringing the novel leaves back to Europe.

The term cha is “Sinitic,” meaning it is common to many varieties of Chinese. It began in China and made its way through central Asia, eventually becoming “chay” in Persian. That is no doubt due to the trade routes of the Silk Road, along which, according to a recent discovery, tea was traded over 2,000 years ago. This form spread beyond Persia, becoming chay in Urdu, shay in Arabic, and chay in Russian, among others. It even it made its way to sub-Saharan Africa, where it became chai in Swahili. The Japanese and Korean terms for tea are also based on the Chinese cha, though those languages likely adopted the word even before its westward spread into Persian. But that doesn’t account for “tea.” The te form used in coastal-Chinese languages spread to Europe via the Dutch, who became the primary traders of tea between Europe and Asia in the 17th century, as explained in the World Atlas of Language Structures. The main Dutch ports in east Asia were in Fujian and Taiwan, both places where people used the te pronunciation. The Dutch East India Company’s expansive tea importation into Europe gave us the French the, the German Tee, and the English tea.

This reminds me of this old post about how most languages, apart from English, use “ananas” as a word for pineapple.

Japanese vs. English : Sentence Structure

I am not learning Japanese (just yet), but I still find the diagram above aesthetically pleasing. It’s from this article, which discusses the structure of the Japanese sentences versus the English ones.

Language Detection Library for PHP

patrickschur/language-detection – is a language detection library for PHP, which detects the language from a given text string. Now, a bit more detailed:

This library can detect the language of a given text string. It can parse given training text in many different idioms into a sequence of N-grams and builds a database file in JSON format to be used in the detection phase. Then it can take a given text and detect its language using the database previously generated in the training phase. The library comes with text samples used for training and detecting text in 106 languages.

I tried it briefly with a few languages that I can master a phrase or two in, and it works better with some than with others. Greek was good, Russian not so much.

Hopefully, the sample data used for training will improve over time, but it’s definitely a good start.

Via this blog post.

Morphos – morphological solution in PHP for English and Russian

If you ever had to deal with morphology in English, you probably found one or two libraries to help you out. But if you had to do that for Russian, than I’m sure you are missing a few hairs, and the ones that you still have are grayer than they used to be. I’ve got some good news for you though, now there is Morphos (GitHub repository).

Morphos is a morphological solution written completely in the PHP language. Supports Russian and English. Provides classes to decline First/Middle/Last names/nouns and generate cardinal numerals.

Just look at this beauty!

var_dump($dec->getForms($user_name, $dec->detectGender($user_name)));
/* Will produce something like
  array(6) {
    ["nominativus"]=>
    string(8) "Иван"
    ["genetivus"]=>
    string(10) "Ивана"
    ["dativus"]=>
    string(10) "Ивану"
    ["accusative"]=>
    string(10) "Ивана"
    ["ablativus"]=>
    string(12) "Иваном"
    ["praepositionalis"]=>
    string(15) "об Иване"
  }
*/

Just this alone can make user interfaces and emails so much better. But there is more to it than that.

The History of English in 10 Minutes

https://www.youtube.com/watch?v=njJBw2KlIEo