{"id":24732,"date":"2015-09-12T15:23:26","date_gmt":"2015-09-12T13:23:26","guid":{"rendered":"https:\/\/mamchenkov.net\/wordpress\/?p=24732"},"modified":"2015-09-12T15:23:26","modified_gmt":"2015-09-12T13:23:26","slug":"when-monospace-fonts-arent-the-unicode-character-width-nightmare","status":"publish","type":"post","link":"https:\/\/mamchenkov.net\/wordpress\/2015\/09\/12\/when-monospace-fonts-arent-the-unicode-character-width-nightmare\/","title":{"rendered":"When monospace fonts aren&#8217;t: The Unicode character width nightmare"},"content":{"rendered":"<!-- google_ad_section_start -->\n<p>I don&#8217;t deal with Unicode and other character encoding on the daily basis, but when I do, I need every piece of information that has been written on the subject. \u00a0Hence the link to <a href=\"http:\/\/denisbider.blogspot.com.cy\/2015\/09\/when-monospace-fonts-arent-unicode.html\">this interesting issue<\/a>\u00a0:<\/p>\n<blockquote><p>As long as you stick to precomposed Unicode characters, and Western scripts, things are relatively straightforward. Whether it&#8217;s A or \u00c5, S or \u0160 \u2013 so long as there are no combining marks, you can count a single Unicode code point as one character width. So the following works:<\/p>\n<pre>\taeioucsz\r\n\t\u00e1\u00e9\u00ed\u00f3\u00fa\u010d\u0161\u017e\r\n<\/pre>\n<p>Nice and neat, right?<\/p>\n<p>Unfortunately, problems appear with Asian characters. When displayed in monospace, many Asian characters occupy two character widths.<\/p><\/blockquote>\n<!-- google_ad_section_end -->\n","protected":false},"excerpt":{"rendered":"<!-- google_ad_section_start -->\n<p>I don&#8217;t deal with Unicode and other character encoding on the daily basis, but when I do, I need every piece of information that has been written on the subject. \u00a0Hence the link to this interesting issue\u00a0: As long as you stick to precomposed Unicode characters, and Western scripts, things are relatively straightforward. Whether it&#8217;s &hellip; <a href=\"https:\/\/mamchenkov.net\/wordpress\/2015\/09\/12\/when-monospace-fonts-arent-the-unicode-character-width-nightmare\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">When monospace fonts aren&#8217;t: The Unicode character width nightmare<\/span><\/a><\/p>\n<!-- google_ad_section_end -->\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"When monospace fonts aren't: The Unicode character width nightmare #WebDev #Unicode","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_links_to":"","_links_to_target":""},"categories":[1,18,62,1334],"tags":[23,1330],"keyring_services":[],"class_list":["post-24732","post","type-post","status-publish","format-standard","hentry","category-general","category-programming","category-technology","category-web-work","tag-unicode","tag-web-development"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":8475,"url":"https:\/\/mamchenkov.net\/wordpress\/2005\/01\/13\/unicode-saves-the-day-again\/","url_meta":{"origin":24732,"position":0},"title":"Unicode saves the day again","author":"Leonid Mamchenkov","date":"January 13, 2005","format":false,"excerpt":"I have finally fixed a bug with encoding of the blog. The content was always served as UTF-8, but the encoding was set to iso-8859-1. Editing the file nucleus\/language\/english.php helped. Your browsers should not be confused any more and Russian characters should work fine in the BlogRoll. Yabadabadu!","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":28321,"url":"https:\/\/mamchenkov.net\/wordpress\/2018\/01\/09\/zero-width-characters\/","url_meta":{"origin":24732,"position":1},"title":"Zero-Width Characters","author":"Leonid Mamchenkov","date":"January 9, 2018","format":false,"excerpt":"This article shows a couple of interesting zero-width characters techniques for the invisible fingeprinting of text. In early 2016 I realized that it was possible to use zero-width characters, like\u00a0zero-width non-joiner\u00a0or other zero-width characters like the\u00a0zero-width space\u00a0to fingerprint text. Even with just a single type of zero-width character the presence\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":91,"url":"https:\/\/mamchenkov.net\/wordpress\/2002\/11\/11\/unicode-in-linux\/","url_meta":{"origin":24732,"position":2},"title":"Unicode in Linux","author":"Leonid Mamchenkov","date":"November 11, 2002","format":false,"excerpt":"Installed RT bugraq\/ticketing system for our developers. Seems to work as good as it does for our support team. Another step to the victory with Unicode support in both X and console. Today I've managed to fix the pseudographics (think mc) in console. It looks much better now. Even displays\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":22400,"url":"https:\/\/mamchenkov.net\/wordpress\/2014\/08\/19\/ftfy-fixes-text-for-you\/","url_meta":{"origin":24732,"position":3},"title":"ftfy &#8211; fixes text for you","author":"Leonid Mamchenkov","date":"August 19, 2014","format":"link","excerpt":"ftfy - fixes text for you ftfy makes Unicode text less broken and more consistent. It works in Python 2.7, Python 3.2, or later. The most interesting kind of brokenness that this resolves is when someone has encoded Unicode with one standard and decoded it with a different one. This\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":26269,"url":"https:\/\/mamchenkov.net\/wordpress\/2016\/07\/21\/the-regex-that-killed-stackoverflow\/","url_meta":{"origin":24732,"position":4},"title":"The RegEx that killed StackOverflow","author":"Leonid Mamchenkov","date":"July 21, 2016","format":false,"excerpt":"Here's an outage postmortem from the recent StackOverflow downtime. \u00a0It just shows you how easy it is to break things, even they were built by some of the smartest people around. \u00a0Programming is touch and there is no way around it. Technical Details The regular expression was: ^[\\s\\u200c]+|[\\s\\u200c]+$ Which is\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":15407,"url":"https:\/\/mamchenkov.net\/wordpress\/2011\/08\/18\/php-regular-expression-to-match-englishlatin-characters-only\/","url_meta":{"origin":24732,"position":5},"title":"PHP regular expression to match English\/Latin characters only","author":"Leonid Mamchenkov","date":"August 18, 2011","format":false,"excerpt":"Today at work I came across a task which turned out to be much easier and simpler than I originally thought it would. \u00a0We have have a site with some user registration forms. \u00a0The site is translated into a number of languages, but due to the regulatory procedures, we have\u2026","rel":"","context":"In &quot;All&quot;","block_context":{"text":"All","link":"https:\/\/mamchenkov.net\/wordpress\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/posts\/24732","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/comments?post=24732"}],"version-history":[{"count":0,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/posts\/24732\/revisions"}],"wp:attachment":[{"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/media?parent=24732"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/categories?post=24732"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/tags?post=24732"},{"taxonomy":"keyring_services","embeddable":true,"href":"https:\/\/mamchenkov.net\/wordpress\/wp-json\/wp\/v2\/keyring_services?post=24732"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}