Sumerian Text

LinkedIn recently announced Arabic support. Rendering Arabic (or more precisely, mixed Arabic and Latin/Greek text) is a fascinating technical challenge and I wish I was on the team that got to have the fun of doing that.

Anyway, just for fun I thought I would post a UTF-8 torture test that I created some years ago when writing the search engine parsing routines for Cyrus IMAP. It's in Sumerian and it's a proverb that was originally written down in Cuneiform over four thousand years ago in the land that is now southern Iraq. Cuneiform glyphs are defined in the latest few Unicode standards, but their code points are quite large and a lot of UTF-8 implementations have trouble parsing them.

It's unlikely you'll have a font to render this text. I use the Akkadian font from the Ubuntu ttf-ancient-fonts package. It works fine on a Mac too.

Here's roughly what it should look like.

And here's what happens when LinkedIn renders it.

��������
dumu si nu-sa2
����������������
ama-a-ni na-an-u3-(dib?)-tud
��������������
diĝir-ra-ni na-an-dim2-dim2-e

A disorderly son – his mother should not have given birth to him, His god should not have created him.

Spend the money to get Sumerian on your iPhone. That way, next time you're driving around in the Iraqi desert, you can read the road signs.

Interesting...the iOS app gets this right. Or at least it would if I could figure out a $0 way to install a font on the device.

Like
Reply

It seems like our code emits HTML entities corresponding to the UTF-16 surrogate pairs, so �� instead of the correct 𒌉. Those are not valid character entities in HTML and no browser should be rendering them. I guess we're using an UTF-16 internal representation, because Java.

Like
Reply

if it serves 99% of use case it is fine. This is a valuable lesson I learnt when I moved to services. Fixing use cases that no one cares or only very few cares is not worth your time.

Yeah, you suck big time! Recent example - wanted to spam my friend with a link in email from LinkedIn: the link doesn't work.

Like
Reply

To view or add a comment, sign in

More articles by Greg Banks

  • LinkedIn's Trust Advantage

    This report from Business Insider was mentioned recently in internal conversations. The article skims results from a…

    2 Comments
  • Getting a Unix File Descriptor in Java

    [Edited to use images for code instead of block quotes] I’m an old Unix hand; I’ve been knocking around writing Unix…

    6 Comments

Explore content categories