Sumerian Text
LinkedIn recently announced Arabic support. Rendering Arabic (or more precisely, mixed Arabic and Latin/Greek text) is a fascinating technical challenge and I wish I was on the team that got to have the fun of doing that.
Anyway, just for fun I thought I would post a UTF-8 torture test that I created some years ago when writing the search engine parsing routines for Cyrus IMAP. It's in Sumerian and it's a proverb that was originally written down in Cuneiform over four thousand years ago in the land that is now southern Iraq. Cuneiform glyphs are defined in the latest few Unicode standards, but their code points are quite large and a lot of UTF-8 implementations have trouble parsing them.
It's unlikely you'll have a font to render this text. I use the Akkadian font from the Ubuntu ttf-ancient-fonts package. It works fine on a Mac too.
Here's roughly what it should look like.
And here's what happens when LinkedIn renders it.
��������
dumu si nu-sa2
����������������
ama-a-ni na-an-u3-(dib?)-tud
��������������
diĝir-ra-ni na-an-dim2-dim2-e
A disorderly son – his mother should not have given birth to him, His god should not have created him.
Spend the money to get Sumerian on your iPhone. That way, next time you're driving around in the Iraqi desert, you can read the road signs.
Interesting...the iOS app gets this right. Or at least it would if I could figure out a $0 way to install a font on the device.
It seems like our code emits HTML entities corresponding to the UTF-16 surrogate pairs, so �� instead of the correct 𒌉. Those are not valid character entities in HTML and no browser should be rendering them. I guess we're using an UTF-16 internal representation, because Java.
if it serves 99% of use case it is fine. This is a valuable lesson I learnt when I moved to services. Fixing use cases that no one cares or only very few cares is not worth your time.
Yeah, you suck big time! Recent example - wanted to spam my friend with a link in email from LinkedIn: the link doesn't work.