Java String Length Method Misconception

Do u know ? String.length() ( length method of string class) is sometimes misleading in Java… you know that? 👀 For example, it counts length as 2 for a single emoji. Try this: Java Copy code String name = "hi😅"; System.out.println(name.length()); It shows 4. At first I was like… how? "h" = 1 "i" = 1 "😅" = 1 So shouldn’t it be 3? Here’s the reason. In Java, String.length() does NOT count characters the way humans see them. It counts UTF-16 code units. Java internally uses UTF-16 encoding. Basic rule: Normal characters (a, b, c, h, i) → 1 code unit Some special characters (like emojis 😅) → 2 code units These are called Supplementary Characters. They are represented using something called a Surrogate Pair. Let’s break "hi😅": h → 1 i → 1 😅 → 2 Total = 4. That’s why length() prints 4. Internally, 😅 has Unicode: U+1F605 UTF-16 can’t store it in one 16-bit value. So it splits into: High surrogate Low surrogate So Java actually sees: hi😅 as h i \uD83D \uDE05 Total = 4. If you want actual character count (human count), use: name.codePointCount(0, name.length()); That returns 3. I recently found this again while revising my basics. I had completely forgotten this detail. Sometimes going back to fundamentals teaches more than learning something new. How many of you knew this? 👀 #java #String #codenodes #coding #developing #android

To view or add a comment, sign in

Explore content categories