Can’t seem to remove the formatting from a string of text?

Chris Coyier Chris Coyier on

I had a fella email me a line of text almost just like this:

𝐂𝐚𝐥𝐥𝐞 𝐁𝐥𝐚𝐧𝐜𝐨𝐬, 𝐂𝐨𝐬𝐭𝐚 𝐑𝐢𝐜𝐚

He said he could not remove that formatting no matter what he did. It looks kinda bold, doesn’t it? And set into a serif font. You’d think you could select it in the text editor you’re in and remove that formatting. He said he tried copy/pasting it into places where no text formatting is even allowed, like in VS Code or the URL bar of a browser. Voodoo, he said.

Here’s the thing: that text isn’t formatted.

That first “C” you see above isn’t a regular uppercase character C, our typical friend U+0043 : LATIN CAPITAL LETTER C, it’s “𝐂”, that is, U+1D402 : MATHEMATICAL BOLD CAPITAL C. It’s literally a different character in Unicode. There are… a lot of Unicode characters:

As of Unicode version 16.0, there are 155,063 characters with code points, covering 168 modern and historical scripts, as well as multiple symbol sets.

List of Unicode characters — Wikipedia

It could be written like 𝕮𝖆𝖑𝖑𝖊 𝕭𝖑𝖆𝖓𝖈𝖔𝖘, 𝕮𝖔𝖘𝖙𝖆 𝕽𝖎𝖈𝖆 instead, or 𝗖𝗮𝗹𝗹𝗲 𝗕𝗹𝗮𝗻𝗰𝗼𝘀, 𝗖𝗼𝘀𝘁𝗮 𝗥𝗶𝗰𝗮.

Should you do this to get super sweet effects in places you otherwise couldn’t? Probably not. The accessibility is rough. Listen to the audio output in this blog post. If you’re going to do it on the web where you have HTML control, do something like:

<!-- Don't do this! Leaving for posterity. -->
<span aria-label="Calle Blancos, Costa Rica">
  <span aria-hidden="true">𝕮𝖆𝖑𝖑𝖊 𝕭𝖑𝖆𝖓𝖈𝖔𝖘, 𝕮𝖔𝖘𝖙𝖆 𝕽𝖎𝖈𝖆</span>
</span>Code language: HTML, XML (xml)

UPDATE: See Ben’s comment on why not to do the above. Instead, make a visually hidden version that a screen reader would still see, and an ARIA hidden one that will be seen visually. (Noting potential concerns about copy/paste that started this whole article.)

<span class="visually-hidden">Calle Blancos, Costa Rica</span>
<span aria-hidden="true">𝕮𝖆𝖑𝖑𝖊 𝕭𝖑𝖆𝖓𝖈𝖔𝖘, 𝕮𝖔𝖘𝖙𝖆 𝕽𝖎𝖈𝖆</span>Code language: HTML, XML (xml)

Wanna be a better web typographer?

Frontend Masters logo

It was once famously said that the web is 95% typography. You can't be a web designer and ignore type! We have an in-depth course on web typography from Jason Pamental getting into things like responsive styles, variable fonts, font loading, and more.

7-Day Free Trial

3 responses to “Can’t seem to remove the formatting from a string of text?”

  1. Ben Myers says:

    Howdy! Great callout on not using alternate Unicode characters in place of the true characters for these letters. Unfortunately, placing an aria-label on a roleless span (or any generic element) is not a valid use of aria-label, and so you won’t get the results this article would expect in most screenreader/browser combinations. VoiceOver for macOS will do this substitution, which is what leads to developers’ expectations in this case, but this is nonstandard behavior that shouldn’t be relied upon.

    In this case, the safest thing to do would probably be to combine a .visually-hidden/.sr-only span with the safe characters, with an aria-hidden span of the alternate Unicode characters.

  2. James Moberg says:

    I focus more on backend (versus frontend) using ColdFusion. CF runs on top of Java and I use the java.text.Normalizer class and JUnidecode library to normalize Unicode strings and reduce them to ASCII 7. (I started doing this because comment form spammers started using Unicode to bypass spam filters.)
    https://github.com/gcardone/junidecode

    Related to this, I added this function to a REST API and wrote a Windows AutoHotKey shortcut to take my clipboard, pass the contents to the API and return ASCII7 content free of any Unicode formatting.

Leave a Reply

Your email address will not be published. Required fields are marked *

Did you know?

Frontend Masters Donates to open source projects. $363,806 contributed to date.