I had a fella email me a line of text almost just like this:
ššš„š„š šš„šš§ššØš¬, ššØš¬šš šš¢šš
He said he could not remove that formatting no matter what he did. It looks kinda bold, doesn’t it? And set into a serif font. You’d think you could select it in the text editor you’re in and remove that formatting. He said he tried copy/pasting it into places where no text formatting is even allowed, like in VS Code or the URL bar of a browser. Voodoo, he said.
Here’s the thing: that text isn’t formatted.
That first “C” you see above isn’t a regular uppercase character C, our typical friend U+0043 : LATIN CAPITAL LETTER C
, it’s “š”, that is, U+1D402 : MATHEMATICAL BOLD CAPITAL C
. It’s literally a different character in Unicode. There are… a lot of Unicode characters:
As of Unicode version 16.0, there are 155,063 characters with code points, covering 168 modern and historical scripts, as well as multiple symbol sets.
List of Unicode characters ā Wikipedia
It could be written like š®šššš ššššššš, š®šššš š½ššš instead, or šš®š¹š¹š² šš¹š®š»š°š¼š, šš¼ššš® š„š¶š°š®.
Should you do this to get super sweet effects in places you otherwise couldn’t? Probably not. The accessibility is rough. Listen to the audio output in this blog post. If you’re going to do it on the web where you have HTML control, do something like:
<!-- Don't do this! Leaving for posterity. -->
<span aria-label="Calle Blancos, Costa Rica">
<span aria-hidden="true">š®šššš ššššššš, š®šššš š½ššš</span>
</span>
Code language: HTML, XML (xml)
UPDATE: See Ben’s comment on why not to do the above. Instead, make a visually hidden version that a screen reader would still see, and an ARIA hidden one that will be seen visually. (Noting potential concerns about copy/paste that started this whole article.)
<span class="visually-hidden">Calle Blancos, Costa Rica</span>
<span aria-hidden="true">š®šššš ššššššš, š®šššš š½ššš</span>
Code language: HTML, XML (xml)
Howdy! Great callout on not using alternate Unicode characters in place of the true characters for these letters. Unfortunately, placing an
aria-label
on a rolelessspan
(or any generic element) is not a valid use ofaria-label
, and so you won’t get the results this article would expect in most screenreader/browser combinations. VoiceOver for macOS will do this substitution, which is what leads to developers’ expectations in this case, but this is nonstandard behavior that shouldn’t be relied upon.In this case, the safest thing to do would probably be to combine a .visually-hidden/.sr-only span with the safe characters, with an aria-hidden span of the alternate Unicode characters.
Oh shucks, all I looked at was VoiceOver which did indeed to the “right” thing”
https://share.cleanshot.com/WdZlpRng
I’ll update the post with the recommended technique.
I focus more on backend (versus frontend) using ColdFusion. CF runs on top of Java and I use the java.text.Normalizer class and JUnidecode library to normalize Unicode strings and reduce them to ASCII 7. (I started doing this because comment form spammers started using Unicode to bypass spam filters.)
https://github.com/gcardone/junidecode
Related to this, I added this function to a REST API and wrote a Windows AutoHotKey shortcut to take my clipboard, pass the contents to the API and return ASCII7 content free of any Unicode formatting.