Unicode bloopers! Keep Code Off Visible Web Page: HTML Chart

Unicode bloopers! HTML should not be visible on a web page!
By Mike Banks Valentine

Have you noticed how many web sites are showing unicode in the visible page text in place of punctuation marks? I’ve seen it on major newspaper sites and even on monstrous portals like Yahoo. I don’t know whether this comes from ignorance of the site maintainer, bad content management software, cutting and pasting directly from word processing software, or all of the above – but it can be incredibly annoying.

Reading becomes nearly impossible when you find the unicode HTML string for apostrophe ‘ or quote ” marks inserted within, or bracketing a word. If you haven't seen it here’s an “example” for everyone's mutual annoyance!

Some sites are using unicode characters for periods and question marks. While it is possible to use unicode to represent all of the alphabet, numbers, punctuation and most symbols – it just isn’t necessary to do so. Beyond that, those who are still using older browser versions will see that code on the page, instead of the characters intended to display. I’ve posted a complete chart of unicode characters and the HTML equivalent chart below.

If you’d like to know more about Unicode visit the authority site

I would encourage those who maintain their web site or those of clients to reference the proper use of those characters in your code on your web pages. If your web page software is displaying this code visibly, your software is outdated or incompatible with current standards.

Sometimes software is the enemy when updating pages. If you cut and paste text from a Microsoft Word document into a non Microsoft web page editor, it will sometimes use proprietary code from that word processing software attempting to represent it as text or simply paste unreadable gibberish into the document. It’s better to work in the Notepad plain text editor, rather than Word, if you will be pasting that text into a web page in any software other than another Microsoft product.

Microsoft has built all of it’s software to recognize the proprietary code from each of it’s other products. Outlook Express, to Internet Explorer, to Word, to Excel in each of the others within the family of Microsoft products. But as soon as you attempt to cut and paste text from a Microsoft product to a web page or any non Microsoft software, you will suddenly get strange characters appearing in the text of pages that don’t recognize the Microsoft proprietary code.

It’s not just a Microsoft problem, it also occurs when cutting and pasting from other proprietary word processing software product into web pages or email products NOT displayed in the same code base.

Unicode can be useful in web pages when the @ symbol displayed unicode equivalent of @ is posted to a web page to stop the spambot harvesting software from gathering useful email addresses from web pages. This trick is not always effective but does stop some harvesting software from recognizing those unicode symbols as valid email addresses.

Further discussion on using unicode to stop spam harvesting software is here.

I spend quite a bit of time in plain text editors like NotePad writing documents that will be transferred to a web page. The transfer of formatting simply won’t work from Word documents to a web page. Some word processing software offers the command “Save As HTML” but not all work flawlessly. Some products don’t present these web pages properly in all browsers.

Even worse is the amount of code clutter created when a Word document is saved as a web page using the “Save As HTML” command in that software (Word). I won’t burden you with showing that clutter here. Dreamweaver web design software actually has a command labeled “Clean up Word HTML” to rid the code of extraneous and unnecessary clutter created by Word during that “Save as HTML” command.

We’ve made available a very simple Text to HTML Converter for Web Publishing that allows simple conversions of plain text documents to HTML without adding unicode characters to visible text.
This great tool automates these normally tedious tasks for web publishing:

Adds mailto: to email addresses
Inserts paragraph tags
Allows ordered lists or bullets
Adds clickable hyperlinks to URL’s
Sets headings to bold text
Allows links to open in new window
All of this without inserting unicode characters into that text as some content management software does.

Web content creators need to know – and actually write HTML sometimes. I hope the Unicode chart below will help toward that end.

Just be sure that you are inserting that code into the HTML and NOT into the visible page text by becoming familiar with your software.

Character Unicode HTML 4.0 Character Name
  space
! ! exclamation mark
" " quotation mark
# # number sign
$ $ dollar sign
% % percent sign
& & & ampersand
' apostrophe
( ( left parenthesis
) ) right parenthesis
* * asterisk
+ + plus sign
, , comma
- hyphen-minus
. . full stop
/ / solidus
0 0 number zero
1 1 number one
2 2 number two
3 3 number three
4 4 number four
5 5 number five
6 6 number six
7 7 number seven
8 8 number eight
9 9 number nine
: : colon
; &#59; semicolon
< &#60; &lt; less-than sign
= &#61; equals sign
> &#62; &gt; greater-than sign
? &#63; question mark
@ &#64; commercial at
A &#65; Capital letter A
B &#66; Capital letter B
C &#67; Capital letter C
D &#68; Capital letter D
E &#69; Capital letter E
F &#70; Capital letter F
G &#71; Capital letter G
H &#72; Capital letter H
I &#73; Capital letter I
J &#74; Capital letter J
K &#75; Capital letter K
L &#76; Capital letter L
M &#77; Capital letter M
N &#78; Capital letter N
O &#79; Capital letter O
P &#80; Capital letter P
Q &#81; Capital letter Q
R &#82; Capital letter R
S &#83; Capital letter S
T &#84; Capital letter T
U &#85; Capital letter U
V &#86; Capital letter V
W &#87; Capital letter W
X &#88; Capital letter X
Y &#89; Capital letter Y
Z &#90; Capital letter Z
[ &#91; left square bracket
\ &#92; reverse solidus
] &#93; right square bracket
^ &#94; circumflex accent
_ &#95; low line
` &#96; grave accent
a &#97; Lowercase letter a
b &#98; Lowercase letter b
c &#99; Lowercase letter c
d &#100; Lowercase letter d
e &#101; Lowercase letter e
f &#102; Lowercase letter f
g &#103; Lowercase letter g
h &#104; Lowercase letter h
i &#105; Lowercase letter i
j &#106; Lowercase letter j
k &#107; Lowercase letter k
l &#108; Lowercase letter l
m &#109; Lowercase letter m
n &#110; Lowercase letter n
o &#111; Lowercase letter o
p &#112; Lowercase letter p
q &#113; Lowercase letter q
r &#114; Lowercase letter r
s &#115; Lowercase letter s
t &#116; Lowercase letter t
u &#117; Lowercase letter u
v &#118; Lowercase letter v
w &#119; Lowercase letter w
x &#120; Lowercase letter x
y &#121; Lowercase letter y
z &#122; Lowercase letter z
{ &#123; left curly bracket
| &#124; vertical line
} &#125; right curly bracket
~ &#126; tilde
127 (not used)
??? &#128; &euro; euro sign
129 (not used)
??? &#130; &sbquo; single low-9 quotation mark
?? &#131; &fnof; Lowercase letter f with hook
??? &#132; &bdquo; double low-9 quotation mark
??? &#133; &hellip; horizontal ellipsis
??? &#134; &dagger; dagger
??? &#135; &Dagger; double dagger
?? &#136; &circ; modifier letter circumflex accent
??? &#137; &permil; per mille sign
?? &#138; &Scaron; Capital letter S with caron
??? &#139; &lsaquo; single left-pointing angle quotation mark
?? &#140; &OElig; Capital ligature OE
141 (not used)
?? &#142; Capital letter Z with caron
143 (not used)
144 (not used)
??? &#145; &lsquo; left single quotation mark
??? &#146; &rsquo; right single quotation mark
??? &#147; &ldquo; left double quotation mark
??? &#148; &rdquo; right double quotation mark
??? &#149; &bull; bullet
??? &#150; &ndash; en dash
??? &#151; &mdash; em dash
?? &#152; &tilde; small tilde
??? &#153; &trade; trade mark sign
?? &#154; &scaron; Lowercase letter s with caron
??? &#155; &rsaquo; single right-pointing angle quotation mark
?? &#156; &oelig; Lowercase ligature oe
157 (not used)
?? &#158; Lowercase letter z with caron
?? &#159; &Yuml; Capital letter Y with diaeresis
&#160; &nbsp; no-break space
?? &#161; &iexcl; inverted exclamation mark
?? &#162; &cent; cent sign
?? &#163; &pound; pound sign
?? &#164; &curren; currency sign
?? &#165; &yen; yen sign
?? &#166; &brvbar; broken bar
?? &#167; &sect; section sign
?? &#168; &uml; diaeresis
?? &#169; &copy; copyright sign
?? &#170; &ordf; feminine ordinal indicator
?? &#171; &laquo; left-pointing double angle quotation mark
?? &#172; &not; not sign
?? &#173; &shy; soft hyphen
?? &#174; &reg; registered sign
?? &#175; &macr; macron
?? &#176; &deg; degree sign
?? &#177; &plusmn; plus-minus sign
?? &#178; &sup2; superscript two
?? &#179; &sup3; superscript three
?? &#180; &acute; acute accent
?? &#181; &micro; micro sign
?? &#182; &para; pilcrow sign
?? &#183; &middot; middle dot
?? &#184; &cedil; cedilla
?? &#185; &sup1; superscript one
?? &#186; &ordm; masculine ordinal indicator
?? &#187; &raquo; right-pointing double angle quotation mark
?? &#188; &frac14; vulgar fraction one quarter
?? &#189; &frac12; vulgar fraction one half
?? &#190; &frac34; vulgar fraction three quarters
?? &#191; &iquest; inverted question mark
?? &#192; &Agrave; Capital letter A with grave
?? &#193; &Aacute; Capital letter A with acute
?? &#194; &Acirc; Capital letter A with circumflex
?? &#195; &Atilde; Capital letter A with tilde
?? &#196; &Auml; Capital letter A with diaeresis
?? &#197; &Aring; Capital letter A with ring above
?? &#198; &AElig; Capital letter AE
?? &#199; &Ccedil; Capital letter C with cedilla
?? &#200; &Egrave; Capital letter E with grave
?? &#201; &Eacute; Capital letter E with acute
?? &#202; &Ecirc; Capital letter E with circumflex
?? &#203; &Euml; Capital letter E with diaeresis
?? &#204; &Igrave; Capital letter I with grave
?? &#205; &Iacute; Capital letter I with acute
?? &#206; &Icirc; Capital letter I with circumflex
?? &#207; &Iuml; Capital letter I with diaeresis
?? &#208; &ETH; Capital letter Eth
?? &#209; &Ntilde; Capital letter N with tilde
?? &#210; &Ograve; Capital letter O with grave
?? &#211; &Oacute; Capital letter O with acute
?? &#212; &Ocirc; Capital letter O with circumflex
?? &#213; &Otilde; Capital letter O with tilde
?? &#214; &Ouml; Capital letter O with diaeresis
?? &#215; &times; multiplication sign
?? &#216; &Oslash; Capital letter O with stroke
?? &#217; &Ugrave; Capital letter U with grave
?? &#218; &Uacute; Capital letter U with acute
?? &#219; &Ucirc; Capital letter U with circumflex
?? &#220; &Uuml; Capital letter U with diaeresis
?? &#221; &Yacute; Capital letter Y with acute
?? &#222; &THORN; Capital letter Thorn
?? &#223; &szlig; Lowercase letter sharp s
?? &#224; &agrave; Lowercase letter a with grave
?? &#225; &aacute; Lowercase letter a with acute
?? &#226; &acirc; Lowercase letter a with circumflex
?? &#227; &atilde; Lowercase letter a with tilde
?? &#228; &auml; Lowercase letter a with diaeresis
?? &#229; &aring; Lowercase letter a with ring above
?? &#230; &aelig; Lowercase letter ae
?? &#231; &ccedil; Lowercase letter c with cedilla
?? &#232; &egrave; Lowercase letter e with grave
?? &#233; &eacute; Lowercase letter e with acute
?? &#234; &ecirc; Lowercase letter e with circumflex
?? &#235; &euml; Lowercase letter e with diaeresis
?? &#236; &igrave; Lowercase letter i with grave
?? &#237; &iacute; Lowercase letter i with acute
?? &#238; &icirc; Lowercase letter i with circumflex
?? &#239; &iuml; Lowercase letter i with diaeresis
?? &#240; &eth; Lowercase letter eth
?? &#241; &ntilde; Lowercase letter n with tilde
?? &#242; &ograve; Lowercase letter o with grave
?? &#243; &oacute; Lowercase letter o with acute
?? &#244; &ocirc; Lowercase letter o with circumflex
?? &#245; &otilde; Lowercase letter o with tilde
?? &#246; &ouml; Lowercase letter o with diaeresis
?? &#247; &divide; division sign
?? &#248; &oslash; Lowercase letter o with stroke
?? &#249; &ugrave; Lowercase letter u with grave
?? &#250; &uacute; Lowercase letter u with acute
?? &#251; &ucirc; Lowercase letter with circumflex
?? &#252; &uuml; Lowercase letter u with diaeresis
?? &#253; &yacute; Lowercase letter y with acute
?? &#254; &thorn; Lowercase letter thorn
?? &#255; &yuml; Lowercase letter y with diaeresis

View all contributions by

Search Engine Veteran - Enterprise SEO & Small Business Entrepreneurs. Advisor to startups for pre-launch optimization SEO Audits & consulting.