HTML Java

HTML Charset


What is Character Encoding?

Character encoding is a method of converting bytes into characters. To validate or display an HTML document properly, a program must choose a proper character encoding.

  • ASCII was the first character encoding standard (also called character set). ASCII defined 128 different alphanumeric characters that could be used on the internet: numbers (0-9), English letters (A-Z), and some special characters like ! $ + - ( ) @ < > .
  • ISO-8859-1 was the default character set for HTML 4. This character set supported 256 different character codes.
  • ANSI (Windows-1252) was the original Windows character set. ANSI is identical to ISO-8859-1, except that ANSI has 32 extra characters.
  • Because ANSI and ISO-8859-1 were so limited, HTML 4 also supported UTF-8.
  • UTF-8 (Unicode) covers almost all of the characters and symbols in the world.

The HTML charset Attribute

To display an HTML page correctly, a web browser must know the character set used in the page.This is specified in the <meta> tag:

<meta charset="UTF-8">

Some Statistics Symbols

Number ASCII ANSI ISO-8859-1 UTF-8 Description
32 Space
33 ! Exclamation Mark
34 " Quotation Mark
35 # Hash Sign
36 $ Dollar Sign
37 % Percent Sign
38 & Ampersand Sign
39 ' Apostrophe Sign
40 ( Opening Paranthesis
41 ) Closing Parenthesis
42 * Asterisk Sign
43 + Plus Sign
44 , Comma
45 - Hyphen/minus Sign
46 . Full-stop
47 / Slash/Divide Sign
48 0 Number Zero
49 1 Number One
50 2 Number Two
51 3 Number Three
52 4 Number Four
53 5 Number Five
54 6 Number Six
55 7 Number Seven
56 8 Number Eight
57 9 Number Nine
58 : Colon
59 ; Semicolon
60 < Lessthan Sign
61 = Equalto Sign
62 > Greaterthan Sign
63 ? Question Mark
64 @ at Sign