An 8-bit encoding scheme for representing any
Unicode character using only 8-bit codes.
The encoding scheme, while inefficient, includes a variety of features to improve compatibility with older schemes.
ASCII, for example, is represented by the first 128 UTF-8 code-points making ASCII and UTF-8 exactly the same for ASCII encoded text. UTF-8 also has features to promote error trapping and resynchronization. The resynchronization is needed because single
Unicode code points higher than the first 128 used for ASCII, will span multiple bytes in UTF-8 encoded text.
UTF-8 is used on the
web and is described in
RFC 3629.