Directory

Encyclopedia

NodeWorks
                              ENCYCLOPEDIA

Link Checker

Home
Encyclopedia : U : UT : UTF :

UTF-32

 

UTF-32


UTF-32 is a method of encoding Unicode characters, using a fixed amount of 32 bits for each character. It can be regarded as the simplest possible way, as all other Unicode Transformation Formats have variable-length encodings for various characters. However, a notable drawback of UTF-32 is that it requires up to two to four times the storage space of traditional encodings. UTF-32 is generally not as efficient on memory usage and memory bandwidth when compared to UTF-16 or UTF-8. This is why it is rarely used for external storage, but only internally when character handling is required to be as simple as possible.

UCS-4

ISO 10646 defines a 32-bit encoding form called UCS-4, in which each encoded character in the universal character set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

UCS-4 is sufficient to represent all of the Unicode code space, which has 1114112 (= 220+216) code points and therefore requires only up to hexadecimal 10FFFF. Some people consider it wasteful to reserve such a large code space for mapping a relatively small set of code points, so a new encoding form, UTF-32, was proposed. UTF-32 is a subset of UCS-4 that uses 32-bit code values only in the 0 to 10FFFF code space.

UTF-32 and UCS-4


UTF-32 was originally a subset of the UCS-4 standard, but the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes and has removed former provisions for private-use code positions in groups 60 to 7F and in planes E0 to FF.

Accordingly UCS-4 and UTF-32 can be now taken to be identical save that the UTF-32 standard has additional Unicode semantics that must be observed.

External Links

  • The Unicode Standard 4.1, chapter 3 - formally defines UTF-32 in §3.10, D43-D45
  • Unicode Standard Annex #19 - formally defined UTF-32 for Unicode 3.x (March 2001; last updated March 2002)
  • Registration of new charsets: UTF-32, UTF-32BE, UTF-32LE - announcement of UTF-32 being added to the IANA charset registry (April 2002)



  • NodeWorks boosts web surfing!
    Page Returned in 0.110 seconds - HTML Compressed 67.1%

    This article is from Wikipedia. All text is available
    under the terms of the GNU Free Documentation License.
     GNU Free Documentation License
    © 2008 Chamas Enterprises Inc.