For the complete experience, please enable JavaScript in your browser. Thank you!

  • Creative Cloud
  • Photoshop
  • Illustrator
  • InDesign
  • Premiere Pro
  • After Effects
  • Lightroom
  • See all
  • See plans for: businesses photographers students
  • Document Cloud
  • Acrobat DC
  • eSign
  • Stock
  • Elements
  • Marketing Cloud
  • Analytics
  • Audience Manager
  • Campaign
  • Experience Manager
  • Media Optimizer
  • Target
  • See all
  • Acrobat Reader DC
  • Adobe Flash Player
  • Adobe AIR
  • Adobe Shockwave Player
  • All products
  • Creative Cloud
  • Individuals
  • Photographers
  • Students and Teachers
  • Business
  • Schools and Universities
  • Marketing Cloud
  • Document Cloud
  • Stock
  • Elements
  • All products
  • Get Support
    Find answers quickly. Contact us if you need to.
    Start now >
  • Learn the apps
    Get started or learn new ways to work.
    Learn now >
  • Ask the community
    Post questions and get answers from experts.
    Start now >
    • About Us
    • Careers At Adobe
    • Investor Relations
    • Privacy  |  Security
    • Corporate Responsibility
    • Customer Showcase
    • Events
    • Contact Us
News
    • 3/22/2016
      Adobe Summit 2016: Are You An Experience Business?
    • 3/22/2016
      Adobe Announces Cross-Device Co-op to Enable People-Based Marketing
    • 3/22/2016
      Adobe and comScore Advance Digital TV and Ad Measurement
    • 3/22/2016
      Adobe Marketing Cloud Redefines TV Experience
Developing Applications Help / 

About character encodings

Adobe Community Help


Applies to

  • ColdFusion

Contact support

 
By clicking Submit, you accept the Adobe Terms of Use.
 

A character encoding maps each character in a character set to a numeric value that a computer can represent. These numbers can be represented by a single byte or multiple bytes. For example, the ASCII encoding uses 7 bits to represent the Latin alphabet, punctuation, and control characters. 
You use Japanese encodings, such as Shift-JIS, EUC-JP, and ISO-2022-JP, to represent Japanese text. These encodings can vary slightly, but they include a common set of approximately 10,000 characters used in Japanese.
The following terms apply to character encodings:

  • SBCS Single-byte character set; a character set encoded in one byte per character, such as ASCII or ISO 8859-1.
  • DBCS Double-byte character set; a method of encoding a character set in no more than 2 bytes, such as Shift-JIS. Many character encoding schemes that are referred to as double-byte, including Shift-JIS, allow mixing of single-byte and double-byte encoded characters. Others, such as UCS-2, use 2 bytes for all characters.
  • MBCS Multiple-byte character set; a character set encoded with a variable number of bytes per character, such as UTF-8.The following table lists some common character encodings; however, there are many additional character encodings that browsers and web servers support:

Encoding

Type

Description

ASCII

SBCS

7-bit encoding used by English and Indonesian Bahasa languages

Latin-1(ISO 8859-1)

SBCS

8-bit encoding used for many Western European languages

Shift_JIS

DBCS

16-bit Japanese encoding Note: Use an underscore character (_), not a hyphen (-) in the name in CFML attributes.

EUC-KR

DBCS

16-bit Korean encoding

UCS-2

DBCS

Two-byte Unicode encoding

UTF-8

MBCS

Multibyte Unicode encoding. ASCII is 7-bit; non-ASCII characters used in European and many Middle Eastern languages are two-byte; and most Asian characters are three-byte

The World Wide Web Consortium maintains a list of all character encodings supported by the Internet. You can find this information at www.w3.org/International/O-charset.html.Computers must often convert between character encodings. In particular, the character encodings most commonly used on the Internet are not used by Java or Windows. Character sets used on the Internet are typically single-byte or multiple-byte (including DBCS character sets that allow single-byte characters). These character sets are most efficient for transmitting data, because each character takes up the minimum necessary number of bytes. Currently, Latin characters are most frequently used on the web, and most character encodings used on the web represent those characters in a single byte. Computers, however, process data most efficiently if each character occupies the same number of bytes. Therefore, Windows and Java both use double-byte encoding for internal processing.

The Java Unicode character encoding

ColdFusion uses the Java Unicode Standard for representing character data internally. This standard corresponds to UCS-2 encoding of the Unicode character set. The Unicode character set can represent many languages, including all major European and Asian character sets. Therefore, ColdFusion can receive, store, process, and present text from all languages supported by Unicode. 
The Java Virtual Machine (JVM) that is used to processes ColdFusion pages converts between the character encoding used on a ColdFusion page or other source of information to UCS-2. The page or data encodings that ColdFusion supports depend on the specific JVM, but include most encodings used on the web. Similarly, the JVM converts between its internal UCS-2 representation and the character encoding used to send the response to the client.
By default, ColdFusion uses UTF-8 to represent text data sent to a browser. UTF-8 represents the Unicode character set using a variable-length encoding. ASCII characters are sent using a single byte. Most European and Middle Eastern characters are sent as 2 bytes, and Japanese, Korean, and Chinese characters are sent as 3 bytes. One advantage of UTF-8 is that it sends ASCII character set data in a form that is recognized by systems designed to process only single-byte ASCII characters, while it is flexible enough to handle multiple-byte character representations. 
While the default format of text data returned by ColdFusion is UTF-8, you can have ColdFusion return a page to any character set supported by Java. For example, you can return text using the Japanese language Shift-JIS character set. Similarly, ColdFusion can handle data that is in many different character sets. For more information, see Determining the page encoding of server output in Processing a request in ColdFusion.

Character encoding conversion issues

Because different character encodings support different character sets, you can encounter errors if your application gets text in one encoding and presents it in another encoding. For example, the Windows Latin-1 character encoding, Windows-1252, includes characters with hexadecimal representations in the range 80-9F, while ISO 8859-1 does not include characters in that range. As a result, under the following circumstances, characters in the range 80-9F, such as the euro symbol (), are not displayed properly:

  • A file encoded in Windows-1252 includes characters in the range 80-9F.
  • ColdFusion reads the file, specifying the Windows-1252 encoding in the cffile tag.
  • ColdFusion displays the file contents, specifying ISO-8859 in the cfcontent tag.
    Similar issues can arise if you convert between other character encodings; for example, if you read files encoded in the Japanese Windows default encoding and display them using Shift-JIS. To prevent these problems, ensure that the display encoding is the same as the input encoding.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License  Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices   |   Online Privacy Policy

Choose your region United States (Change)   Products   Downloads   Learn & Support   Company
Choose your region Close

Americas

Europe, Middle East and Africa

Asia Pacific

  • Brasil
  • Canada - English
  • Canada - Français
  • Latinoamérica
  • México
  • United States
  • Africa - English
  • Österreich - Deutsch
  • Belgium - English
  • Belgique - Français
  • België - Nederlands
  • България
  • Hrvatska
  • Cyprus - English
  • Česká republika
  • Danmark
  • Eesti
  • Suomi
  • France
  • Deutschland
  • Greece - English
  • Magyarország
  • Ireland
  • Israel - English
  • ישראל - עברית
  • Italia
  • Latvija
  • Lietuva
  • Luxembourg - Deutsch
  • Luxembourg - English
  • Luxembourg - Français
  • Malta - English
  • الشرق الأوسط وشمال أفريقيا - اللغة العربية
  • Middle East and North Africa - English
  • Moyen-Orient et Afrique du Nord - Français
  • Nederland
  • Norge
  • Polska
  • Portugal
  • România
  • Россия
  • Srbija
  • Slovensko
  • Slovenija
  • España
  • Sverige
  • Schweiz - Deutsch
  • Suisse - Français
  • Svizzera - Italiano
  • Türkiye
  • Україна
  • United Kingdom
  • Australia
  • 中国
  • 中國香港特別行政區
  • Hong Kong S.A.R. of China
  • India - English
  • 日本
  • 한국
  • New Zealand
  • Southeast Asia (Includes Indonesia, Malaysia, Philippines, Singapore, Thailand, and Vietnam) - English
  • 台灣

Commonwealth of Independent States

  • Includes Armenia, Azerbaijan, Belarus, Georgia, Moldova, Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, Ukraine, Uzbekistan

Copyright © 2016 Adobe Systems Incorporated. All rights reserved.

Terms of Use | Privacy | Cookies

AdChoices