Class: Encoding::Converter
Constant Summary
- INVALID_MASK
If only these mapped to the internal enums...
- INVALID_REPLACE
- UNDEF_MASK
- UNDEF_REPLACE
- UNDEF_HEX_CHARREF
- PARTIAL_INPUT
- AFTER_OUTPUT
- UNIVERSAL_NEWLINE_DECORATOR
- CRLF_NEWLINE_DECORATOR
- CR_NEWLINE_DECORATOR
- XML_TEXT_DECORATOR
- XML_ATTR_CONTENT_DECORATOR
- XML_ATTR_QUOTE_DECORATOR
Class Method Summary (collapse)
-
+ alloc
:nodoc:.
-
+ asciicompat_encoding
Returns the corresponding ASCII compatible encoding.
-
+ search_convpath
Returns a conversion path.
Instance Method Summary (collapse)
-
- convert
Convert source_string and return destination_string.
-
- convpath
Returns the conversion path of ec.
-
- destination_encoding
Returns the destination encoding as an Encoding object.
-
- finish
Finishes the converter.
-
- initialize
constructor
possible options elements:.
- - insert_output
-
- inspect
Returns a printable version of ec.
- - last_error
- - primitive_convert
- - primitive_errinfo
- - putback
-
- replacement
Returns the replacement string.
-
- replacement=
Sets the replacement string.
-
- source_encoding
Returns the source encoding as an Encoding object.
Methods inherited from NSObject
#!, #!=, #!~, #<=>, #==, #===, #=~, #Rational, #__callee__, #__method__, #__send__, #__type__, `, allocWithZone:, #autoContentAccessingProxy, autoload, autoload?, autorelease_pool, #awakeAfterUsingCoder:, binding, block_given?, caller, cancelPreviousPerformRequestsWithTarget:, cancelPreviousPerformRequestsWithTarget:selector:object:, catch, class, classFallbacksForKeyedArchiver, #classForCoder, #classForKeyedArchiver, classForKeyedUnarchiver, #clone, conformsToProtocol:, #copy, copyWithZone:, #dealloc, #define_singleton_method, description, display, #doesNotRecognizeSelector:, #dup, #enum_for, #eql?, #equal?, #extend, fail, #finalize, format, #forwardInvocation:, #forwardingTargetForSelector:, framework, #freeze, #frozen?, getpass, gets, global_variables, #init, initialize, #initialize_clone, #initialize_copy, #initialize_dup, instanceMethodForSelector:, instanceMethodSignatureForSelector:, #instance_eval, #instance_exec, #instance_of?, #instance_variable_defined?, #instance_variable_get, #instance_variable_set, #instance_variables, instancesRespondToSelector:, isSubclassOfClass:, #is_a?, iterator?, #kind_of?, lambda, load, load_bridge_support_file, load_plist, local_variables, loop, #method, #methodForSelector:, #methodSignatureForSelector:, #methods, #mutableCopy, mutableCopyWithZone:, new, #nil?, open, p, #performSelector:onThread:withObject:waitUntilDone:, #performSelector:onThread:withObject:waitUntilDone:modes:, #performSelector:withObject:afterDelay:, #performSelector:withObject:afterDelay:inModes:, #performSelectorInBackground:withObject:, #performSelectorOnMainThread:withObject:waitUntilDone:, #performSelectorOnMainThread:withObject:waitUntilDone:modes:, print, printf, #private_methods, proc, #protected_methods, #public_method, #public_methods, #public_send, putc, puts, raise, rand, readline, readlines, #replacementObjectForCoder:, #replacementObjectForKeyedArchiver:, require, resolveClassMethod:, resolveInstanceMethod:, #respond_to?, #respond_to_missing?, select, #send, setVersion:, #singleton_methods, sprintf, srand, superclass, #taint, #tainted?, #tap, test, throw, #to_plist, #to_s, trace_var, trap, #trust, #untaint, untrace_var, #untrust, #untrusted?, version
Constructor Details
- (Object) Encoding(source_encoding, destination_encoding) - (Object) Encoding(source_encoding, destination_encoding, opt) - (Object) Encoding(convpath)
possible options elements:
hash form:
:invalid => nil # raise error on invalid byte sequence (default)
:invalid => :replace # replace invalid byte sequence
:undef => nil # raise error on undefined conversion (default)
:undef => :replace # replace undefined conversion
:replace => string # replacement string ("?" or "\uFFFD" if not specified)
:newline => :universal # decorator for converting CRLF and CR to LF
:newline => :crlf # decorator for converting LF to CRLF
:newline => :cr # decorator for converting LF to CR
:universal_newline => true # decorator for converting CRLF and CR to LF
:crlf_newline => true # decorator for converting LF to CRLF
:cr_newline => true # decorator for converting LF to CR
:xml => :text # escape as XML CharData.
:xml => :attr # escape as XML AttValue
integer form:
Encoding::Converter::INVALID_REPLACE
Encoding::Converter::UNDEF_REPLACE
Encoding::Converter::UNDEF_HEX_CHARREF
Encoding::Converter::UNIVERSAL_NEWLINE_DECORATOR
Encoding::Converter::CRLF_NEWLINE_DECORATOR
Encoding::Converter::CR_NEWLINE_DECORATOR
Encoding::Converter::XML_TEXT_DECORATOR
Encoding::Converter::XML_ATTR_CONTENT_DECORATOR
Encoding::Converter::XML_ATTR_QUOTE_DECORATOR
Encoding::Converter.new creates an instance of Encoding::Converter.
Source_encoding and destination_encoding should be a string or Encoding object.
opt should be nil, a hash or an integer.
convpath should be an array. convpath may contain
-
two-element arrays which contain encodings or encoding names, or
-
strings representing decorator names.
Encoding::Converter.new optionally takes an option. The option should be a hash or an integer. The option hash can contain :invalid => nil, etc. The option integer should be logical-or of constants such as Encoding::Converter::INVALID_REPLACE, etc.
- :invalid => nil
-
Raise error on invalid byte sequence. This is a default behavior.
- :invalid => :replace
-
Replace invalid byte sequence by replacement string.
- :undef => nil
-
Raise an error if a character in source_encoding is not defined in destination_encoding. This is a default behavior.
- :undef => :replace
-
Replace undefined character in destination_encoding with replacement string.
- :replace => string
-
Specify the replacement string. If not specified, "uFFFD" is used for Unicode encodings and "?" for others.
- :universal_newline => true
-
Convert CRLF and CR to LF.
- :crlf_newline => true
-
Convert LF to CRLF.
- :cr_newline => true
-
Convert LF to CR.
- :xml => :text
-
Escape as XML CharData. This form can be used as a HTML 4.0 #PCDATA.
-
'&' -> '&'
-
'<' -> '<'
-
'>' -> '>'
-
undefined characters in destination_encoding -> hexadecimal CharRef such as &#xHH;
-
- :xml => :attr
-
Escape as XML AttValue. The converted result is quoted as "...". This form can be used as a HTML 4.0 attribute value.
-
'&' -> '&'
-
'<' -> '<'
-
'>' -> '>'
-
'"' -> '"'
-
undefined characters in destination_encoding -> hexadecimal CharRef such as &#xHH;
-
Examples:
# UTF-16BE to UTF-8
ec = Encoding::Converter.new("UTF-16BE", "UTF-8")
# Usually, decorators such as newline conversion are inserted last.
ec = Encoding::Converter.new("UTF-16BE", "UTF-8", :universal_newline => true)
p ec.convpath #=> [[#<Encoding:UTF-16BE>, #<Encoding:UTF-8>],
# "universal_newline"]
# But, if the last encoding is ASCII incompatible,
# decorators are inserted before the last conversion.
ec = Encoding::Converter.new("UTF-8", "UTF-16BE", :crlf_newline => true)
p ec.convpath #=> ["crlf_newline",
# [#<Encoding:UTF-8>, #<Encoding:UTF-16BE>]]
# Conversion path can be specified directly.
ec = Encoding::Converter.new(["universal_newline", ["EUC-JP", "UTF-8"], ["UTF-8", "UTF-16BE"]])
p ec.convpath #=> ["universal_newline",
# [#<Encoding:EUC-JP>, #<Encoding:UTF-8>],
# [#<Encoding:UTF-8>, #<Encoding:UTF-16BE>]]
Dynamic Method Handling
This class handles dynamic methods through the method_missing method in the class NSObject
Class Method Details
+ (Object) alloc
:nodoc:
+ (Encoding?) Encoding(string) + (Encoding?) Encoding(encoding)
Returns the corresponding ASCII compatible encoding.
Returns nil if the argument is an ASCII compatible encoding.
"corresponding ASCII compatible encoding" is a ASCII compatible encoding which can represents exactly the same characters as the given ASCII incompatible encoding. So, no conversion undefined error occurs when converting between the two encodings.
Encoding::Converter.asciicompat_encoding("ISO-2022-JP") #=> #<Encoding:stateless-ISO-2022-JP>
Encoding::Converter.asciicompat_encoding("UTF-16BE") #=> #<Encoding:UTF-8>
Encoding::Converter.asciicompat_encoding("UTF-8") #=> nil
+ (Array) Encoding(source_encoding, destination_encoding) + (Array) Encoding(source_encoding, destination_encoding, opt)
Returns a conversion path.
p Encoding::Converter.search_convpath("ISO-8859-1", "EUC-JP")
#=> [[#<Encoding:ISO-8859-1>, #<Encoding:UTF-8>],
# [#<Encoding:UTF-8>, #<Encoding:EUC-JP>]]
p Encoding::Converter.search_convpath("ISO-8859-1", "EUC-JP", universal_newline: true)
or
p Encoding::Converter.search_convpath("ISO-8859-1", "EUC-JP", newline: :universal)
#=> [[#<Encoding:ISO-8859-1>, #<Encoding:UTF-8>],
# [#<Encoding:UTF-8>, #<Encoding:EUC-JP>],
# "universal_newline"]
p Encoding::Converter.search_convpath("ISO-8859-1", "UTF-32BE", universal_newline: true)
or
p Encoding::Converter.search_convpath("ISO-8859-1", "UTF-32BE", newline: :universal)
#=> [[#<Encoding:ISO-8859-1>, #<Encoding:UTF-8>],
# "universal_newline",
# [#<Encoding:UTF-8>, #<Encoding:UTF-32BE>]]
Instance Method Details
- (Object) convert(source_string)
Convert source_string and return destination_string.
source_string is assumed as a part of source. i.e. :partial_input=>true is specified internally. finish method should be used last.
ec = Encoding::Converter.new("utf-8", "euc-jp")
puts ec.convert("\u3042").dump #=> "\xA4\xA2"
puts ec.finish.dump #=> ""
ec = Encoding::Converter.new("euc-jp", "utf-8")
puts ec.convert("\xA4").dump #=> ""
puts ec.convert("\xA2").dump #=> "\xE3\x81\x82"
puts ec.finish.dump #=> ""
ec = Encoding::Converter.new("utf-8", "iso-2022-jp")
puts ec.convert("\xE3").dump #=> "".force_encoding("ISO-2022-JP")
puts ec.convert("\x81").dump #=> "".force_encoding("ISO-2022-JP")
puts ec.convert("\x82").dump #=> "\e$B$\"".force_encoding("ISO-2022-JP")
puts ec.finish.dump #=> "\e(B".force_encoding("ISO-2022-JP")
If a conversion error occur, Encoding::UndefinedConversionError or Encoding::InvalidByteSequenceError is raised. Encoding::Converter#convert doesn't supply methods to recover or restart from these exceptions. When you want to handle these conversion errors, use Encoding::Converter#primitive_convert.
- (Array) convpath
Returns the conversion path of ec.
The result is an array of conversions.
ec = Encoding::Converter.new("ISO-8859-1", "EUC-JP", crlf_newline: true)
p ec.convpath
#=> [[#<Encoding:ISO-8859-1>, #<Encoding:UTF-8>],
# [#<Encoding:UTF-8>, #<Encoding:EUC-JP>],
# "crlf_newline"]
Each element of the array is a pair of encodings or a string. A pair means an encoding conversion. A string means a decorator.
In the above example, [#<Encoding:ISO-8859-1>, #<Encoding:UTF-8>] means a converter from ISO-8859-1 to UTF-8. "crlf_newline" means newline converter from LF to CRLF.
- (Encoding) destination_encoding
Returns the destination encoding as an Encoding object.
- (String) finish
Finishes the converter. It returns the last part of the converted string.
ec = Encoding::Converter.new("utf-8", "iso-2022-jp")
p ec.convert("\u3042") #=> "\e$B$\""
p ec.finish #=> "\e(B"
- (Object) insert_output
- (String) inspect
Returns a printable version of ec
ec = Encoding::Converter.new("iso-8859-1", "utf-8")
puts ec.inspect #=> #<Encoding::Converter: ISO-8859-1 to UTF-8>
- (Object) last_error
- (Object) primitive_convert
- (Object) primitive_errinfo
- (Object) putback
- (String) replacement
Returns the replacement string.
ec = Encoding::Converter.new("euc-jp", "us-ascii")
p ec.replacement #=> "?"
ec = Encoding::Converter.new("euc-jp", "utf-8")
p ec.replacement #=> "\uFFFD"
- (Object) replacement=(string)
Sets the replacement string.
ec = Encoding::Converter.new("utf-8", "us-ascii", :undef => :replace)
ec.replacement = "<undef>"
p ec.convert("a \u3042 b") #=> "a <undef> b"
- (Encoding) source_encoding
Returns the source encoding as an Encoding object.