enum Unicode.UTF8
Inheritance |
UnicodeCodec, _UnicodeEncoding
View Protocol Hierarchy →
|
---|---|
Associated Types | |
Nested Types | Unicode.UTF8.ForwardParser, Unicode.UTF8.ReverseParser |
Import | import Swift |
Cases
Initializers
Static Variables
A unicode scalar value to be used when repairing encoding/decoding errors, as represented in this encoding.
If the Unicode replacement character U+FFFD is representable in this
encoding, encodedReplacementCharacter
encodes that scalar value.
Declaration
static var encodedReplacementCharacter: Unicode.UTF8.EncodedScalar { get }
Static Methods
Converts from encoded to encoding-independent representation
Declaration
static func decode(_ source: Unicode.UTF8.EncodedScalar) -> Unicode.Scalar
Converts from encoding-independent to encoded representation, returning
nil
if the scalar can't be represented in this encoding.
Declaration
static func encode(_ source: Unicode.Scalar) -> Unicode.UTF8.EncodedScalar?
Encodes a Unicode scalar as a series of code units by calling the given closure on each code unit.
For example, the musical fermata symbol ("𝄐") is a single Unicode scalar
value (\u{1D110}
) but requires four code units for its UTF-8
representation. The following code encodes a fermata in UTF-8:
var bytes: [UTF8.CodeUnit] = []
UTF8.encode("𝄐", into: { bytes.append($0) })
print(bytes)
// Prints "[240, 157, 132, 144]"
Parameters: input: The Unicode scalar value to encode. processCodeUnit: A closure that processes one code unit argument at a time.
Declaration
static func encode(_ input: Unicode.Scalar, into processCodeUnit: (Unicode.UTF8.CodeUnit) -> Void)
Returns a Boolean value indicating whether the specified code unit is a UTF-8 continuation byte.
Continuation bytes take the form 0b10xxxxxx
. For example, a lowercase
"e" with an acute accent above it ("é"
) uses 2 bytes for its UTF-8
representation: 0b11000011
(195) and 0b10101001
(169). The second
byte is a continuation byte.
let eAcute = "é"
for codeUnit in eAcute.utf8 {
print(codeUnit, UTF8.isContinuation(codeUnit))
}
// Prints "195 false"
// Prints "169 true"
byte
: A UTF-8 code unit.
Returns: true
if byte
is a continuation byte; otherwise, false
.
Declaration
static func isContinuation(_ byte: Unicode.UTF8.CodeUnit) -> Bool
Converts a scalar from another encoding's representation, returning
nil
if the scalar can't be represented in this encoding.
A default implementation of this method will be provided automatically for any conforming type that does not implement one.
Declaration
static func transcode<FromEncoding>(_ content: FromEncoding.EncodedScalar, from _: FromEncoding.Type) -> Unicode.UTF8.EncodedScalar? where FromEncoding : _UnicodeEncoding
Instance Methods
Starts or continues decoding a UTF-8 sequence.
To decode a code unit sequence completely, call this method repeatedly
until it returns UnicodeDecodingResult.emptyInput
. Checking that the
iterator was exhausted is not sufficient, because the decoder can store
buffered data from the input iterator.
Because of buffering, it is impossible to find the corresponding position
in the iterator for a given returned Unicode.Scalar
or an error.
The following example decodes the UTF-8 encoded bytes of a string into an
array of Unicode.Scalar
instances. This is a demonstration only---if
you need the Unicode scalar representation of a string, use its
unicodeScalars
view.
let str = "✨Unicode✨"
print(Array(str.utf8))
// Prints "[226, 156, 168, 85, 110, 105, 99, 111, 100, 101, 226, 156, 168]"
var bytesIterator = str.utf8.makeIterator()
var scalars: [Unicode.Scalar] = []
var utf8Decoder = UTF8()
Decode: while true {
switch utf8Decoder.decode(&bytesIterator) {
case .scalarValue(let v): scalars.append(v)
case .emptyInput: break Decode
case .error:
print("Decoding error")
break Decode
}
}
print(scalars)
// Prints "["\u{2728}", "U", "n", "i", "c", "o", "d", "e", "\u{2728}"]"
input
: An iterator of code units to be decoded. input
must be
the same iterator instance in repeated calls to this method. Do not
advance the iterator or any copies of the iterator outside this
method.
Returns: A UnicodeDecodingResult
instance, representing the next
Unicode scalar, an indication of an error, or an indication that the
UTF sequence has been fully decoded.
Declaration
mutating func decode<I>(_ input: inout I) -> UnicodeDecodingResult where I : IteratorProtocol, I.Element == Unicode.UTF8.CodeUnit
The basic unit of encoding