2 UCD Properties
The Unicode standard defines a large number of character properties which describe the type and behavior of characters and character compositions. The data supporting these properties is collected into the Unicode Character Database (UCD) and made available as either a set of text files or XML files.
The functions below rely on a set of racket source files, expressed either as hashes or rang dicts, that are loaded lazily to fetch specific property values. The loading of the source data incurs a runtime penalty for the first call that requires that specific data but once loaded this penalty is avoided in future calls.
These source files are generated by tooling from the latest UCD files as described in the separate section Data Generator.
2.1 Module codepoint/properties
(require codepoint/properties) | package: codepoint |
The functions below are either directly mapped to to a character property, or are derived from a character property.
> (define cp (char->codepoint #\§)) > (ucd-latin-1? cp) #t
> (ucd-name cp) "SECTION SIGN"
> (ucd-name-aliases cp) ucd-name-aliases: no property data found for codepoint
codepoint: 167
property: 'name-aliases
> (ucd-general-category cp) 'Po
> (cdr (assoc (ucd-general-category cp) *general-categories*)) "Other punctuation"
> (ucd-age cp) "1.1"
> (ucd-block-name cp) "Latin-1 Supplement"
> (ucd-scripts cp) '(Common)
> (ucd-script-extensions cp (lambda () "None found!")) "None found!"
> (ucd-line-break cp) 'AI
> (cdr (assoc (ucd-line-break cp) *line-breaks*)) "Ambiguous (Alphabetic or Ideographic)"
For any function below that performs a property lookup and has a parameter named failure-result, if no value is found for codepoint, then failure-result determines the result:
If failure-result is a procedure, it is called (through a tail call) with no arguments to produce the result.
Otherwise, failure-result is returned as the result.
> (format "Generated from UCD data, version ~a" *corresponding-unicode-version*) "Generated from UCD data, version 14.0.0"
procedure
(ucd-ascii? c) → boolean?
c : codepoint?
procedure
(ucd-latin-1? c) → boolean?
c : codepoint?
procedure
c : codepoint?
procedure
(ucd-name->symbol c) → symbol?
c : codepoint?
> (ucd-name->symbol 73) 'latin-capital-letter-i
> (ucd-name->symbol 0) 'control/0
> (ucd-name->symbol 13312) 'cjk-ideograph-extension-a/first
> (ucd-name->symbol 63755) 'cjk-compatibility-ideograph/f90b
procedure
(ucd-name-aliases c failure-result) → (listof string?)
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
procedure
(ucd-general-category c failure-result) → symbol?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (define macron (codepoint->char 772))
> (for ([char (list #\nul #\space #\a #\A #\ༀ #\1 #\½ #\, #\] #\¥ macron)]) (displayln (format "~a => ~a" char (cdr (assoc (ucd-general-category (char->codepoint char)) *general-categories*)))))
=> Control
=> Space separator
a => Lowercase letter
A => Uppercase letter
ༀ => Other letter
1 => Decimal digit number
½ => Other number
, => Other punctuation
] => Close punctuation
¥ => Currency symbol
̄ => Non-spacing mark
See *general-categories* for a mapping from this symbol to a description.
procedure
(ucd-letter-category? c failure-result) → boolean?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-letter-category? (char->codepoint #\nul)) #f
> (ucd-letter-category? (char->codepoint #\space)) #f
> (ucd-letter-category? (char->codepoint #\a)) #t
> (ucd-letter-category? (char->codepoint #\A)) #t
> (ucd-letter-category? (char->codepoint #\ༀ)) #t
> (ucd-letter-category? (char->codepoint #\1)) #f
> (ucd-letter-category? (char->codepoint #\½)) #f
> (ucd-letter-category? (char->codepoint #\,)) #f
> (ucd-letter-category? (char->codepoint #\])) #f
> (ucd-letter-category? (char->codepoint #\¥)) #f
> (ucd-letter-category? 772) #f
procedure
(ucd-cased-letter-category? c failure-result) → boolean? c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-cased-letter-category? (char->codepoint #\nul)) #f
> (ucd-cased-letter-category? (char->codepoint #\space)) #f
> (ucd-cased-letter-category? (char->codepoint #\a)) #t
> (ucd-cased-letter-category? (char->codepoint #\A)) #t
> (ucd-cased-letter-category? (char->codepoint #\ༀ)) #f
> (ucd-cased-letter-category? (char->codepoint #\1)) #f
> (ucd-cased-letter-category? (char->codepoint #\½)) #f
> (ucd-cased-letter-category? (char->codepoint #\,)) #f
> (ucd-cased-letter-category? (char->codepoint #\])) #f
> (ucd-cased-letter-category? (char->codepoint #\¥)) #f
> (ucd-cased-letter-category? 772) #f
procedure
(ucd-mark-category? c failure-result) → boolean?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-mark-category? (char->codepoint #\nul)) #f
> (ucd-mark-category? (char->codepoint #\space)) #f
> (ucd-mark-category? (char->codepoint #\a)) #f
> (ucd-mark-category? (char->codepoint #\A)) #f
> (ucd-mark-category? (char->codepoint #\ༀ)) #f
> (ucd-mark-category? (char->codepoint #\1)) #f
> (ucd-mark-category? (char->codepoint #\½)) #f
> (ucd-mark-category? (char->codepoint #\,)) #f
> (ucd-mark-category? (char->codepoint #\])) #f
> (ucd-mark-category? (char->codepoint #\¥)) #f
> (ucd-mark-category? 772) #t
procedure
(ucd-number-category? c failure-result) → boolean?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-number-category? (char->codepoint #\nul)) #f
> (ucd-number-category? (char->codepoint #\space)) #f
> (ucd-number-category? (char->codepoint #\a)) #f
> (ucd-number-category? (char->codepoint #\A)) #f
> (ucd-number-category? (char->codepoint #\ༀ)) #f
> (ucd-number-category? (char->codepoint #\1)) #t
> (ucd-number-category? (char->codepoint #\½)) #t
> (ucd-number-category? (char->codepoint #\,)) #f
> (ucd-number-category? (char->codepoint #\])) #f
> (ucd-number-category? (char->codepoint #\¥)) #f
> (ucd-number-category? 772) #f
procedure
(ucd-punctuation-category? c failure-result) → boolean? c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-punctuation-category? (char->codepoint #\nul)) #f
> (ucd-punctuation-category? (char->codepoint #\space)) #f
> (ucd-punctuation-category? (char->codepoint #\a)) #f
> (ucd-punctuation-category? (char->codepoint #\A)) #f
> (ucd-punctuation-category? (char->codepoint #\ༀ)) #f
> (ucd-punctuation-category? (char->codepoint #\1)) #f
> (ucd-punctuation-category? (char->codepoint #\½)) #f
> (ucd-punctuation-category? (char->codepoint #\,)) #t
> (ucd-punctuation-category? (char->codepoint #\])) #t
> (ucd-punctuation-category? (char->codepoint #\¥)) #f
> (ucd-punctuation-category? 772) #f
procedure
(ucd-symbol-category? c failure-result) → boolean?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-symbol-category? (char->codepoint #\nul)) #f
> (ucd-symbol-category? (char->codepoint #\space)) #f
> (ucd-symbol-category? (char->codepoint #\a)) #f
> (ucd-symbol-category? (char->codepoint #\A)) #f
> (ucd-symbol-category? (char->codepoint #\ༀ)) #f
> (ucd-symbol-category? (char->codepoint #\1)) #f
> (ucd-symbol-category? (char->codepoint #\½)) #f
> (ucd-symbol-category? (char->codepoint #\,)) #f
> (ucd-symbol-category? (char->codepoint #\])) #f
> (ucd-symbol-category? (char->codepoint #\¥)) #t
> (ucd-symbol-category? 772) #f
procedure
(ucd-separator-category? c failure-result) → boolean?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-separator-category? (char->codepoint #\nul)) #f
> (ucd-separator-category? (char->codepoint #\space)) #t
> (ucd-separator-category? (char->codepoint #\a)) #f
> (ucd-separator-category? (char->codepoint #\A)) #f
> (ucd-separator-category? (char->codepoint #\ༀ)) #f
> (ucd-separator-category? (char->codepoint #\1)) #f
> (ucd-separator-category? (char->codepoint #\½)) #f
> (ucd-separator-category? (char->codepoint #\,)) #f
> (ucd-separator-category? (char->codepoint #\])) #f
> (ucd-separator-category? (char->codepoint #\¥)) #f
> (ucd-separator-category? 772) #f
procedure
(ucd-other-category? c failure-result) → boolean?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-other-category? (char->codepoint #\nul)) #t
> (ucd-other-category? (char->codepoint #\space)) #f
> (ucd-other-category? (char->codepoint #\a)) #f
> (ucd-other-category? (char->codepoint #\A)) #f
> (ucd-other-category? (char->codepoint #\ༀ)) #f
> (ucd-other-category? (char->codepoint #\1)) #f
> (ucd-other-category? (char->codepoint #\½)) #f
> (ucd-other-category? (char->codepoint #\,)) #f
> (ucd-other-category? (char->codepoint #\])) #f
> (ucd-other-category? (char->codepoint #\¥)) #f
> (ucd-other-category? 772) #f
procedure
(ucd-codepoint-type c failure-result) → symbol?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-codepoint-type (char->codepoint #\nul)) 'control
> (ucd-codepoint-type (char->codepoint #\space)) 'graphic
> (ucd-codepoint-type (char->codepoint #\a)) 'graphic
> (ucd-codepoint-type (char->codepoint #\A)) 'graphic
> (ucd-codepoint-type (char->codepoint #\ༀ)) 'graphic
> (ucd-codepoint-type (char->codepoint #\1)) 'graphic
> (ucd-codepoint-type (char->codepoint #\½)) 'graphic
> (ucd-codepoint-type (char->codepoint #\,)) 'graphic
> (ucd-codepoint-type (char->codepoint #\])) 'graphic
> (ucd-codepoint-type (char->codepoint #\¥)) 'graphic
> (ucd-codepoint-type 772) 'graphic
See *codepoint-types* for a mapping from this symbol to a description.
procedure
(ucd-canonical-combining-class c failure-result) → symbol? c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (assoc (ucd-canonical-combining-class (char->codepoint #\space)) *combining-classes*)
'(0
.
"Spacing and enclosing marks; also many vowel and consonant signs, even if nonspacing")
> (assoc (ucd-canonical-combining-class (char->codepoint #\a)) *combining-classes*)
'(0
.
"Spacing and enclosing marks; also many vowel and consonant signs, even if nonspacing")
> (assoc (ucd-canonical-combining-class 772) *combining-classes*) '(230 . "Distinct marks directly above")
> (assoc (ucd-canonical-combining-class 3954) *combining-classes*) #f
See *combining-classes* for a mapping from this symbol to a description.
procedure
(ucd-bidi-class c failure-result) → symbol?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (assoc (ucd-bidi-class (char->codepoint #\nul)) *bidi-classes*) '(BN . "Boundary Neutral")
> (assoc (ucd-bidi-class (char->codepoint #\space)) *bidi-classes*) '(WS . "White Space")
> (assoc (ucd-bidi-class (char->codepoint #\A)) *bidi-classes*) '(L . "Left-to-right")
> (assoc (ucd-bidi-class (char->codepoint #\א)) *bidi-classes*) '(R . "Right-to-left")
> (assoc (ucd-bidi-class (char->codepoint #\ؠ)) *bidi-classes*) '(AL . "Arabic Letter")
> (assoc (ucd-bidi-class (char->codepoint #\1)) *bidi-classes*) '(EN . "European Number")
> (assoc (ucd-bidi-class (char->codepoint #\!)) *bidi-classes*) '(ON . "Other Neutral")
See *bidi-classes* for a mapping from this symbol to a description.
procedure
(ucd-bidi-mirrored? c) → boolean?
c : codepoint?
> (ucd-bidi-mirrored? (char->codepoint #\A)) #f
> (ucd-bidi-mirrored? (char->codepoint #\[)) #t
> (ucd-bidi-mirrored? (char->codepoint #\∈)) #t
> (ucd-bidi-mirrored? (char->codepoint #\༼)) #t
> (ucd-bidi-mirrored? (char->codepoint #\!)) #f
procedure
c : codepoint?
> (ucd-has-mirror-glyph? (char->codepoint #\[)) #t
> (ucd-has-mirror-glyph? (char->codepoint #\∈)) #t
> (ucd-has-mirror-glyph? (char->codepoint #\༼)) #t
> (ucd-has-mirror-glyph? (char->codepoint #\!)) #f
procedure
(ucd-mirror-glyph c failure-result) → codepoint?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (codepoint->char (ucd-mirror-glyph (char->codepoint #\[))) #\]
> (codepoint->char (ucd-mirror-glyph (char->codepoint #\∈))) #\∋
> (codepoint->char (ucd-mirror-glyph (char->codepoint #\༼))) #\༽
> (ucd-mirror-glyph (char->codepoint #\!)) #f
procedure
(ucd-bracket? c) → boolean?
c : codepoint?
> (ucd-bracket? (char->codepoint #\[)) #t
> (ucd-bracket? (char->codepoint #\))) #t
> (ucd-bracket? (char->codepoint #\⌈)) #t
> (ucd-bracket? (char->codepoint #\༺)) #t
> (ucd-bracket? (char->codepoint #\⟅)) #t
> (ucd-bracket? (char->codepoint #\«)) #f
> (ucd-bracket? (char->codepoint #\!)) #f
procedure
(ucd-bracket-type c failure-result) → symbol?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-bracket-type (char->codepoint #\[)) 'open
> (ucd-bracket-type (char->codepoint #\))) 'close
> (ucd-bracket-type (char->codepoint #\⌈)) 'open
> (ucd-bracket-type (char->codepoint #\༺)) 'open
> (ucd-bracket-type (char->codepoint #\⟅)) 'open
> (ucd-bracket-type (char->codepoint #\«)) 'none
> (ucd-bracket-type (char->codepoint #\!)) 'none
procedure
(ucd-matching-bracket c failure-result) → codepoint?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (codepoint->char (ucd-matching-bracket (char->codepoint #\[))) #\]
> (codepoint->char (ucd-matching-bracket (char->codepoint #\)))) #\(
> (codepoint->char (ucd-matching-bracket (char->codepoint #\⌈))) #\⌉
> (codepoint->char (ucd-matching-bracket (char->codepoint #\༺))) #\༻
> (codepoint->char (ucd-matching-bracket (char->codepoint #\⟅))) #\⟆
> (ucd-matching-bracket (char->codepoint #\«)) #f
> (ucd-matching-bracket (char->codepoint #\!)) #f
procedure
(ucd-decomposition-type c failure-result) → symbol?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
procedure
(ucd-decomposition-mapping c failure-result) → (listof codepoint?) c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
procedure
(ucd-numeric-type c failure-result) → symbol?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-numeric-type (char->codepoint #\3)) 'decimal
> (ucd-numeric-type (char->codepoint #\¼)) 'numeric
> (ucd-numeric-type (char->codepoint #\⒍)) 'digit
> (ucd-numeric-type (char->codepoint #\㊾)) 'numeric
> (ucd-numeric-type (char->codepoint #\₂)) 'digit
> (ucd-numeric-type (char->codepoint #\ⅳ)) 'numeric
> (ucd-numeric-type (char->codepoint #\六)) 'numeric
> (ucd-numeric-type (char->codepoint #\༣)) 'decimal
> (ucd-numeric-type (char->codepoint #\𐄎)) 'numeric
See *numeric-types* for a mapping from this symbol to a description.
procedure
(ucd-numeric-value c failure-result) → rational?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-numeric-value (char->codepoint #\3)) 3
> (ucd-numeric-value (char->codepoint #\¼)) 1/4
> (ucd-numeric-value (char->codepoint #\⒍)) 6
> (ucd-numeric-value (char->codepoint #\㊾)) 49
> (ucd-numeric-value (char->codepoint #\₂)) 2
> (ucd-numeric-value (char->codepoint #\ⅳ)) 4
> (ucd-numeric-value (char->codepoint #\六)) 6
> (ucd-numeric-value (char->codepoint #\༣)) 3
> (ucd-numeric-value (char->codepoint #\𐄎)) 8
procedure
(ucd-simple-uppercase-mapping c failure-result) → codepoint? c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (codepoint->char (ucd-simple-uppercase-mapping (char->codepoint #\a))) #\A
> (codepoint->char (ucd-simple-uppercase-mapping (char->codepoint #\α))) #\Α
> (codepoint->char (ucd-simple-uppercase-mapping (char->codepoint #\ა))) #\Ა
> (codepoint->char (ucd-simple-uppercase-mapping (char->codepoint #\ꭰ))) #\Ꭰ
> (codepoint->char (ucd-simple-uppercase-mapping (char->codepoint #\ж))) #\Ж
> (codepoint->char (ucd-simple-uppercase-mapping (char->codepoint #\dž))) #\DŽ
procedure
(ucd-simple-lowercase-mapping c failure-result) → codepoint? c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (codepoint->char (ucd-simple-lowercase-mapping (char->codepoint #\A))) #\a
> (codepoint->char (ucd-simple-lowercase-mapping (char->codepoint #\Α))) #\α
> (codepoint->char (ucd-simple-lowercase-mapping (char->codepoint #\Ა))) #\ა
> (codepoint->char (ucd-simple-lowercase-mapping (char->codepoint #\Ꭰ))) #\ꭰ
> (codepoint->char (ucd-simple-lowercase-mapping (char->codepoint #\Ж))) #\ж
> (codepoint->char (ucd-simple-lowercase-mapping (char->codepoint #\DŽ))) #\dž
procedure
(ucd-simple-titlecase-mapping c failure-result) → codepoint? c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (codepoint->char (ucd-simple-titlecase-mapping (char->codepoint #\a))) #\A
> (codepoint->char (ucd-simple-titlecase-mapping (char->codepoint #\α))) #\Α
> (codepoint->char (ucd-simple-titlecase-mapping (char->codepoint #\ა))) #\ა
> (codepoint->char (ucd-simple-titlecase-mapping (char->codepoint #\dž))) #\Dž
procedure
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (define groucho-emoji #\🥸) > (ucd-age (char->codepoint #\nul)) "1.1"
> (ucd-age (char->codepoint #\space)) "1.1"
> (ucd-age (char->codepoint #\a)) "1.1"
> (ucd-age (char->codepoint #\A)) "1.1"
> (ucd-age (char->codepoint #\ༀ)) "2.0"
> (ucd-age (char->codepoint #\1)) "1.1"
> (ucd-age (char->codepoint #\½)) "1.1"
> (ucd-age (char->codepoint #\,)) "1.1"
> (ucd-age (char->codepoint #\])) "1.1"
> (ucd-age (char->codepoint #\¥)) "1.1"
> (ucd-age (char->codepoint #\€)) "2.1"
> (ucd-age (char->codepoint groucho-emoji)) "13.0"
> (ucd-age 772) "1.1"
procedure
(ucd-block-name c failure-result) → string?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-block-name (char->codepoint #\nul)) "Basic Latin"
> (ucd-block-name (char->codepoint #\space)) "Basic Latin"
> (ucd-block-name (char->codepoint #\a)) "Basic Latin"
> (ucd-block-name (char->codepoint #\A)) "Basic Latin"
> (ucd-block-name (char->codepoint #\ༀ)) "Tibetan"
> (ucd-block-name (char->codepoint #\1)) "Basic Latin"
> (ucd-block-name (char->codepoint #\½)) "Latin-1 Supplement"
> (ucd-block-name (char->codepoint #\,)) "Basic Latin"
> (ucd-block-name (char->codepoint #\])) "Basic Latin"
> (ucd-block-name (char->codepoint #\¥)) "Latin-1 Supplement"
> (ucd-block-name 772) "Combining Diacritical Marks"
procedure
(ucd-scripts c failure-result) → (listof symbol?)
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (ucd-scripts (char->codepoint #\nul)) '(Common)
> (ucd-scripts (char->codepoint #\space)) '(Common)
> (ucd-scripts (char->codepoint #\a)) '(Latin)
> (ucd-scripts (char->codepoint #\A)) '(Latin)
> (ucd-scripts (char->codepoint #\ༀ)) '(Tibetan)
> (ucd-scripts (char->codepoint #\1)) '(Common)
> (ucd-scripts (char->codepoint #\½)) '(Common)
> (ucd-scripts (char->codepoint #\,)) '(Common)
> (ucd-scripts (char->codepoint #\])) '(Common)
> (ucd-scripts (char->codepoint #\¥)) '(Common)
> (ucd-scripts 772) '(Inherited)
procedure
(ucd-script-extensions c failure-result) → (listof symbol?)
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (define (display-scripts cpc) (display (format "~a => Script: ~a" cpc (ucd-scripts (char->codepoint cpc)))) (let ([extensions (ucd-script-extensions (char->codepoint cpc) #f)]) (if (false? extensions) (newline) (displayln (format ", extensions: ~a" extensions))))) > (display-scripts #\𐋡) 𐋡 => Script: (Common), extensions: (Arab Copt)
> (display-scripts #\჻) ჻ => Script: (Common), extensions: (Geor Latn)
> (display-scripts #\꜀) ꜀ => Script: (Common), extensions: (Hani Latn)
> (display-scripts #\a) a => Script: (Latin)
procedure
(ucd-line-break c failure-result) → symbol?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (assoc (ucd-line-break (char->codepoint #\space)) *line-breaks*) '(SP . "Space")
> (assoc (ucd-line-break (char->codepoint #\-)) *line-breaks*) '(HY . "Hyphen")
> (assoc (ucd-line-break (char->codepoint #\,)) *line-breaks*) '(IS . "Infix Numeric Separator")
> (assoc (ucd-line-break (char->codepoint #\a)) *line-breaks*) '(AL . "Alphabetic")
> (assoc (ucd-line-break (char->codepoint #\Z)) *line-breaks*) '(AL . "Alphabetic")
See *line-breaks* for a mapping from this symbol to a description.
2.2 Module codepoint/enums
(require codepoint/enums) | package: codepoint |
value
*bidi-classes* : (listof (cons/c symbol? string?))
value
value
value
value
value
*line-breaks* : (listof (cons/c symbol? string?))
value
value