Reference LGR for script: Georgian (Geor) | lgr-second-level-georgian-script-24aug20-en |
---|
This document is mechanically formatted from the above XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.
Date | 2020-08-24 |
---|---|
LGR Version | 3 |
Language | und-Geor |
Unicode Version | 6.3.0 |
This document specifies a reference set of Label Generation Rules for Georgian Mkhedruli (modern) for the second level. The starting point for the development of this LGR can be found in the related Root Zone LGR [RZ-LGR-3-Geor]. For details and additional background on the script, see "Proposal for a Georgian Script Root Zone LGR [Proposal-Georgian]". The format of this file follows [RFC 7940].
This is a DRAFT document released for public comments and not final. Please see the announcement on the ICANN website for public comments on the Second Level Reference LGRs for details on how to submit comments.
The repertoire contains 33 code points for letters used by languages that are actively written in the Mkhedruli alphabet. The repertoire is a subset of [Unicode 6.3]. For details, see Section 5, "Repertoire" in [Proposal-Georgian]. (The proposal cited has been adopted for the Georgian script portion of the Root Zone LGR.)
For the second level, the repertoire has been augmented with the ASCII digits, U+0030 (0) to U+0039 (9), and U+002D (-) HYPHEN-MINUS for a total of 44 repertoire elements.
Each code point or range is tagged with the script or scripts that the code point is used with, and one or more references documenting sufficient justification for inclusion in the repertoire, see "References" below.
According to Section 6, "Variants" in [Proposal-Georgian], this LGR defines no variants.
This LGR does not define character classes.
Actions include the default actions for LGRs as well as that needed to invalidate labels with misplaced combining marks. They are marked with ⍟. For a description see [RFC 7940].
According to Section 7, "Whole Label Evaluation (WLE) Rules" in [Proposal-Georgian], this LGR does not define rules specific to Georgian.
This reference LGR for Georgian Mkhedruli for the 2nd Level has been developed by Michel Suignard and Asmus Freytag, based on the Root Zone LGR for Georgian Mkhedruli and information contained or referenced therein, see [RZ-LGR-3-Geor]. Suitable extensions for the second level have been applied according to the [Guidelines]. The original proposal for a Root Zone LGR for Georgian Mkhedruli, that this LGR is based on, was developed by the Georgian Generation Panel. For more information on methodology and contributors to the underlying Root Zone LGR, see Sections 4 and 8 in [Proposal-Georgian], as well as [RZ-LGR-Overview].
The following general references are cited in this document:
For references consulted particularly in designing the repertoire for the Georgian script for the second level. please see details in the Table of References below. Reference [0] refers to the Unicode Standard version in which corresponding code points were initially encoded. Reference [100] corresponds to a source given in [Proposal-Georgian] for justifying the inclusion of for the corresponding code points. Entries in the table may have multiple source reference values.
Number of elements in Repertoire | 44 | ||||
---|---|---|---|---|---|
Number of code points for each script |
|
||||
Longest code point sequence | 1 |
The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where a comment in the original LGR is equal to the character name, it has been suppressed.
See also the legend provided below the table.
Code Point |
Glyph | Script | Name | Ref | Tags | Required Context | Comment |
---|---|---|---|---|---|---|---|
U+002D | - | Common | HYPHEN-MINUS | [0] | Hyphen | not: hyphen-minus-disallowed | ⍟ |
U+0030 | 0 | Common | DIGIT ZERO | [0] | Common-digit | ⍟ | |
U+0031 | 1 | Common | DIGIT ONE | [0] | Common-digit | ⍟ | |
U+0032 | 2 | Common | DIGIT TWO | [0] | Common-digit | ⍟ | |
U+0033 | 3 | Common | DIGIT THREE | [0] | Common-digit | ⍟ | |
U+0034 | 4 | Common | DIGIT FOUR | [0] | Common-digit | ⍟ | |
U+0035 | 5 | Common | DIGIT FIVE | [0] | Common-digit | ⍟ | |
U+0036 | 6 | Common | DIGIT SIX | [0] | Common-digit | ⍟ | |
U+0037 | 7 | Common | DIGIT SEVEN | [0] | Common-digit | ⍟ | |
U+0038 | 8 | Common | DIGIT EIGHT | [0] | Common-digit | ⍟ | |
U+0039 | 9 | Common | DIGIT NINE | [0] | Common-digit | ⍟ | |
U+10D0 | ა | Georgian | GEORGIAN LETTER AN | [0], [100] | Georgian | ||
U+10D1 | ბ | Georgian | GEORGIAN LETTER BAN | [0], [100] | Georgian | ||
U+10D2 | გ | Georgian | GEORGIAN LETTER GAN | [0], [100] | Georgian | ||
U+10D3 | დ | Georgian | GEORGIAN LETTER DON | [0], [100] | Georgian | ||
U+10D4 | ე | Georgian | GEORGIAN LETTER EN | [0], [100] | Georgian | ||
U+10D5 | ვ | Georgian | GEORGIAN LETTER VIN | [0], [100] | Georgian | ||
U+10D6 | ზ | Georgian | GEORGIAN LETTER ZEN | [0], [100] | Georgian | ||
U+10D7 | თ | Georgian | GEORGIAN LETTER TAN | [0], [100] | Georgian | ||
U+10D8 | ი | Georgian | GEORGIAN LETTER IN | [0], [100] | Georgian | ||
U+10D9 | კ | Georgian | GEORGIAN LETTER KAN | [0], [100] | Georgian | ||
U+10DA | ლ | Georgian | GEORGIAN LETTER LAS | [0], [100] | Georgian | ||
U+10DB | მ | Georgian | GEORGIAN LETTER MAN | [0], [100] | Georgian | ||
U+10DC | ნ | Georgian | GEORGIAN LETTER NAR | [0], [100] | Georgian | ||
U+10DD | ო | Georgian | GEORGIAN LETTER ON | [0], [100] | Georgian | ||
U+10DE | პ | Georgian | GEORGIAN LETTER PAR | [0], [100] | Georgian | ||
U+10DF | ჟ | Georgian | GEORGIAN LETTER ZHAR | [0], [100] | Georgian | ||
U+10E0 | რ | Georgian | GEORGIAN LETTER RAE | [0], [100] | Georgian | ||
U+10E1 | ს | Georgian | GEORGIAN LETTER SAN | [0], [100] | Georgian | ||
U+10E2 | ტ | Georgian | GEORGIAN LETTER TAR | [0], [100] | Georgian | ||
U+10E3 | უ | Georgian | GEORGIAN LETTER UN | [0], [100] | Georgian | ||
U+10E4 | ფ | Georgian | GEORGIAN LETTER PHAR | [0], [100] | Georgian | ||
U+10E5 | ქ | Georgian | GEORGIAN LETTER KHAR | [0], [100] | Georgian | ||
U+10E6 | ღ | Georgian | GEORGIAN LETTER GHAN | [0], [100] | Georgian | ||
U+10E7 | ყ | Georgian | GEORGIAN LETTER QAR | [0], [100] | Georgian | ||
U+10E8 | შ | Georgian | GEORGIAN LETTER SHIN | [0], [100] | Georgian | ||
U+10E9 | ჩ | Georgian | GEORGIAN LETTER CHIN | [0], [100] | Georgian | ||
U+10EA | ც | Georgian | GEORGIAN LETTER CAN | [0], [100] | Georgian | ||
U+10EB | ძ | Georgian | GEORGIAN LETTER JIL | [0], [100] | Georgian | ||
U+10EC | წ | Georgian | GEORGIAN LETTER CIL | [0], [100] | Georgian | ||
U+10ED | ჭ | Georgian | GEORGIAN LETTER CHAR | [0], [100] | Georgian | ||
U+10EE | ხ | Georgian | GEORGIAN LETTER XAN | [0], [100] | Georgian | ||
U+10EF | ჯ | Georgian | GEORGIAN LETTER JHAN | [0], [100] | Georgian | ||
U+10F0 | ჰ | Georgian | GEORGIAN LETTER HAE | [0], [100] | Georgian |
This LGR does not specify any variants.
The following table lists all named and implicit classes with their definition and a list of their members intersected with the current repertoire (for larger classes, this list is elided).
Name | Definition | Count | Members or Ranges | Ref | Comment |
---|---|---|---|---|---|
hyphen | Tag=Hyphen | 1 | {002D} | The Hyphen-minus character ⍟ | |
implicit | Tag=Common-digit | 10 | {0030-0039} | Any character tagged as Common-digit | |
implicit | Tag=sc:Geor | 33 | {10D0-10F0} | Any character tagged as Georgian | |
implicit | Tag=sc:Zyyy | 11 | {002D 0030-0039} | Any character tagged as Common |
The following table lists all named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point or variant.
Name | Regular Expression | Used as Trigger |
Anchor | Used as Context |
Ref | Comment |
---|---|---|---|---|---|---|
leading-combining-mark | (start)[∅=[[∅=\p{gc=Mn}] ∪ [∅=\p{gc=Mc}]]] |
✔ | [150] | RFC 5891 restrictions on placement of combining marks ⍟ | ||
hyphen-minus-disallowed | (((start))← ⚓)|(⚓ →((end)))|(((start)..[:hyphen:])← ⚓) |
✔ | C | [150] | RFC 5891 restrictions on placement of U+002D (-) ⍟ | |
ascii-only-label | (start)[\u002D\u0030-\u0039]+(end) |
✔ | [150] | RFC 5891 restriction requiring at least one non-ASCII code point ⍟ |
The following table lists the actions that are used to assign dispositions to labels and variant labels based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.
# | Condition | Rule / Variant Set | Disposition | Ref | Comment | |
---|---|---|---|---|---|---|
1 | if label matches | leading-combining-mark | → | invalid | [150] | labels with leading combining marks are invalid ⍟ |
2 | if label matches | ascii-only-label | → | invalid | [150] | ascii-only labels invalid (not IDNs) ⍟ |
3 | if at least one variant is in | {out-of-repertoire-var} | → | invalid | any variant label with a code point out of repertoire is invalid ⍟ | |
4 | if at least one variant is in | {blocked} | → | blocked | any variant label containing blocked variants is blocked ⍟ | |
5 | if each variant is in | {allocatable} | → | allocatable | variant labels with all variants allocatable are allocatable ⍟ | |
6 | if any label (catch-all) | → | valid | catch all (default action) ⍟ |
The following lists the references cited for specific code points, variants, classes, rules or actions in this LGR. For General references refer to the "References" section in the Description.
[0] | The Unicode Standard 1.1 Any code point originally encoded in Unicode 1.1 |
[100] | Omniglot Georgian Mkhedruli, https://www.omniglot.com/writing/georgian.htm#mkhedruli |
[150] | RFC 5891, Internationalized Domain Names in Applications (IDNA): Protocol http://tools.ietf.org/html/rfc5891 |