Public Comment
IDN TLDs - LGR Procedure Implementation - Maximal Starting Repertoire Version 2 is Now Open for Public Comment
Open Date
15 December 2014 23:59 UTC
Close Date
16 March 2015 23:59 UTC
Staff Report Due
6 April 2015 23:59 UTC
Brief Overview
ICANN is releasing for public comment version 2 of the Maximal Starting Repertoire (MSR-2). This upwardly compatible version of the MSR adds six additional scripts to the repertoire. The MSR is the first deliverable under the "Procedure to Develop and Maintain Label Generation Rules (LGR) for the Root Zone With Respect to IDN Labels" [PDF, 772 KB] (the Procedure) and the starting point for the work by community based Generation Panels to develop their LGR proposals. The contents of MSR-2 and the detailed rationale behind its development are described in "Maximal Starting Repertoire - MSR-2-Overview and Rationale" [PDF, 1.12 MB]. Community members are invited to provide feedback on the contents of MSR-2.
Section I: Description and Explanation
As a deliverable under the "Procedure to Develop and Maintain Label Generation Rules for the Root Zone With Respect to IDN Labels" [PDF, 772 KB] (the Procedure), ICANN is providing for public comment version 2 of the Maximal Starting Repertoire (MSR-2). The contents of MSR-2 and the detailed rationale behind its development are described in "Maximal Starting Repertoire - MSR-2-Overview and Rationale" [PDF, 1.12 MB].
Summary of Contents
MSR-2 is a subset of IDNA 2008 PVALID code points for Unicode 6.3 (latest version of the Unicode Standard for which IANA provides IDNA 2008 tables). As stated in the Procedure, the code points included are not restricted from use in identifiers (as defined in Table 1 of UTS#39) and must not be used for writing an excluded script. The MSR was further adjusted by following the prescriptions of the Procedure [PDF, 772 KB] in eliminating code points not eligible for the root zone.
MSR-2 covers the following 28 scripts, of which six (marked with *) had been previously deferred: Arabic, Armenian*, Bengali, Cyrillic, Devanagari, Ethiopic*, Georgian, Greek, Gujarati, Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Khmer*, Lao, Latin, Malayalam, Myanmar*, Oriya, Sinhala, Tamil, Telugu, Thaana*, Tibetan* and Thai. MSR-2 contains 33,492 code points short-listed from 97,973 PVALID/CONTEXT code points of Unicode version 6.3.
The mere presence of a code point in the MSR does not indicate that the Integration Panel considers it acceptable for inclusion in the LGR. Where the Integration Panel was not able to resolve the status of a code point, it has tended to retain it in the MSR, with the aim of allowing Generation Panels to perform a more thorough review, and where appropriate to present a justification of the inclusion of such code points in the LGR.
In contrast, the absence of a code point affirms that the Integration Panel has determined that the code point is not appropriate for the DNS root, or, in certain situations, the panel has decided to defer it to a future version of the MSR.
Instructions for Reviewers
Reviewers are encouraged to carefully review the MSR-2 documents listed below. Communities that disagree with the choices that the Integration Panel has made in MSR-2 as presented here are advised to raise any issues during the public comment period, and to provide a rationale for adding or removing specific code points. Note that MSR-2 represents an upwardly compatible replacement of MSR-1, not merely an addition. As such reviewers are encouraged to comment on the entire content of MSR-2. In particular, communities for scripts that are about to begin work on an LGR for the first time are strongly encouraged to review the MSR. Note that MSR-2 did not add new content for the scripts that were already included in MSR-1. After the public comment period, the MSR-2 will be frozen for the purposes of developing the version of the Root Zone LGR based on it (LGR-1).
Future Development
The work that will be developed for integration in the first version of the LGR (LGR-1) will be based on MSR-2. If it becomes necessary to stage the release of the LGR, for example because not all Generation Panels are able to submit proposals at the same time, subsequent versions of the LGR may be released.
MSR-2 defers some code points that are already encoded in Unicode 7.0, because authoritative tables for IDNA 2008 are not yet available for Unicode 7.0. Unicode 8.0, due in 2015, is expected to further add code points that are potentially eligible for the root zone. In addition, the Integration Panel monitors any scripts not included in the MSR for indications that change in status is warranted. At regular annual intervals, another version of the MSR will be developed assuming that additional repertoire exists for which inclusion in the MSR is warranted. Until such a later version of the MSR is developed, MSR-2 would be the foundation for any LGR versions developed after its release.
All future versions of the MSR and all versions of the LGR must retain full backwards compatibility, such that they preserve the output of any label registration against the old LGR, when applied to an updated LGR or an LGR resulting from a later version of the MSR. Repertoire that has not been used for label registration is not required to be retained in future versions.
It is important to note that, while the expectation is that registrations predating the initial release of an LGR for the respective script will be allowed to remain in place even if they were to conflict, there is no requirement for an initial LGR to be compatible with them or to consider them precedents.
Section II: Background
To support IDN variants in the root zone, the ICANN community, at the direction of the Board, undertook several projects to study and make recommendations on their viability, sustainability and delegation. One of these projects is the implementation of the Procedure [PDF, 772 KB] allowing for the development of Label Generation Rules (LGR) for the Root Zone. The LGR for the Root Zone is a mechanism for creating and maintaining rules with respect to IDN labels for the root. This mechanism will be used to determine which Unicode code points are permitted for use in U-Labels in the root zone, what variants (if any) are allocatable and what variants (if any) are automatically blocked.
The MSR is the first deliverable from the Integration Panel under the Procedure [PDF, 772 KB] and will serve as a starting collection of code points from which Generation Panels may make a selection in constructing the repertoire for their respective LGR proposals. In accordance with the Procedure [PDF, 772 KB], "Generation panels must not include in their proposed repertoires any assigned code point that is not included in the maximal set of code points for the root zone defined by the integration panel."
Section III: Relevant Resources
Maximal Starting Repertoire - MSR-2-Overview and Rationale.pdf [PDF, 1.12 MB]
MSR-2-Annotated-Han-Tables-20141114.pdf [PDF, 43.3 MB]
MSR-2-Annotated-Hangul-Tables-20141114.pdf [PDF, 4.18 MB]
MSR-2-Annotated-non-CJK-Tables-20141114.pdf [PDF, 2.41 MB]
MSR-2-Repertoire+WLE-Rules-20141204.xml [PDF, 746 KB]
Comments Closed
Report of Public Comments