1. Introduction
This document sets out the policy for IDN (Internationalized Domain Names) registrations under.ලංකා and .இலங்கை at LK Domain Registry. This policy and procedure are designed to ensure reliable and reasonable assignments of IDN for the registrants.
2. Abbreviations
Character: Character can be either a vowel, or a consonant or a composite ( a consonant with a vowel modifier) in Sinhala/Tamil script, or a digit/number or a Latin letter or hyphen(-)
Domain name: Domain name is unique address that can be used identify a resource on the Internet. It may consist one or more domain labels.
Domain Label: Domain label is a string which is bounded by period(s) “.”
Ex: nic.lk in this Domain name “nic” is a domain label
3.2. IDN Policy
3.2.1. Registrants can request IDN domains under.ලංකා and .இலங்கை in LK Domain Registry.
3.2.2. The requested domain name should consists only the characters given in the relevant IDN language table [Appendix-A]. The relevant language table for. ලංකා and . இலங்கை are attached in the Appendix A.
E.g.: If you are registering a .ලංකා domain name, then you should use characters which are included in the.ලංකා language table.
3.2.3. The domain string must contain at least two letters.
3.2.4. The string should consist valid Unicode code points and should comply with the linguistic rules of the respective language. However it does not need to comply with the spelling rules.
E.g.:
ෙඅ is not a valid string.
නය and ණය are valid strings and which are considered as two different strings.
3.2.5. The String should not contain the pattern “xn- -”
3.2.6. It you are requesting a domain name containing zwj(200D) we are registering two domain names. They are domain name which contains zwj and a domain name without zwj as a bundle.
- Even if we register two domain names the domain registration charges will not be affected.
E.g.:
If you are requesting a .ලංකා domain, containing ් + ර or ්+ය we are registering an extra domain name contains rakaransaya(්ර) or yansaya(්ය) with the requesting domain accordingly.
E.g.: If you are requesting සත්ය.ලංකා we are registering සත්ය.ලංකා domain as well.
සත්ය.ලංකා සත්ය.ලංකා
4. IDN Label Rules
This set of rules (Appendix-B) guide you to create a valid domain name for.ලංකා and. இலங்கை domains.
5. Appendix
5.1. Appendix-A
5.1.1. Permitted String Table for. ලංකා domains
Following tabel specifies the IDN (Internationalized Domain Names) Language Table used by the LK Domain Registry for the registration of Sinhala language domain labels in the .lk and .ලංකා domains. These are based on the recommendation of the ICTA IDN working group.
Other restrictions on the allowable character sequences exist, which are not documented in this table.
Latin | ||
U+002D | HYPHEN-MINUS | – |
U+0030..U+0039 | DIGIT ZERO – DIGIT NINE | 0-9 |
U+0061..U+007A | LATIN SMALL LETTER A – LATIN SMALL LETTER Z | A-Z
a-z |
Sinhala | ||
U+0D82 | Sinhala sign anusvaraya | (ං) |
U+0D83 | Sinhala sign visargaya | (ඃ) |
U+0D85 | Sinhala letter ayanna | (අ) |
U+0D86 | Sinhala letter aayanna | (ආ) |
U+0D87 | Sinhala letter aeyanna | (ඇ) |
U+0D88 | Sinhala letter aeeyanna | (ඈ) |
U+0D89 | Sinhala letter iyanna | (ඉ) |
U+0D8A | Sinhala letter iiyanna | (ඊ) |
U+0D8B | Sinhala letter uyanna | (උ) |
U+0D8C | Sinhala letter uuyanna | (ඌ) |
U+0D8D | Sinhala letter iruyanna | (ඍ) |
U+0D8E | Sinhala letter iruuyanna | (ඎ) |
U+0D91 | Sinhala letter eyanna | (එ) |
U+0D92 | Sinhala letter eeyanna | (ඒ) |
U+0D93 | Sinhala letter aiyanna | (ඓ) |
U+0D94 | Sinhala letter oyanna | (ඔ) |
U+0D95 | Sinhala letter ooyanna | (ඕ) |
U+0D96 | Sinhala letter auyanna | (ඖ) |
U+0D9A | Sinhala letter alpapraana kayanna | (ක) |
U+0D9B | Sinhala letter mahaapraana kayanna | (ඛ) |
U+0D9C | Sinhala letter alpapapraana gayanna | (ග) |
U+0D9D | Sinhala letter mahaapraana gayanna | (ඝ) |
U+0D9E | Sinhala letter kantaja naasikyaya | (ඞ) |
U+0D9F | Sinhala letter sanyaka gayanna | (ඟ) |
U+0DA0 | Sinhala letter alpapraana cayanna | (ච) |
U+0DA1 | Sinhala letter mahaapraana cayanna | (ඡ) |
U+0DA2 | Sinhala letter alpapraana jayanna | (ජ) |
U+0DA3 | Sinhala letter mahaapraana jayanna | (ඣ) |
U+0DA4 | Sinhala letter taaluja naasikyaya | (ඤ) |
U+0DA5 | Sinhala letter taaluja sanyooga naaksikyaya | (ඥ) |
U+0DA7 | Sinhala letter alpapraana ttayanna | (ට) |
U+0DA8 | Sinhala letter mahaapraana ttayanna | (ඨ) |
U+0DA9 | Sinhala letter alpapraana ddayanna | (ඩ) |
U+0DAA | Sinhala letter mahaapraana ddayanna | (ඪ) |
U+0DAB | Sinhala letter muurdhaja nayanna | (ණ) |
U+0DAD | Sinhala letter alpapraana tayanna | (ත) |
U+0DAE | Sinhala letter mahaapraana tayanna | (ථ) |
U+0DAF | Sinhala letter alpapraana dayanna | (ද) |
U+0DB0 | Sinhala letter mahaapraana dayanna | (ධ) |
U+0DB1 | Sinhala letter dantaja nayanna | (න) |
U+0DB3 | Sinhala letter sanyaka dayanna | (ඳ) |
U+0DB4 | Sinhala letter alpapraana payanna | (ප) |
U+0DB5 | Sinhala letter mahaapraana payanna | (ඵ) |
U+0DB6 | Sinhala letter alpapraana bayanna | (බ) |
U+0DB7 | Sinhala letter mahaapraana bayanna | (භ) |
U+0DB8 | Sinhala letter mayanna | (ම) |
U+0DB9 | Sinhala letter amba bayanna | (ඹ) |
U+0DBA | Sinhala letter yayanna | (ය) |
U+0DBB | Sinhaya letter rayanna | (ර) |
U+0DBD | Sinhala letter dantaja layanna | (ල) |
U+0DC1 | Sinhala letter taaluja sayanna | (ශ) |
U+0DC2 | Sinhala letter muurdhaja sayanna | (ෂ) |
U+0DC3 | Sinhala letter dantaja sayanna | (ස) |
U+0DC4 | Sinhala letter hayanna | (හ) |
U+0DC5 | Sinhala letter muurdhaja layanna | (ළ) |
U+0DC6 | Sinhala letter fayanna | (ෆ) |
U+0DCA | Sinhala sign al-lakuna | (්) |
U+0DCF | Sinhala vowel sign aela-pilla | (ා) |
U+0DD0 | Sinhala vowel sign ketti aeda-pilla | (ැ) |
U+0DD1 | Sinhala vowel sign diga aeda-pilla | (ෑ) |
U+0DD2 | Sinhala vowel sign ketti is-pilla | (ි) |
U+0DD3 | Sinhala vowel sign diga is-pilla | (ී) |
U+0DD4 | Sinhala vowel sign ketti paa-pilla | (ු) |
U+0DD6 | Sinhala vowel sign diga paa-pilla | (ූ) |
U+0DD8 | Sinhala vowel sign gaetta-pilla | (ෘ) |
U+0DD9 | Sinhala vowel sign kombuva | (ෙ) |
U+0DDA | Sinhala vowel sign diga kombuva | (ේ) |
U+0DDB | Sinhala vowel sign kombu deka | (ෛ) |
U+0DDC | Sinhala vowel sign kombuva haa aela-pilla | (ො) |
U+0DDD | Sinhala vowel sign kombuva haa diga aela-pilla | (ෝ) |
U+0DDE | Sinhala vowel sign kombuva haa gayanukitta | (ෞ) |
U+0DF2 | Sinhala vowel sign diga gaetta-pilla | (ෲ) |
5.1.2. Permitted String Table for .இலங்கை domains
This document specifies the IDN (Internationalized Domain Names) Language Table used by the LK Domain Registry for the registration of Tamil language labels in the .lk and .இலங்கை domains. These are based on the recommendation of the ICTA IDN working group.
Latin | ||
U+002D | HYPHEN-MINUS | – |
U+0030..U+0039 | DIGIT ZERO – DIGIT NINE | 0-9 |
LATIN SMALL LETTER A – LATIN SMALL LETTER Z | A-Z
a-z |
|
Tamil | ||
U+0B83 | TAMIL SIGN VISARGA = aytham | ஃ |
U+0B85 | TAMIL LETTER A | அ |
U+0B86 | TAMIL LETTER AA | ஆ |
U+0B87 | TAMIL LETTER I | இ |
U+0B88 | TAMIL LETTER II | ஈ |
U+0B89 | TAMIL LETTER U | உ |
U+0B8A | TAMIL LETTER UU | ஊ |
U+0B8E | TAMIL LETTER E | எ |
U+0B8F | TAMIL LETTER EE | ஏ |
U+0B90 | TAMIL LETTER AI | ஐ |
U+0B92 | TAMIL LETTER O | ஒ |
U+0B93 | TAMIL LETTER OO | ஓ |
U+0B94 | TAMIL LETTER AU | ஔ |
U+0B95 | TAMIL LETTER KA | க |
U+0B99 | TAMIL LETTER NGA | ங |
U+0B9A | TAMIL LETTER CA | ச |
U+0B9C | TAMIL LETTER JA | ஜ |
U+0B9E | TAMIL LETTER NYA | ஞ |
U+0B9F | TAMIL LETTER TTA | ட |
U+0BA3 | TAMIL LETTER NNA | ண |
U+0BA4 | TAMIL LETTER TA | த |
U+0BA8 | TAMIL LETTER NA | ந |
U+0BA9 | TAMIL LETTER NNNA | ன |
U+0BAA | TAMIL LETTER PA | ப |
U+0BAE | TAMIL LETTER MA | ம |
U+0BAF | TAMIL LETTER YA | ய |
U+0BB0 | TAMIL LETTER RA | ர |
U+0BB1 | TAMIL LETTER RRA | ற |
U+0BB2 | TAMIL LETTER LA | ல |
U+0BB3 | TAMIL LETTER LLA | ள |
U+0BB4 | TAMIL LETTER LLLA | ழ |
U+0BB5 | TAMIL LETTER VA | வ |
U+0BB6 | TAMIL LETTER SHA | ஶ |
U+0BB7 | TAMIL LETTER SSA | ஷ |
U+0BB8 | TAMIL LETTER SA | ஸ |
U+0BB9 | TAMIL LETTER HA | ஹ |
U+0BBE | TAMIL VOWEL SIGN AA | ா |
U+0BBF | TAMIL VOWEL SIGN I | ி |
U+0BC0 | TAMIL VOWEL SIGN II | ீ |
U+0BC1 | TAMIL VOWEL SIGN U | ு |
U+0BC2 | TAMIL VOWEL SIGN UU | ூ |
U+0BC6 | TAMIL VOWEL SIGN E | ெ |
U+0BC7 | TAMIL VOWEL SIGN EE | ே |
U+0BC8 | TAMIL VOWEL SIGN AI | ை |
U+0BCA | TAMIL VOWEL SIGN O | ொ |
U+0BCB | TAMIL VOWEL SIGN OO | ோ |
U+0BCC | TAMIL VOWEL SIGN AU | ௌ |
U+0BCD | TAMIL SIGN VIRAMA | ் |
5.2. Appendix-B
5.2.1. IDN Label Rules for .ලංකා domains
- IDN rules for Indic scripts are based on strings rather than individual Unicode characters
- as Indic letters (akshara) are represented by strings of Unicode characters.
- we define the sets (consonants, vowels, modifiers, semi consonants, zwj etc.) to which we group the letters
SinhalaVowel = [
Sinhala_Letter_A = U+0D85 # (අ)
Sinhala_Letter_AA = U+0D86 # (ආ)
Sinhala_Letter_AE = U+0D87 # (ඇ)
Sinhala_Letter_AEE = U+0D88 # (ඈ)
Sinhala_Letter_I = U+0D89 # (ඉ)
Sinhala_Letter_II = U+0D8A # (ඊ)
Sinhala_Letter_U = U+0D8B# (උ)
Sinhala_Letter_UU = U+0D8C # (ඌ)
Sinhala_Letter_vR= U+0D8D # (ඍ)
Sinhala_Letter_vRR= U+0D8E # (ඎ)
Sinhala_Letter_E = U+0D91 # (එ)
Sinhala_Letter_EE = U+0D92 # (ඒ)
Sinhala_Letter_AI= U+0D93 # (ඓ)
Sinhala_Letter_O= U+0D94 # (ඔ)
Sinhala_Letter_OO = U+0D95 # (ඕ)
Sinhala_Letter_AU = U+0D96 # (ඖ)
]
SinhalaConsonant = [
Sinhala_Letter_KHA = U+0D9A # (ක)
Sinhala_Letter_GA= U+0D9B # (ඛ)
Sinhala_Letter_GHA = U+0D9C # (ග)
Sinhala_Letter_NGA = U+0D9D # (ඝ)
Sinhala_Letter_NGGA = U+0D9E # (ඞ)
Sinhala_Letter_CA = U+0D9F # (ඟ)
Sinhala_Letter_CHA = U+0DA0 # (ච)
Sinhala_Letter_JA= U+0DA1 # (ඡ)
Sinhala_Letter_JHA = U+0DA2 # (ජ)
Sinhala_Letter_NYA = U+0DA3 # (ඣ)
Sinhala_Letter_JNYA= U+0DA4 # (ඤ)
Sinhala_Letter_NYJA= U+0DA5 # (ඥ)
Sinhala_Letter_NYJA = U+0DA6 # (ඦ)
Sinhala_Letter_TTA = U+0DA7 # (ට)
Sinhala_Letter_TTHA= U+0DA8 # (ඨ)
Sinhala_Letter_DDA = U+0DA9 # (ඩ)
Sinhala_Letter_DDHA = U+0DAA # (ඪ)
Sinhala_Letter_NNA= U+0DAB # (ණ)
Sinhala_Letter_NNDDA = U+0DAC # (ඬ)
Sinhala_Letter_TA = U+0DAD # (ත)
Sinhala_Letter_THA = U+0DAE # (ථ)
Sinhala_Letter_DA = U+0DAF # (ද)
Sinhala_Letter_DHA = U+0DB0# (ධ)
Sinhala_Letter_NA= U+0DB1# (න)
Sinhala_Letter_NDA = U+0DB3# (ඳ)
Sinhala_Letter_PA= U+0DB4 # (ප)
Sinhala_Letter_PHA = U+0DB5 # (ඵ)
Sinhala_Letter_BA= U+0DB6 # (බ)
Sinhala_Letter_BHA = U+0DB7 # (භ)
Sinhala_Letter_MA= U+0DB8 # (ම)
Sinhala_Letter_MBA= U+0DB9# (ඹ)
Sinhala_Letter_YA = U+0DBA # (ය)
Sinhala_Letter_RA = U+0DBB # (ර)
Sinhala_Letter_LA = U+0DBD # (ල)
Sinhala_Letter_VA = U+0DC0 # (ව)
Sinhala_Letter_SHA = U+0DC1 # (ශ)
Sinhala_Letter_SSA= U+0DC2 # (ෂ)
Sinhala_Letter_SA= U+0DC3 # (ස)
Sinhala_Letter_HA = U+0DC4 # (හ)
Sinhala_Letter_LLA = U+0DC5 # (ළ)
Sinhala_Letter_FA= U+0DC6 # (ෆ)
]
SinhalaModifiers=[
Sinhala_Vowel_Sign_AA= U+0DCF # (ා)
Sinhala_Vowel_Sign_AE = U+0DD0# (ැ)
Sinhala_Vowel_Sign_AEE= U+0DD1# (ෑ)
Sinhala_Vowel_Sign_I= U+0DD2# (ි)
Sinhala_Vowel_Sign_II= U+0DD3# (ී)
Sinhala_Vowel_Sign_U= U+0DD4# (ු)
Sinhala_Vowel_Sign_UU= U+0DD6# (ූ)
Sinhala_Vowel_Sign_VR= U+0DD8# (ෘ)
Sinhala_Vowel_Sign_VRR= U+0DF2# (ෲ)
Sinhala_Vowel_Sign_E= U+0DD9# (ෙ)
Sinhala_Vowel_Sign_EE= U+0DDA # (ේ)
Sinhala_Vowel_Sign_AI= U+0DDB # (ෛ)
Sinhala_Vowel_Sign_VI= U+0DDF # (ෟ)
Sinhala_Vowel_Sign_O= U+0DDC # (ො)
Sinhala_Vowel_Sign_OO= U+0DDD # (ෝ)
Sinhala_Vowel_Sign_AU= U+0DDE # (ෞ)
Sinhala_Sign_ALLAKUNA= U+0DCA # (්)
]
SinhalaSemiConsonants=[
Sinhala_Sign_Anusvaraya= U+0D82 # (ං)
Sinhala_Sign_Visargaya= U+0D83 # (ඃ)
]
ZWJ= [
ZWJ =U+200D #(zwj)
]
English_Letters=[A-Z or a-z]
Digits=[0 to 9]
- Rules
# Rules have the following format:
# <sequence>:<result>
# Key:
# <sequence> is the sequence of characters starting from the current position in the label where each element is either a named character or a member of a character set defined above.
# <result> is either “fail” or “next”
# Logically, a label is processed by iterating through its character positions
# In each iteration, each rule is checked with the substring starting from the current character position.
# If the current substring matches then the result is applied as follows:
# fail: stop, the label is invalid
# next: move to the next character position
# If the processing reaches the end of the string, then the label is valid.
#
# Variants:
# A variant is defined by a rule of the form
# <sequence1> | <sequence2> : variant
# If the current substring matches either <sequence1> or <sequence2>, then note that
# the label contains a variant, and then move to the next character position.
# Rule can be defined as follows.
1. First letter can be a vowel a consonant a digit or a English letter
EX: Sinhala_Letter_A(0D85). . . . Sinhala_Letter_AU(0D96)
Sinhala_Letter_KHA(0D9A) … Sinhala_Letter_FA(0DC6)
English_Letter (A – Z or a – z)
Digits (0 to 9)
2. A vowel can follow another vowel, consonant, a semi consonant, English letter or a digit
Ex:
Sinhala_Letter_A Sinhala_Letter_AA (අආ)
Sinhala_Letter_I Sinhala_Letter_RA (ඉර)
Sinhala_Letter_A Sinhala_Sign_Anusvaraya(අං),
Sinhala_Letter_A Sinhala_Sign_Visargaya(අඃ)
Sinhala_Letter_A English_Letter_C (අc)
Sinhala_Letter_A 1 (අ1)
3. A consonant can follow another consonant, modifier, vowel, al-lakuna a semi consonant, digit or an English letter
Ex:
Sinhala_Letter_GHA Sinhala_Letter_MA(ග ම)
Sinhala_Letter_GHA Sinhala_Vowel_Sign_AA Sinhala_Letter_LA Sinhala_Sign_ALLAKUNA Sinhala_Letter_LA (ගාල්ල)
Sinhala_Letter_KHA Sinhala_Sign_ALLAKUNA Sinhala_Letter_LA Sinhala_Vowel_Sign_I Sinhala_Letter_FA Sinhala_Letter_DDA Sinhala_Sign_ALLAKUNA (ක්ලිෆඩ්)
Sinhala_Letter_NA Sinhala_Sign_Anusvaraya Sinhala_Letter_GHA Sinhala_Vowel_Sign_II (නංගී)
Sinhala_Letter_GHA 1 (ග1)
Sinhala_Letter_GHA m(ගm)
4. A digit/ a English letter can follow a vowel, a consonant or digit/a English letter
Ex:
English_Letter_A Sinhala_Letter_I Sinhala_Letter_RA (Aඉර)
English_Letter_B Sinhala_Letter_GHA Sinhala_Letter_MA(Bග ම)
English_Letter_A English_Letter_B (AB)
5. A semi consonant can follow a vowel, a consonant, digit/or an English letter
Ex:
Sinhala_Letter_A Sinhala_Sign_Visargaya Sinhala_Letter_RA(අඃර)
Sinhala_Letter_KHA Sinhala_Sign_Anusvaraya Sinhala_Letter_vR(කංඍ)
Sinhala_Letter_KHA Sinhala_Sign_Anusvaraya 1 (කං1)
Sinhala_Letter_KHA Sinhala_Sign_Anusvaraya English_Letter_a (කංa)
6. A modifier can follow a semi consonant, vowel, consonant, digit or an English letter
Ex:
Sinhala_Letter_KHA Sinhala_Vowel_Sign_II Sinhala_Sign_Anusvaraya (කීං)
Sinhala_Letter_NA Sinhala_Vowel_Sign_AA Sinhala_Letter_U Sinhala_Letter_LA (නාඋල)
Sinhala_Letter_NA Sinhala_Vowel_Sign_AA Sinhala_Letter_GHA Sinhala_Letter_SA (නාගස)
Sinhala_Letter_NA 1(නා1)
7. Sinhala_Sign_ALLAKUNA can follow vowel, consonant, zwj,digit or an English letter
Ex:
Sinhala_Letter_GHA Sinhala_Letter_LA Sinhala_Sign_ALLAKUNA Sinhala_Letter_A Sinhala_Letter_MA Sinhala_Vowel_Sign_U Sinhala_Letter_NNA(ගල්අමුණ)
Sinhala_Letter_A Sinhala_Letter_TA Sinhala_Sign_ALLAKUNA Sinhala_Letter_LA(අත්ල)
Sinhala_Letter_KHA Sinhala_Sign_ALLAKUNA 200D Sinhala_Letter_RA(ක + ් + zwj + ර) = ක්ර
Sinhala_Letter_BA Sinhala_Letter_SA Sinhala_Sign_ALLAKUNA 1 (බස්1)
8. After a zwj Sinhala_Letter_YA, Sinhala_Letter_RA can be followed.
Ex:
ක්ර = Sinhala_Letter_KHA Sinhala_Sign_ALLAKUNA zwj(200D) Sinhala_Letter_RA (ක + ් + zwj + ර)
ක්ය = Sinhala_Letter_KHA Sinhala_Sign_ALLAKUNA zwj(200D) Sinhala_Letter_YA (ක + J + zwj + ය)
5.2.2. IDN Label Rules for .இலங்கைdomains
- IDN rules for Indic scripts are based on strings rather than individual Unicode characters
- as Indic letters (akshara) are represented by strings of Unicode characters.
- we define the sets (consonants, vowels, vowel signs, etc.) to which we group the letters
TamilVowel = [
Tamil_Letter_A
Tamil_Letter_AA
Tamil_Letter_I
Tamil_Letter_II
Tamil_Letter_U
Tamil_Letter_UU
Tamil_Letter_E
Tamil_Letter_EE
Tamil_Letter_AI
Tamil_Letter_O
Tamil_Letter_OO
Tamil_Letter_AU
]
TamilConsonant = [
Tamil_Letter_KA
Tamil_Letter_NGA
Tamil_Letter_CA
Tamil_Letter_JA
Tamil_Letter_NYA
Tamil_Letter_TTA
Tamil_Letter_NNA
Tamil_Letter_TA
Tamil_Letter_NA
Tamil_Letter_NNNA
Tamil_Letter_PA
Tamil_Letter_MA
Tamil_Letter_YA
Tamil_Letter_RA
Tamil_Letter_RRA
Tamil_Letter_LA
Tamil_Letter_LLA
Tamil_Letter_LLLA
Tamil_Letter_VA
Tamil_Letter_SHA
Tamil_Letter_SSA
Tamil_Letter_SA
Tamil_Letter_HA
]
TamilVowelSign = [
Tamil_Vowel_Sign_AA
Tamil_Vowel_Sign_I
Tamil_Vowel_Sign_II
Tamil_Vowel_Sign_U
Tamil_Vowel_Sign_UU
Tamil_Vowel_Sign_E
Tamil_Vowel_Sign_EE
Tamil_Vowel_Sign_AI
Tamil_Vowel_Sign_O
Tamil_Vowel_Sign_OO
Tamil_Vowel_Sign_AU
]
TAMIL SIGN VISARGA – Aytham
ASCIIDigit = [0-9]
- Rules
# Rules have the following format:
# <sequence> : <result>
# Key:
# <sequence> is the sequence of characters starting from the current position in the label
# where each element is either a named character or a member of a character set defined above.
# <result> is either “fail” or “next”
# Logically, a label is processed by iterating through its character positions
# In each iteration, each rule is checked with the substring starting from the current character position.
# If the current substring matches then the result is applied as follows:
# fail: stop, the label is invalid
# next: move to the next character position after the end of the matched string
# If the processing reaches the end of the string, then the label is valid.
# Variants:
# A variant is defined by a rule of the form
# <sequence1> | <sequence2> : variant
# If the current substring matches either <sequence1> or <sequence2>, then note that
# the label contains a variant, and then move to the next character position.
# we now define each of the special cases, and finally the general rules.
# allow ik + ssa as either a single glyph (க்ஷ) or separate glyphs (க்ஷ)
# these are variants of each other
# NOTE GD-20100416: we could also allow just one form, and make the other invalid
# This is the only place where ZWNJ is valid
Tamil_Letter_KA Tamil_Sign_Pulli Tamil_Letter_SSA | Tamil_Letter_KA Tamil_Sign_Pulli ZWNJ Tamil_Letter_SSA : variant
# the ZWNJ is not valid anywhere else except in the sequence2 above
ZWNJ : fail
# disallow old form of Shri (ஸ+்+ர+ீ)
Tamil_Letter_SA Tamil_Sign_Pulli Tamil_Letter_RA Tamil_Vowel_Sign_II
: fail
# Note: the valid representation of Shri is
# Tamil_Letter_SHA Tamil_Sign_Pulli Tamil_Letter_RA Tamil_Vowel_Sign_II (ஶ+்+ர+ீ)
# we don’t need a special rule for this
# disallow a LLA after a consonant with a Kombu (e.g. கெ ள) unless it is modified by a vowel sign or Pulli
# to avoid confusion with TamilConsonant+Vowel Sign AU
# It is presumed that this sequence will never occur in a valid word
# the kombu should be preceeded by a consonant
TamilConsonant Tamil_VowelSign_E Tamil_Letter_LLA TamilVowelSign : next
TamilConsonant Tamil_Vowel_Sign_E Tamil_Letter_LLA Tamil_Sign_Pulli : next
TamilConsonant Tamil_Vowel_Sign_E Tamil_Letter_LLA : fail
# disallow a LLA after Letter O (ஒ ள) unless it is modified by a vowel sign or Pulli
# to avoid confusion with Letter AU (ஔ)
# again, we assume that this sequence will never occur in a valid word
Tamil_Letter_O Tamil_Letter_LLA TamilVowelSign : next
Tamil_Letter_O Tamil_Letter_LLA Tamil_Sign_Pulli : next
Tamil_Letter_O Tamil_Letter_LLA : fail
# General Rules
# a vowel sign or a pulli (virama) can only follow a consonant and is not valid elsewhere
TamilConsonant TamilVowelSign : next
TamilConsonant Tamil_Sign_Pulli : next
TamilVowelSign : fail
Tamil_Sign_Pulli : fail
# allow consonants, vowels, Aytham, European numerals anywhere (unless disallowed by previous rules)
TamilConsonant : next
TamilVowel : next
Tamil_Sign_Aytham : next
ASCIIDigit : next
Hyphen-Minus : next
# IDN rules, which are not implemented in this table, restrict the placement of hyphen-minus
# anything else is invalid
: fail