Tamil All Character Encoding (TACE16) is a 16-bit Unicode-based character encoding scheme for the Tamil language.[1][2] This encoding isn't used on the web. Other encodings, such as Unicode, i.e. UTF-8, have been used on the web.
Keyboard drivers and fonts
The keyboard driver for this encoding scheme is freely available on the Tamil Virtual University website.[3][4] It uses Tamil99 and Tamil Typewriter keyboard layouts, which are approved by the Tamil Nadu Government, and maps the input keystrokes to their corresponding characters in the TACE16 scheme.[2] The corresponding Unicode Tamil fonts for this encoding scheme are also available in the same website.[4][3] These fonts are also for the present Unicode encoding for both ASCII and Tamil characters, which provides backward compatibility for the present Unicode encoding scheme for Tamil.
Character set
All characters of this encoding scheme are located in the private use area of the Basic Multilingual Plane of Unicode's Universal Character Set.
Consonants→ Vowels ↓ |
E10 | E18 | E1A | E1F | E20 | E21 | E22 | E23 | E24 | E25 | E26 | E27 | E28 | E29 | E2A | E2B | E2C | E2D | E2E | E2F | E30 | E31 | E32 | E33 | E34 | E35 | E36 | E37 | E38 | E39 | E3A | E3B | E3C | E3D | E3E | E3F |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ௳ | ௦ | அரைக்கால் | ் | க் | ங் | ச் | ஞ் | ட் | ண் | த் | ந் | ப் | ம் | ய் | ர் | ல் | வ் | ழ் | ள் | ற் | ன் | ||||||||||||||
1 | ௴ | ௧ | கால் | அ | க | ங | ச | ஞ | ட | ண | த | ந | ப | ம | ய | ர | ல | வ | ழ | ள | ற | ன | ||||||||||||||
2 | ௵ | ௨ | அரை | ா | ஆ | கா | ஙா | சா | ஞா | டா | ணா | தா | நா | பா | மா | யா | ரா | லா | வா | ழா | ளா | றா | னா | |||||||||||||
3 | ௶ | ௩ | முக்கால் | ி | இ | கி | ஙி | சி | ஞி | டி | ணி | தி | நி | பி | மி | யி | ரி | லி | வி | ழி | ளி | றி | னி | |||||||||||||
4 | ௷ | ௪ | அரைவீசம் | ீ | ஈ | கீ | ஙீ | சீ | ஞீ | டீ | ணீ | தீ | நீ | பீ | மீ | யீ | ரீ | லீ | வீ | ழீ | ளீ | றீ | னீ | |||||||||||||
5 | ௸ | ௫ | வீசம் | ு | உ | கு | ஙு | சு | ஞு | டு | ணு | து | நு | பு | மு | யு | ரு | லு | வு | ழு | ளு | று | னு | |||||||||||||
6 | ௹ | ௬ | மூவீசம் | ூ | ஊ | கூ | ஙூ | சூ | ஞூ | டூ | ணூ | தூ | நூ | பூ | மூ | யூ | ரூ | லூ | வூ | ழூ | ளூ | றூ | னூ | |||||||||||||
7 | ௺ | ௭ | அரைமா | ெ | எ | கெ | ஙெ | செ | ஞெ | டெ | ணெ | தெ | நெ | பெ | மெ | யெ | ரெ | லெ | வெ | ழெ | ளெ | றெ | னெ | |||||||||||||
8 | பௌர்ணமி | ௮ | ஒருமா | ே | ஏ | கே | ஙே | சே | ஞே | டே | ணே | தே | நே | பே | மே | யே | ரே | லே | வே | ழே | ளே | றே | னே | |||||||||||||
9 | அமாவாசை | ௯ | இரண்டுமா | ை | ஐ | கை | ஙை | சை | ஞை | டை | ணை | தை | நை | பை | மை | யை | ரை | லை | வை | ழை | ளை | றை | னை | |||||||||||||
A | கார்த்திகை | ௰ | மும்மா | ொ | ஒ | கொ | ஙொ | சொ | ஞொ | டொ | ணொ | தொ | நொ | பொ | மொ | யொ | ரொ | லொ | வொ | ழொ | ளொ | றொ | னொ | |||||||||||||
B | ராஜ | ௱ | நாலுமா | ோ | ஓ | கோ | ஙோ | சோ | ஞோ | டோ | ணோ | தோ | நோ | போ | மோ | யோ | ரோ | லோ | வோ | ழோ | ளோ | றோ | னோ | |||||||||||||
C | ௐ | ௲ | முந்திரி | ௌ | ஔ | கௌ | ஙௌ | சௌ | ஞௌ | டௌ | ணௌ | தௌ | நௌ | பௌ | மௌ | யௌ | ரௌ | லௌ | வௌ | ழௌ | ளௌ | றௌ | னௌ | |||||||||||||
D | அரைக்காணி | ஃ | ||||||||||||||||||||||||||||||||||
E | காணி | |||||||||||||||||||||||||||||||||||
F | முக்காணி |
Note: | |
---|---|
Newly added. Not present in Unicode_v6.3. | |
Allocated for researches(NLP) | |
For future use |
Comparison
ACE16 over the present Unicode standard for the Tamil language:[1]
- Unicode code Tamil has code positions for 31 out of 247 Tamil Characters. These 31 characters include 12 vowels, 18 agara-uyirmey, and one aytham, not including five Grantha agara-uyirmey, which are also provided code space in Unicode Tamil. The Uyir-meys that are left out in the present Unicode Tamil are the ka, kA, ki, kI, etc., characters of Tamil.
- It uses multiple code points to render a single character.
- It requires ZWJ or ZWNJ type hidden characters.
- A sequence of characters may correspond to a single glyph, that is, ச + ெ◌ + ◌ா = ெசா. According to Unicode, ெசா is a grapheme, which is false.
- The Unicode Tamil standard includes the vowel signs as combining characters. These signs would be displayed as is by engines that detect a blank space between them and a base character. Unicode introduces the dotted circle as a Tamil character.
There was a proposal to re-encode Tamil.[5] This was rejected by Unicode, who said that the reencoding would be "damaging." These encoding methods follow the Tamil grammar that consonant+vowel=vowel-consonant (UyirMei).
Method 1 (By simple arithmetic operations): க் + இ = கி E210 (க்) + E203 (இ) – E200 (Constant) = E213 (கி) Method 2: க் (E210) + இ (E203) = கி (E213) E210 (க்) | (E203 (இ) & 000F (Constant)) = E213 (கி)
- To divide a vowel-consonant (UyirMei) character into its corresponding vowel and consonant.
/* To get Vowel */ E213 (கி) & 'F20F (Constant)' = E203 (இ) /* To get Consonant */ E213 (கி) & 'FFF0 (Constant)' = E210 (க்)
- To find whether a character is vowel or consonant or vowel-consonant (UyirMei) or numbers.
/* | - Bitwise OR * & - Bitwise AND * ! - Bitwise NOT * ^ - Bitwise XOR * ||- Conditional OR * &&- Conditional AND */ c = the TACE16 encoding for a Tamil character /* To check whether a character is vowel */ /* Method 1 */ ((c >= E201) && (c <= E20C)) == true // => Vowel /* Method 2 - If code positions E200, E20E, E20F are not used for any other purpose*/ (((c & 'E20F (Constant)')==c) && (c != E20D)) == true // => Vowel ((!((c & 'E20F (Constant)')^c)) && (c != E20D)) == true // => Vowel /* To check whether a character is consonant or Vowel-consonant (UyirMei) */ x = (c & '000F (Constant)') // If c is Vowel or Vowel-Consonant, then x = Unique number for each vowel starting from 1 (((c >= E210) && (c <= E38C)) && (x == 0)) == true // => Consonant (((c >= E210) && (c <= E38C)) && ((x >= 1) && (x <= 12))) == true // => Vowel-Consonant(UyirMei) /* To check whether a character is Tamil number */ /* Method 1 */ ((c >= E180) && (c <= E18C)) == true // => Tamil Number /* Method 2*/ //If code positions E18D-E18F are not used for any other purpose (c & 'E18F (Constant)') == c // => Tamil Number (!((c & 'E18F (Constant)')^c)) == true // => Tamil Number //If code positions E18D-E18F are used for any other purpose, then either Method 1 or below method can be used*/ ((!((c & 'E18F (Constant)')^c)) && ((c & '000F (Constant)') <= 12)) == true // => Tamil Number
- To convert numbers to Tamil numbers and vice versa.
/* To convert a number to new format of Tamil number and vice versa, direct digit to digit conversion is enough. */ /* To convert a number to new format of Tamil number */ n = single digit number (0-9) /* Method 1 */ (n & 'E18F (Constant)') // => Tamil Number /* Method 2 */ (n | 'E180 (Constant)') // => Tamil Number /* To convert new format of Tamil number to a number */ c = single digit Tamil number character(௦-௯) (c & '000F (Constant)') // => Number
Alternative claims
Open-Tamil
The Open-Tamil project[6] provides many of the common operations, e.g. to extract letters from Unicode UTF-8 encoded string, sorting, searching, etc. Even though the project claims Level-1 compliance of Tamil text processing without using TACE16, the project is written on top of extra programming logic which is needed for present Unicode Standard for Tamil.
#!/usr/bin/env python
import codecs
import tamil.utf8 as utf8
with codecs.open('singl', 'w', encoding='utf-8') as ff:
letters = utf8.get_letters(u"கூவிளம் என்பது என்ன சீர்")
for letter in letters:
ff.write(letter)
print(letter)
ff.write(' ')
ff.close()
generates the output, "output: கூ வி ள ம் எ ன் ப து எ ன் ன சீ ர்"
See also
- TSCII (Tamil Script Code for Information Interchange)
- AnyTaFont2UTF8 – An Open source project for all Tamil Encoding/Font Mapping characters.
References
- 1 2 Report on the final recommendations of the task force on TACE16
- 1 2 Tamil Nadu Government's Tender Document for development of Tamil fonts and Tamil keyboard driver for 16-bit encodings (Unicode and TACE16)
- 1 2 Tamil Nadu Government's Order(G.O.), Keyboard Drivers and Fonts
- 1 2 "தமிழ் எழுத்துருக்கள் | தமிழ் இணையக் கல்விக்கழகம் Tamil Virtual Academy".
- ↑ https://www.unicode.org/L2/L2012/12033-tamil-presentation.pdf
- ↑ https://pypi.org/project/Open-Tamil/ open-tamil project