Telugu As A Computational Language-Unicode – Telugu

Unicode – Telugu

-Dr Geeta Madhavi Kala

In the previous chapter we talked about the preliminary stage of Telugu typing! Before learning about keyboards, let’s look at Unicode.

Though Telugu script could be typed on a computer as a Typewriter, if we think of what is Unicode and why is it required, for the files in one place to be read elsewhere and to convert a typed file from Telugu script into PDF, required readability support.

In 1991, Adobe introduced the PDF to the world of converting files from paper to digital. In PDF, every little detail on the paper can be turned as an image. However, in the early days, typed Telugu script in a place, was neither possible to print from anywhere else nor open the file. All we could see was gibberish. The reason for this was that the standardized code for each character in those days was not a code that works everywhere, so as to make it possible to pass data from one computer to another.

Here is an example of how non-Unicode characters look like:

“At the very beginning Telugu websites used page maker DTP software to save a page as an image or PDF in order to publish on a web page. Anu fonts and SriLipi fonts, which were popular in the DTP field, could serve the purposes. The first problem with this approach was the greatly increased file sizes resulted greatly delayed download time for the pages. And changes to what was published as an image were not possible to add. Moreover in the DTP software, the complex keyboard, made it harder to type.

The public sector company, C-Doc, has developed a package called i-Leap for fonts that work in all Indian languages. But unfortunately the expensive i-Leap software never became popular.

Later on Telugu newspapers developed dynamic fonts by themselves. While this may have solved some of the problems, since not everyone had been able to use the dynamic fonts, and the tough keyboard resulted the digital technology development limited the Internet Magazines.

Unicode technology is popular now. The great advantage of this is that you can type it in any language, send it to other computers, and create web pages. The great movement in this field has begun with Microsoft Windows 2000 fully supporting Unicode.” (Konatam Dileep, Telangana Magazine)

In short, Unicode is the Standardized code that works everywhere. Unicode is called as “Sarva Sanketa Paddhati” or “Ekarupa Sanketa Paddhati” in Telugu.

“Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before the Unicode standard was developed, there were many different systems, called character encodings, for assigning these numbers. These earlier character encodings were limited and did not cover characters for all the world’s languages. Even for a single language like English, no single encoding covered all the letters, punctuation, and technical symbols in common use. Pictographic languages were a challenge to support with these earlier encoding standards.

Early character encodings also conflicted with one another. That is, two encodings could use the same number for two different characters, or use different numbers for the same character. Any given computer might have to support many different encodings. However, when data is passed between computers and different encodings it increased the risk of data corruption or errors. Character encodings existed for a handful of “large” languages. But many languages lacked character support altogether.” (unicode.org)

Unicode is accepted worldwide as the standard for language communication. Government of India collaborated with Unicode Consortium to make required changes for ISII-88 for Indian languages. Central Information technology has become a lifetime member in Unicode Consortium. (Vikaspedia)

“The Unicode standard was developed to address the issue of language encoding system. The standard was created on an encoding foundation large enough to support the writing systems used by all the world’s languages. Over the years the Unicode standard encoding has been steadily expanded and now includes many languages. Beyond simply providing a standardized system of character codes, the Unicode Consortium has expanded the scope of its efforts to include standard “locale” data and code libraries that assist programmers to develop.

The variety of languages found on the Web today is thanks to the character support provided by Unicode, which enables computers to support virtually every language in use in the world today, and for users and programmers to develop content in their own native language. Unicode standard provides a unique code for every character, in every language, in every program, on every platform. Any one can become a member in Unicode Consortium with a membership.” (unicode.org)

“Unicode’s standard for converting letters, symbols and numbers into characters is universally accepted. The Unicode standard is used to process text. Unicode standards provide the ability to encode the letters, symbols and digits of all written languages in the world. Unicode standards are very useful for linguists, researchers, scientists and technicians, not just those who use multilingual texts.

Unicode uses 16 sections of code creation process. Unicode provides a node point for symbols and digits for over 65,000 characters. Unicode Standard and ISO-10646 together offer an additional system called UTF-16. It provides character encoding for nearly a million characters, symbols, and digits.” (Vikaspedia)

“Character encoding” refers to specifications that can be converted to bytes from characters.

In a sense, Unicode is one of the greatest milestones in the computer history of languages.

“Very few first generation Telugu sites used Telugu script. There was a lack of standard way to display Telugu script on web pages, and the sites that display Telugu script in the image form were soon shut down with the difficulty of navigation. Unicode has been developed since 1991, but the Unicode standard was not available for Telugu until 2003 due to the lack of a supported operating system. But in 2003, the ability to show the Telugu script on a computer screen and the core system to type and write was incorporated by Microsoft XP operating system. Unicode has opened new gateways for writing emails in Telugu, building websites using Telugu script, Telugu forums, and searching Telugu dictionaries on the Internet. In addition to Unicode technology, Web 2.0 interactive features are becoming more accessible, leading to new projects such as Telugu blogs, new generation Telugu magazines and Telugu Wikipedia.” (Suresh Kolichala, eemata)

“The use of Telugu language in computers began in 1993 due to the inevitable technological advances in the printing industry. Desktop publishing has grown rapidly with the advent of DTP applications, the creation of Telugu fonts for use in them, and the advent of sophisticated equipment such as laser printers for printing pages. When the Internet debuted, there were efforts to introduce regional languages on the Internet. However, inserting fonts into web pages is a tedious affair, and even the reader needs technical knowledge. The introduction of Unicode fonts (Eka Sanketa Khatulu (or) Viswa rupa Khatulu) has greatly altered the use of the Internet language. All the world languages with written system has been able to preserve their existence in the Internet provided by Unicode. Now we have developed better unicode fonts in Telugu language. Beautiful characters are designed in a variety of sizes, many of which are freely available. The following Unicode block is given to Telugu language characters: 0C00-0C7F (3072-3199). ”(unicode.org)

Unicode specifies the code associated with the script, the Font is to make the script appear. Let’s learn about fonts in the next chapter.

*****

Please follow and like us:

Telugu As A Computational Language-Unicode – Telugu

Unicode – Telugu

-Dr Geeta Madhavi Kala

Leave a Reply Cancel reply

Menu