2011年9月9日 星期五

Languages of Mandarin Chinese, transcription systems and character sets

Many dialects are spoken in China. Mandarin is a category of related Chinese dialects spoken in most parts of North, Central and Western China. However, Mandarin, as it is known in the world, refer to standard Mandarin (or modern standard Chinese) based on the dialect of Mandarin spoken in Beijing. Standard Mandarin is the official language known as Putonghua in China. Standard Mandarin is also one of the five official languages of the United Nations and is used in many international organizations. Phonologic descriptions show the structural model of a Mandarin syllable is an optional initial consonant followed by the vowel, and then optionally followed by an alveolar nasal ending or velar. Another component of the Mandarin syllable is the tone which primarily specifies the tone of the syllable pattern. Technically, a syllable occurs in terms of its initial, final and tone. Mandarin Chinese is a tonal language because tones, just like consonants and vowels, are used to differentiate words each other.

Chinese linguists have proposed various systems of transcription of Mandarin. But the most popular are Hanyu Pinyin. Hanyu Pinyin was accepted as the official transcription system of the Chinese language in 1958 by the Chinese Government. The transcription system used in the entry of Chinese characters in the computer systems.

Currently, there are two sets of Chinese characters used by the users of the Chinese language, namely, traditional Chinese characters and simplified Chinese characters. Traditional Chinese characters have been used since the century v. This character set is still currently used in some Chinese communities overseas. Simplified Chinese characters originate in the simplification of formal, during the 1950s and 1960s. Now, this simplified Chinese character set is the system of writing in China and is accepted by the United Nations. Computer systems, use different codes for these two sets. The code of guobiao, are silenced (GB) is a national standard character encoding in China. Refers to the set of GB 2312-80 issued in 1981, the GB 18030-2000 set issued in the year 2000. There are 6.763 Chinese characters in GB 80 3212 established code.

Chinese Mandarin is known as monosyllabic because most of the words are one syllable in length. This is true for classical Chinese, but it is no longer true for modern Chinese. A large number of polysyllabic words used today in the Chinese newspaper. One syllable when it pronounced with different tones corresponds to different characters. A Word is written in polysyllabic form with two or more characters. Already Chinese texts without spacing between words, an additional effort is required to segment a phrase in parts of the word. Due to these characteristics, the design of the corpus of Chinese language needs additional considerations. Most of the developed Chinese language processing systems recently is standard Mandarin. Few of them meet other dialects such as Cantonese, Min nan, Hakka, Wu, etc..

Subscribe to post comments feeds or leave a trackback

View the original article here

沒有留言:

張貼留言