What are the two symbols used in the Google Translate icon????
I've got the "A"... said the ignorant American.
Another one is a Chinese character, 文.
In Chinese, 文 means text or article, and its pronunciation is similar like when in English.
Related
I'm not really sure if this is the right site for asking this, but I've read so many papers about the handwriting recognition problem applied to different language scripts or languages.
However, I still don't have a clear understanding about recognizing diacritics (á, à, ä or ñ, ć, č). Are they recognized as full new characters?
Are the diacritics separated from the alphabet characters so that we can do character+diacritics combinations?
Thanks in advance!
The first problem is The original language is english, "CORN STARCH PROCESSING LINE" is translated into Russian with a neural machine. The result is "ЛИНИЯ ОБРАБОТКИ КУХНЯ", which is larger difference from the original language. And the second problem is the original language is english, "SULLAIR COALESCING FILTER 02250153-324", translated into Arabic with the neural machine, the result is "SULLAIR COALESCING FILTER 02250153-324", not Arabic.how can I solve this problem?
Regarding the Russian translation, the Cloud Translation API is giving a result as much accurate as it is possible. Those results are being constantly improved and updated.
For the Arabic translation part, there seems to be an issue with using the - symbol in the numbers. Because if you remove it or use any other symbol then the words will be translated to Arabic as expected.
I have created an issue tracker for that, you can follow this link to get updates on the fix. Keep in mind that there is no ETA on when the fix will be ready, so as a work around for now, just replace the - symbol with _ symbol and the words will be translated to Arabic.
I've been playing around with Google's OCR recently using the default tutorial and was trying to parse numbers. I've seen previous issues dealing with numbers on license plates, but was wondering if there was a solution when special characters affect the results of OCR. Most notably, including the '#' character with a number, such as #1, #2, etc as shown below results in the output ##Z#T# and even occasionally gives me Chinese characters even after I set the language to/from settings to English.
Numbers with pound sign
For a similar comparison, the image below is easily read by the OCR:
Numbers without pound sign
Is there a setting that I'm missing that can improve the results or is this just a constraint by the model?
I'm supposed to add some modifications to a PHP web site which uses a font with Arabic style numbers.
I'm asked to convert the numbers style (language) to the English style (language) using the same font, is that achievable ?
Arabic(red) & English (green) numbering:
In principle, it is possible to create a font that has alternate glyphs for Arabic digits, selectable with OpenType font features and looking like common (European) digits. However, I do not know any such font, and such an approach would be odd on several accounts. The Arabic digits have been encoded as separate characters, and treating the difference between them and common digits as merely a glyph difference would deviate from normal reasonable practices.
Thus, the change, if desired, should be made at the character level. The details depend on the context, but the principle is simple: common digits are U+0030...U+0039 and Arabic digits are U+0660...U+0669, both in numeric order, so at the character code level it is simply a matter of adding or subtracting a constant.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I need to convert Chinese characters into pinyin and need an official document for that conversion.
There are some libraries around as mentioned by previous posts such as Convert chinese characters to hanyu pinyin .
However, I need an "official standard" more than an "available library". Where could I find such a document? Is there any standard / document / book released by China government for how shall Chinese characters be pronounced/marked by pinyin?
Appreciate your kind help.
Taiwan Ministry of Education has a site listing all the variants of the Chinese character. http://dict.variants.moe.edu.tw/eng.htm
In it, they also specified the pronunciation of the characters. However, the pronunciation used is Zhuyin (popular in Taiwan) and not Hanyu Pinyin (popular in Mainland China).
You could use the list on Wikipedia to map Zhuyin to Hanyu Pinyin http://zh.wikipedia.org/wiki/%E4%B8%AD%E6%96%87%E6%8B%BC%E9%9F%B3%E5%B0%8D%E7%85%A7%E8%A1%A8
For example, the character 井 http://dict.variants.moe.edu.tw/yitia/fra/fra00052.htm has the Zhuyin of ㄐ|ㄥˇ, which you then look up ㄐ|ㄥ = jing. Then combine with the tone and you get jǐng.
I don't know of any official standard in Mainland China or in any other Chinese speaking countries.
There is no unique way to convert a Chinese character to pinyin, since there is not necessary a unique way to pronounce characters; and pinyin a system to transcribe Chinese characters into Latin script from which one can derive how to pronounce the character. It all depends on the context in which the character is used.
Some examples:
The verb 数 meaning "to count" has pinyin shǔ, while the noun 数 meaning "number" has pinyin shù.
长 with meaning "long" is written as cháng, with meaning "chief" however it is written as zhǎng
The pinyin for 好 with meaning "good" is hǎo while the 好 in 爱好 has pinyin hào.
行 with meaning "to walk" has pinyin xíng, while the measurement word meaning for a row of something has pinyin háng.
Chinese is full with such examples. Sometimes only the tones differ (see the 好 example) and something the pronunciation is completely different (the 行 example).
Next to having characters with multiple pronunciations (depending on the context), tones also change when characters are used together with other characters. For example the pinyin for 不 is normally bù, but becomes bú when the character following 不 has a forth tone.
answer my own questions just to add my 2 cents, in case others might bump into this topic.
In mainland China, there is a dictionary 新华字典 (http://en.wikipedia.org/wiki/Xinhua_Zidian) that is quite authoritative. Although it's not endorsed by China government, it's published more than 400 million copies, widely used as reference book for primary school and middle school students & teachers.
unfortunately there's no online websites for this dictionary, though some scanned version are available.
For mainland China, pinyin orthography follows the 《中文拼音正词法基本规则》 (Chinese Pinyin Orthography Basic Rules) published in 1996. This is the national standard, which has to be used in all official publications (although you will see wrong Pinyin use everywhere in China). You can find the full text (including English translation) here: http://www.pinyin.info/rules/pinyinrules_simp.html
For the correct transcription of characters, I agree that Xinhua Zidian is a quasi authority. You can find some online versions, in fact (like http://xh.5156edu.com/), but I don't know if they are reliable.