I have text in 5 Languages: English, German, Spanish, Italian and French. I have to translate text from other languages to english. The Language auto-detect function detects the language, but it costs as 1 call to the translator. Since i'm using the free tier, this isn't feasible. Is there any workaround?
If you do not specify 'from' in your Translate request, Detect will be run automatically before the translation starts.
Related
I am using microsoft translation api for language translation from source Language to English, it is working fine. But i want to have the translation accuracy. any help will be appreciated.
I have gone through the below article.
https://learn.microsoft.com/en-us/azure/cognitive-services/translator/translator-info-overview
Translation api should return translation accuracy or score.
Microsoft Translator evaluates the quality with the BLEU (bilingual evaluation understudy) standards and our own benchmarks (both automatic and human evaluations).
Depending on multiple variables, such as length and type of text translated, language pairs (source and target), industry lingo, or the domain in which Translator is used, results will vary greatly for any vendor offering a machine translation solution.
I have recently started using Microsoft Cognitive Services for translations, and the Translator does support the "pt" language. I ran a few tests with 'reception, train, bus, cream, brown' and got the Brazilian Portuguese translations. I would have expected to get the Portuguese ones with this language code and the need to use pt-br for the Brazilian version.
As what I need the most is the Brazilian language, this is not an issue, but I would like to make sure the behavior won't change further down the line, or that my test was indeed good enough.
Is there anyway to confirm that the "pt" language code used as a destination language is indeed and will always result in Brazilian Portuguese translations ?
Thank you!
The following is a response from a Cognitive Service Engineer:
Our Translation engines are not locale specific. So we have one for EN (US, AUS, UK, etc.) , one for FR, one for ES, and one for PT.
That being said, most of our training data for PT is Brazilian Portuguese so most likely all translations will be PT-BR more than PT-PT.
Finally, for speech translation (the API and the product it supports: Microsoft Translator live , Skype Translator), our ASR was only trained on BR speech data.
In German there is a formal ("Sie") and an informal ("Du") form. I would like to translate some software into informal German, notably woocommerce using transiflex, but new languages can only be added using their locale code.
There only seems to be de_DE as a locale. What's best to differentiate between the two forms, shouldn't there be another locale code just for the informal form, too?
Generally Gettext uses # to distinguish language variants. So in this case it could be de_DE#informal. This way the locale will be correctly handled by Gettext with fallback to de_DE (in contrast with suggested de_DE-x-informal which will not fall back this way as Gettext will see DE-x-informal as country code).
You can find more on locales naming at https://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html
Since you asked about WooCommerce, explaining the current state of how WordPress handles it is probably most relevant:
The best approach is to use locale variants as Michal wrote, but WordPress has its own unique twist on it and don’t use the variant syntax. Instead they add a third component to the filename but not -x-informal either: de_DE.mo is the informal (and also fallback, because it lacks any further specification) variant in WordPress and de_DE_formal.mo contains the formal variant.
Language tags as defined in BCP 47 (currently RFC 5646) do not specify any codes for informal vs. formal variants of German. It would probably be unrealistic to try to have such codes registered, so you would be limited to Private Use subtags, e.g. de-x-formal vs. de-x-informal. Whether your software can handle these in any way is a different issue.
On the practical side. the choice of “Sie” vs. “Du” (or “du”) is hardly a language variant issue. Standard German uses both pronouns to address a person in singular, depending on style of presentation and on the relationship with the addressed person. At the extreme, we could say that the choice of “Sie” vs. “Du” in the context of addressing a user generically in instructions or a user interface is a language variant issue. But on the practical side, just make up your mind.
Using some existing math libraries, though. The point is, that there is literally no research in this field in my language (Georgian). Is that possible? How long would I take? I know that this also depends on the skills, but still?
Also answered at
Speech to text conversion for non-english language
Is it possible to write a speech-recognition engine from scratch?
You do not need to write engine from scratch, there are many engines already available, you can just pick one like CMUSphinx
http://cmusphinx.sourceforge.net
If you just interested in supporting Georgian, it's just a task to train Georgian model, you do not need to implement engine itself. Speech recognition engines do not depend on langauge.
The point is, that there is literally no research in this field in my language (Georgian).
Luckily there is a lot in English.
How long would I take?
It takes about a month of single man work to create a database for a new language for CMUSphinx. For more details see the documentation:
http://cmusphinx.sourceforge.net/wiki/tutorial (Tutorial)
http://cmusphinx.sourceforge.net/wiki/tutorialam (Acoustic model training tutorial chapter)
We have developed a medium-sized ASP.NET / SQL Server application that uses resource files to provide English and Spanish user interface variants. Unicode data types are used throughout the databases. Now we are being asked to add Mandarin to the mix. I have no experience with localising into Asian languages, so I really cannot imagine what kind of job this would be.
My questions are:
How complex would this job be, as compared to localising into another Western language such as French or German?
What additional issues, other than (obviously) translating strings in resource files, should I deal with for Mandarin localisation? Anything related to the different alphabet, perhaps?
Reports of previous experiences or pointers to best practices are most welcome. Thanks.
On the technical side of things, I don't believe it's significantly more difficult. Adding support for non-Western languages will expose encoding issues if you are not using Unicode throughout, but it's pretty much the norm to use UTF-8 encoding and Unicode SQL types (nvarchar instead of varchar) anyway.
I would say that the added complexity and uncertainty is more about the non-technical aspects. Most of us English speakers are able to make some sense of European languages when we see 1:1 translations and can notice a lot of problems. But Mandarin is utterly meaningless to most of us, so it's more important to get a native speaker to review or at least spot-check the app prior to release.
One thing to be mindful of is the issue of input methods: Chinese speakers use IMEs (input method editors) to input text, so avoid writing custom client-side input code such as capturing and processing keystrokes.
Another issue is the actual culture identifier to choose. There are several variants for Chinese (zh-Hans which replaces zh-CHS and is used in China, and zh-Hant which replaces zh-CHT used in Taiwan). See the note on this MSDN page for more info on this. But these are neutral culture identifiers (they are not country-specific) and can be used for localization but not for things such as number and date formatting, therefore ideally you should use a specific culture identifier such as zh-CN for China and zh-TW for Taiwan. Choosing a specific culture for a Web application can be tricky, therefore this choice is usually based on your market expectations. More info on the different .NET culture identifiers is at this other post.
Hope this helps!
As far as translating text in the user interface goes, the localization effort for Chinese is probably comparable to that of Western languages. Like English and Spanish, Chinese is read left to right, so you won't need to mirror the page layout as you would if you had to support Arabic or Hebrew. Here are a couple more points to consider:
Font size: Chinese characters are more intricate than Latin characters, so you may need to use a larger font size. English and Spanish are readable at 8pt; for Chinese, you'll want a minimum of 10pt.
Font style: In English, bold and italics are often used for emphasis. In Chinese, emphasis is usually achieved with a different typeface, font size, or color. Use bold with caution, and avoid italics.
However, if you're targeting an Asian market, more significant changes may be required. Here are a few examples:
Personal names: A typical Chinese name is 孫中山: the first character (孫) is the family name, and the second and third characters (中山) constitute the given name. This of course is the opposite of the common Western convention of "given name" + space + "family name". If you're storing and displaying names, you may want to use a single "Name" field instead of separate "First Name" and "Last Name" fields.
Colors: In the U.S., it's common to use green for "good" and red for "bad". However, in China and Taiwan, red is "good". For instance, compare the stock prices on Yahoo! versus Yahoo! Taiwan.
Lack of an alphabet: Chinese characters are not based on an alphabet. Thus, for example, it wouldn't make sense to offer the ability to filter a list by the first letter of each entry, as in a directory of names.
Regarding sort order, different methods are used in Chinese: binary (i.e. Unicode), radical+stroke, Pinyin (=romanization), and Bopomofo (=syllabary).
On SQL Server, you can define the sort order using the COLLATE clause in the column definition, or on statement level. It's also possible to have a default collation on database level.
Run this statement to see all supported Chinese collations:
select * from fn_helpcollations()
where name like 'chinese%'
In .Net, there's also a list of Chinese cultures to be used for sorting, see MSDN.