wrong language detection with google translate (multiple languages) - google-translate

I am currently working on something where I am trying to translate a paragraph which includes more than one language.
Now I have realised with the google translate API if we have lets say:
hello bye hola
it will detect the language as English and if its:
hello hola adios then it will detect Spanish.
So basically whichever language has the highest word count in the sentence/paragraph, it will detect that language. Now the funny thing is that on google translate they actually have this feature.
Is there any way that to fix this issue so that it will only detect the foreign language and not English?

No, there's not a way to do that with the Google Translate API because there's just no mechanism for that exposed in their public API.
If you use an alternate language detection library, you can define a threshold under which to remove the content of the less-represented language. This would allow you to remove the English content if it makes up less than, say, 30% of the text in your overall sample.
For example, see the RemoveMinorityScriptsTextFilterTest class in the optimaize/language-detector project.

Related

Google Translate: UI translates but API fails

Just struggling with the following thing:
I have a lot of kind-of-comments-dirty-writing-manner comments. Comments came from India region, and there is a mix of languages (but not within the single comment). In addition there are samples with the transliteration, and I was wondering whether I can detect the language in this kind of mess.
But with Google Translate UI I found it deals with it.
As may be seen Google Translate UI detects language (probably correctly), suggests how the text should be written in the language detected, and finally translated it.
In contrast, Google Translate API does not give such a translation, but detects the language yet. Here's the response for the same input string:
translations {
translated_text: "Theek kiya"
detected_language_code: "hi"
}
So, I just wondering whether UI does additional stuff before it runs the translation, like spelling or whatever.
I do not see what did I missed with the API to make it finally translate the text.
Maybe someone faced the same problem and can help me?

Translate API - different result from the web service

When using the translation API, I get a different translation (and worse) than if I use translate.google.com.
I am working on a project for a client, and the client was dissatisfied with the translation and noticed the difference.
Do these two service use different engines? I read that the API uses nmt-mode now, and that translate.google.com already uses the same engine.
Both set to translate from Norwegian to English.
Any more information that can clear this up?
Thanks!
The result differences between the translate.google.com and the Translation API calls are considered as an expected behavior that can be generated due to maintenance tasks and the logic used by the internal processes; However, the engines used for each service seems to be private information.
Based on this, it is normal to get some variances when using the API. I think you can use the model parameter option as an available workaround in case you want to specify which of the available models to use, as well as take a look on the Specifying a model official documentation to get detail information about this alternative.
It's almost about 3 years later and the problem still remains!
So I was trying to translate a dataset with the Google Translate API, but in the end it failed to translate some texts to the target language (in my case, Persian/Farsi). So I decided to check them to see if there's a pattern and maybe translate them using the web version of Google Translate.
As I was doing so, I figured that the web version actually could translate some of those untranslated texts, BUT not all. When trying to find a reason for such behaviour, I found out that most of them were names and not sentences. But as we know, names can easily be written with the target language characters as the translation. But why the API doesn't transform those names while the web version does? This photo will explain everything perhaps:
verified translation
As can be seen, some translations have a badge indicating that the translation has been verified, while some others don't.
So to recap, my guess is that maybe the API is set to only use verified translations, but as for the web version, even unverified translations are allowed since you can edit or report them.

Alexa skill responses in different languages

Is it possible and if so how to respond within a skill with different languages? For example I'm developing a skill for the German skill store which reads various texts from the internet. Those can be any in language and I can determine the language when I'm about to emit the response.
From what I can see the SSML subset Alexa implements does not specify the language in which the response is given. But Alexa's own Kindle skill is able to read me eBooks in either German or English (perhaps Amazon's own skills are special).
As said in other answers the right way is to use the <lang> tag in SSML. However since the english voice do not speak German it is quite weird. The right solution is to change the voice using <voice> tag.
Here is an example in German
<speak>
<voice name="Hans"><lang xml:lang="de-DE">Ich bin ein Berliner</lang></voice>.
I am a Berliner.
</speak>
It is described in this doc https://developer.amazon.com/fr/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html#examplefrench-content-in-an-english-skill
It looks like this is not possible at the moment: https://forums.developer.amazon.com/questions/55086/specify-output-language-per-intent.html
You can use the <lang> tag in SSML for this.
Here is an example in German.
<speak>
<lang xml:lang="de-DE">Mein Luftkissenfahrzeug ist voller Aale</lang>.
Hello in the default language.
</speak>
Here is a list of supported Amazon Polly languages for Alexa.

Google Translate API Pricing and Language auto-detect Effeciency

I have the following three questions
I want to use Google's API to translate text. I know that Google charges separately for translation and detection. Google translate also supports translation two ways to translate
i) By specifying both source and target, as in
https://www.googleapis.com/language/translate/v2?key=INSERT-YOUR-KEY&source=en&target=de&q=Hello%20world&q=My%20name%20is%20Jeff
ii) By specifying just the target, where the source us auto-detected,
like this https://www.googleapis.com/language/translate/v2?key=INSERT-YOUR-KEY&target=de&q=Hello%20world
My question is, if i call the API as in the second example, will I be charged for both detection and translation or just translation?
Is it more efficient when you specify both source and target than when you just the target, or, are there any downsides of using the second way above?
How many words should be sent to Google Translate API to detect a language reliably?
Thanks
I pretty much translate using the second approach most of the time (not informing to google the source language) and they only charge for the translation, not for the detection.
However, you must be aware of the fact that, in case your source text is of the same language as your target language, google will attempt to translate it anyways, and sometimes it leads to confused results, or at least a translation which was not necessary, since you already had the text in the desired language.

Are there any preset I18n word lists / resource files?

I'm creating a web application that uses I18n. As I don't want to translate very common basic strings like "forgot password?" on my own I'm asking you if there are already any resource files or word lists containing these strings. One option is to download an existing framework and extract somehow these strings but this might be a hassle?
Especially I'm looking for translation regarding user authentication and translations from English to Italian, French and German. The file format doesn't matter.
Professional translators use a tool, TMX is the generic term i think, Translation Memory Exchange, that does what you are talking about by building up standard phrase lists in other languages so when they translate they can bring these phrases in to speed up their job and reduce the repetitive tedium. So these lists exist.
There is a free plugin for MS Word that does this and may come with lists (sorry cannot remember the name although Rosetta rings a bell).
There is an FOSS TMX tool called Okapi at Sourceforge. It may come with the dictionaries but if not it is a point where you can investigate.
You could also approach a site called Proz which is a site for translators and might be able to point you in the right direction
Take care over MT like Google API as it can give some weird results but you could use it to build you list and then double check. Remember that when you check a language that you need to do it with a native speaker who can pick up on the nuances and colloquialisms.
You can use google translator api. and your custom resource bundle

Resources