I am using Translation API for language conversion.But i am not able to get translation Accuracy - azure-cognitive-services

I am using microsoft translation api for language translation from source Language to English, it is working fine. But i want to have the translation accuracy. any help will be appreciated.
I have gone through the below article.
https://learn.microsoft.com/en-us/azure/cognitive-services/translator/translator-info-overview
Translation api should return translation accuracy or score.

Microsoft Translator evaluates the quality with the BLEU (bilingual evaluation understudy) standards and our own benchmarks (both automatic and human evaluations).
Depending on multiple variables, such as length and type of text translated, language pairs (source and target), industry lingo, or the domain in which Translator is used, results will vary greatly for any vendor offering a machine translation solution.

Related

Is Microsoft Cognitive Translator service "pt" language certified to be Brazilian Portugese?

I have recently started using Microsoft Cognitive Services for translations, and the Translator does support the "pt" language. I ran a few tests with 'reception, train, bus, cream, brown' and got the Brazilian Portuguese translations. I would have expected to get the Portuguese ones with this language code and the need to use pt-br for the Brazilian version.
As what I need the most is the Brazilian language, this is not an issue, but I would like to make sure the behavior won't change further down the line, or that my test was indeed good enough.
Is there anyway to confirm that the "pt" language code used as a destination language is indeed and will always result in Brazilian Portuguese translations ?
Thank you!
The following is a response from a Cognitive Service Engineer:
Our Translation engines are not locale specific. So we have one for EN (US, AUS, UK, etc.) , one for FR, one for ES, and one for PT.
That being said, most of our training data for PT is Brazilian Portuguese so most likely all translations will be PT-BR more than PT-PT.
Finally, for speech translation (the API and the product it supports: Microsoft Translator live , Skype Translator), our ASR was only trained on BR speech data.

Watson Conversation: multi-language website approach

I'm trying to figure out the best approach to interfacing with Watson Conversation from a multi-language website (English/French). The input to Watson Conversation will be fed from Watson STT, so the input should be in the appropriate language. Should I set up the intents and entities in both languages? That might potentially cause issues with words that are the same (or very similar) in both languages but have different meanings. My guess is that I'll need two separate Conversation workspaces but that seems like a lot of overhead (double work when anything changes). I've thought about using the Watson Language Translator in between STT and Conversation but I would think the risk with that approach could be a reduction in accuracy. Has anyone been able to do this?
You will need to set up separate workspaces for each language. As you need to set the language of your workspace.
After that you would need to do STT, then language detection service to determine what workspace it should be directed to.

Bilingual (English and Portuguese) documentation in an R package

I am writing a package to facilitate importing Brazilian socio-economic microdata sets (Census, PNAD, etc).
I foresee two distinct groups of users of the package:
Users in Brazil, who may feel more at ease with the documentation in
Portuguese. The probably can understand English to some extent, but a
foreign language would probably make the package feel less
"ergonomic".
The broader international users community, from whom English
documentation may be a necessary condition.
Is it possible to write a package in a way that the documentation is "bilingual" (English and Portuguese), and that the language shown to the user will depend on their country/language settings?
Also,
Is that doable within the roxygen2 documentation framework?
I realise there is a tradeoff of making the package more user-friendly by making it bilingual vs. the increased complexity and difficulty to maintain. General comments on this tradeoff from previous expirience are also welcome.
EDIT: following the comment's suggestion I cross-posted r-package-devel mailling list. HERE, then follow the answers at the bottom. Duncan Murdoch posted an interesting answer covering some of what #Brandons answer (bellow) covers, but also including two additional suggestions that I think are useful:
have the package in one language, but the vignettes for different
languages. I will follow this advice.
have to versions of the package , let's say 1.1 and 1.2, one on each
language
According to Ropensci, there is no standard mechanism for translating package documentation into non-English languages. They describe the typical process of internationalization/localization as follows:
To create non-English documentation requires manual creation of
supplemental .Rd files or package vignettes.
Packages supplying
non-English documentation should include a Language field in the
DESCRIPTION file.
And some more info on the Language field:
A ‘Language’ field can be used to indicate if the package
documentation is not in English: this should be a comma-separated list
of standard (not private use or grandfathered) IETF language tags as
currently defined by RFC 5646 (https://www.rfc-editor.org/rfc/rfc5646,
see also https://en.wikipedia.org/wiki/IETF_language_tag), i.e., use
language subtags which in essence are 2-letter ISO 639-1
(https://en.wikipedia.org/wiki/ISO_639-1) or 3-letter ISO 639-3
(https://en.wikipedia.org/wiki/ISO_639-3) language codes.
Care is needed if your package contains non-ASCII text, and in particular if it is intended to be used in more than one locale. It is possible to mark the encoding used in the DESCRIPTION file and in .Rd files.
Regarding encoding...
First, consider carefully if you really need non-ASCII text. Many
users of R will only be able to view correctly text in their native
language group (e.g. Western European, Eastern European, Simplified
Chinese) and ASCII.72. Other characters may not be rendered at all,
rendered incorrectly, or cause your R code to give an error. For .Rd
documentation, marking the encoding and including ASCII
transliterations is likely to do a reasonable job. The set of
characters which is commonly supported is wider than it used to be
around 2000, but non-Latin alphabets (Greek, Russian, Georgian, …) are
still often problematic and those with double-width characters
(Chinese, Japanese, Korean) often need specialist fonts to render
correctly.
On a related note, R does, however, provide support for "errors and warnings" in different languages - "There are mechanisms to translate the R- and C-level error and warning messages. There are only available if R is compiled with NLS support (which is requested by configure option --enable-nls, the default)."
Besides bilingual documentation, please allow me the following comment: Given your two "target" groups, it may be assumed that some of your users will be running non-English OS (typically, Windows in Portuguese). When importing time series data (or any date entries as a matter of fact), due to different "date" formatting (English vs. non-English), you may get different "results" (i.e. misinterpeted date entries) when importing to English/non-English machines. I have some experience with those issues (I often work with Czech-language-based OSs) and -other than ad-hoc coding- I don't find a simple solution.
(If you find this off-topic, please feel free to delete)

Is there a locale code for informal German (de_DE-informal?)

In German there is a formal ("Sie") and an informal ("Du") form. I would like to translate some software into informal German, notably woocommerce using transiflex, but new languages can only be added using their locale code.
There only seems to be de_DE as a locale. What's best to differentiate between the two forms, shouldn't there be another locale code just for the informal form, too?
Generally Gettext uses # to distinguish language variants. So in this case it could be de_DE#informal. This way the locale will be correctly handled by Gettext with fallback to de_DE (in contrast with suggested de_DE-x-informal which will not fall back this way as Gettext will see DE-x-informal as country code).
You can find more on locales naming at https://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html
Since you asked about WooCommerce, explaining the current state of how WordPress handles it is probably most relevant:
The best approach is to use locale variants as Michal wrote, but WordPress has its own unique twist on it and don’t use the variant syntax. Instead they add a third component to the filename but not -x-informal either: de_DE.mo is the informal (and also fallback, because it lacks any further specification) variant in WordPress and de_DE_formal.mo contains the formal variant.
Language tags as defined in BCP 47 (currently RFC 5646) do not specify any codes for informal vs. formal variants of German. It would probably be unrealistic to try to have such codes registered, so you would be limited to Private Use subtags, e.g. de-x-formal vs. de-x-informal. Whether your software can handle these in any way is a different issue.
On the practical side. the choice of “Sie” vs. “Du” (or “du”) is hardly a language variant issue. Standard German uses both pronouns to address a person in singular, depending on style of presentation and on the relationship with the addressed person. At the extreme, we could say that the choice of “Sie” vs. “Du” in the context of addressing a user generically in instructions or a user interface is a language variant issue. But on the practical side, just make up your mind.

Is it possible to write a speech-recognition engine from scratch?

Using some existing math libraries, though. The point is, that there is literally no research in this field in my language (Georgian). Is that possible? How long would I take? I know that this also depends on the skills, but still?
Also answered at
Speech to text conversion for non-english language
Is it possible to write a speech-recognition engine from scratch?
You do not need to write engine from scratch, there are many engines already available, you can just pick one like CMUSphinx
http://cmusphinx.sourceforge.net
If you just interested in supporting Georgian, it's just a task to train Georgian model, you do not need to implement engine itself. Speech recognition engines do not depend on langauge.
The point is, that there is literally no research in this field in my language (Georgian).
Luckily there is a lot in English.
How long would I take?
It takes about a month of single man work to create a database for a new language for CMUSphinx. For more details see the documentation:
http://cmusphinx.sourceforge.net/wiki/tutorial (Tutorial)
http://cmusphinx.sourceforge.net/wiki/tutorialam (Acoustic model training tutorial chapter)

Resources