Is it possible that two exact same inputs give two different outputs using the Google Cloud API (advanced) - google-translate

We are working for a customer implementing a solution which uses the Google Translate API (advanced edition). We have an issue now, because we found that translating identical input results in different outputs.
For example, the dutch input string "Goudse 48+ kaas belegen 1/16 Noord Hollandse weidemelk" is translated to French.
The first output gives: "Gouda 48+ fromage affiné 1/16 Lait de prairie de Hollande du Nord"
The second output gives: "Gouda 48+ fromage affiné 1/16 Lait des prés de Hollande du Nord"
This while being translated shortly after each other. In total, within one file of ± 250 products and ± 25 colums, 267 differences appear.
Does anyone know how this is possible? Or what we can do about it?

Related

Cracking an XOR crypt with a know key length

I'm trying to crack a crypt with a known key length. I deduced that the operation made was a hex XOR. Here is the crypt:
330a1448010816101c1e470b0248104711050903040a0844511317130d030817024812150d014817150d1c48050f0751181415071f0618060e510618000a051b190606144822080e1006040a42051d1302101e1b040a423d4651330a14480608101548010816101c1e470f1011511507170d0347161e48050f0751181d060c0548181311140417470b1f48100306181c18080c511c1e4716190d510206180a1d0242051d1302105f4838094205001447231f0c14144e511f1902101448050f07511b010201180d02470b0248180906180f14090d041b5d4716190d030242101a1447111e0514470d050014154212041e14071d115115071d09050206510b040b16181e1013071548010816101c1e4711010d120e07024651370d0509050807024806021014481809160307151201140c510817051b180307511c1902423006150211511a14000b1e0651010d041a5104071f1c04150b141b5106051e4451060c154819061414481302011e051447031f48180916140f03060e5118101516510717470f040b19470d1748050f07511f1e150e154f0247041e071547110418010b1b5f48381342181b51130a14480608101d0c561442170704151619451d0610160d02134217071e0342121a1e174e510e1e0b0e1e1f1809055105100e18144451100a14090547031f0c51150b120d5f
I have tried to use this tool to decrypt it. The tool outputted multiple possible keys and for each key, an attempt do decipher the crypt. I know the result should be plain text English. The closest I got was this:
T-e po1ato ,s a 6tarc-y, t0bero0s cr*p fr*m th per nnia) nig-tsha!e So)anumetube7osumeL. T-e wo7d po1ato (ay r fer 1o th pla+t it6elf ,n ad!itio+ to 1he e!ibleetube7. Inethe ndesi whe7e th spe&ies ,s in!igen*us, 1hereeare 6ome *thereclos ly r late! cul1ivat d po1ato 6peci s.P*tato s we7e in1rodu&ed o0tsid theeAnde6 reg,on f*ur c ntur,es a"o,a+d ha3e be&ome $n in1egra) par1 of (uch *f th wor)d's #ood 6uppl<. Iteis t-e wo7ld'sefour1h-la7gestefoodecropi fol)owin" mai?e, w-eat $nd r,ce.
After some digging and manually tweaking the text, I got this:
The potato is a starchy, tuberous crop from the perennial nightshade Solanum tuberosum L. The word "potato" may refer to the plant, itself, in addition to the edible tuber. In the Andes, where the species is indigenous, there are some other closely related cultivated potato species. Potatoes were introduced outside the Andes region four centuries ago, and have become an integral part of much of the world's food supply. It is the world's fourth-largest food crop, following maize, wheat and rice.
It unfortunately did not work. I am now trying to find a clue as to what I should be doing to find the answer.

How to make QTextToSpeech say "%n error(s)" correctly in different languages?

I'm using the Qt Translator's handling of the diversity of plural cases in different languages. In particular, in Russian there are singular, dual and plural cases (that's what Qt Linguist calls them), and I can use
tr("%n error(s)", "", errorCount)
to get the correct translation of e.g.
1 error → 1 ошибка
3 errors → 3 ошибки
6 errors → 6 ошибок
That's fine when I want to present the text as text — e.g. in a message box. But if I try making QTextToSpeech pronounce this (I tried it by pasting the text into Qt's hello_speak example), I get bad results, because e.g. "1 error" is pronounced as "один ошибка" instead of "одна ошибка" (incorrect agreement by gender: ошибка is feminine, одна is feminine, but один is masculine).
I suppose it's not currently possible for QTextToSpeech engines to find out that the number should somehow agree with several words following or preceding it, so a solution should involve transformation of the textual form to something a speech engine could unambiguously pronounce correctly (i.e. feed "one error" instead of "1 error" to QTextToSpeech). But I'd like to fix it in some generic way, so that I didn't have to manually create a "number-to-text" converter for each target language.
How can I fix this (with Qt facilities, if possible)?

Extracting full article text via the newsanchor package [in R]

I am using the newsanchor package in R to try to extract entire article content via NewsAPI. For now I have done the following :
require(newsanchor)
results <- get_everything(query = "Trump +Trade", language = "en")
test <- results$results_df
This give me a dataframe full of info of (maximum) a 100 articles. These however do not containt the entire actual article text. Rather they containt something like the following:
[1] "Tensions between China and the U.S. ratcheted up several notches over the weekend as Washington sent a warship into the disputed waters of the South China Sea. Meanwhile, Google dealt Huaweis smartphone business a crippling blow and an escalating trade war co… [+5173 chars]"
Is there a way to extract the remaining 5173 chars. I have tried to read the documentation but I am not really sure.
I don't think that is possible at least with free plan. If you go through the documentation at https://newsapi.org/docs/endpoints/everything in the Response object section it says :
content - string
The unformatted content of the article, where available. This is truncated to 260 chars for Developer plan users.
So all the content is restricted to only 260 characters. However, test$url has the link of the source article which you can use to scrape the entire content but since it is being aggregated from various sources I don't think there is one automated way to do this.

Can MeCab be configured / enhanced to give me the reading of English words too?

If I begin with a wholly Japanese sentence and run it through MeCab, I get something like this:
$ echo "吾輩は猫である" | mecab
吾輩 名詞,代名詞,一般,*,*,*,吾輩,ワガハイ,ワガハイ
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
猫 名詞,一般,*,*,*,*,猫,ネコ,ネコ
で 助動詞,*,*,*,特殊・ダ,連用形,だ,デ,デ
ある 助動詞,*,*,*,五段・ラ行アル,基本形,ある,アル,アル
EOS
If I smash together everything I get from the last column, I get "ワガハイワネコデアル", which I can then feed into a speech synthesis program and get output. Said program, however, doesn't handle English words.
I throw English into MeCab, it manages to tokenise it (probably naively at the spaces), but gives no reading:
$ echo "I am a cat" | mecab
I 名詞,固有名詞,組織,*,*,*,*
am 名詞,一般,*,*,*,*,*
a 名詞,一般,*,*,*,*,*
cat 名詞,固有名詞,組織,*,*,*,*
EOS
I want to get readings for these as well, even if they're not perfect, so that I can get something along the lines of "アイアムアキャット".
I have already scoured the web for solutions and whereas I do find a bunch of web sites which have transliteration that appears to be adequate, I can't find any way to do it in my own code. In a couple of cases, I emailed the site authors and got no response yet after waiting for a few weeks. (Just how far behind on their inboxes are these people?)
There are a number of directions I can go but I hit dead ends on all of them so far, so this is my compound question:
MeCab takes custom dictionaries. Is there a custom dictionary which fills in the English knowledge somewhat?
Is there some other library or tool that can take English and spit out Katakana?
Is there some library or tool that can take IPA (International Phonetic Alphabet) and spit out Katakana? (I know how to get from English to IPA.)
As an aside, I find that the software "VOICEROID" can speak English text (poorly, but adequately for my purposes). This software uses MeCab too (or at least its DLL and dictionary files are included in the install.) It also uses another library, Cabocha, which as far as I can tell by running it does the exact same thing as MeCab. It could be using custom dictionaries for either of these two libraries to do the job, or the code to do it could be in the proprietary AITalk library they are using. More research is needed and I haven't figured out how to run either tool against their dictionaries to test it out directly either.

How to control the echo width using Sweave

I have a problem with the width of the output from echo within sweave, I have a list with a large amount of text. The problem is the echo response from R runs off the page within the pdf. I have tried using
<<>>=
options(width=40)
#
but this has not changed anything.
An example: Set up the list (not showing in latex).
<<echo=FALSE>>=
my_list <- list(example="Site location was fixed using a Silvia Navigator handheld GPS in October 2003. Point of reference used was the station Bench Mark. If the bench mark location was remote from the site then the point of reference used was changed to the 0-1 metre gauge. Bench Mark location was then recorded as a separate entry in the Site History section [but not used as the site location].\r\nFor a Station location map and all digital photograph's of the station, river reach, and site details see H:\\hyd\\dat\\doc. For non digital photo's taken prior to October 2003 please see the relevant station file at Tumut office.")
#
And show the entry of the list.
<<>>=
my_list
#
Is there any way that I can get this to work without having to break up the list with cat statements.
You can use capture.output() to capture the printed representation of the list and then use writeLines() and strwrap() to display this output, nicely wrapped. As capture.output() returns a vector of strings containing the printed representation of the object, we can cat each of them to the screen/page but wrapped using strwrap(). The benefit of this approach is that the result looks like it was printed by R. Here's the solution:
writeLines(strwrap(capture.output(my_list)))
which produces:
$example
[1] "Site location was fixed using a Silvia Navigator
handheld GPS in October 2003. Point of reference used
was the station Bench Mark. If the bench mark location
was remote from the site then the point of reference used
was changed to the 0-1 metre gauge. Bench Mark location
was then recorded as a separate entry in the Site History
section [but not used as the site location].\r\nFor a
Station location map and all digital photograph's of the
station, river reach, and site details see
H:\\hyd\\dat\\doc. For non digital photo's taken prior
to October 2003 please see the relevant station file at
Tumut office."
From a 2010 posting to rhelp by Mark Schwartz:
cat(paste(strwrap(x, width = 70), collapse = "\\\\\n"), "\n")

Resources