What is the meaning of Google Translate query params? - google-translate

What is the meaning of all Google Translate query params?
client:t
sl:auto
tl:sk
hl:sk //language of the interface (default:en, you can try xx-bork or xx-hacker)
dt:bd
dt:ex
dt:ld
dt:md
dt:qc
dt:rw
dt:rm
dt:ss
dt:t
dt:at
dt:sw
ie:UTF-8 // encoding of the input (default: utf-8)
oe:UTF-8 // encoding of the output, the results (default: utf-8)
otf:1
srcrom:1
ssel:3
tsel:0
q:translate // query, what you type in the search box
I already discovered some of them.

I'm developing an online translator app, and this is what I found out empirically:
sl - source language code (auto for autodetection)
tl - translation language
q - source text / word
ie - input encoding (a guess)
oe - output encoding (a guess)
dt - may be included more than once and specifies what to return in the reply.
Here are some values for dt. If the value is set, the following data will be returned:
t - translation of source text
at - alternate translations
rm - transcription / transliteration of source and translated texts
bd - dictionary, in case source text is one word (you get translations with articles, reverse translations, etc.)
md - definitions of source text, if it's one word
ss - synonyms of source text, if it's one word
ex - examples
rw - See also list.
dj - Json response with names. (dj=1)

Here are a few more:
client t probably represents the standalone google translate web app (as opposed to a mobile app, or the widget that pops up if you google search "translate")
sl is source language
tl is translate language (the language you want to translate into)
srcrom seems to be present when the source text has no spelling suggestions

Related

python3 imaplib search function encoding

Can someone point me out how to properly search using imaplib in python. The email server is Microsoft Exchange - seems to have problems but I would want a solution from the python/imaplib side.
https://github.com/barbushin/php-imap/issues/128
I so far use:
import imaplib
M = imaplib.IMAP4_SSL(host_name, port_name)
M.login(u, p)
M.select()
s_str = 'hello'
M.search(s_str)
And I get the following error:
>>> M.search(s_str)
('NO', [b'[BADCHARSET (US-ASCII)] The specified charset is not supported.'])
search takes two or more parameters, an encoding, and the search specifications. You can pass None as the encoding, to not specify one. hello is not a valid charset.
You also need to specify what you are searching: IMAP has a complex search language detailed in RFC3501§6.4.4; and imaplib does not provide a high level interface for it.
So, with both of those in mind, you need to do something like:
search(None, 'BODY', '"HELLO"')
or
search(None, 'FROM', '"HELLO"')

Can MeCab be configured / enhanced to give me the reading of English words too?

If I begin with a wholly Japanese sentence and run it through MeCab, I get something like this:
$ echo "吾輩は猫である" | mecab
吾輩 名詞,代名詞,一般,*,*,*,吾輩,ワガハイ,ワガハイ
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
猫 名詞,一般,*,*,*,*,猫,ネコ,ネコ
で 助動詞,*,*,*,特殊・ダ,連用形,だ,デ,デ
ある 助動詞,*,*,*,五段・ラ行アル,基本形,ある,アル,アル
EOS
If I smash together everything I get from the last column, I get "ワガハイワネコデアル", which I can then feed into a speech synthesis program and get output. Said program, however, doesn't handle English words.
I throw English into MeCab, it manages to tokenise it (probably naively at the spaces), but gives no reading:
$ echo "I am a cat" | mecab
I 名詞,固有名詞,組織,*,*,*,*
am 名詞,一般,*,*,*,*,*
a 名詞,一般,*,*,*,*,*
cat 名詞,固有名詞,組織,*,*,*,*
EOS
I want to get readings for these as well, even if they're not perfect, so that I can get something along the lines of "アイアムアキャット".
I have already scoured the web for solutions and whereas I do find a bunch of web sites which have transliteration that appears to be adequate, I can't find any way to do it in my own code. In a couple of cases, I emailed the site authors and got no response yet after waiting for a few weeks. (Just how far behind on their inboxes are these people?)
There are a number of directions I can go but I hit dead ends on all of them so far, so this is my compound question:
MeCab takes custom dictionaries. Is there a custom dictionary which fills in the English knowledge somewhat?
Is there some other library or tool that can take English and spit out Katakana?
Is there some library or tool that can take IPA (International Phonetic Alphabet) and spit out Katakana? (I know how to get from English to IPA.)
As an aside, I find that the software "VOICEROID" can speak English text (poorly, but adequately for my purposes). This software uses MeCab too (or at least its DLL and dictionary files are included in the install.) It also uses another library, Cabocha, which as far as I can tell by running it does the exact same thing as MeCab. It could be using custom dictionaries for either of these two libraries to do the job, or the code to do it could be in the proprietary AITalk library they are using. More research is needed and I haven't figured out how to run either tool against their dictionaries to test it out directly either.

Wordpress/Apache - 404 error with unicode characters in image filenames

We've recently moved a website to a new server, and are running into an odd issue where some uploaded images with unicode characters in the filename are giving us a 404 error.
Via ssh/FTP, we can see that the files are definitely there.
For example:
http://sjofasting.no/project/adnoy
none of the images are working:
Code:
<img class='image-display' title='' src='http://sjofasting.no/wp/wp-content/uploads/2012/03/ådnøy_1_2.jpg' width='685' height='484'/>
SSH:
-rw-r--r-- 1 xxxxxxxx xxxxxxxx 836813 Aug 3 16:12 ådnøy_1_2.jpg
What is also strange is that if you navigate to the directory you can even click on the image and it works:
http://sjofasting.no/wp/wp-content/uploads/2012/03/
click on 'ådnøy_1_2.jpg' and it works.
Somehow wordpress is generating
http://sjofasting.no/wp/wp-content/uploads/2012/03/ådnøy_1_2.jpg
and copying from the direct folder browse is generating
http://sjofasting.no/wp/wp-content/uploads/2012/03/a%CC%8Adn%C3%B8y_1_2.jpg
What is going on??
edit:
If I copy the image url from the wordpress source I get:
http://sjofasting.no/wp/wp-content/uploads/2011/11/Bore-Strand-Hotellg%C3%A5rd-12.jpg
When copied from the apache browser I get:
http://sjofasting.no/wp/wp-content/uploads/2011/11/Bore-Strand-Hotellga%cc%8ard-12.jpg
What could account for this discrepancy between:
%C3%A5 and %cc%8
??
Unicode normalisation.
0xC3 0xA5 is the UTF-8 encoding for U+00E5 a-with-ring.
0xCC 0x8A is the UTF-8 encoding for U+030A combining ring.
U+0035 is the composed (Normal Form C) way of writing an a-ring; an a letter followed by U+030A is the decomposed (Normal Form D) way of writing it. å vs å - they should look the same, though they may differ slightly depending on font rendering.
Now normally it doesn't really matter which one you've got because sensible filesystems leave them untouched. If you save a file called [char U+00E5].txt (å.txt), it stays called that under Windows and Linux.
Macs, on the other hand, are insane. The filesystem prefers Normal Form D, to the extent that any composed characters you pass into it get converted into decomposed ones. If you put a file in called [char U+00E5].txt and immediately list the directory, you'll find you've actually got a file called a[char U+030A].txt. You can still access the file as [char U+00E5].txt on a Mac because it'll convert that input into Normal Form D too before looking it up, but you cannot recover the same filename in character sequence terms as you put in: it's a lossy conversion.
So if you save your files on a Mac and then transfer to a filesystem where [char U+00E5].txt and a[char U+030A].txt refer to different files, you will get broken links.
Update the pages to point to the Normal Form D versions of the URLs, or re-upload the files from a filesystem that doesn't egregiously mangle Unicode characters.
Think Different, Cause Bizarre Interoperability Problems.

How do you transfer a binary file via Connect:Direct NDM?

I'm trying to submit a binary file, in this case, an Excel file from my local server (Solaris server with Mainframe rehosting software) using Connect:Direct NDM to a destination server (Mainframe).
Here are the environment values I set:
SODETFL "DetailedReport.xls"
SODDETNDM "FIN.REPORT(+1)"
TDCOPTS ":DATATYPE=BINARY:XLATE=NO:STRIP.BLANKS=NO"
Here is the NDM configuration I use:
ASSGNDD ddname='SYSIN' type='INSTREAM' << !
SIGNON 00260005
SUBMIT PROC=COPYFILE - 00270005
JOBNAME=JOB00001 - 00280005
PNODE=SERVER001 - 00290005
SNODE=NDMIDS - 00300005
SNODEID=(xxxxxx,xxxxxx) - 00310005
HOLD=NO - 00320005
NOTIFY=CCACTD - 00330005
NODE=, - 00360005
DSN1=${SODDETFL} - 00370005
DSN2=${SODDETNDM} -
DCBINFO='dcb=(dsorg=ps, recfm=vb, lrecl=1504)' - 00385005
DISP1=NEW, - 00390005
DISP2=CATLG,DELETE - 00400005
UNIT=BATCH - 00410005
SYSOPTS=${TDCOPTS} - 00440005
AEFAJOB=PSIAPNB5
SEL PROC WHERE (QUEUE=A) TABLE 00450005
SIGNOFF 00460005
I'm able to send text files via NDM all day long, no problems there. However, it seems that binary is a bit more difficult. When I try with the above configuration, I get the following error:
Completion Code => 8
Message Id => XCPS009I
Short Text => Read buffer too small. Possibly src reclen > dest reclen.
Ckpt=>Y Lkfl=>N Rstr=>N Xlat=>Y Scmp=>N Ecmp=>Y Ecpr=>0.00 CRC=>N Zlvl=>1 win=>13 Zmem=>4
Can anyone shed some light as to how I can go about submitting a binary file via NDM?
Off the cuff...
Try changing RECFM=VB to RECFM=U and specify a BLKSIZE= instead of a LRECL=
This is really not all that different from how executable load modules are stored on the mainframe except you don't want the file to be a PDS dataset. I'm not at my office right now and I think I have some examples of NDM that transmit load modules that I can look-up if this suggestion doesn't work but I think it will.
Give this suggestion a shot and if it still doesn't fly let me know.

Math symbols in vim

Does anyone know how to have vim convert the html entities of math symbols into the math characters?
For example:
≠ becomes ≠
∴ becomes ∴
here is a table with the symbol html entities
http://barzilai.org/math_sym.htm
Updated: Solved, bignose came through with the solution.
using the :digraphs functionality of Vim. with a character encoding of Unicode,
see ':help digraphs' for documentation
I'm Still looking for a monospace Unicode font so it renders completely but with extra spaces it works great.
In order to see math characters UTF-8 has to be the encoding and a font that will display those characters.
I added the following to my vim configuration files.
created custom file: mathdoc.vim in syntax/
" set the encoding to be utf-8, requires gVim or a terminal capable of
" unicode see ':help Unicode' for details
set encoding=utf-8
" requires a font that has characters for the higher uniocode symbols
set guifont=MS\ Gothic
I added this to filetype to set this for my own custom extension .txtmt
au BufNewFile,BufRead *.txtmt setf mathdoc
but you could alternately call this with the file open:
:set ft=mathdoc
digraphs works great as bignose specified below here is how it works
in insert mode:
press control+k followed by:
∴ is S*
≠ is !=
∑ is +Z
≡ is =3
⇐ is <=
⇒ is =>
⇔ is ==
∀ is FA
∃ is TE
∋ is -)
see :digraphs for the complete list
* note if you only see half a screens worth you're character encoding is not unicode, unicode characters cover several screens, type :set encoding=utf-8 to switch to utf-8.
The table in the above link has the numbers for the characters that you'll need to find the keyboard shortcuts, 8756 is ∴ for example
You want what Vim calls “digraphs”: read :help digraphs to see how they're used, and :digraphs to list the defined ones in your Vim.
Summary: in insert mode, press Ctrl+K (which causes Vim to display a highlighted ?, waiting for further input), then the defined two characters of the digraph. Vim then replaces what you typed with the defined resulting character. E.g. Ctrl+K, !, = produces ‘≠’.
I'm not sure that libraries exist to do this in pure vimscript, however, vim does allow you to embed Python, and Python has BeautifulSoup which can handle converting html entities to unicode:
I don't have python support enabled on my vim, so I had to settle for writing an external script, soup.py, which converts html entities to UTF-8:
# soup.py
from BeautifulSoup import BeautifulStoneSoup
import sys
input = sys.stdin.read()
output = str(BeautifulStoneSoup(input, convertEntities=BeautifulStoneSoup.HTML_ENTITIES))
sys.stdout.write(output)
(FYI, I don't know python, so even though that works, it's probably pretty ugly)
You can use it in vim by selecting the lines with entities
you want to convert in visual mode, and passing them to the script thusly:
:'<,'>!python soup.py
For example, if my cursor was on a line reading
∴ i ≠ 10
And I hit
!!python soup.py<Enter>
It would convert it to
∴ i ≠ 10

Resources