How to convert text symbols like ‰^“]€”õONŠ back to Japanese so I can google translate it to English? (Script?) - dictionary

I am working with Mitsubishi PLC files that were originally commented in Japanese but then opened on English-only computers which converted the Japanese symbols to incomprehensible latin keyboard symbol combinations such as ‰^“]€”õONŠm”F(‘€ì”Õ1).
Being able to understand these comments would greatly enhance my ability to analyze and modify these files as I am required to do so for my work. If I could translate these back to Japanese symbols (I do have the Japanese language pack installed on my windows laptop), I could then translate these with Google Translate, which I know is not perfect, but is a lot better than ##$$##&^.
Does anyone have any ideas how this could be done? I figure that Windows must have interpreted the original characters somehow, and there may be a way to interpret them back to the original symbols.
I am thinking of trying to do some kind of character translation using a script in Python or Powershell or VBA (maybe I can create a map in Excel...)
Any ideas?
I can export these comments into CSV files so easy to get to and manipulate if I can figure out how....
This is an ongoing problem for me so I am willing to put some time into a solution.
I tried re-opening the oldest version of the files, in my computer with the Japanese language pack installed and no luck.

You can run your text through an ascii to hex converter and then through a hex to ascii converter in order to change the encoding without your system settings being in the way.

Related

Using write.csv() function in R but it doesn't actually save it to my C:

I'm a newbie and have searched Stack, the internet, everywhere I can think of.. But I cannot figure out why when I use write.csv() in R it doesn't actually save it as a csv file on my computer. All I want is to get a .csv file of my work from RStudio to Tableau and I've spent a week trying to figure it out. Many of the answers I have read use too much coding "lingo" and I cannot translate it because I'm just a beginner. Would be so so thankful for any help.
Here is the code I'm using:
""write.csv(daily_steps2,"C:\daily_steps2.csv", row.names = TRUE)""
I put the double quotes around the code because it seems like that's what I'm supposed to do here? IDK, but I don't have those when I run the function. There is no error when I run this, it just doesn't show up as a .csv on my computer. It runs but actually does nothing. Thank you so much for any help.
In my opinion, the simplest way would be to save the file to the same folder that rstudio is running in, and use the rstudio gui. It should be write.csv(daily_steps, "./daily_steps.csv") (no quotes around the function), and then on the tab in the bottom right of rstudio, you can select files, and it should be there. Then you can use the graphic user interface to move it to your desktop in a way analogous to what you would do in MS word.
Quick fix is to use double slashes or forward slash for Windows paths. (Also, since row.names=TRUE is the default, there is no need to specify)
write.csv(daily_steps2, "C:\\daily_steps2.csv")
write.csv(daily_steps2, "C:/daily_steps2.csv")
However, consider the OS-agnostic file.path() and avoid issues of folder separators in file paths including forward slash (used on Unix systems like Mac and Linux) or backslash (used on Windows systems).
write.csv(daily_steps2, file.path("C:", "daily_steps2.csv"))
Another benefit is that because of this functional form to path expression, you can pass dynamic file or folder names without paste.

Getting Japanese characters to display in R Shiny

Our users have RStudio installed on their local machines and are using Shiny to filter data and exporting dataframes to an .xlsx file.
This works really well for most characters, but not when it comes to the Japanese and Mandarin ones. For those, they get to see ??????? instead of the actual text.
Data is residing in a SQL DB and we're using RODBC to connect to DB.
RODBC doesn't seem to like reading these Japanese and Mandarin characters. Is there a way to get around this?
Any help is much appreciated!
Thanks
I had a similar problem with french language the other day. Maybe these options can help you :
In RStudio, try going in Tool > Global Options > Code > Saving > and then choose the right encoding for Japanese and Mandarin. The UTF-8 enconding might work for you.
The blog post Escaping from character encoding hell in R on Windows explains how to set the encoding to import external documents. It should work with data imported with RODBC as well. The autor uses Japanese characters in his examples.
In the odbcDriverConnect() function of the RODBC package, the argument DBMSencoding="UTF-8" might work for you.

R exporting text issue

I have a problem that it might be a bit unique, but I think that if it is answered it could answer other questions about encoding too.
In order to expand my R skills I tried to write a function that I could manage the vcf file from android phones. Everything went ok, until I tried to upload the file in the phone. An error appeared that the first line starts with something else than a normal VCF version 3 file. But when I check the file on the PC it appears to be ok without these characters that my phone said. So, I asked about it and one person here said that it is the Byte Ordering Mark and I should use a HEX editor to see it. And it was there even it couldn't be seen in the TXT editor of windows and linux.
Thus, I tried to solve the problem by using fileEncoding arguments in R. the code that I use to write the file is:
write.table(cons2,file=paste(filename,".vcf",sep=""),row.names=F,col.names=F,quote=FALSE,fileEncoding="")
I put ASCII as argument, UTF-8 etc but no luck. ASCII seems to delete some of the characters, and UTF-8 makes these characters be visible in the text file.
I would appreciate if someone could provide a solution to this.
PS: I know that if I modify the file in a HEX editor it solves the problem, but I want the solution in the R coding.

Converting .pdf files to excel (.xls)

A friend of mine doing an internship asked me 2 hours ago if I could help him avoid to do manually 462 pdf file to .xls using free online soft.
I thought of a shell script using unoconv, but I didn't find out how to use it properly, and I am not sure if unoconv can solve this problem since it mainly converts file to pdf, not the reverse thing.
Conversion from PDF to any other structured format is not always possible and not generally recommended.
Having said that, this does look like a one-off job and there's a fair few of them (462).
It's worth pursuing, if you can reliably extract text from most of them and it's reasonably structured. It's a matter of trying to get regular text output across a sample of the PDF's that you can reliably parse into a table structure.
There's plenty of tools around that target either direct or OCR based text extraction, just google around.
One I like is pstotext from the ghostscript suite; the -bboxes option lets me get the coordinates of each word and leaves it up to me to re-assemble the structure. Despite its name it does work on input PDFs. Downside is that it can be a bit flakey and works on some PDF's but not others.
If you get this far, you'd then most likely then need to write a shell-script or program to convert that to a CSV. You can either open this directly via a spread-sheet or look for tools to convert this into XLS.
PS If he hasn't already, get the intern to ask if there's any possible way of getting at the original data that was used to created the PDFs It will save a lot of time and effort and lead to a way more accurate result.
Update An alternative to pstotext is renderpdf.pl command which is included in the Perl CAM::PDF module. More robust, but just reports text (x,y) position, not bounding boxes.
Other responses on a linked question suggest Tabula, too.
https://github.com/tabulapdf/tabula
I tried and it works very well.

Unix vs. Windows rendering of characters

I have a text file that display differently when opening it in FreeBSD vs. Windows.
On FreeBSD:
An·lisis e InvestigaciÛn
On Windows:
Análisis e Investigación
The windows representation is obviously right. Any ideas on how to get that result in bsd?
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
The problem is that it's not ASCII, but UTF-8. You have to use another editor which detects the encoding correctly or convert it to something your editor on freebsb understands.
This is not pure ASCII. It's utf-8. Try freebsd editor with utf-8 support or change locales.
From the way the characters are being displayed, I would say that file is UTF-8 encoded unicode. Windows is recognising this, and displaying the 'á' and 'ó' characters correctly, while FreeBSD is assuming it's ISO-8859-1, which results in these characters being displayed as 2 seperate characters (due to the UTF-8 encoding using 2 bytes).
You'll have to tell FreeBSD that it is a UTF-8 file, somehow.
How is the file encoded? I would try re-encoding the file as UTF-16.
So after doing a bit more digging if 1) Open the csv file in excel on mac and export it as csv file and 2) then open it in textmate, copy the text, and save it again it works.
The result of: file file.csv is
UTF-8 Unicode English text, with very long lines
The original is:
on-ISO extended-ASCII English text, with very long lines
This workaround isn't really suitable as this process is supposed to be automated, thanks for the help so far.
It doesn't matter which operating system you're using when you open the file. What matters is the application you use to open it. On Windows you're probably using Notepad, which automatically identifies the encoding as UTF-8.
The app you're using on FreeBSD obviously isn't doing that. Maybe it just can't read UTF-8 and you need to use a different app. Or maybe you just have to tell it which encoding to use. Automatic detection of character encodings is far from universal (and much farther from perfect).

Resources