I am trying to create an xml file with the encoding "UCS-2 LE BOM" in Qt. How do I do so?
I have found the solution. When using QTextStream, use setGenerateByteOrderMark(true) and codec to "UTF-16LE", the output will then show
with encoding "UCS-2 LE BOM".
Related
When I look at data in R, it has characters like "é" displayed correctly.
I export it to excel using write.csv. When I open the csv file, "é" is displayed as "√©". Is the problem with write.csv or with excel? What can I do to fix it?
Thanks
Try the write_excel_csv() function from the readr package
readr::write_excel_csv(your_dataframe, "file_path")
It's a problem with Excel. Try Importing data instead of Opening the file.
Go to: 'Data' --> 'From Text/CSV' and then select '65001:Unicode (UTF-8)'. That will match the encoding from R.
Try experimenting with the parameter fileEncoding of write.csv:
write.csv(..., fileEncoding="UTF-16LE")
From write.csv documentation:
fileEncoding character string: if non-empty declares the encoding to
be used on a file (not a connection) so the character data can be
re-encoded as they are written. See file.
CSV files do not record an encoding, and this causes problems if they
are not ASCII for many other applications. Windows Excel 2007/10 will
open files (e.g., by the file association mechanism) correctly if they
are ASCII or UTF-16 (use fileEncoding = "UTF-16LE") or perhaps in the
current Windows codepage (e.g., "CP1252"), but the ‘Text Import
Wizard’ (from the ‘Data’ tab) allows far more choice of encodings.
Excel:mac 2004/8 can import only ‘Macintosh’ (which seems to mean Mac
Roman), ‘Windows’ (perhaps Latin-1) and ‘PC-8’ files. OpenOffice 3.x
asks for the character set when opening the file.
I have tried my best to read a CSV file in r but failed. I have provided a sample of the file in the following Gdrive link.
Data
I found that it is a tab-delimited file by opening in a text editor. The file is read in Excel without issues. But when I try to read it in R using "readr" package or the base r packages, it fails. Not sure why. I have tried different encoding like UTF-8. UTF-16, UTF16LE. Could you please help me to write the correct script to read this file. Currently, I am converting this file to excel as a comma-delimited to read in R. But I am sure there must be something that I am doing wrong. Any help would be appreciated.
Thanks
Amal
PS: What I don't understand is how excel is reading the file without any parameters provided? Can we build the same logic in R to read any file?
This is a Windows-related encoding problem.
When I open your file in Notepad++ it tells me it is encoded as UCS-2 LE BOM. There is a trick to reading in files with unusual encodings into R. In your case this seems to do the trick:
read.delim(con <- file("temp.csv", encoding = "UCS-2LE"))
(adapted from R: can't read unicode text files even when specifying the encoding).
BTW "CSV" stands for "comma separated values". This file has tab-separated values, so you should give it either a .tsv or .txt suffix, not .csv, to avoid confusion.
In terms of your second question, could we build the same logic in R to guess encoding, delimiters and read in many types of file without us explicitly saying what the encoding and delimiter is - yes, this would certainly be possible. Whether it is desirable I'm not sure.
I try to deal with encoding UTF-8 my R package. My R version is 3.4.4 on Windows.
My package is composed of some functions with console printing and
graph who needed encoding UTF 8 (french).
I try to add this line in my R Script (at the beginning of script containing my function and in my function) but the printing is like this "Répartition de la différence"
Sys.setlocale("LC_CTYPE","french")
options(encoding = "UTF-8")
In another script, after load my package, I also add this few line but I have the same encoding problem ...
Any ideas ?
You can add a line specifying Encoding: UTF-8 in your DESCRIPTION file.
See https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Character-encoding-issues
If the DESCRIPTION file is not entirely in ASCII it should contain an
‘Encoding’ field specifying an encoding. This is used as the encoding
of the DESCRIPTION file itself and of the R and NAMESPACE files, and
as the default encoding of .Rd files. The examples are assumed to be
in this encoding when running R CMD check, and it is used for the
encoding of the CITATION file. Only encoding names latin1, latin2 and
UTF-8 are known to be portable. (Do not specify an encoding unless one
is actually needed: doing so makes the package less portable. If a
package has a specified encoding, you should run R CMD build etc in a
locale using that encoding.)
Please let me know if it solves your issue.
I have a .csv file which should be in 'UTF-8' encoding. I have exported it from Sql Server Management Studio. However, when importing it to R it fails on the lines with ÿ. I use read.csv2 and specify file encoding "UTF-8-BOM".
Notepad++ correctly displays the ÿ and says it is UTF-8 encoding. Is this a bug with the R encoding, or is ÿ in fact not part of the UTF-8 encoding scheme?
I have uploaded a small tab delimited .txt file that fails here:
https://www.dropbox.com/s/i2d5yj8sv299bsu/TestData.txt
Thanks
That is probably part of the BOM marker at the beginning. If the editor or parser doesn't recognize BOM markers it believes it is garbage. See https://www.ultraedit.com/support/tutorials-power-tips/ultraedit/unicode.html for more details.
i am facing a problem with creating a .xls file in Hebrew/arabic language in php.
When i create a file then all of the field created that in english.But field in Hebrew language field has created in unreadable format.
if anyone can help me?
i m waiting...........
You can use phpexcel library for this purpose. Its works great. You can download the library from here:
http://code.google.com/p/php-excel/