file encoding changes when appending - r

I want to write a file using write.table and use UTF-8 as encoding. This works as long as I don't append to this file. When I do, the encoding changes to ANSI. Why is that and how can I prevent this?
Here is a small example code:
options("encoding" = "UTF-8")
write.table("Hello World in UTF-8", file = "C:/TEMP/test.txt", col.names = FALSE, row.names = FALSE, sep = "", quote = FALSE)
write.table("Now it changes to ANSI", file = "C:/TEMP/test.txt", col.names = FALSE, row.names = FALSE, sep = "", quote = FALSE, append = TRUE)
I also tried to use fileEncoding = "UTF-8" directly in write.table, but the result is the same.

Personally, I prefer not to rely on global option. Using fileEncoding parameter to write.table safeguards your code from any changes in global option. Hence the line should be:
write.table("Now it changes to ANSI", file = "C:/TEMP/test.txt", col.names = FALSE, row.names = FALSE, sep = "", quote = FALSE, append = TRUE, fileEncoding = "UTF-8")

Related

Why R reads CSV file differently

I am using
myCounts<-read.csv("myCounts.csv", header = TRUE, row.names = 1, sep = ",")
and
Book4 <- read_delim("Book4.csv", delim = ";",
escape_double = FALSE, trim_ws = TRUE)
to read two csv files. But read.csv and read.delim is pressing them differently.
Could you please explane how to read in book4 data in the same structure of myCounts data?
I tried following, it works.
df<-read.delim("~/Documents/sample.csv" ,sep = ";",row.names = 1)

How do I copy text files containing code?

I am trying to read .tex files containing LaTeX code, and paste their content into different .tex files depending on the results of calculations in R.
I need to avoid changing any character of the tex files by processing them with R. I am looking for a way to stop R from interpreting the content of the files and make R just "copy" the files character for character.
Example R file:
cont <- paste(readLines("path/to/file/a.tex"), collapse = "\n")
write.table(cont , file = "Mother.tex", append = FALSE, quote = FALSE, sep = "",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = FALSE, qmethod = c("escape", "double"),
fileEncoding = "")
cont2 <- paste(readLines("path/to/file/b.tex"), collapse = "\n")
write.table(cont2 , file = "Mother.tex", append = TRUE, quote = FALSE, sep = "",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = FALSE, qmethod = c("escape", "double"),
fileEncoding = "")
cont3 <- paste(readLines("path/to/file/c.tex"), collapse = "\n")
write.table(cont3 , file = "Mother.tex", append = TRUE, quote = FALSE, sep = "",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = FALSE, qmethod = c("escape", "double"),
fileEncoding = "")
cont4 <- paste(readLines("path/to/file/d.tex"), collapse = "\n")
write.table(cont4 , file = "Mother.tex", append = TRUE, quote = FALSE, sep = "",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = FALSE, qmethod = c("escape", "double"),
fileEncoding = "")
Example Latex File a:
\documentclass{beamer}
\usepackage{listings}
\lstset{basicstyle=\ttfamily, keywordstyle=\bfseries}
\begin{document}
Example Latex file b:
\begin{frame}
Example Latex file c:
content based on values in r
\end{frame}
Example Latex file d:
\end{document}
I do have two Problems now:
wrong escape information for readlines
non utf-8 keyword at files: b,c,d
Latex is not abled to compile sucessfully, because theres an non utf-8 information inside the Motherfile after processing Mother with r.
If i do copy and paste the content of each file manually i am abled to compile Latex sucessfully. As a result of the information about bad utf-8 information in Latex (no wrong Characters in TexLive IDE shown) I suspect r to add information into the files, which is not shown by IDE TextLive.
I do not understand why theres something "invisible" added into my Mother tex file which is not shown inside TexLive.
Assuming you want to store the content of the .tex file into a string.
cont <- paste(readLines("path/to/file/file.tex"), collapse = "\n")

Opening csv-file in R

I want to read a csv-data record into R. I downloaded the script and the data set from SoSci Survey and got the following error message:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec
= dec, : scan() expected 'a logical', got '3'
in the script:
zh = read.table(
file=zh_file, encoding="UTF-8",
header = FALSE, sep = "\t", quote = "\"",
dec = ".", row.names = "CASE",
col.names = c(
"CASE","SERIAL","REF","QUESTNNR","MODE","LANGUAGE","STARTED","ZH02","ZH03",
"ZH19","ZH19_03","ZH04","ZH05","ZH08_01","ZH08_02","ZH08_03","ZH08_04",
"ZH08_05","ZH08_06","ZH09_01","ZH09_02","ZH11_01","ZH11_02","ZH11_03","ZH11_04",
"ZH13_01","ZH13_02","ZH13_03","ZH13_04","ZH13_05","ZH14","ZH14_01","ZH14_02",
"ZH14_03","ZH14_04","ZH14_05","ZH14_06","ZH14_07","ZH14_09","ZH14_08",
"ZH14_08a","ZH15","ZH15_01","ZH15_02","ZH15_03","ZH15_04","ZH15_05","ZH15_06",
"ZH15_07","ZH15_08","ZH15_09","ZH15_09a","ZH16","ZH16_01","ZH16_02","ZH16_03",
"ZH16_04","ZH16_05","ZH16_06","ZH16_07","ZH16_08","ZH16_09","TIME001","TIME002",
"TIME003","TIME004","TIME005","TIME006","TIME007","TIME008","TIME009","TIME010",
"TIME011","TIME012","TIME013","TIME014","TIME015","TIME016","TIME017",
"TIME_SUM","MAILSENT","LASTDATA","FINISHED","Q_VIEWER","LASTPAGE","MAXPAGE",
"MISSING","MISSREL","TIME_RSI","DEG_TIME"
),
as.is = TRUE,
colClasses = c(
CASE="numeric", SERIAL="character", REF="character", QUESTNNR="character",
MODE="character", LANGUAGE="character", STARTED="POSIXct", ZH02="numeric",
ZH03="numeric", ZH19="numeric", ZH19_03="character", ZH04="numeric",
ZH05="numeric", ZH08_01="numeric", ZH08_02="numeric", ZH08_03="numeric",
ZH08_04="numeric", ZH08_05="numeric", ZH08_06="numeric", ZH09_01="numeric",
ZH09_02="numeric", ZH11_01="numeric", ZH11_02="numeric", ZH11_03="numeric",
ZH11_04="numeric", ZH13_01="numeric", ZH13_02="numeric", ZH13_03="numeric",
ZH13_04="numeric", ZH13_05="numeric", ZH14="numeric", ZH14_01="logical",
ZH14_02="logical", ZH14_03="logical", ZH14_04="logical", ZH14_05="logical",
ZH14_06="logical", ZH14_07="logical", ZH14_09="logical", ZH14_08="logical",
ZH14_08a="character", ZH15="numeric", ZH15_01="logical", ZH15_02="logical",
ZH15_03="logical", ZH15_04="logical", ZH15_05="logical", ZH15_06="logical",
ZH15_07="logical", ZH15_08="logical", ZH15_09="logical",
ZH15_09a="character", ZH16="numeric", ZH16_01="logical", ZH16_02="logical",
ZH16_03="logical", ZH16_04="logical", ZH16_05="logical", ZH16_06="logical",
ZH16_07="logical", ZH16_08="logical", ZH16_09="logical", TIME001="integer",
TIME002="integer", TIME003="integer", TIME004="integer", TIME005="integer",
TIME006="integer", TIME007="integer", TIME008="integer", TIME009="integer",
TIME010="integer", TIME011="integer", TIME012="integer", TIME013="integer",
TIME014="integer", TIME015="integer", TIME016="integer", TIME017="integer",
TIME_SUM="integer", MAILSENT="POSIXct", LASTDATA="POSIXct",
FINISHED="logical", Q_VIEWER="logical", LASTPAGE="numeric",
MAXPAGE="numeric", MISSING="numeric", MISSREL="numeric", TIME_RSI="numeric",
DEG_TIME="numeric"
),
skip = 1,
check.names = TRUE, fill = TRUE,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "",
na.strings = ""
)
What should I do?
Looking for help!
Have you tried using read.csv("filename.csv",header=T,sep=",") instead of read.table?

How to read non-english characters with read.delim in R?

I have a text file containing several languages, how to read in R use read.delim function,
Encoding("file.tsv")
#[1] "unknown"
source_data = read.delim(file, header= F, fileEncoding= "windows-1252",
sep = "\t", quote = "")
source_D[360]
#[1] "ð¿ð¾ð¸ñðº ð½ð° ññ‚ð¾ð¼ ñð°ð¹ñ‚ðµ"
But the source_D[360] showed in Notepad is 'поиск на этом сайте'
tidyverse approach:
use the option locale in read_delim.
(readr functions have _ instead of . and are usually faster and smarter to read)
more details here: https://r4ds.had.co.nz/data-import.html#parsing-a-vector
source_data = read_delim(file, header= F,
locale = locale(encoding = "windows-1252"),
sep = "\t", quote = "")
source_data = read.delim(file, header = F, sep = "\t", quote = "", stringsAsFactors = FALSE)
Encoding(source_data)= "UTF-8"
I have tried, If you run you R in windows, above code works for me.
and if you run R in Unix, you could use following code
source_data = read.delim(file, header = F, fileEncoding="UTF-8", sep = "\t", quote = "", stringsAsFactors = FALSE)

Error in reading a CSV file with read.table()

I am encountering an issue while loading a CSV data set in R. The data set can be taken from
https://data.baltimorecity.gov/City-Government/Baltimore-City-Employee-Salaries-FY2015/nsfe-bg53
I imported the data using read.csv as below and the dataset was imported correctly.
EmpSal <- read.csv('E:/Data/EmpSalaries.csv')
I tried reading the data using read.table and there were a lot of anomalies when looking at the dataset.
EmpSal1 <- read.table('E:/Data/EmpSalaries.csv',sep=',',header = T,fill = T)
The above code started reading the data from 7th row and the dataset actually contains ~14K rows but only 5K rows were imported. When looked at the dataset in few cases 15-20 rows were combined into a single row and the entire row data appeared in a single column.
I can work on the dataset using read.csv but I am curious to know the reason why it didn't work with read.table.
read.csv is defined as:
function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
You need to add quote="\"" (read.table expects single quotes by default whereas read.csv expects double quotes)
EmpSal <- read.csv('Baltimore_City_Employee_Salaries_FY2015.csv')
EmpSal1 <- read.table('Baltimore_City_Employee_Salaries_FY2015.csv', sep=',', header = TRUE, fill = TRUE, quote="\"")
identical(EmpSal, EmpSal1)
# TRUE
As you mentioned, your data is imported successfully by using read.csv() command without mentioning quote argument.
Default value of quote argument for read.csv function is "\"" and for read.table function, it is "\"'".
Check following code,
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
row.names, col.names, as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
read.csv(file, header = TRUE, sep = ",", quote = "\"",
dec = ".", fill = TRUE, comment.char = "", ...)
There are many single quotation in your specified data. And this is the reason why read.table function isn't working for you.
Try the following code and it will work for you.
r<-read.table('/home/workspace/Downloads/Baltimore_City_Employee_Salaries_FY2015.csv',sep=",",quote="\"",header=T,fill=T)

Resources