For writing a single xtab, I have used:
write.csv(xtabData1, "analysis.csv")
For appending one more xtab to the same csv file, I tried:
write.csv(xtabData2, "analysis.csv", append=T)
But this throws a warning "attempt to set 'append' ignored" and overwrites the csv file.
One solution would be to join the tab data first using rbind, e.g.
write.csv(rbind(xtabData1, xtabData2), file="analysis.csv")
The append option is disabled in write.csv(). write.csv() is just a wrapper function for write.table(). Here;s more from the help file.
write.csv and write.csv2 provide convenience wrappers for writing CSV
files. They set sep and dec (see below), qmethod = "double", and
col.names to NA if row.names = TRUE (the default) and to TRUE
otherwise. write.csv uses "." for the decimal point and a comma for
the separator. write.csv2 uses a comma for the decimal point and a
semicolon for the separator, the Excel convention for CSV files in
some Western European locales. These wrappers are deliberately
inflexible: they are designed to ensure that the correct conventions
are used to write a valid file. Attempts to change append, col.names,
sep, dec or qmethod are ignored, with a warning.
Use write.table() instead (with sep="," and whatever other settings you'd like).
Related
I have just received a data file, whose extension is "*.psv". After doing a bit of research, I don't know how to open it R.
We could use read.table to read *.psv file.
read.table("myfile.psv", sep = "|", header = FALSE, stringsAsFactors = FALSE)
There might be many different representations of psv file, but when it comes to data mining, I think it might be more about "pipe separated" file. The data in the file is separated by "|"
I am trying to read a specific file that I have copied from an SFTP location. The file is pipe delimited. I can read the file in Excel. But R read is as null values and column names are being duplicated. I don't understand if this is an encoding issue? I am trying to create a bash script to automate this process. Any help? Below is the link for the data.
Here's file!
I have tried changing the Encoding. But without knowing which encoding I am struggling. I have tried using read_delim, ead_table, read.table, read_csv and read.csv. But no help.
this is the code I have used to read the file.
read_delim("./Engagement_Level.txt", delim = "|")
I would like to read it as a data frame.
The issue is that the file encoding is UTF-16LE, which read_delim cannot read at present.
You could use the base read.delim and file() to specify the encoding:
read.delim(file("Engagement_Level.txt", encoding = "UTF-16LE"), sep = "|")
That will convert all the quoted numbers to numeric. If you'd rather they were type character, to deal with later:
read.delim(file("Engagement_Level.txt", encoding = "UTF-16LE"), sep = "|",
colClasses = "character")
I really recommend you to use Excel to build a CSV file using Data>Text in columns, this is not appropriate in this context but it's incredibly infallible and quickly.
Then use read.csv(file,sep=",").
How can I use |(pipe) as a delimiter while writing csv files in R?
When I try writing a data set into a file with write.csv with sep = "|", it ignores the separator and writes the file simply as a comma separated file.
Also write.csv2 also doesn't seem to cover the other variety of characters which could be used as a separator.
Is there a way to use other characters such as ^, $, ~, ¬ or |, as a delimiter while writing a csv file in R.
Thanks.
You have to understand that .csv means "comma-separated value" https://en.wikipedia.org/wiki/Comma-separated_values.
If you want to export with a separator using that characters you need another function.
For example, using write.table, and you'll be able to load this file with R, Excel,....
write.table(data, "data.txt", sep = "|")
data_load <- read.table("data.txt", sep = "|")
Feel free to use any character as separator.
Or you could force this plain text to be .csv
write.table(data, "data.csv", sep = "|")
data_load <- read.csv("data.csv", sep = "|")
This answer is just a variation of the one I gave for this question. They are similar, but I don't think the question itself is an exact duplicate, but they are both part of a bigger question (not yet asked).
In the help for write.table, it states:
write.csv and write.csv2 provide convenience wrappers for writing CSV files.
...
These wrappers are deliberately inflexible: they are designed to ensure that the correct conventions are used to write a valid file.
Attempts to change append, col.names, sep, dec or qmethod are ignored,
with a warning.
To set sep or another of these parameters you need to use write.table instead of write.csv.
I am processing the US Weather service Storm Data, which has one large CSV data file for each year from 1950 onwards. The 1999 year file contains several rows with very large freeform text fields which contain embedded NUL characters, in an otherwise vanilla ascii database. (The offending file is at ftp://ftp.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/StormEvents_details-ftp_v1.0_d1999_c20140915.csv.gz).
R cannot handle corrupted string data without errors,and this includes R data.frame, data.table, stringr, and stringi package functions (tried).
I can clean the files of NULs with sed, but I would prefer not to use external programs, as this is for an R markdown type report with embedded code.
Suggestions?
Maybe this could be of help:
in.file <- file(description = "StormEvents_details-ftp_v1.0_d1999_c20140915.csv",
open = "r")
writeLines(iconv(readLines(in.file), to = "ASCII"),
con = "StormEvents_ascii.csv")
I was able to read the csv file without errors with this call do read.table:
options(stringAsFactors = FALSE)
StormEvents <- read.table("StormEvents_ascii.csv", header = TRUE,
sep = ",", fill = TRUE, quote = '"')
Obviously you'd need to change the class of several columns, since all are considered character as it is.
Just for posterity - you can use binary reads (readBin()) and replace the NULs with anything else - see
Removing "NUL" characters (within R)
An update for May 2020: The tidyverse and data.table both still choke on null characters within files however the base::read.*() family and readLines() will gracefully skip them with the skipNul=TRUE option. You can read a file in skipping over null characters and then write it back out again.
I used SAS to save a tab-delimited text file with utf8 encoding on a windows machine. Then I tried to open this in R:
read.table(myfile, header =TRUE, sep = "\t")
To my surprise, the data was totally messed up, but only in a sneaky way. Number values changed randomly, but the overall layout looked normal, so it took me a while to notice the problem, which I'm assuming now is the BOM.
This is not a new issue of course; they address it briefly here, and recommend using
read.table(myfile, fileEncoding = "UTF-8", header =TRUE, sep = "\t")
However, this made no improvement! My only solution was to suppress the header, with or without the fileEncoding argument:
read.table(myfile, fileEncoding = "UTF-8", header =FALSE, sep = "\t")
read.table(myfile, header =FALSE, sep = "\t")
In either case, I have to do some funny business to replace the column names with the first row, but only after I remove some version of the BOM that appears at the beginning of the first column name (<U+FEFF> if I use fileEncoding and
 if I don't use fileEncoding).
Isn't there a simple way to just remove the BOM and use read.table without any special arguments?
Update for #Joe:
The SAS that I used:
FILENAME myfile 'C:\Documents ... file.txt' encoding="utf-8";
proc export data=lib.sastable
outfile=myfile
dbms=tab replace;
putnames=yes;
run;
Update on further weirdness: Using fileEncoding="UTF-8-BOM" as #Joe suggested in his solution below seems to remove the BOM. However, it did not fix my original motivating problem, which is corruption in the data; the header row is fine, but weirdly the last few digits of the first column of numbers gets messed up. I'll give Joe credit for his answer -- maybe my problem is not actually a BOM issue?
Hack solution: Use fileEncoding="UTF-8-BOM" AND also include the argument colClasses = "character". No idea why this works to fix the data corruption issue -- could be the topic of a future question.
As per your link, it looks like it works for me with:
read.table('c:\\temp\\testfile.txt',fileEncoding='UTF-8-BOM',header=TRUE,sep='\t')
note the -BOM in the file encoding.
This is in 2.1 Variations on read.table in the r documentation. Under 12 Encoding, see "Under UNIX you might need...", which apparently applies even on Windows now (for me, at least).
or you can use the sas system option options=NOBOMFILE the write a uft-8 file without the BOM.