Printing several pieces of output to the same CSV in R? - r

I am using the TraMineR package. I am printing output to a CSV file, like this:
write.csv(seqient(sequences.seq), file = "diversity_measures.csv", quote = FALSE, na = "", row.names = TRUE)
write.csv(seqici(sequences.seq), file = "diversity_measures.csv", quote = FALSE, na = "", row.names = TRUE, append= TRUE)
write.csv(seqST(sequences.seq), file = "diversity_measures.csv", quote = FALSE, na = "", row.names = TRUE, append= TRUE)
The dput(sequences.seq) object can be found here.
However, this does not append the output properly but creates this error message:
In write.csv(seqST(sequences.seq), file = "diversity_measures.csv", :attempt to set 'append' ignored
Additionally, it only gives me the output for the last command, so it seems like it overwrites the file each time.
Is it possible to get all the columns in a single CSV file, with a column name for each (i.e. entropy, complexity, turbulence)

You can use append=TRUE in write.table calls and use the same file name, but you'll need to specify all the other arguments as needed. append=TRUE is not available for the wrapper function write.csv, as noted in the documentation:
These wrappers are deliberately inflexible: they are designed to
ensure that the correct conventions are used to write a valid file.
Attempts to change append, col.names, sep, dec or qmethod are ignored,
with a warning.
Or you could write out
write.csv(data.frame(entropy=seqient(sequences.seq),
complexity=seqici(sequences.seq),
turbulence=seqST(sequences.seq)),
'output.csv')

Related

Remove extra row in printing to file

I'm attempting to print to file the output of a str_split operation as follows:
s <- t(unlist(str_split("foo_bar_0.5", "_"), use.names = FALSE))
write.csv(s, "test.csv", quote = FALSE, row.names = FALSE, col.names = FALSE)
With the row.names = FALSE argument, I was able to remove the row names. However, this code still writes an extra line with the column names to the file as follows:
V1,V2,V3
foo,bar,0.5
With the following warning:
Warning message:
In write.csv(s, "test.csv", quote = FALSE, :
attempt to set 'col.names' ignored
I want only the second line. Any ideas what I am doing wrong?
Use write.table instead of write.csv :
write.table(s, "test.csv",sep=',', quote = FALSE, row.names = FALSE, col.names = FALSE)
write.table has two parameters like sep for putting the delimeter correctly in this case its comma, the other parameter is col.names which is a valid parameter, setting this to False should work for you.
Also as per documentation, if look for ?write.csv, for the ellipsis(...) , it says the following
... arguments to write.table: append, col.names, sep, dec and qmethod
cannot be altered.
A more detailed explanation is also present in documentation which mentions the warning you are getting:
write.csv and write.csv2 provide convenience wrappers for writing CSV
files. They set sep and dec (see below), qmethod = "double", and
col.names to NA if row.names = TRUE (the default) and to TRUE
otherwise.
write.csv uses "." for the decimal point and a comma for the
separator.
write.csv2 uses a comma for the decimal point and a semicolon for the
separator, the Excel convention for CSV files in some Western European
locales.
These wrappers are deliberately inflexible: they are designed to
ensure that the correct conventions are used to write a valid file.
Attempts to change append, col.names, sep, dec or qmethod are ignored,
with a warning.

How to use a sequential combination of YAML parameters in Rmarkdown

How can I use a sequence of parameters, assigned using YAML in Rmarkdown? I am trying to write a csv file, but I can only use one param.
For instance. I have:
title: "My Title"
params:
month: "March"
person: "CEO"
extention: ".csv"
I want to add all of the assigned parameters as a single continuous word:
write.table(d, file=params$month params$person params$extention, quote = F, fileEncoding = "UTF-8", sep = ";", row.names=FALSE)
However, its not possible like that, it only reads one parameter if put like that.
What is the right way to do that?
Your file attribute has to be a string so you can use paste to put it all together as one string. (paste0 is the same as paste but doesn't put a separation between the variables by default)
write.table(d, file = paste0(params$month,params$person,params$extention),
quote = F, fileEncoding = "UTF-8", sep = ";", row.names = FALSE)
will give "MarchCEO.csv"
In your case you only had month for the name of the file, the other variables were not taken into account because of the space, so it thought it was some other attribute..
if you want the line to be more readable you can define the name first like that:
myfilename <- paste0(params$month,params$person,params$extention)
write.table(d, file = myfilename, quote = F, fileEncoding = "UTF-8",
sep = ";", row.names = FALSE)

Problems reading in table with unclear line-end symbol

I am currently trying to read in a .txt file.
I have researched here and found Error in reading in data set in R - however, it did not solve my problem.
The data are political contributions listed by the Federal Election Commission of the U.S. at ftp://ftp.fec.gov/FEC/2014/webk14.zip
Upon inspection of the .txt, I realized that the data is weirdly structured. Especially, the end of the any line is not separated at all from the first cell of the next line (not by a "|", not by a space).
Strangely enough, import via Excel and Access seems to work just fine. However, R import does not work.
To avoid the Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 90 did not have 27 elements error, I use the following command:
webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", sep = "|", file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))
This does not result in an error, however, the results a) have a different line count than with Excel import and b) fail to correctly separate columns (which is probably the reason for a))
I would like not to do a detour via Excel and directly import into R. Any ideas what I am doing wrong?
It might be related to the symbols inside the variable names so turn of interpretation of these using comment.char="", which gives you:
webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", comment.char="",sep = "|",file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))

Scan and readLines

In R, scan and readLines have same function on file reading, but different classes of output. To get the vector for further steps, I use scan in reading files. However, one of the text file always occurs error, just like below:
filt <- "E:/lexicon/wenku_baidu_com/stopwords_cn.txt"
specialfilter <- scan(file = filt, what=character(), nmax = -1, sep = "\n", blank.lines.skip = TRUE, skipNul = TRUE, fileEncoding = "UTF-8")
Read 1 item
Warning message:
In scan(file = filt, what = character(), nmax = -1, sep = "\n", :
invalid input found on input connection 'E:/lexicon/wenku_baidu_com/stopwords_cn.txt'
The environment has checked several times, no directory error, no encoding error(file encoding is UTF-8). The salient feature in this file is it has thousand of lines. If use readLines, there is no errors at all:
specialfilter<-readLines(filt, encoding = "UTF-8", skipNul = FALSE)
My questions are:
Is scan have lines limits on reading files? If the answer is
“yes”, how many lines it can read in one file?
If in this case, we can only use readLines, how to change the
result(specialfilter) into vector?
PS: the file uploaded in a network storage, its only 12kb: https://yunpan.cn/OcMTMXyFXNQzYu Access Code is 3c9d

Keep rows separate with write.table R

I'm trying to produce some files that have slightly unusual field seperators.
require(data.table)
dset <- data.table(MPAN = c(rep("AAAA",1000),rep("BBBB",1000),rep("CCCC",1000)),
INT01 = runif(3000,0,1), INT02 = runif(3000,0,1), INT03 = runif(3000,0,1))
write.table(dset,"C:/testing_write_table.csv",
sep = "|",row.names = FALSE, col.names = FALSE, na = "", quote = FALSE, eol = "")
I'm findiong however that the rows are not being kept seperate in the output file, e.g.
AAAA|0.238683722680435|0.782154920976609|0.0570344978477806AAAA|0.9250325632......
Would you know how to ensure the text file retains distinct rows?
Cheers
You are using the wrong eol argument. The end of line argument needs to be a break line:
This worked for me:
require(data.table)
dset <- data.table(MPAN = c(rep("AAAA",1000),rep("BBBB",1000),rep("CCCC",1000)),
INT01 = runif(3000,0,1), INT02 = runif(3000,0,1), INT03 = runif(3000,0,1))
write.table(dset,"C:/testing_write_table.csv", #save as .txt if you want to open it with notepad as well as excel
sep = "|",row.names = FALSE, col.names = FALSE, na = "", quote = FALSE, eol = "\n")
Using the break line symbol '\n' as the end of line argument creates separate lines for me.
Turns out this was a UNIX - Windows encoding issue. So something of a red herring, but perhaps worth recording in case anyone else has this at first perplexing issue.
It turns out that Windows notepad sometimes struggles to render files generated in UNIX properly, a quick test to see if this is the issue is to open in Windows WordPad instead and you may find that it will render properly.

Resources