I'm trying to produce some files that have slightly unusual field seperators.
require(data.table)
dset <- data.table(MPAN = c(rep("AAAA",1000),rep("BBBB",1000),rep("CCCC",1000)),
INT01 = runif(3000,0,1), INT02 = runif(3000,0,1), INT03 = runif(3000,0,1))
write.table(dset,"C:/testing_write_table.csv",
sep = "|",row.names = FALSE, col.names = FALSE, na = "", quote = FALSE, eol = "")
I'm findiong however that the rows are not being kept seperate in the output file, e.g.
AAAA|0.238683722680435|0.782154920976609|0.0570344978477806AAAA|0.9250325632......
Would you know how to ensure the text file retains distinct rows?
Cheers
You are using the wrong eol argument. The end of line argument needs to be a break line:
This worked for me:
require(data.table)
dset <- data.table(MPAN = c(rep("AAAA",1000),rep("BBBB",1000),rep("CCCC",1000)),
INT01 = runif(3000,0,1), INT02 = runif(3000,0,1), INT03 = runif(3000,0,1))
write.table(dset,"C:/testing_write_table.csv", #save as .txt if you want to open it with notepad as well as excel
sep = "|",row.names = FALSE, col.names = FALSE, na = "", quote = FALSE, eol = "\n")
Using the break line symbol '\n' as the end of line argument creates separate lines for me.
Turns out this was a UNIX - Windows encoding issue. So something of a red herring, but perhaps worth recording in case anyone else has this at first perplexing issue.
It turns out that Windows notepad sometimes struggles to render files generated in UNIX properly, a quick test to see if this is the issue is to open in Windows WordPad instead and you may find that it will render properly.
Related
I'm wondering if there's a way to monitor the contents of a file from within R, similar to the behavior of tail -f (details here) in the Linux terminal.
Specifically, I want a function that you could pass a file path and it would
print the last n lines of the file to the console
hold the console
continue printing any new lines, as they are added
There are outstanding questions like "what if previously printed lines in the file get modified?" and honestly I'm not sure how tail -f handles that, but I'm interested in streaming a log file to the console, so it's kind of beside the point for my current usage.
I was looking around in the ?readLines and ?file docs and I feel like I'm getting close, but I can't quite figure it out. Plus, I can't imagine I'm the first one to want to do this, so maybe there's an established best practice (or even an existing function). Any help is greatly appreciated.
Thanks!
I made progress on this using the processx package. I created an R script which I named fswatch.R:
library(processx)
monitor <- function(fpath = "test.csv", wait_monitor = 1000 * 60 * 2){
system(paste0("touch ", fpath))
print_last <- function(fpath){
con <- file(fpath, "r", blocking = FALSE)
lines <- readLines(con)
print(lines[length(lines)])
close(con)
}
if(file.exists(fpath)){
print_last(fpath)
}
p <- process$new("fswatch", fpath, stdin = "|", stdout = "|", stderr = "|")
while(
# TRUE
p$is_alive() &
file.exists(fpath)
){
p$poll_io(wait_monitor)
p$read_output()
print_last(fpath)
# call poll_io twice otherwise endless loop :shrug:
p$poll_io(wait_monitor)
p$read_output()
}
p$kill()
}
monitor()
Then I ran the script as a "job" in RStudio.
Every time I wrote to test.csv the job printed the last line. I stopped monitoring by deleting the log file:
log_path <- "test.csv"
write.table('1', log_path, sep = ",", col.names = FALSE,
append = TRUE, row.names = FALSE)
write.table("2", log_path, sep = ",", col.names = FALSE,
append = TRUE, row.names = FALSE)
unlink(log_path)
I have this code:
write.table(df, file = f, append = F, quote = TRUE, sep = ";",
eol = "\n", na = "", dec = ".", row.names = FALSE,
col.names = TRUE, qmethod = c("escape", "double"))
where df is my data frame and f is a .csv file name.
The problem is that the resulting csv file has an empty line at the end.
When I try to read the file:
dd<-read.table(f,fileEncoding = "UTF-8",sep = ";",header = T,quote = "\"")
I get the following error:
incomplete final line found by readTableHeader
Is there something I am missing?
Thank you in advance.
UPDATE: I solved the problem deleting the UTF-8 file encoding into the read.table:
dd<-read.table(f,sep = ";",header = T,quote = "\"")
but I can't explain the reason of this, since the default for write.table seems to be UTF-8 anyway (I checked this using an advanced text tool).
Any idea of why this is happening?
Thank you,
I am currently trying to read in a .txt file.
I have researched here and found Error in reading in data set in R - however, it did not solve my problem.
The data are political contributions listed by the Federal Election Commission of the U.S. at ftp://ftp.fec.gov/FEC/2014/webk14.zip
Upon inspection of the .txt, I realized that the data is weirdly structured. Especially, the end of the any line is not separated at all from the first cell of the next line (not by a "|", not by a space).
Strangely enough, import via Excel and Access seems to work just fine. However, R import does not work.
To avoid the Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 90 did not have 27 elements error, I use the following command:
webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", sep = "|", file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))
This does not result in an error, however, the results a) have a different line count than with Excel import and b) fail to correctly separate columns (which is probably the reason for a))
I would like not to do a detour via Excel and directly import into R. Any ideas what I am doing wrong?
It might be related to the symbols inside the variable names so turn of interpretation of these using comment.char="", which gives you:
webk14 <- read.table(header = FALSE, fill = TRUE, colClasses = "character", comment.char="",sep = "|",file = "webk14.txt", stringsAsFactors = FALSE, dec = ".", col.names = c("cmte_id", "cmte_nm", "cmte_tp", "cmte_dsgn", "cmte_filing_freq", "ttl_receipts", "trans_from_aff", "indv_contrib", "other_pol_cmte_contrib", "cand_contrib", "cand_loans", "ttl_loans_received", "ttl_disb", "tranf_to_aff", "indv_refunds", "other_pol_cmte_refunds", "cand_loan_repay", "loan_repay", "coh_bop", "coh_cop", "debts_owed_by", "nonfed_trans_received", "contrib_to_other_cmte", "ind_exp", "pty_coord_exp", "nonfed_share_exp","cvg_end_dt"))
I have a file similar to
ColA ColB ColC
A 1 0.1
B 2 0.2
But with many more columns.
I want to read the table and set the correct type of data for each column.
I am doing the following:
data <- read.table("file.dat", header = FALSE, na.string = "",
dec = ".",skip = 1,
colClasses = c("character", "integer","numeric"))
But I get the following error:
Error in scan(...): scan() expected 'an integer', got 'ColB'
What am I doing wrong? Why is it trying to parse also the first line according to colClasses, despite skip=1?
Thanks for your help.
Some notes: This file has been generated in a Linux environment and is being worked on in a Windows environment. I am thinking of a problem with newline characters, but I have no idea what to do.
Also, if I read the table without colClasses the table is read correctly (skipping the first line) but all columns are factor type. I can probably change the class later, but still I would like to understand what is happening.
Instead of skipping first line, you can change header = TRUE and it should work fine.
data <- read.table("file.dat", header = TRUE, na.string = "",
dec = ".",colClasses = c("character", "integer","numeric"), sep = ",")
I am using the TraMineR package. I am printing output to a CSV file, like this:
write.csv(seqient(sequences.seq), file = "diversity_measures.csv", quote = FALSE, na = "", row.names = TRUE)
write.csv(seqici(sequences.seq), file = "diversity_measures.csv", quote = FALSE, na = "", row.names = TRUE, append= TRUE)
write.csv(seqST(sequences.seq), file = "diversity_measures.csv", quote = FALSE, na = "", row.names = TRUE, append= TRUE)
The dput(sequences.seq) object can be found here.
However, this does not append the output properly but creates this error message:
In write.csv(seqST(sequences.seq), file = "diversity_measures.csv", :attempt to set 'append' ignored
Additionally, it only gives me the output for the last command, so it seems like it overwrites the file each time.
Is it possible to get all the columns in a single CSV file, with a column name for each (i.e. entropy, complexity, turbulence)
You can use append=TRUE in write.table calls and use the same file name, but you'll need to specify all the other arguments as needed. append=TRUE is not available for the wrapper function write.csv, as noted in the documentation:
These wrappers are deliberately inflexible: they are designed to
ensure that the correct conventions are used to write a valid file.
Attempts to change append, col.names, sep, dec or qmethod are ignored,
with a warning.
Or you could write out
write.csv(data.frame(entropy=seqient(sequences.seq),
complexity=seqici(sequences.seq),
turbulence=seqST(sequences.seq)),
'output.csv')