I am trying to automate some data exporting, and I would like to add a header to each file such as "please cite Bob and Jane 2008" ... or even a few lines of specific instructions depending on the context.
I have looked at the write.csv and write.table documentation, but do not see any such feature.
What is the easiest way to achieve this?
Here are two possible approaches - the solution under EDIT using connections is more flexible and efficient.
Using write.table(...,append = T) and cat
Use append=T within a call to write.table, having cat the header there previously
wrapped in its own function....
write.table_with_header <- function(x, file, header, ...){
cat(header, '\n', file = file)
write.table(x, file, append = T, ...)
}
Note that append is ignored in a write.csv call, so you simply need to call
write.table_with_header(x,file,header,sep=',')
and that will result in a csv file.
EDIT
using connections
(Thanks to #flodel whose suggestion is this)
my.write <- function(x, file, header, f = write.csv, ...){
# create and open the file connection
datafile <- file(file, open = 'wt')
# close on exit
on.exit(close(datafile))
# if a header is defined, write it to the file (#CarlWitthoft's suggestion)
if(!missing(header)) writeLines(header,con=datafile)
# write the file using the defined function and required addition arguments
f(x, datafile,...)
}
Note that this version allows you to use write.csv or write.table or any function and uses a file connection which
(as #flodel points out in the comments)
will only open and close the file once, and automatically appends. Therefore it is more efficient!
Related
I have a simple syntax question: Is there a way to specify the path in which to write a csv file within the .csv function itself?
I always do the following:
setwd("C:/Users/user/Desktop")
write.csv(dt, "my_file.csv", row.names = F)
However, I would like to skip the setwd() line and include it directly in the write.csv() function. I can't find a path setting in the write.csv documentation file. Is it possible to do this exclusively in write.csv without using write.table() or having to download any packages?
I am writing around 300 .csv files in a script that runs auomatically everyday. The loop runs slower when using write.table() than when using write.csv(). The whole reason I want to include the path in the write.csv() function is to see if I can decrease the time it takes to execute any further.
I typically set my "out" path in the beginning and then just use paste() to create the full filename to save to.
path_out = 'C:\\Users\\user\\Desktop\\'
fileName = paste(path_out, 'my_file.csv',sep = '')
write.csv(dt,fileName)
or all within write.csv()
path_out = 'C:\\Users\\user\\Desktop\\'
write.csv(dt,paste(path_out,'my_file.csv',sep = ''))
There is a specialized function for this: file.path:
path <- "C:/Users/user/Desktop"
write.csv(dt, file.path(path, "my_file.csv"), row.names=FALSE)
Quoting from ?file.path, its purpose is:
Construct the path to a file from components in a platform-independent way.
Some of the few things it does automatically (and paste doesn't):
Using a platform-specific path separator
Adding the path separator between path and filename (if it's not already there)
Another way might be to build a wrapper function around the write.csv function and pass the arguments of the write.csv function in your wrapper function.
write_csv_path <- function(dt,filename,sep,path){
write.csv(dt,paste0(path,filename,sep = sep))
}
Example
write_csv_path(dt = mtcars,filename = "file.csv",sep = "",path = ".\\")
In my case this works fine,
create a folder -> mmult.datas
copy its directory-> C:/Users/seyma/TP/tp.R/tp.R5 - Copy
give a name of your .csv -> df.Bench.csv
do not forget the write your data.frame -> df
write.csv(df, file ="C:/Users/seyma/TP/tp.R/tp.R5 - Copy/mmult.datas/df.Bench.csv")
for more you can check the link
Currently I use a script to edit columns of a dataset.
I click run on Rmarkdown, and my first line of code is
Data <- read.csv(file.choose(), sep = "," ,header = T , skip = 2)
This skips the first 2 lines and gives the third line a header for the file that I select after clicking run. When the script finishes, the last line of code is
write.csv(Data, "FileName.csv", row.names=FALSE)
This removes all the row names that were given numerical values on the left, and saving a FileName.csv in my working directory.
My question is if I do a read.csv of a certain file that I pick, for example, the file name is "FileName.csv", is there a way to use that name that I picked the same name as the file that I use
write.csv
and it would give out the name FileName on my working directory without manually writing it. Also is there a way to add back the other first 2 lines that I skipped when doing write.csv
You can capture the filename from file.choose and save the skipped lines to write out later.
## Capture file name
FileName = file.choose()
# Capture skipped header lines
IN=file(FileName, open="r")
Header=readLines(IN, 2)
Input <- read.csv(IN) # No need to skip lines now
close(IN)
## Whatever processing
## Output
OUT = file(FileName ,open="w")
writeLines(Header, OUT)
write.csv(Input, OUT, row.names=FALSE)
close(OUT)
You can save the filepath/filename into a variable and use that variable in both read.csv and write.csv:
myfile <- file.choose()
data <- read.csv(file=myfile, ...)
... lots of code...
write.csv(data, file=myfile)
I'd comment if I could as I am not sure this is a full answer, but I guess you could just use:
Data <- read.csv(file.choose(), skip=2))
FileName <- basename(file.choose())
write.csv(Data, FileName, row.names=FALSE)
I am trying to read a csv in a zip file by using the command fread("unzip -cq file.zip") which works perfectly when the file is in my working directory.
But when I try the command by specifying the path of the file without changing the directory say fread("unzip -cq C:/Users/My user/file.zip") I get an error saying the following unzip: cannot find either C:/Users/My or C:/Users/My.zip
The reason why this happens is that there are spaces in my path but what would be the workaround?
The only option that I have thought is to just change to the directory where each file is located and read it from there but this is not ideal.
I use shQuote for this, like...
fread_zip = function(fp, silent=FALSE){
qfp = shQuote(fp)
patt = "unzip -cq %s"
thecall = sprintf(patt, qfp)
if (!silent) cat("The call:", thecall, sep="\n")
fread(thecall)
}
Defining a pattern and then substituting in with sprintf can keep things readable and easier to manage. For example, I have a similar wrapper for .tar.gz files (which apparently need to be unzipped twice with a | pipe between the steps).
If your zip contains multiple csvs, fread isn't set up to read them all (though there's an open issue). My workaround for that case currently looks like...
library(magrittr)
fread_zips = function(fp, unzip_dir = file.path(dirname(fp), sprintf("csvtemp_%s", sub(".zip", "", basename(fp)))), silent = FALSE, do_cleanup = TRUE){
# only tested on windows
# fp should be the path to mycsvs.zip
# unzip_dir should be used only for CSVs from inside the zip
dir.create(unzip_dir, showWarnings = FALSE)
# unzip
unzip(fp, overwrite = TRUE, exdir = unzip_dir)
# list files, read separately
# not looking recursively, since csvs should be only one level deep
fns = list.files(unzip_dir)
if (!all(tools::file_ext(fns) == "csv")) stop("fp should contain only CSVs")
res = lapply(fns %>% setNames(file.path(unzip_dir, .), .), fread)
if (do_cleanup) unlink(unzip_dir, recursive = TRUE)
res
}
So, because we're not passing a command-line call directly to fread, there's no need for shQuote here. I wrote and used this function yesterday, so there are probably still some oversights or bugs.
The magrittr %>% pipe part could be written as setNames(file.path(unzip_dir, fns), fns) instead.
Try to assign the location to a variable and use paste to call the zip file like below:
myVar<-"C:/Users/Myuser/"
fread(paste0("unzip -cq ",myVar,"file.zip"))
I would like to printout to the same txt (outfile.txt) file items one after the other.
For instance, first I would like to print to outfile.txt a dataframe - u. Afterwards, a written message 'hello' and finally a summary of model.
How can I do it? Is sink(outfile.txt) is appropriate for this case?
It is generally a very bad idea to mix data in the same file. I advise against it in the strongest terms: it makes the data file next to unusable for other programs.
That said, most functions to save data have an append argument. You can set this to TRUE to append to an existing file rather than overwriting its contents. No need for sink.
Where you do need sink (or equivalent) is when you want to write contents formatted in the same way as it’s written on the console. This, for instance, is the case for summary.
Here’s an example similar to your requirements:
filename = 'test.txt'
write.table(head(cars), filename, quote = FALSE, col.names = NA)
cat('\nHello\n\n', file = filename, append = TRUE)
capture.output(print(summary(cars)), file = filename, append = TRUE)
Rather than sink, this uses capture.output, which is a convenience wrapper around sink.
I have a dataframe s. I would like to write its content into an outputfile.txt
When I use the following commands:
> sink ("outputfile.txt")
> s
> sink()
I get the following message:
[ reached getOption("max.print") -- omitted 5162 rows ]
How can I write all the content of this dataframe directly into a txt file?
Don’t use sink to write table data to files, use the appropriate functions instead. In base R, that’s write.table and its sibling functions. Unfortunately the function has some rather questionable defaults — but the following for instance should work:
write.table(data, filename, sep = '\t', quote = FALSE, col.names = NA)
sink is generally only useful to capture output from functions that don’t return their output but rather echo it directly to the console (such as warnings and messages).