Specify path in write.csv function - r

I have a simple syntax question: Is there a way to specify the path in which to write a csv file within the .csv function itself?
I always do the following:
setwd("C:/Users/user/Desktop")
write.csv(dt, "my_file.csv", row.names = F)
However, I would like to skip the setwd() line and include it directly in the write.csv() function. I can't find a path setting in the write.csv documentation file. Is it possible to do this exclusively in write.csv without using write.table() or having to download any packages?
I am writing around 300 .csv files in a script that runs auomatically everyday. The loop runs slower when using write.table() than when using write.csv(). The whole reason I want to include the path in the write.csv() function is to see if I can decrease the time it takes to execute any further.

I typically set my "out" path in the beginning and then just use paste() to create the full filename to save to.
path_out = 'C:\\Users\\user\\Desktop\\'
fileName = paste(path_out, 'my_file.csv',sep = '')
write.csv(dt,fileName)
or all within write.csv()
path_out = 'C:\\Users\\user\\Desktop\\'
write.csv(dt,paste(path_out,'my_file.csv',sep = ''))

There is a specialized function for this: file.path:
path <- "C:/Users/user/Desktop"
write.csv(dt, file.path(path, "my_file.csv"), row.names=FALSE)
Quoting from ?file.path, its purpose is:
Construct the path to a file from components in a platform-independent way.
Some of the few things it does automatically (and paste doesn't):
Using a platform-specific path separator
Adding the path separator between path and filename (if it's not already there)

Another way might be to build a wrapper function around the write.csv function and pass the arguments of the write.csv function in your wrapper function.
write_csv_path <- function(dt,filename,sep,path){
write.csv(dt,paste0(path,filename,sep = sep))
}
Example
write_csv_path(dt = mtcars,filename = "file.csv",sep = "",path = ".\\")

In my case this works fine,
create a folder -> mmult.datas
copy its directory-> C:/Users/seyma/TP/tp.R/tp.R5 - Copy
give a name of your .csv -> df.Bench.csv
do not forget the write your data.frame -> df
write.csv(df, file ="C:/Users/seyma/TP/tp.R/tp.R5 - Copy/mmult.datas/df.Bench.csv")
for more you can check the link

Related

Creating objects from all .xlsx documents in working directory

I am trying to create objects from all files in working directory with name of the original file. I tried to go the following way, but couldn't solve appearing problems.
# - SETTING WD
getwd()
setwd("PATH TO THE FILE")
library(readxl)
# - CREATING OBJECTS
file_objects <- list.files()
xlsx_objects <- unlist(grep(".xlsx",file_objects,value = T))
for (i in xlsx_objects) {
xlsx_objects[i] <- read_xlsx(xlsx_objects[i], header = T)
}
I tried to paste [i]item from "xlsx_objects" with path to WD but it only created a list of files names from docs in WD.
I also find information, that read.csv can read only one file at the time, but I guess that it should be the case with for loop, right? It is reading only one file at the time.
Using lapply (as described in this forum) I was able to get the data in the environment, but argument header didn't work, I lost names of my docs in that object which does not have desired structure. I am though looking for having these files in separated objects without calling every document exclusively.
IIUC, you could do something like:
files = list.files("PATH TO THE FILE", full.names = T, pattern = 'xlsx')
list_files = map(files, readxl::read_excel)
(You can't use read.csv to read excel files)
Also I recommend reading about R Projects so you don't have to use setwd() ever again, which makes your code harder to reproduce down the pipeline

Is there a way to pass an R object to read.csv?

I have an R function from a package that I need to pass a file path as an argument but it's expecting a csv and my file is an xlsx.
I've looked at the code for the function an it is using read.csv to load the file but unfortunately I can't make any changes to the package.
Is there a good way to read in the xlsx and pass it to the function without writing it to a csv and having the function read it back in?
I came across the text argument for read.csv here:
Is there a way to use read.csv to read from a string value rather than a file in R?
This seems like might be part way there but as I said I am unable to alter the function.
Maybe you could construct your own function checking if the file is xlsx, and in this case create a temporary csv file, feed it to your function, and delete it. Something like
yourfunction = function(path){
read.csv(path)
head(path)
}
library(readxl)
modified_function = function(path){
if(grepl{"\\.xlsx",path}){
tmp <- read_xlsx(path)
tmp_path <- paste0(gsub("\\.xlsx","",path),"_tmp.csv")
write.csv(tmp,file = tmp_path)
output <- yourfunction(tmp_path)
file.remove(tmp_path)
}else{
output <- yourfunction(path)
}
return(output)
}
If it is of help, here you can see how to modify only one function of a package: How to modify a function of a library in a module

Data table fread with zip file in other directory with spaces in the name

I am trying to read a csv in a zip file by using the command fread("unzip -cq file.zip") which works perfectly when the file is in my working directory.
But when I try the command by specifying the path of the file without changing the directory say fread("unzip -cq C:/Users/My user/file.zip") I get an error saying the following unzip: cannot find either C:/Users/My or C:/Users/My.zip
The reason why this happens is that there are spaces in my path but what would be the workaround?
The only option that I have thought is to just change to the directory where each file is located and read it from there but this is not ideal.
I use shQuote for this, like...
fread_zip = function(fp, silent=FALSE){
qfp = shQuote(fp)
patt = "unzip -cq %s"
thecall = sprintf(patt, qfp)
if (!silent) cat("The call:", thecall, sep="\n")
fread(thecall)
}
Defining a pattern and then substituting in with sprintf can keep things readable and easier to manage. For example, I have a similar wrapper for .tar.gz files (which apparently need to be unzipped twice with a | pipe between the steps).
If your zip contains multiple csvs, fread isn't set up to read them all (though there's an open issue). My workaround for that case currently looks like...
library(magrittr)
fread_zips = function(fp, unzip_dir = file.path(dirname(fp), sprintf("csvtemp_%s", sub(".zip", "", basename(fp)))), silent = FALSE, do_cleanup = TRUE){
# only tested on windows
# fp should be the path to mycsvs.zip
# unzip_dir should be used only for CSVs from inside the zip
dir.create(unzip_dir, showWarnings = FALSE)
# unzip
unzip(fp, overwrite = TRUE, exdir = unzip_dir)
# list files, read separately
# not looking recursively, since csvs should be only one level deep
fns = list.files(unzip_dir)
if (!all(tools::file_ext(fns) == "csv")) stop("fp should contain only CSVs")
res = lapply(fns %>% setNames(file.path(unzip_dir, .), .), fread)
if (do_cleanup) unlink(unzip_dir, recursive = TRUE)
res
}
So, because we're not passing a command-line call directly to fread, there's no need for shQuote here. I wrote and used this function yesterday, so there are probably still some oversights or bugs.
The magrittr %>% pipe part could be written as setNames(file.path(unzip_dir, fns), fns) instead.
Try to assign the location to a variable and use paste to call the zip file like below:
myVar<-"C:/Users/Myuser/"
fread(paste0("unzip -cq ",myVar,"file.zip"))

How to read a csv file or load an excel workbook by ignoring some characters in the file path?

I'm writing a loop script which involves reading a file from a workbook (using the package XLConnect). The challenge is that the file names contain characters (representing time) that I want to ignore.
For example, here are 3 paths to those files:
G://User//Documents//daily_data//Op_Schedule_20160520_132025.xlsx
G://User//Documents//daily_data//Op_Schedule_20160521_142805.xlsx
G://User//Documents//daily_data//Op_Schedule_20160522_103052.xlsx
I need to import hundreds of those files. I can easily account for the character string representing the date (e.g. 20160522), but not the time.
Is there a way to tell R to ignore some characters located in the file path? Here is how I was thinking of writing my script (the "???" is where i need help). I know a loop is probably not the most efficient way, but i'm open to suggestions, should you have any:
require(XLConnect)
path= "G://User//Documents//daily_data//Op_Schedule_"
wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
scheduleList = rep(list(matrix(1,1,1)),length(wd.seq))
for(i in 1:length(wd.seq)) {
wb = loadWorkbook(file= paste0(path,wd.seq[i],"???",".xlxs"))
scheduleList[[i]] = readWorksheet(wb,sheet='=SCHEDULE', header = TRUE)
}
`
Thanks for reading and suggestions, if any.
Mathieu
I don't know if this is helpful, but if you want to read all the files in a certain directory (which it seems to me is what you're after), you can read all the filenames into a list using the list.files() function, for example
fileList <- list.files(""G://User//Documents//daily_data//")
And then load the xlsx files looping through the list with a for loop
for(i in fileList) {
loadWorkbook(file = i)
}
I haven't used the XLConnect function before so that exact code probably doesn't work, but the loop will iterate through all the files in that directory and so you can construct your loading call using the i variable for the filename (it won't be an absolute path though, so you might need to use paste to add the first part of the filepath)
I realize there might be other files in the directory that are not excel files, you could use grepl to select only files containg "OP_Schedule_"
fileListClean <- fileList[grepl("Op_Schedule_",fileList)]
or perhaps only selecting .xlsx files in the directory:
fileListClean <- fileList[grepl(".xlsx",fileList)]
Edit to fit your reply:
Since you need to fit it to a sequence, you can do it as you did earlier:
wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
wd.seq2 <- paste("Op_Schedule_", wd.seq, sep = "")
And then use grepl to only pick files starting with that extensions:
fileListClean <- fileList[grepl(paste(wd.seq2, collapse = "|"), fileList)]
Full disclosure: The last part i got from this SO answer: grep using a character vector with multiple patterns

problem creating dynamic file name in R

I'm working on a script in R that processes some data and writes an output file. I'd like that output file to be named in a way that reflects the input file, and I'd like something about the file to be unique so older files aren't overwritten.
So I thought to use a timestamp. But this isn't working the way I'd hoped, and I'd like to understand what's happening and how to do this correctly.
This is how I'm trying to name the file (file_base is the name of the input file):
now<-format(Sys.time(), "%b%d%H%M%S")
outputfile<-cat(file_base, "-",now,"-output.txt", sep="")
The output of this pair of functions looks great. But executing 'outputfile' subsequently results in 'NULL' as output.
What's happening here and how can I create an output filename with the properties that I'd like?
You're confusing cat and paste. You want:
outputfile <- paste(file_base, "-",now,"-output.txt", sep="")
You can also use the function sprintf(), it's a wrapper for the C function.
example:
filepath <- file.path(outdir, sprintf("abcdefg_%s.rda", name))
You could also use the separator argument of paste:
outputfile <- paste(file_base,now,"output.txt", sep="-")

Resources