R-Markdown: input and output with relative pathes - r

I am working on a project which is developed by our team. We share the codes in a repository. Every team member is using his/her own machine with his/her own working directories. That is why we use relative paths in our projects. Usually we use something like
setwd("MyUser/MyProject/MyWD/myCodesDir") # local
...
MyReportingPath <- "../ReportsDir" # in repository
Now I try to render a markdown report to this directory:
rmarkdown::render(input = "relevantPath/ReportingHTML.Rmd",
output_file = paste0(MyReportingPath, "/ReportingHTML.html"))
This doesn't work. It only works if I type in the full path of the output file ("/home/User/..../ReportingHTML.html")
This is one of the issues I would like to clarify: is there any possibility to use relative paths in any way for Markdown?
Second issue is that if I type in an non-existing directory in the output_file, pandoc throws me an error instead of creating this directory with my output file. Is there any possibility to do a dynamic output directory creation? (except for doing system(paste0("mkdir ", reportPath), intern = T) before rendering)
P.S. It is important for me to render the markdown document in a separate R function, where I create the whole environment which is inherited by my Markdown document.

Trivial issue - since you're using paste0 you need to provide the / delimiter between your output directory and output file.
You wrote:
rmarkdown::render(input = "relevantPath/ReportingHTML.Rmd",
output_file = paste0(MyReportingPath, "ReportingHTML.html"))
Instead, try:
rmarkdown::render(input = "relevantPath/ReportingHTML.Rmd",
output_file = paste0(MyReportingPath, "/", "ReportingHTML.html"))
More broadly:
For your first issue (settting the path for the input file) - I also suggest using here::here(). If you need to navigate up from your working directory you can break down the path as follows:
parent_dir <- paste(head(unlist(strsplit(here::here(), "/", fixed = TRUE)), -1), collapse = "/")
grandparent_dir <- paste(head(unlist(strsplit(here::here(), "/", fixed = TRUE)), -2), collapse = "/")
However - it might be easier to set the working directory to a higher level, then build up your code and results directories, for example:
project_dir <- here::here()
codefile <- paste(project_dir, "code", "myreport.Rmd", sep = "/")
outfile <- paste(project_dir, "results", "myreport.html", sep = "/")
rmarkdown::render(input = codefile,
output_file = outfile))
For your second issue (creating the directory for output) - using dir.create("MyReportingPath", recursive = TRUE) will create the output directory and any intermediate levels. You will get a warning if the directory exists which can be suppressed using showWarnings = FALSE.

I just ran into this myself, and as it turns out, there's one additional complication here: You really do need to use an absolute path for the output if your input file being rendered isn't in your current working directory. You also need to use absolute paths for anything else during rendering, for example images like ![](path.png).
This looks to be because rmarkdown::render temporarily sets your working directory to the directory containing the input file, so it interprets a relative directory as relative to that path, not your initial working directory. (EDIT: There's currently an open issue for this on github.)
For example if you have this setup:
subdir/test.Rmd
outdir
And you do this:
rmarkdown::render(
input = "subdir/test.Rmd",
output_file = "outdir/out.html")
You get the error:
Error: The directory 'outdir' does not not exist.
Execution halted
Instead, you could do:
output_path <- file.path(normalizePath("."), "outdir/out.html")
rmarkdown::render(
input = "subdir/test.Rmd",
output_file = output_path)
...and that should work.
You might think you could just use normalizePath("outdir/out.html"), but you can't, because that function only works when the path already exists. You also might think you could do this:
rmarkdown::render(
input = "subdir/test.Rmd",
output_file = file.path(normalizePath("."), "outdir/out.html"))
but you can't, because R only gets around to interpreting the value of output_file once the working directory has already been changed.

Related

R renaming file extension

I have tried looking at File extension renaming in R and using the script without any luck. My question is very much the same.
I have a bunch of files with the a file extension that I want to change. I have used the following code but cannot get the last step to work.
I know similar questions have been asked before but I'm simply stuck and therefore reaching out anyway.
startingDir<-"/Users/anders/Documents/Juni 2019/DATA"
endDir<-"/Users/anders/Documents/Juni 2019/DATA/formatted"
#List over files in startingDir with the extension .zipwblibcurl that I want to replace
old_files<-list.files(startingDir,pattern = "\\.zipwblibcurl")
#View(old_files)
#Renaming the file extension and making a new list i R changing the file extension from .zipwblibcurl to .zip
new_files <- gsub(".zipwblibcurl", ".zip", old_files)
#View(new_files)
#Replacing the old files in the startingDir. Eventually I would like to move them to newDir. For simplicity I have just tried as in the other post without any luck:...
file.rename( old_files, new_files)
After running file.rename I get the output FALSE for every entry.
The full answer here, including comment from #StephaneLaurent: make sure that you have full.names = TRUE inside the list.files(); otherwise the path to the file will not be captured, just the file name.
Full working snippet:
old = list.files(startingDir,
pattern = "\\.zipwblibcurl",
full.names = TRUE) #
# replace the file names
new <- gsub(".zipwblibcurl", ".zip", old )
# Rename old files names to the new file names
file.rename(old, new)
Like #StéphaneLaurent said, it's most likely that R tries to look in the current working directory for the files and can't find them. You can correct this by adding
file.rename(paste(startingDir, old_files, sep = "/"), paste(newDir, new_files, sep = "/"))

Unzip failing due to long name in zipped folder

I want to be able to read and edit spatial SQlite tables that are downloaded from a server. These come compressed.
These zip files have a folder in them that contains information about the model that has been run as the name of the folder, and as such these can sometimes be quite long.
When this folder name gets too long, unziping the folder fails. I ultimately dont need to unzip the file. But i seem to get the same error when I use unz within readOGR.
I cant think of how to recreate a replicate able example but I can give an example of a path that works and one that doesnt.
Works:
"S:\3_Projects\CRC00001\4699-12103\scenario_initialised model\performance_assessment.sqlite"
4699-12103 is the zip file name
and "scenario_initialised model" is the offending subfolder
Fails:
""S:\3_Projects\CRC00001\4699-12129\scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0\performance_assessment.sqlite""
4699-12103 is the zip file name
and "scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0" is the offending subfolder
The code would work in a similar fashion to this.
list_zips <- list.files(pattern = "*.zip", recursive = TRUE, include.dirs = TRUE)
unzip(zipfile = paste(getwd(),"/",list_zips[i],sep = ""),
exdir=substr(paste(getwd(),"/",list_zips[i],sep = ""),1,nchar(paste(getwd(),"/",list_zips[i],sep = ""))-4))
But I would prefer to directly be able to load the spatial file in without unzipping. Such as:
sq_path <- unzip(list_zips[i], list=TRUE)[2,1]
temp <- unz(paste(getwd(),"/",list_zips[i],sep = ""),sq_path)
vectorImport <- readOGR(dsn=temp, layer="micro_climate_grid")
Any help would be appreciated! Tim

Data table fread with zip file in other directory with spaces in the name

I am trying to read a csv in a zip file by using the command fread("unzip -cq file.zip") which works perfectly when the file is in my working directory.
But when I try the command by specifying the path of the file without changing the directory say fread("unzip -cq C:/Users/My user/file.zip") I get an error saying the following unzip: cannot find either C:/Users/My or C:/Users/My.zip
The reason why this happens is that there are spaces in my path but what would be the workaround?
The only option that I have thought is to just change to the directory where each file is located and read it from there but this is not ideal.
I use shQuote for this, like...
fread_zip = function(fp, silent=FALSE){
qfp = shQuote(fp)
patt = "unzip -cq %s"
thecall = sprintf(patt, qfp)
if (!silent) cat("The call:", thecall, sep="\n")
fread(thecall)
}
Defining a pattern and then substituting in with sprintf can keep things readable and easier to manage. For example, I have a similar wrapper for .tar.gz files (which apparently need to be unzipped twice with a | pipe between the steps).
If your zip contains multiple csvs, fread isn't set up to read them all (though there's an open issue). My workaround for that case currently looks like...
library(magrittr)
fread_zips = function(fp, unzip_dir = file.path(dirname(fp), sprintf("csvtemp_%s", sub(".zip", "", basename(fp)))), silent = FALSE, do_cleanup = TRUE){
# only tested on windows
# fp should be the path to mycsvs.zip
# unzip_dir should be used only for CSVs from inside the zip
dir.create(unzip_dir, showWarnings = FALSE)
# unzip
unzip(fp, overwrite = TRUE, exdir = unzip_dir)
# list files, read separately
# not looking recursively, since csvs should be only one level deep
fns = list.files(unzip_dir)
if (!all(tools::file_ext(fns) == "csv")) stop("fp should contain only CSVs")
res = lapply(fns %>% setNames(file.path(unzip_dir, .), .), fread)
if (do_cleanup) unlink(unzip_dir, recursive = TRUE)
res
}
So, because we're not passing a command-line call directly to fread, there's no need for shQuote here. I wrote and used this function yesterday, so there are probably still some oversights or bugs.
The magrittr %>% pipe part could be written as setNames(file.path(unzip_dir, fns), fns) instead.
Try to assign the location to a variable and use paste to call the zip file like below:
myVar<-"C:/Users/Myuser/"
fread(paste0("unzip -cq ",myVar,"file.zip"))

R download.file() rename the downloaded file, if the filename already exists

In R, I am trying to download files off the internet using the download.file() command in a simple code (am complete newbie). The files are downloading properly. However, if a file already exists in the download destination, I'd wish to rename the downloaded file with an increment, as against an overwrite which seems to be the default process.
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
nse.destfile = paste0(nse.folder,"fo04FEB2016bhav.csv.zip")
download.file(nse.url,nse.destfile,mode = "wb",method = "libcurl")
Problem w.r.t to this specific code: if "fo04FEB2016bhav.csv.zip" already exists, then get say "fo04FEB2016bhav.csv(2).zip"?
General answer to the problem (and not just the code mentioned above) would be appreciated as such a bottleneck could come up in any other situations too.
The function below will automatically assign the filename based on the file being downloaded. It will check the folder you are downloading to for the presence of a similarly named file. If it finds a match, it will add an incrementation and download to the new filename.
ekstroem's suggestion to fiddle with the curl settings is probably a much better approach, but I wasn't clever enough to figure out how to make that work.
download_without_overwrite <- function(url, folder)
{
filename <- basename(url)
base <- tools::file_path_sans_ext(filename)
ext <- tools::file_ext(filename)
file_exists <- grepl(base, list.files(folder), fixed = TRUE)
if (any(file_exists))
{
filename <- paste0(base, " (", sum(file_exists), ")", ".", ext)
}
download.file(url, file.path(folder, filename), mode = "wb", method = "libcurl")
}
download_without_overwrite(
url = "https://raw.githubusercontent.com/nutterb/redcapAPI/master/README.md",
folder = "[path_to_folder]")
Try this:
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
#Get file name from url, with file extention
fname.x <- gsub(".*/(.*)", "\\1", nse.url)
#Get file name from url, without file extention
fname <- gsub("(.*)\\.csv.*", "\\1", fname.x)
#Get xtention of file from url
xt <- gsub(".*(\\.csv.*)", "\\1", fname.x)
#How many times does the the file exist in folder
exist.times <- sum(grepl(fname, list.files(path = nse.folder)))
if(exist.times){
# if it does increment by 1
fname.x <- paste0(fname, "(", exist.times + 1, ")", xt)
}
nse.destfile = paste0(nse.folder, fname.x)
download.file(nse.url, nse.destfile, mode = "wb",method = "libcurl")
Issues
This approach will not work in cases where part of the file name already exists for example you have url/test.csv.zip and in the folder you have a file testABC1234blahblah.csv.zip. It will think the file already exists, so it will save it as test(2).csv.zip.
You will need to change the #How many times does the the file exist in folder part of the code accordingly.
This is not a proper answer and shouldn't be considered as such, but the comment section above was too small to write it all.
I thought the -O -n options to curl could be used to but now that I looked at it more closely it turned out that it wasn't implemented yet. Now wget automatically increment the filename when downloading a file that already exists. However, setting method="wget" doesn't work with download.file because you are forced to set the destination file name, and once you do that you overwrite the automatic file increments.
I like the solution that #Benjamin provided. Alternatively, you can use
system(paste0("wget ", nse.url))
to get the file through the system (provided that you have wget installed) and let wget handle the increment.

Why doesn't knitr respect RStudio project details?

In all other cases, when I am working within an RStudio project, I can make references relative to the project root in scripts. So I can, for example, dfX = read.csv("Data/somefile.csv"), where the folder Data is relative to my project root.
The same code in a knitr chunk does not find the file. I guess this is because knitr creates a bunch of temporary directories that it needs to refer to relative to the file location. Is there an easy way to change this behavior? Obviously, I would not like to add the entire path to the project folder -- I am aware that I can easily do this using knitr::opts_knit$set(root.dir = rootPath). That completely breaks maintainability across machines and OSs.
Edit: This seems closely linked to this question.
Presumably you know the path to the package directory when you call 'knit', so how about:
ENV <- new.env()
assign("workingDirectory", getcwd(), envir = ENV)
knitr::knit(...,
# THE ENVIRONMENT IN WHICH THE CODE CHUNKS ARE TO BE EVALUATED
envir=ENV)
Then in your rmd file you can do:
```{r] print(workingDirectory)```
If you're searching for the location of the current install, you can use:
PATH = NULL
for(libPath in .libPaths())
if('myPackage' %in% list.dirs(libPath,FALSE,FALSE)){
PATH = file.path(libPath,'myPackage')
}
if(is.null(PATH))
stop('could not find package directory')
ENV <- new.env()
assign("workingDirectory", PATH, envir = ENV)
knitr::knit(...,
# THE ENVIRONMENT IN WHICH THE CODE CHUNKS ARE TO BE EVALUATED
envir=ENV)
My guess is that the document that you are "knitting" is in a subdirectory itself. It seems that, when you click "Knit PDF", RStudio or knitr will setwd() to the directory containing the file being knitted. So you may need to do something like dfX = read.csv("../Data/somefile.csv") to get the reference right.
I have a working example here.

Resources