Can someone please explain what "extdata" means in R?
For instance, I was looking at the "cronR" library in R (used for automatically scheduling jobs), and came across the term "extdata":
f <- system.file(package = "cronR", "extdata", "helloworld.R")
cmd <- cron_rscript(f)
cmd
cron_add(command = cmd, frequency = 'minutely',
id = 'test1', description = 'My process 1', tags = c('lab', 'xyz'))
cron_add(command = cmd, frequency = 'daily', at='7AM', id = 'test2')
cron_njobs()
cron_ls()
cron_clear(ask=TRUE)
cron_ls()
Similarly, the "taskscheduleR" package (also used for automatically scheduling jobs) also makes reference to "extdata":
library(taskscheduleR)
myscript <- system.file("extdata", "helloworld.R", package = "taskscheduleR")
## run script once within 62 seconds
taskscheduler_create(taskname = "myfancyscript", rscript = myscript,
schedule = "ONCE", starttime = format(Sys.time() + 62, "%H:%M"))
My Question: Can someone please explain what is "extdata"? Is this just some "formality" that needs to be added to the "system.file()" command? Can someone please explain its relevance here?
Thanks!
References:
https://cran.r-project.org/web/packages/cronR/cronR.pdf
https://cran.r-project.org/web/packages/taskscheduleR/vignettes/taskscheduleR.html
This is a convention, not a formally defined term. (However, it's a convention defined by the package authors and coded in the package structure; it's not something you can change unless you mess around with the package structure yourself.) "extdata" is presumably short for "external data".
However, this doesn't mean that you need to use "extdata" when you are structuring your own code; you only need it when finding the files that are included by the package. cron_rscript("~/my_cron_jobs/foo.R") should work fine (provided you actually have something there, and provided that the ~ == home directory shortcut works across OS, which I think it does).
system.file() takes a package argument, but otherwise strings its arguments together into a file path; i.e. system.file(package = "cronR", "extdata", "helloworld.R") means
look in the system folder that R has set up for the cronR package (in my case that is /usr/local/lib/R/site-library/cronR, but the precise location will vary by OS and configuration)
within that folder look in the extdata folder
within that folder look for helloworld.R
So this command will refer in my case to /usr/local/lib/R/site-library/cronR/extdata/helloworld.R.
Since "/" works as a path separator (at least when used from within R) for all current operating systems, you would get the same results from system.file(package="cronR", "extdata/helloworld.R")
Related
I have a bunch of sf objects I'd like to export to GDB from R. I'm running R 4.0.2 on Windows 10. In this case the sf objects are all vector point data. The main reasons to export to GDB are to keep longer field names (the shapefile truncation is very annoying), and because GDBs are more desirable storage locations for our workflows.
Yes, I know about the ArcGisBinding package. I've got it to work in a test script but it's pretty unstable - often crashing and requiring a restart of R. This is a problem, because the sf objects I'd like to export come after an already long Rmd that reads in, formats and cleans the data. So it's not a simple manner of re-running the script until arc.write doesn't break. I could break up the script, but then I'd still have to read in a bunch of shapefiles. One option I haven't yet explored is using reticulate to call a python script instead of trying to do everything in R, but we're trying to do our analysis all in one place, if possible.
I'm pretty sure I've managed to set up RPyGeo appropriately, first setting my python path using the reticulate package. I'm doing it this way because IT restrictions means I can't edit PATH variables on my machine.
#package calls
library(sf)
library(spData)
library(reticulate)
#set python version in reticulate
py_path <- "C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe"
reticulate::use_python(python = py_path, required = TRUE)
#call RPyGeo
library(RPyGeo) # for potential point export
#output gdb
out.gdb <- "C:/LOCAL_PROJECTS/Output/Output.gdb"
#RPyGeo Parameters
# Note that, in order to use RPyGeo you need a working ArcMap or ArcGIS Pro installation on your computer.
# python path - note that this will change depending on which version of Arc one is using
# py_path <- "C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe"
arcpy <- rpygeo_build_env(workspace = out.gdb,
overwrite = TRUE,
extensions = c("Spatial","DataInteroperability"),
path = py_path)
I've tried a bunch of different tools to export an sf object, here using dummy data also used in the RPyGeo vignette
data(nz, package = "spData")
arcpy$Copy_management(in_data = nz,out_data = "nz_test")
arcpy$Copy_management(in_data = nz,out_data = file.path(out.gdb,"nz"))
arcpy$FeatureClassToGeodatabase_conversion(Input_Features = nz,Output_Geodatabase = out.gdb)
arcpy$FeatureClassToFeatureClass_conversion(in_features = nz,out_path = out.gdb,out_name = "nz")
arcpy$QuickExport_interop(Input = nz,Output = file.path(out.gdb,"nz"))
arcpy$CopyFeatures_management(in_features = nz,out_feature_class = file.path(out.gdb,"nz"))
arcpy$CopyFeatures_management(in_features = nz,out_feature_class = "nz")
Each time I get an error, for example:
Error in py_call_impl(callable, dots$args, dots$keywords) :
RuntimeError: Object: Error in executing tool
Detailed traceback:
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\management.py", line 3232, in CopyFeatures
raise e
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\management.py", line 3229, in CopyFeatures
retval = convertArcObjectToPythonObject(gp.CopyFeatures_management(*gp_fixargs((in_features, out_feature_class, config_keyword, spatial_grid_1, spatial_grid_2, spatial_grid_3), True)))
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\geoprocessing\_base.py", line 511, in <lambda>
return lambda *args: val(*gp_fixargs(args, True))
I'm not an expert in ArcPy by any means. Nor am I an expert in tracing errors inside packages. Am I making a simple syntax mistake? Is there something else that I'm missing? Any help would be much appreciated!
I am trying to use the R's taskscheduleR package to download data using a script every tenth of a minute (every 6 seconds). To do this, I have a script named getwmatadata.R which downloads data from an API and I am trying to call this script using taskscheduleR based on the following link: https://github.com/bnosac/taskscheduleR
However, my script below is not working because I get an error saying
Error in taskscheduler_create(taskname = "wmatadata", rscript = wmatapinger, :
File does not exist
Below is how I'm trying to run taskscheduleR:
library(taskscheduleR)
wmatapinger <- system.file("extdata", "getwmatadata.R", package = "taskscheduleR")
taskscheduler_create(taskname = "wmatadata", rscript = wmatapinger, schedule = "MINUTE", starttime = "05:00", modifier = 0.1)
Just configure the path to your script using file.path() ... don't use system.file()
Solution:
wmatapinger <- file.path("C:", "name_of_the_folder", "wmatapinger.R")
Please refer to the file.path() how to construct the path (comma means forward slash / )
Your next line is fine and now it should work.
I was getting the same error. Although it took several attempts (I kept getting the error "file does not exist"), I was finally able to solve it by scheduling it via the GUI add-in.
If you're using RStudio, go to Tools → Addins → "Schedule R scripts on…". This eventually worked for me.
Check if your .R file exist on the path that you specified.
file.exists(wmatapinger)
One possible solution and easy to implement -
library(taskschedulerR)
taskscheduler_create(taskname = "ABC",
rscript = Full Address of the
script,
schedule = "DAILY",
starttime = "23:45",
startdate = format(Sys.Date(),
"%d/%m/%Y"))
I have an inconsistency issue which I cannot explain when running an R script. I am not able to produce a reproducible example because there is a whole set of files/functions called by the entry script.
Using Rscript or RStudio with R v3.1.2 I obtain the results I'm expecting, however when calling R CMD BATCH from bash my script does not produce identical output. From bash, R seems to read the command line arguments correctly and reports them from the script, BUT in my code only the Rscript and RStudio source methods seem to use the parameter correctly in my code.
The 2 command line calls are as follows:
Rscript ./script/forecast_category_script.R "category='razors'" "cores=4L"
R CMD BATCH --no-save "--args category='razors' cores=4L" ./script/forecast_category_script.R ~/data/output/out.out
Is there any obvious reason why these inconsistencies might be occurring? I'd prefer to use R CMD BATCH as it redirects output to a file and when I migrate my code to the university cluster as a batch job through the scheduler I'd like to be able to follow what it has done.
UPDATE: changing this line resolves it but why?
Previously I had the following line in there, basically so when I was testing I didn't keep reloading the huge dataset if it was already loaded in my RStudio environment:
if(!exists("spi")) spi = f_load.spi(category = category)
Replaced it with this:
spi = f_load.spi(category = category)
The underlying function f_load_spi remained the same however:
f_load.spi = function(spi = NULL, category = "razors" , n=NULL) {
# check if the data is pre-loaded
if (is.null(spi)) {
fil = paste0(pth.data.storage, "categories/", category, "/", category, ".sp_ss.interp.rds")
print(fil)
spi = readRDS(fil)
}
# subset to a specific set of items
if (!is.null(n)) {
fc.items = unique(spi$fc.item)
rnd = sample(1:length(fc.items), n)
spi = spi[fc.item %in% fc.items[rnd]]
}
spi
}
For some reason the category variable was not being passed through properly into the function and it was loading a different category (beer rather than razors) which was an enormous file and not suitable for testing.
This still doesn't explain why Rscript and R CMD BATCH behaved differently.
It is possible that one of them is loading up a previously saved workspace and using global variables. Have you checked whether it matters which directory you are in or if there are any .Rhistory files present? One way to ensure that you don't have any hidden variables is to clear the worspace at the beginning of each script. For example, rm(list=ls()) as the first line of your Rscript.
Also, you can pipe output to a file with an Rscript using sink().
Is there a way to programmatically find the path of an R script inside the script itself?
I am asking this because I have several scripts that use RGtk2 and load a GUI from a .glade file.
In these scripts I am obliged to put a setwd("path/to/the/script") instruction at the beginning, otherwise the .glade file (which is in the same directory) will not be found.
This is fine, but if I move the script in a different directory or to another computer I have to change the path. I know, it's not a big deal, but it would be nice to have something like:
setwd(getScriptPath())
So, does a similar function exist?
This works for me:
getSrcDirectory(function(x) {x})
This defines an anonymous function (that does nothing) inside the script, and then determines the source directory of that function, which is the directory where the script is.
For RStudio only:
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
This works when Running or Sourceing your file.
Use source("yourfile.R", chdir = T)
Exploit the implicit "--file" argument of Rscript
When calling the script using "Rscript" (Rscript doc) the full path of the script is given as a system parameter. The following function exploits this to extract the script directory:
getScriptPath <- function(){
cmd.args <- commandArgs()
m <- regexpr("(?<=^--file=).+", cmd.args, perl=TRUE)
script.dir <- dirname(regmatches(cmd.args, m))
if(length(script.dir) == 0) stop("can't determine script dir: please call the script with Rscript")
if(length(script.dir) > 1) stop("can't determine script dir: more than one '--file' argument detected")
return(script.dir)
}
If you wrap your code in a package, you can always query parts of the package directory.
Here is an example from the RGtk2 package:
> system.file("ui", "demo.ui", package="RGtk2")
[1] "C:/opt/R/library/RGtk2/ui/demo.ui"
>
You can do the same with a directory inst/glade/ in your sources which will become a directory glade/ in the installed package -- and system.file() will compute the path for you when installed, irrespective of the OS.
This answer works fine to me:
script.dir <- dirname(sys.frame(1)$ofile)
Note: script must be sourced in order to return correct path
I found it in: https://support.rstudio.com/hc/communities/public/questions/200895567-can-user-obtain-the-path-of-current-Project-s-directory-
But I still don´t understand what is sys.frame(1)$ofile. I didn´t find anything about that in R Documentation. Someone can explain it?
#' current script dir
#' #param
#' #return
#' #examples
#' works with source() or in RStudio Run selection
#' #export
z.csd <- function() {
# http://stackoverflow.com/questions/1815606/rscript-determine-path-of-the-executing-script
# must work with source()
if (!is.null(res <- .thisfile_source())) res
else if (!is.null(res <- .thisfile_rscript())) dirname(res)
# http://stackoverflow.com/a/35842176/2292993
# RStudio only, can work without source()
else dirname(rstudioapi::getActiveDocumentContext()$path)
}
# Helper functions
.thisfile_source <- function() {
for (i in -(1:sys.nframe())) {
if (identical(sys.function(i), base::source))
return (normalizePath(sys.frame(i)$ofile))
}
NULL
}
.thisfile_rscript <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
cmdArgsTrailing <- commandArgs(trailingOnly = TRUE)
cmdArgs <- cmdArgs[seq.int(from=1, length.out=length(cmdArgs) - length(cmdArgsTrailing))]
res <- gsub("^(?:--file=(.*)|.*)$", "\\1", cmdArgs)
# If multiple --file arguments are given, R uses the last one
res <- tail(res[res != ""], 1)
if (length(res) > 0)
return (res)
NULL
}
A lot of these solutions are several years old. While some may still work, there are good reasons against utilizing each of them (see linked source below). I have the best solution (also from source): use the here library.
Original example code:
library(ggplot2)
setwd("/Users/jenny/cuddly_broccoli/verbose_funicular/foofy/data")
df <- read.delim("raw_foofy_data.csv")
Revised code
library(ggplot2)
library(here)
df <- read.delim(here("data", "raw_foofy_data.csv"))
This solution is the most dynamic and robust because it works regardless of whether you are using the command line, RStudio, calling from an R script, etc. It is also extremely simple to use and is succinct.
Source: https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
I have found something that works for me.
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
How about using system and shell commands? With the windows one, I think when you open the script in RStudio it sets the current shell directory to the directory of the script. You might have to add cd C:\ e.g or whatever drive you want to search (e.g. shell('dir C:\\*file_name /s', intern = TRUE) - \\ to escape escape character). Will only work for uniquely named files unless you further specify subdirectories (for Linux I started searching from /). In any case, if you know how to find something in the shell, this provides a layout to find it within R and return the directory. Should work whether you are sourcing or running the script but I haven't fully explored the potential bugs.
#Get operating system
OS<-Sys.info()
win<-length(grep("Windows",OS))
lin<-length(grep("Linux",OS))
#Find path of data directory
#Linux Bash Commands
if(lin==1){
file_path<-system("find / -name 'file_name'", intern = TRUE)
data_directory<-gsub('/file_name',"",file_path)
}
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name /s', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Directory of ","",file_path)
filepath<-gsub("\\\\","/",file_path)
data_directory<-file_path
}
#Change working directory to location of data and sources
setwd(data_directory)
Thank you for the function, though I had to adjust it a Little as following for me (W10):
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Verzeichnis von ","",file_path)
file_path<-chartr("\\","/",file_path)
data_directory<-file_path
}
In my case, I needed a way to copy the executing file to back up the original script together with its outputs. This is relatively important in research. What worked for me while running my script on the command line, was a mixure of other solutions presented here, that looks like this:
library(scriptName)
file_dir <- gsub("\\", "/", fileSnapshot()$path, fixed=TRUE)
file.copy(from = file.path(file_dir, scriptName::current_filename()) ,
to = file.path(new_dir, scriptName::current_filename()))
Alternatively, one can add to the file name the date and our to help in distinguishing that file from the source like this:
file.copy(from = file.path(current_dir, current_filename()) ,
to = file.path(new_dir, subDir, paste0(current_filename(),"_", Sys.time(), ".R")))
None of the solutions given so far work in all circumstances. Worse, many solutions use setwd, and thus break code that expects the working directory to be, well, the working directory — i.e. the code that the user of the code chose (I realise that the question asks about setwd() but this doesn’t change the fact that this is generally a bad idea).
R simply has no built-in way to determine the path of the currently running piece of code.
A clean solution requires a systematic way of managing non-package code. That’s what ‘box’ does. With ‘box’, the directory relative to the currently executing code can be found trivially:
box::file()
However, that isn’t the purpose of ‘box’; it’s just a side-effect of what it actually does: it implements a proper, modern module system for R. This includes organising code in (nested) modules, and hence the ability to load code from modules relative to the currently running code.
To load code with ‘box’ you wouldn’t use e.g. source(file.path(box::file(), 'foo.r')). Instead, you’d use
box::use(./foo)
However, box::file() is still useful for locating data (i.e. OP’s use-case). So, for instance, to locate a file mygui.glade from the current module’s path, you would write.
glade_path = box::file('mygui.glade')
And (as long as you’re using ‘box’ modules) this always works, doesn’t require any hacks, and doesn’t use setwd.
Is there a way to programmatically find the path of an R script inside the script itself?
I am asking this because I have several scripts that use RGtk2 and load a GUI from a .glade file.
In these scripts I am obliged to put a setwd("path/to/the/script") instruction at the beginning, otherwise the .glade file (which is in the same directory) will not be found.
This is fine, but if I move the script in a different directory or to another computer I have to change the path. I know, it's not a big deal, but it would be nice to have something like:
setwd(getScriptPath())
So, does a similar function exist?
This works for me:
getSrcDirectory(function(x) {x})
This defines an anonymous function (that does nothing) inside the script, and then determines the source directory of that function, which is the directory where the script is.
For RStudio only:
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
This works when Running or Sourceing your file.
Use source("yourfile.R", chdir = T)
Exploit the implicit "--file" argument of Rscript
When calling the script using "Rscript" (Rscript doc) the full path of the script is given as a system parameter. The following function exploits this to extract the script directory:
getScriptPath <- function(){
cmd.args <- commandArgs()
m <- regexpr("(?<=^--file=).+", cmd.args, perl=TRUE)
script.dir <- dirname(regmatches(cmd.args, m))
if(length(script.dir) == 0) stop("can't determine script dir: please call the script with Rscript")
if(length(script.dir) > 1) stop("can't determine script dir: more than one '--file' argument detected")
return(script.dir)
}
If you wrap your code in a package, you can always query parts of the package directory.
Here is an example from the RGtk2 package:
> system.file("ui", "demo.ui", package="RGtk2")
[1] "C:/opt/R/library/RGtk2/ui/demo.ui"
>
You can do the same with a directory inst/glade/ in your sources which will become a directory glade/ in the installed package -- and system.file() will compute the path for you when installed, irrespective of the OS.
This answer works fine to me:
script.dir <- dirname(sys.frame(1)$ofile)
Note: script must be sourced in order to return correct path
I found it in: https://support.rstudio.com/hc/communities/public/questions/200895567-can-user-obtain-the-path-of-current-Project-s-directory-
But I still don´t understand what is sys.frame(1)$ofile. I didn´t find anything about that in R Documentation. Someone can explain it?
#' current script dir
#' #param
#' #return
#' #examples
#' works with source() or in RStudio Run selection
#' #export
z.csd <- function() {
# http://stackoverflow.com/questions/1815606/rscript-determine-path-of-the-executing-script
# must work with source()
if (!is.null(res <- .thisfile_source())) res
else if (!is.null(res <- .thisfile_rscript())) dirname(res)
# http://stackoverflow.com/a/35842176/2292993
# RStudio only, can work without source()
else dirname(rstudioapi::getActiveDocumentContext()$path)
}
# Helper functions
.thisfile_source <- function() {
for (i in -(1:sys.nframe())) {
if (identical(sys.function(i), base::source))
return (normalizePath(sys.frame(i)$ofile))
}
NULL
}
.thisfile_rscript <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
cmdArgsTrailing <- commandArgs(trailingOnly = TRUE)
cmdArgs <- cmdArgs[seq.int(from=1, length.out=length(cmdArgs) - length(cmdArgsTrailing))]
res <- gsub("^(?:--file=(.*)|.*)$", "\\1", cmdArgs)
# If multiple --file arguments are given, R uses the last one
res <- tail(res[res != ""], 1)
if (length(res) > 0)
return (res)
NULL
}
A lot of these solutions are several years old. While some may still work, there are good reasons against utilizing each of them (see linked source below). I have the best solution (also from source): use the here library.
Original example code:
library(ggplot2)
setwd("/Users/jenny/cuddly_broccoli/verbose_funicular/foofy/data")
df <- read.delim("raw_foofy_data.csv")
Revised code
library(ggplot2)
library(here)
df <- read.delim(here("data", "raw_foofy_data.csv"))
This solution is the most dynamic and robust because it works regardless of whether you are using the command line, RStudio, calling from an R script, etc. It is also extremely simple to use and is succinct.
Source: https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
I have found something that works for me.
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
How about using system and shell commands? With the windows one, I think when you open the script in RStudio it sets the current shell directory to the directory of the script. You might have to add cd C:\ e.g or whatever drive you want to search (e.g. shell('dir C:\\*file_name /s', intern = TRUE) - \\ to escape escape character). Will only work for uniquely named files unless you further specify subdirectories (for Linux I started searching from /). In any case, if you know how to find something in the shell, this provides a layout to find it within R and return the directory. Should work whether you are sourcing or running the script but I haven't fully explored the potential bugs.
#Get operating system
OS<-Sys.info()
win<-length(grep("Windows",OS))
lin<-length(grep("Linux",OS))
#Find path of data directory
#Linux Bash Commands
if(lin==1){
file_path<-system("find / -name 'file_name'", intern = TRUE)
data_directory<-gsub('/file_name',"",file_path)
}
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name /s', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Directory of ","",file_path)
filepath<-gsub("\\\\","/",file_path)
data_directory<-file_path
}
#Change working directory to location of data and sources
setwd(data_directory)
Thank you for the function, though I had to adjust it a Little as following for me (W10):
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Verzeichnis von ","",file_path)
file_path<-chartr("\\","/",file_path)
data_directory<-file_path
}
In my case, I needed a way to copy the executing file to back up the original script together with its outputs. This is relatively important in research. What worked for me while running my script on the command line, was a mixure of other solutions presented here, that looks like this:
library(scriptName)
file_dir <- gsub("\\", "/", fileSnapshot()$path, fixed=TRUE)
file.copy(from = file.path(file_dir, scriptName::current_filename()) ,
to = file.path(new_dir, scriptName::current_filename()))
Alternatively, one can add to the file name the date and our to help in distinguishing that file from the source like this:
file.copy(from = file.path(current_dir, current_filename()) ,
to = file.path(new_dir, subDir, paste0(current_filename(),"_", Sys.time(), ".R")))
None of the solutions given so far work in all circumstances. Worse, many solutions use setwd, and thus break code that expects the working directory to be, well, the working directory — i.e. the code that the user of the code chose (I realise that the question asks about setwd() but this doesn’t change the fact that this is generally a bad idea).
R simply has no built-in way to determine the path of the currently running piece of code.
A clean solution requires a systematic way of managing non-package code. That’s what ‘box’ does. With ‘box’, the directory relative to the currently executing code can be found trivially:
box::file()
However, that isn’t the purpose of ‘box’; it’s just a side-effect of what it actually does: it implements a proper, modern module system for R. This includes organising code in (nested) modules, and hence the ability to load code from modules relative to the currently running code.
To load code with ‘box’ you wouldn’t use e.g. source(file.path(box::file(), 'foo.r')). Instead, you’d use
box::use(./foo)
However, box::file() is still useful for locating data (i.e. OP’s use-case). So, for instance, to locate a file mygui.glade from the current module’s path, you would write.
glade_path = box::file('mygui.glade')
And (as long as you’re using ‘box’ modules) this always works, doesn’t require any hacks, and doesn’t use setwd.