Is it possible to stop `Rscript` cleaning up its `tempdir`? - r

I'm using R, via Rscript and H2O, but H2O is crashing. I want to review the logs, but the R tempdir that contains them seem to be removed when the R session ends (i.e. when the Rscript finishes).
Is it possible to tell R/Rscript not to remove the tmp folder it uses?

A work around for this would be to use on.exit to get the temporary files and save them in a different directory. An example function would be like this:
ranfunction <- function(){
#Get list of files in tempdir
on.exit(templist <- list.files(tempdir(), full.names = T,pattern = "^file") )
#create a new directory for files to go on exit
#use add = T to add to the on.exit call
on.exit(dir.create(dir1 <- file.path("G:","testdir")),add = T )
#for each file in templist assign it to the new directory
on.exit(
lapply(templist,function(x){
file.create(assign(x, tempfile(tmpdir = dir1) ))})
,add=T)
}
ranfunction()
One thing this function does not take into account is that if you rerun it - it will throw an error because the new directory dir1 already exits. You would have to delete dir1 before re-running the script.

Related

Running a same script in sub folders

In my main folder i have many sub folders like AA,BB,CC,DD ...etc. and all folders have a common script named run_script.R and i want to run this script in every folder. folder can be any amount.
Its working abut running in first folder only ,but i wanted it to run in every folder.
also when i am using setwd(folder) then showing error
Error in setwd(folder) : cannot change working directory
data_folder <- "C:/Users/mosho/Desktop/New folder (2)/"
allfolders <- data.frame(Folders = list.dirs(path = data_folder, recursive = F, full.names = F))
r_scripts <- "run_script.R"
for (folder in allfolders$Folders) {
#setwd(folder)
message(folder)
source(paste0(data_folder,folder,"/",r_scripts))
}
You are on a right path, I did some minor tweaks to your script which will resolve the issue. The points missing in your scripts are;
the allfolders contains the folder name not the entire explicit path. To set the working directory you need to set give the explicit path, by only calling the folder name will result into error unless you existing working directory is contains that folder. Anyways, its best practice to work with full path names.
also to simplify setting up allfolders as list for iterator will make your life lot easier than a data frame
Below is my work-out;
I created some dummy folders (DIC01, DIC02, DIC03...) under path "C:\Users\XXXXXX\Documents\TEST MAIN", and placed code run_script.R inside each one. This run_script.R contains simple code print("Hello World !!")
Next I set initial working directory where to the path where all the folders present i.e. to path "C:\Users\XXXXXX\Documents\TEST MAIN". Next listed the folders/directories present within this path as a list instead of data frame. Next is for loop which iterate over list of folder names. Inside we reset the working directory by the folder name and source the R code.
data_folder <- "C:\\Users\\XXXXXX\\Documents\\TEST MAIN"
setwd(data_folder)
allfolders <- list.dirs(path = data_folder, recursive = F, full.names = F)
r_scripts <- "run_script.R"
for (folder in allfolders) {
print(folder)
setwd(paste0(data_folder,"\\",folder))
source(paste0(data_folder,"\\",folder,"\\",r_scripts))
}
The result I get after the execution is something like this. First the name of the directory and then execution result.
I hope this resolves you problem. If yes Like/Up vote the answer and let me know.

Iteration over a non-existing file in the directory

I have around five files in my directory that I want to read in r. Each file has a name pattern: "filex.html", where x=1,2,3 and so on. However, a few files are missing. I wanted to create a loop to read all the files and whenever any file is non-existential, the loop should jump to next file in the sequence. However, my loop stops whenever it encounters the first non-existing file.
Following is the loop.
ids = c(1:10)
for (i in ids) {
myurl = paste("mypage",i,".html")
myurl = gsub(" ","",myurl)
pointer = read_html(myurl)
if(is_null(pointer)){
next
}
}
This is the error.
Error: 'mypage3.html' does not exist in current working directory ('E:/My_projects/mydb').
How can iterate my loop over the non-existing file?
Instead of looping over your ids vector that may include non-existant files, try to lapply over a list of the actual files, obtained from list.files().
You can use a pattern to only get the html files with list.files(pattern = "*.html").
Here is an example
html_files = list.files(pattern = "*.html").
lapply(html_files, function(x) {
pointer = read_html(x)
}

Read all files in specific folder in R

I am trying to read all files in a specific sub-folder of the wd. I have been able to add a for loop successfully, but the loop only looks at files within the wd. I thought the command line:
directory <- 'folder.I.want.to.look.in'
would enable this but the script still only looks in the wd. However, the above command does help create a list of the correct files. I have included the script below that I have written but not sure what I need to modify to aim it at a specific sub-folder.
directory <- 'folder.I.want.to.look.in'
files <- list.files(path = directory)
out_file <- read_excel("file.to.be.used.in.output", col_names = TRUE)
for (filename in files){
show(filename)
filepath <- paste0(filename)
## Import data
data <- read_excel(filepath, skip = 8, col_names = TRUE)
data <- data[, -c(6:8)]
further script
}
The further script is irrelevant to this question and works fine. I just can't get the loop to look over each file in files from directory. Many thanks in advance
Set your base directory, and then use it to create a vector of all the files with list.files, e.g.:
base_dir <- 'path/to/my/working/directory'
all_files <- paste0(base_dir, list.files(base_dir, recursive = TRUE))
Then just loop over all_files. By default, list.files has recursive = FALSE, i.e., it will only get the files and directory names of the directory you specify, rather than going into each subfolder. Setting recursive = TRUE will return the full filepath excluding your base directory, which is why we concatenate it with base_dir.

No result from system2 command

I have a script that I am using to analyse some audio data in R studio. The script runs with no errors however when I execute it, it simply does nothing. I think it may be an issue with the system2 command. Any help is much appreciated, I am at a loss without even an error message to go by
Here is my code-
# Set the directory containing the files
directory <- "D://Alto//Audio 1//Day 2//"
# The directory to store the results
base_output_directory <- "D://Ecoacoustics//Output//indices_output//"
# Get a list of audio files inside the directory
# (Get-ChildItem is just like ls, or dir)
files <- list.files(directory, pattern = "*.wav", full.names = TRUE)
# iterate through each file
for(file in files) {
message("Processing ", file)
# get just the name of the file
file_name <- basename(file)
# make a folder for results
output_directory <- normalizePath(file.path(base_output_directory, file_name))
dir.create(output_directory, recursive = TRUE)
# prepare command
command <- sprintf('audio2csv "%s" "Towsey.Acoustic.yml" "%s" ', file, output_directory)
# finally, execute the command
system2('C://Ecoacoustics//AnalysisPrograms//AnalysisPrograms.exe//', command)
}

Get the path of current script

I would like to set the working directory to the path of current script programmatically but first I need to get the path of current script.
So I would like to be able to do:
current_path = ...retrieve the path of current script ...
setwd(current_path)
Just like the RStudio menu does:
So far I tried:
initial.options <- commandArgs(trailingOnly = FALSE)
file.arg.name <- "--file="
script.name <- sub(file.arg.name, "", initial.options[grep(file.arg.name, initial.options)])
script.basename <- dirname(script.name)
script.name returns NULL
source("script.R", chdir = TRUE)
Returns:
Error in file(filename, "r", encoding = encoding) : cannot open the
connection In addition: Warning message: In file(filename, "r",
encoding = encoding) : cannot open file '/script.R': No such file or
directory
dirname(parent.frame(2)$ofile)
Returns: Error in dirname(parent.frame(2)$ofile) : a character vector argument expected
...because parent.frame is null
frame_files <- lapply(sys.frames(), function(x) x$ofile)
frame_files <- Filter(Negate(is.null), frame_files)
PATH <- dirname(frame_files[[length(frame_files)]])
Returns: Null because frame_files is a list of 0
thisFile <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
needle <- "--file="
match <- grep(needle, cmdArgs)
if (length(match) > 0) {
# Rscript
return(normalizePath(sub(needle, "", cmdArgs[match])))
} else {
# 'source'd via R console
return(normalizePath(sys.frames()[[1]]$ofile))
}
}
Returns: Error in path.expand(path) : invalid 'path' argument
Also I saw all answers from here, here, here and here.
No joy.
Working with RStudio 1.1.383
EDIT: It would be great if there was no need for an external library to achieve this.
In RStudio, you can get the path to the file currently shown in the source pane using
rstudioapi::getSourceEditorContext()$path
If you only want the directory, use
dirname(rstudioapi::getSourceEditorContext()$path)
If you want the name of the file that's been run by source(filename), that's a little harder. You need to look for the variable srcfile somewhere back in the stack. How far back depends on how you write things, but it's around 4 steps back: for example,
fi <- tempfile()
writeLines("f()", fi)
f <- function() print(sys.frame(-4)$srcfile)
source(fi)
fi
should print the same thing on the last two lines.
Update March 2019
Based on Alexis Lucattini and user2554330 answers, to make it work on both command line and RStudio. Also solving the "as_tibble" deprecated message
library(tidyverse)
getCurrentFileLocation <- function()
{
this_file <- commandArgs() %>%
tibble::enframe(name = NULL) %>%
tidyr::separate(col=value, into=c("key", "value"), sep="=", fill='right') %>%
dplyr::filter(key == "--file") %>%
dplyr::pull(value)
if (length(this_file)==0)
{
this_file <- rstudioapi::getSourceEditorContext()$path
}
return(dirname(this_file))
}
TLDR: The here package (available on CRAN) helps you build a path from a project's root directory. R projects configured with here() can be shared with colleagues working on different laptops or servers and paths built relative to the project's root directory will still work. The development version is at github.com/r-lib/here.
With git
You certainly store your R code in a directory. This directory is probably part of a git repository and/or an R studio project. I would recommend building all paths relative to that project's root directory. For example let's say that you have an R script that creates reusable plotting functions and that you have an R markdown notebook that loads that script and plots graphs in a nice (so nice) document. The project tree would look something like this
├── notebooks
│   ├── analysis.Rmd
├── R
│   ├── prepare_data.R
│   ├── prepare_figures.R
From the analysis.Rmd notebook, you would import plotting function with here() as such:
source(file.path(here::here("R"), "prepare_figures.R"))
Why?
Hadley Wickham in a Stackoverflow
comment:
"You should never use setwd() in R code - it basically defeats the idea of
using a working directory because you can no longer easily move your code
between computers. – hadley Nov 20 '10 at 23:44 "
From the Ode to the here package:
Do you:
Have setwd() in your scripts? PLEASE STOP DOING THAT.
This makes your script very fragile, hard-wired to exactly one time and place. As soon as you rename or move directories, it breaks. Or maybe you get a new computer? Or maybe someone else needs to run your code?
[...]
Classic problem presentation: Awkwardness around building paths and/or setting working directory in projects with subdirectories. Especially if you use R Markdown and knitr, which trips up alot of people with its default behavior of “working directory = directory where this file lives”. [...]
Install the here package:
install.packages("here")
library(here)
here()
here("construct","a","path")
Documentation of the here() function:
Starting with the current working directory during package load time,
here will walk the directory hierarchy upwards until it finds
a directory that satisfies at least one of the following conditions:
contains a file matching [.]Rproj$ with contents matching ^Version: in
the first line
[... other options ...]
contains a directory .git
Once established, the root directory doesn't change during the active
R session. here() then appends the arguments to the root directory.
The development version of the here package is available on github.
What about
What about files outside the project directory?
If you are loading or sourcing files outside the project directory, the recommended way is to use an environment variable at the Operating System level. Other users of your R code on different laptops or servers would need to set the same environment variable. The advantage is that it is portable.
data_path <- Sys.getenv("PROJECT_DATA")
df <- read.csv(file.path(data_path, "file_name.csv"))
Note: There is a long list of environmental variables which can affect an R session.
What about many projects sourcing each other?
It's time to create an R package.
If you're running an Rscript through the command-line etc
Rscript /path/to/script.R
The function below will assign this_file to /path/to/script
library(tidyverse)
get_this_file <- function() {
commandArgs() %>%
tibble::enframe(name = NULL) %>%
tidyr::separate(
col = value, into = c("key", "value"), sep = "=", fill = "right"
) %>%
dplyr::filter(key == "--file") %>%
dplyr::pull(value)
}
this_file <- get_this_file()
print(this_file)
Here is a custom function to obtain the path of a file in R, RStudio, or from an Rscript:
stub <- function() {}
thisPath <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
if (length(grep("^-f$", cmdArgs)) > 0) {
# R console option
normalizePath(dirname(cmdArgs[grep("^-f", cmdArgs) + 1]))[1]
} else if (length(grep("^--file=", cmdArgs)) > 0) {
# Rscript/R console option
scriptPath <- normalizePath(dirname(sub("^--file=", "", cmdArgs[grep("^--file=", cmdArgs)])))[1]
} else if (Sys.getenv("RSTUDIO") == "1") {
# RStudio
dirname(rstudioapi::getSourceEditorContext()$path)
} else if (is.null(attr(stub, "srcref")) == FALSE) {
# 'source'd via R console
dirname(normalizePath(attr(attr(stub, "srcref"), "srcfile")$filename))
} else {
stop("Cannot find file path")
}
}
https://gist.github.com/jasonsychau/ff6bc78a33bf3fd1c6bd4fa78bbf42e7
Another option to get current script path is funr::get_script_path() and you don't need run your script using RStudio.
I had trouble with all of these because they rely on libraries that I couldn't use (because of packrat) until after setting the working directory (which was why I needed to get the path to begin with).
So, here's an approach that just uses base R. (EDITED to handle windows \ characters in addition to / in paths)
args = commandArgs()
scriptName = args[substr(args,1,7) == '--file=']
if (length(scriptName) == 0) {
scriptName <- rstudioapi::getSourceEditorContext()$path
} else {
scriptName <- substr(scriptName, 8, nchar(scriptName))
}
pathName = substr(
scriptName,
1,
nchar(scriptName) - nchar(strsplit(scriptName, '.*[/|\\]')[[1]][2])
)
If you don't want to use (or have to remember) code, simply hover over the script and the path will appear
The following solves the problem for three cases: RStudio source Button, RStudio R console (source(...), if the file is still in the Source pane) or the OS console via Rscript:
this_file = gsub("--file=", "", commandArgs()[grepl("--file", commandArgs())])
if (length(this_file) > 0){
wd <- paste(head(strsplit(this_file, '[/|\\]')[[1]], -1), collapse = .Platform$file.sep)
}else{
wd <- dirname(rstudioapi::getSourceEditorContext()$path)
}
print(wd)
The following code gives the directory of the running Rscript if you are running it either from Rstudio or from the command line using Rscript command:
if (rstudioapi::isAvailable()) {
if (require('rstudioapi') != TRUE) {
install.packages('rstudioapi')
}else{
library(rstudioapi) # load it
}
wdir <- dirname(getActiveDocumentContext()$path)
}else{
wdir <- getwd()
}
setwd(wdir)

Resources