Is there a possibility to copy the executed lines or basically the script into the working directory?
So my normal scenario is, I have stand alone script which just need to be sourced within a working directory and they will do everything I need.
After a few month, I made update to these scripts and I would love to have a snapshot from the script when I executed the source...
So basically file.copy(ITSELF, '.') or something like this.
I think this is what you're looking for:
file.copy(sys.frame(1)$ofile,
to = file.path(dirname(sys.frame(1)$ofile),
paste0(Sys.Date(), ".R")))
This will take the current file and copy it to a new file in the same directory with the name of currentDate.R, so for example 2015-07-14.R
If you want to copy to the working directory instead of the original script directory, use
file.copy(sys.frame(1)$ofile,
to = file.path(getwd(),
paste0(Sys.Date(), ".R")))
Just note that the sys.frame(1)$ofile only works if a saved script is sourced, trying to run it in terminal will fail. It is worth mentioning though that this might not be the best practice. Perhaps looking into a version control system would be better.
Explanation:
TBH, I might not be the best person to explain this (I copied this idea from somewhere and use it sometimes), but I'll try. Basically in order to have information about the script file R needs to be running it as a file inside an environment with that information, and when that environment is a source call it contains the ofile data. We use (1) to select the next (source()'s) environment following the global environment (which is 0). When you're running this from terminal, there's no frame/environment other than Global (that's the error message), since no file is being ran - the commands are sent straight to terminal.
To illustrate that, we can do a simple test:
> sys.frame(1)
Error in sys.frame(1) : not that many frames on the stack
But if we call that from another function:
> myf <- function() sys.frame(1)
> myf()
<environment: 0x0000000013ad7638>
Our function's environment doesn't have anything in it, so it exists but, in this case, does not have ofile:
> myf <- function() names(sys.frame(1))
> myf()
character(0)
I just wanted to add my solution since I decided to use a try function before executing the copy command... Because I have the feeling I miss some control...
try({
script_name <- sys.frame(1)$ofile
copy_script_name <-
paste0(sub('\\.R', '', basename(script_name)),
'_',
format(Sys.time(), '%Y%m%d%H%M%S'),
'.R')
file.copy(script_name,
copy_script_name)
})
This will copy the script into the current directory and also adds a timestamp to the filename. In case something goes wrong, the rest of the script will still execute.
I originally posted this other thread, and I think it addresses your problem: https://stackoverflow.com/a/62781925/9076267
In my case, I needed a way to copy the executing file to back up the original >script together with its outputs. This is relatively important in research.
What worked for me while running my script on the command line, was a mixure of >other solutions presented here, that looks like this:
library(scriptName)
file_dir <- paste0(gsub("\\", "/", fileSnapshot()$path, fixed=TRUE))
file.copy(from = file.path(file_dir, scriptName::current_filename()) ,
to = file.path(getwd(), scriptName::current_filename()))
Alternatively, one can add to the file name the date and our to help in >distinguishing that file from the source like this:
file.copy(from = file.path(current_dir, current_filename()) ,
to = file.path(getwd(), subDir, paste0(current_filename(),"_", Sys.time(), ".R")))
Related
I have a flexdashboard which is used by multiple users. They read, modify and write the same (csv) file. I haven't been able to figure out how to do this with a SQL connection so in the meantime (I need a working app) I would like to use a simple .csv file as a database. This should be fine since the users aren't likely to work on it the exact same time and loading and writing the full file is almost instant.
My strategy is therefore:
1-load file,
2-edit (edits are done in rhandsontable which is backconverted to a dataframe)
3-save:
(a)-loads file again (to get the latest data),
(b)-appends the edits from the rhandsontable and keeps the latest data (indicated by a timestamp)
(c)-write.csv
I'm thinking I should add something in (1) such that it checks if the file is not already in use/open (because an other user is at (3). So: check if open, if not-> continue, else-> sys.sleep(3) and try again.
Any ideas about how to do this in R? In Delphi it would be something like:
if fileinuse(filename) then sleep(3) else df<-read.csv
What's the R way?
Edit:
I'm starting with the my edited answer as it's more elegant. This uses a shell command to test whether a file is available as discussed in this question:
How to check in command-line if a given file or directory is locked (used by any process)?
This avoids loading and saving the file, and is therefor more efficient.
# Function to test availability
IsInUse <- function(fName) {
shell(paste("( type nul >> ", fName, " ) 2>nul && echo available || echo in use", sep=""), intern = TRUE)=="in use"
}
# Test availability
IsInUse("test.txt")
Original answer:
Interesting question! I did not find a way to check if a file is in use before trying to write to it. The solution below is far from elegant. It relies on a tryCatch function, and on reading and writing to a file to check if it is available (which can be quite slow depending on your file size).
# Function to check if the file is in use (relies on reading and writing which is inefficient)
IsInUse <- function(fName) {
rData <- read.csv(fName)
tryCatch(
{
write.csv(rData, file=fName, row.names = FALSE)
return(FALSE)
},
error=function(cond) {
return(TRUE)
}
)
}
# Loop to check if file is in use
while(IsInUse(fName)) {
print("Still in use")
Sys.sleep(0.1)
}
# Your action here
I also found the answer to this question useful How to write trycatch in R to make sense of the tryCatch function.
I'd be interested to see if anyone else has a more elegant suggestion!
Interesting question, indeed! Curious about an elegant solution too...
I've been searching for a while for a way to get the name of the currently executed script. Most answers I've seen were one of:
Use commandArgs() - but this won't work for me because in RStudio commandArgs() does not return the filepath
Define the name of the script as the top line and then use that in the rest of the script
I saw one mention of sys.frames() and found out that I can use sys.frame(1)$ofile to get the name of the currently executing script. I don't know much about these kinds of functions, so can anyone advise me if that's a bad a idea or when it can fail me?
Thanks
The problem is that R does't really run code as "scripts." When you "source" a file, it's basically like re-typing the contents of the file at the console. The exception is that functions can keep track of where they were sourced from.
So if you had a file like mycode.R that had
fn <- function(x) {
x + 1 # A comment, kept as part of the source
}
and then you can do
source("mycode.R")
getSrcFilename(fn)
# [1] "mycode.R"
so in order to do that you just need to know a name of the function in the file. You could also make a function like this
gethisfilename <- function(z) {
x<-eval(match.call()[[1]])
getSrcFilename(x)
}
Assuming it's also in mycode.R, you can do
source("mycode.R")
gethisfilename()
# [1] "mycode.R"
Actually I think it is a bad idea, as I explained in my comment here: if you place this code in file1.R and then you source("file1.R") from file2.R, this will actually return "file2.R" instead of "file1.R", where it is called from!
So, to overcome this, you need to use sys.frames() and go for this solution: https://stackoverflow.com/a/1816487/684229
this.file.name <- function () # https://stackoverflow.com/a/1816487
{
frame_files <- lapply(sys.frames(), function(x) x$ofile)
frame_files <- Filter(Negate(is.null), frame_files)
frame_files[[length(frame_files)]]
}
Then you can use this.file.name() in any script and it will return the correct answer! It doesn't depend how deep is the "source-stack". And also it doesn't depend where is the this.file.name() function defined. It will return the information of the source file where it's called from.
(and apart from MrFlick's interesting solution, this doesn't need any function to be defined in the file)
I am not a programmer and have less than a month of experience with R but have been writing simple scripts that read from external CSV files for the past week. The function below, which reads data from a CSV file, was originally more complex but I repeatedly shortened it during troubleshooting until I was left with this:
newfunction <- function(input1, input2) {
processingobject <- read.csv("processing-file.csv")
print(head(processingobject))
}
I can print both the head and the entire processingobject within the script without a problem, but after the script ends processingobject no longer exists. It never appears in the RStudio global environment pane. Shouldn't processingobject still exist after the script terminates?
The script runs without displaying any error or warning messages. I tried assigning processingobject to a second variable: processingobject2 <- processingobject but the second variable doesn't exist after the script ends either. I also tried clearing the global environment and restarting RStudio, but that did not work either. If at the prompt after the script I type processingobject I get the message "Error: object 'processingobject' not found". The CSV file itself is perfectly normal as far as I can tell.
Obviously, I must be doing something very stupid. Please help me. Thanks.
You need to use return in your function and reproducible examples are always a good idea. It makes it easier for people to help you out.
ncol<- 10
nrow<- 100
x <- matrix(runif(nrow*ncol),nrow,ncol)
write.csv(x,file="example_data.csv")
newfunction <- function(input1) {
processingobject <- read.csv("example_data.csv")
result <- apply(processingobject,2,function(x)x*input1) #doing something to csv with input
print(head(result))
return(result)
}
newcsv <-newfunction(3)
I have what I think is a common enough issue, on optimising workflow in R. Specifically, how can I avoid the common issue of having a folder full of output (plots, RData files, csv, etc.), without, after some time, having a clue where they came from or how they were produced? In part, it surely involves trying to be intelligent about folder structure. I have been looking around, but I'm unsure of what the best strategy is. So far, I have tackled it in a rather unsophisticated (overkill) way: I created a function metainfo (see below) that writes a text file with metadata, with a given file name. The idea is that if a plot is produced, this command is issued to produce a text file with exactly the same file name as the plot (except, of course, the extension), with information on the system, session, packages loaded, R version, function and file the metadata function was called from, etc. The questions are:
(i) How do people approach this general problem? Are there obvious ways to avoid the issue I mentioned?
(ii) If not, does anyone have any tips on improving this function? At the moment it's perhaps clunky and not ideal. Particularly, getting the file name from which the plot is produced doesn't necessarily work (the solution I use is one provided by #hadley in 1). Any ideas would be welcome!
The function assumes git, so please ignore the probable warning produced. This is the main function, stored in a file metainfo.R:
MetaInfo <- function(message=NULL, filename)
{
# message - character string - Any message to be written into the information
# file (e.g., data used).
# filename - character string - the name of the txt file (including relative
# path). Should be the same as the output file it describes (RData,
# csv, pdf).
#
if (is.null(filename))
{
stop('Provide an output filename - parameter filename.')
}
filename <- paste(filename, '.txt', sep='')
# Try to get as close as possible to getting the file name from which the
# function is called.
source.file <- lapply(sys.frames(), function(x) x$ofile)
source.file <- Filter(Negate(is.null), source.file)
t.sf <- try(source.file <- basename(source.file[[length(source.file)]]),
silent=TRUE)
if (class(t.sf) == 'try-error')
{
source.file <- NULL
}
func <- deparse(sys.call(-1))
# MetaInfo isn't always called from within another function, so func could
# return as NULL or as general environment.
if (any(grepl('eval', func, ignore.case=TRUE)))
{
func <- NULL
}
time <- strftime(Sys.time(), "%Y/%m/%d %H:%M:%S")
git.h <- system('git log --pretty=format:"%h" -n 1', intern=TRUE)
meta <- list(Message=message,
Source=paste(source.file, ' on ', time, sep=''),
Functions=func,
System=Sys.info(),
Session=sessionInfo(),
Git.hash=git.h)
sink(file=filename)
print(meta)
sink(file=NULL)
}
which can then be called in another function, stored in another file, e.g.:
source('metainfo.R')
RandomPlot <- function(x, y)
{
fn <- 'random_plot'
pdf(file=paste(fn, '.pdf', sep=''))
plot(x, y)
MetaInfo(message=NULL, filename=fn)
dev.off()
}
x <- 1:10
y <- runif(10)
RandomPlot(x, y)
This way, a text file with the same file name as the plot is produced, with information that could hopefully help figure out how and where the plot was produced.
In terms of general R organization: I like to have a single script that recreates all work done for a project. Any project should be reproducible with a single click, including all plots or papers associated with that project.
So, to stay organized: keep a different directory for each project, each project has its own functions.R script to store non-package functions associated with that project, and each project has a master script that starts like
## myproject
source("functions.R")
source("read-data.R")
source("clean-data.R")
etc... all the way through. This should help keep everything organized, and if you get new data you just go to early scripts to fix up headers or whatever and rerun the entire project with a single click.
There is a package called Project Template that helps organize and automate the typical workflow with R scripts, data files, charts, etc. There is also a number of helpful documents like this one Workflow of statistical data analysis by Oliver Kirchkamp.
If you use Emacs and ESS for your analyses, learning Org-Mode is a must. I use it to organize all my work. Here is how it integrates with R: R Source Code Blocks in Org Mode.
There is also this new free tool called Drake which is advertised as "make for data".
I think my question belies a certain level of confusion. Having looked around, as well as explored the suggestions provided so far, I have reached the conclusion that it is probably not important to know where and how a file is produced. You should in fact be able to wipe out any output, and reproduce it by rerunning code. So while I might still use the above function for extra information, it really is a question of being ruthless and indeed cleaning up folders every now and then. These ideas are more eloquently explained here. This of course does not preclude the use of Make/Drake or Project Template, which I will try to pick up on. Thanks again for the suggestions #noah and #alex!
There is also now an R package called drake (Data Frames in R for Make), independent from Factual's Drake. The R package is also a Make-like build system that links code/dependencies with output.
install.packages("drake") # It is on CRAN.
library(drake)
load_basic_example()
plot_graph(my_plan)
make(my_plan)
Like it's predecessor remake, it has the added bonus that you do not have to keep track of a cumbersome pile of files. Objects generated in R are cached during make() and can be reloaded easily.
readd(summ_regression1_small) # Read objects from the cache.
loadd(small, large) # Load objects into your R session.
print(small)
But you can still work with files as single-quoted targets. (See 'report.Rmd' and 'report.md' in my_plan from the basic example.)
There is package developed by RStudio called pins that might address this problem.
I have a program in R. Sometimes when I save history, they do not write into my history file. I lost some histories a few times and this really drive me crazy.
Any recommendation on how to avoid this?
First check your working directory (getwd()). savehistory() saves the history in the current working directory. And to be honest, you better specify the filename, as the default is .History. Say :
savehistory('C:/MyWorkingDir/MySession.RHistory')
which allows you to :
loadhistory('C:/MyWorkingDir/MySession.RHistory')
So the history is not lost, it's just in a place and under a name you weren't aware of. See also ?history.
To clarify : the history is no more than a text file containing all commands of that current session. So it's a nice log of what you've done, but I almost never use it. I construct my "analysis log" myself by using scripts, as hinted in another answer.
#Stedy has provided a workable solution to your immediate question. I would encourage you to learn how to use .R files and a proper text editor, or use an integrated development environment (see this SO page for suggestions). You can then source() in your .R file so that you can consistently replicate your analysis.
For even better replicability, invest the time into learning Sweave. You'll be glad you did.
Check the Rstudio_Desktop/history_database file - it stores every command for any working directory.
See here for more details How to save the whole sequence of commands from a specific day to a file?
Logging your console on a regular basis to **dated* files is handy. The package TeachingDemos has a great function for logging your console session, but it's written as a singleton, which is problematic for automatic logging, since you wouldn't be able to use that function to create teaching demo's if you use it for logging. I re-used that function using a bit of meta-programming to make a copy of that functionality that I include in the .First function in my local .Rprofile, as follows:
.Logger <- (function(){
# copy local versions of the txtStart,
locStart <- TeachingDemos::txtStart
locStop <- TeachingDemos::txtStop
locR2txt <- TeachingDemos:::R2txt
# creat a local environment and link it to each function
.e. <- new.env()
.e.$R2txt.vars <- new.env()
environment(locStart) <- .e.
environment(locStop) <- .e.
environment(locR2txt) <- .e.
# reference the local functions in the calls to `addTaskCallback`
# and `removeTaskCallback`
body(locStart)[[length(body(locStart))-1]] <-
substitute(addTaskCallback(locR2txt, name='locR2txt'))
body(locStop)[[2]] <-
substitute(removeTaskCallback('locR2txt'))
list(start=function(logDir){
op <- options()
locStart(file.path(logDir,format(Sys.time(), "%Y_%m_%d_%H_%M_%S.txt")),
results=FALSE)
options(op)
}, stop = function(){
op <- options()
locStop()
options(op)
})
})()
.First <- function(){
if( interactive() ){
# JUST FOR FUN
cat("\nWelcome",Sys.info()['login'],"at", date(), "\n")
if('fortunes' %in% utils::installed.packages()[,1] )
print(fortunes::fortune())
# CONSTANTS
TIME <- Sys.time()
logDir <- "~/temp/Rconsole.logfiles"
# CREATE THE TEMP DIRECORY IF IT DOES NOT ALREADY EXIST
dir.create(logDir, showWarnings = FALSE)
# DELETE FILES OLDER THAN A WEEK
for(fname in list.files(logDir))
if(difftime(TIME,
file.info(file.path(logDir,fname))$mtime,
units="days") > 7 )
file.remove(file.path(logDir,fname))
# sink() A COPY OF THE TERMINAL OUTPUT TO A DATED LOG FILE
if('TeachingDemos' %in% utils::installed.packages()[,1] )
.Logger$start(logDir)
else
cat('install package `TeachingDemos` to enable console logging')
}
}
.Last <- function(){
.Logger$stop()
}
This causes a copy of the terminal contents to be copied to a dated log file. The nice thing about having dated files is that if you use multiple R sessions the log files won't conflict, unless you start multiple interactive sessions in the same second).