R sometimes does not save my history - r

I have a program in R. Sometimes when I save history, they do not write into my history file. I lost some histories a few times and this really drive me crazy.
Any recommendation on how to avoid this?

First check your working directory (getwd()). savehistory() saves the history in the current working directory. And to be honest, you better specify the filename, as the default is .History. Say :
savehistory('C:/MyWorkingDir/MySession.RHistory')
which allows you to :
loadhistory('C:/MyWorkingDir/MySession.RHistory')
So the history is not lost, it's just in a place and under a name you weren't aware of. See also ?history.
To clarify : the history is no more than a text file containing all commands of that current session. So it's a nice log of what you've done, but I almost never use it. I construct my "analysis log" myself by using scripts, as hinted in another answer.

#Stedy has provided a workable solution to your immediate question. I would encourage you to learn how to use .R files and a proper text editor, or use an integrated development environment (see this SO page for suggestions). You can then source() in your .R file so that you can consistently replicate your analysis.
For even better replicability, invest the time into learning Sweave. You'll be glad you did.

Check the Rstudio_Desktop/history_database file - it stores every command for any working directory.
See here for more details How to save the whole sequence of commands from a specific day to a file?

Logging your console on a regular basis to **dated* files is handy. The package TeachingDemos has a great function for logging your console session, but it's written as a singleton, which is problematic for automatic logging, since you wouldn't be able to use that function to create teaching demo's if you use it for logging. I re-used that function using a bit of meta-programming to make a copy of that functionality that I include in the .First function in my local .Rprofile, as follows:
.Logger <- (function(){
# copy local versions of the txtStart,
locStart <- TeachingDemos::txtStart
locStop <- TeachingDemos::txtStop
locR2txt <- TeachingDemos:::R2txt
# creat a local environment and link it to each function
.e. <- new.env()
.e.$R2txt.vars <- new.env()
environment(locStart) <- .e.
environment(locStop) <- .e.
environment(locR2txt) <- .e.
# reference the local functions in the calls to `addTaskCallback`
# and `removeTaskCallback`
body(locStart)[[length(body(locStart))-1]] <-
substitute(addTaskCallback(locR2txt, name='locR2txt'))
body(locStop)[[2]] <-
substitute(removeTaskCallback('locR2txt'))
list(start=function(logDir){
op <- options()
locStart(file.path(logDir,format(Sys.time(), "%Y_%m_%d_%H_%M_%S.txt")),
results=FALSE)
options(op)
}, stop = function(){
op <- options()
locStop()
options(op)
})
})()
.First <- function(){
if( interactive() ){
# JUST FOR FUN
cat("\nWelcome",Sys.info()['login'],"at", date(), "\n")
if('fortunes' %in% utils::installed.packages()[,1] )
print(fortunes::fortune())
# CONSTANTS
TIME <- Sys.time()
logDir <- "~/temp/Rconsole.logfiles"
# CREATE THE TEMP DIRECORY IF IT DOES NOT ALREADY EXIST
dir.create(logDir, showWarnings = FALSE)
# DELETE FILES OLDER THAN A WEEK
for(fname in list.files(logDir))
if(difftime(TIME,
file.info(file.path(logDir,fname))$mtime,
units="days") > 7 )
file.remove(file.path(logDir,fname))
# sink() A COPY OF THE TERMINAL OUTPUT TO A DATED LOG FILE
if('TeachingDemos' %in% utils::installed.packages()[,1] )
.Logger$start(logDir)
else
cat('install package `TeachingDemos` to enable console logging')
}
}
.Last <- function(){
.Logger$stop()
}
This causes a copy of the terminal contents to be copied to a dated log file. The nice thing about having dated files is that if you use multiple R sessions the log files won't conflict, unless you start multiple interactive sessions in the same second).

Related

R not remembering objects written within functions

I'm struggling to clearly explain this problem.
Essentially, something has seemed to have happened within the R environment and none of the code I write inside my functions are working and not data is being saved. If I type a command line directly into the console it works (i.e. Monkey <- 0), but if I type it within a function, it doesn't store it when I run the function.
It could be I'm missing a glaring error in the code, but I noticed the problem when I accidentally clicked on the debugger and tried to excite out of the browser[1] prompt which appeared.
Any ideas? This is driving me nuts.
corr <- function(directory, threshold=0) {
directory <- paste(getwd(),"/",directory,"/",sep="")
file.list <- list.files(directory)
number <- 1:length(file.list)
monkey <- c()
for (i in number) {
x <- paste(directory,file.list[i],sep="")
y <- read.csv(x)
t <- sum(complete.cases(y))
if (t >= threshold) {
correl <- cor(y$sulfate, y$nitrate, use='pairwise.complete.obs')
monkey <- append(monkey,correl)}
}
#correl <- cor(newdata$sulfate, newdata$nitrate, use='pairwise.complete.obs')
#summary(correl)
}
corr('specdata', 150)
monkey```
It's a namespace issue. Functions create their own 'environment', that isn't necessarily in the global environment.
Using <- will assign in the local environment. To save an object to the global environment, use <<-
Here's some information on R environments.
I suggest you give a look at some tutorial on using functions in R.
Briefly (and sorry for my horrible explanation) objects that you define within functions will ONLY be defined within functions, unless you explicitly export them using (one of the possible approaches) the return() function.
browser() is indeed used for debugging, keeps you inside the function, and allows you accessing objects created inside the function.
In addition, to increase the probability to have useful answers, I suggest that you try to post a self-contained, working piece of code allowing quickly reproducing the issue. Here you are reading some files we have no access to.
It seems to me you have to store the output yourself when you run your script:
corr_out <- corr('specdata', 150)

import file when not in use (any more)

I have a flexdashboard which is used by multiple users. They read, modify and write the same (csv) file. I haven't been able to figure out how to do this with a SQL connection so in the meantime (I need a working app) I would like to use a simple .csv file as a database. This should be fine since the users aren't likely to work on it the exact same time and loading and writing the full file is almost instant.
My strategy is therefore:
1-load file,
2-edit (edits are done in rhandsontable which is backconverted to a dataframe)
3-save:
(a)-loads file again (to get the latest data),
(b)-appends the edits from the rhandsontable and keeps the latest data (indicated by a timestamp)
(c)-write.csv
I'm thinking I should add something in (1) such that it checks if the file is not already in use/open (because an other user is at (3). So: check if open, if not-> continue, else-> sys.sleep(3) and try again.
Any ideas about how to do this in R? In Delphi it would be something like:
if fileinuse(filename) then sleep(3) else df<-read.csv
What's the R way?
Edit:
I'm starting with the my edited answer as it's more elegant. This uses a shell command to test whether a file is available as discussed in this question:
How to check in command-line if a given file or directory is locked (used by any process)?
This avoids loading and saving the file, and is therefor more efficient.
# Function to test availability
IsInUse <- function(fName) {
shell(paste("( type nul >> ", fName, " ) 2>nul && echo available || echo in use", sep=""), intern = TRUE)=="in use"
}
# Test availability
IsInUse("test.txt")
Original answer:
Interesting question! I did not find a way to check if a file is in use before trying to write to it. The solution below is far from elegant. It relies on a tryCatch function, and on reading and writing to a file to check if it is available (which can be quite slow depending on your file size).
# Function to check if the file is in use (relies on reading and writing which is inefficient)
IsInUse <- function(fName) {
rData <- read.csv(fName)
tryCatch(
{
write.csv(rData, file=fName, row.names = FALSE)
return(FALSE)
},
error=function(cond) {
return(TRUE)
}
)
}
# Loop to check if file is in use
while(IsInUse(fName)) {
print("Still in use")
Sys.sleep(0.1)
}
# Your action here
I also found the answer to this question useful How to write trycatch in R to make sense of the tryCatch function.
I'd be interested to see if anyone else has a more elegant suggestion!
Interesting question, indeed! Curious about an elegant solution too...

Copy script after executing

Is there a possibility to copy the executed lines or basically the script into the working directory?
So my normal scenario is, I have stand alone script which just need to be sourced within a working directory and they will do everything I need.
After a few month, I made update to these scripts and I would love to have a snapshot from the script when I executed the source...
So basically file.copy(ITSELF, '.') or something like this.
I think this is what you're looking for:
file.copy(sys.frame(1)$ofile,
to = file.path(dirname(sys.frame(1)$ofile),
paste0(Sys.Date(), ".R")))
This will take the current file and copy it to a new file in the same directory with the name of currentDate.R, so for example 2015-07-14.R
If you want to copy to the working directory instead of the original script directory, use
file.copy(sys.frame(1)$ofile,
to = file.path(getwd(),
paste0(Sys.Date(), ".R")))
Just note that the sys.frame(1)$ofile only works if a saved script is sourced, trying to run it in terminal will fail. It is worth mentioning though that this might not be the best practice. Perhaps looking into a version control system would be better.
Explanation:
TBH, I might not be the best person to explain this (I copied this idea from somewhere and use it sometimes), but I'll try. Basically in order to have information about the script file R needs to be running it as a file inside an environment with that information, and when that environment is a source call it contains the ofile data. We use (1) to select the next (source()'s) environment following the global environment (which is 0). When you're running this from terminal, there's no frame/environment other than Global (that's the error message), since no file is being ran - the commands are sent straight to terminal.
To illustrate that, we can do a simple test:
> sys.frame(1)
Error in sys.frame(1) : not that many frames on the stack
But if we call that from another function:
> myf <- function() sys.frame(1)
> myf()
<environment: 0x0000000013ad7638>
Our function's environment doesn't have anything in it, so it exists but, in this case, does not have ofile:
> myf <- function() names(sys.frame(1))
> myf()
character(0)
I just wanted to add my solution since I decided to use a try function before executing the copy command... Because I have the feeling I miss some control...
try({
script_name <- sys.frame(1)$ofile
copy_script_name <-
paste0(sub('\\.R', '', basename(script_name)),
'_',
format(Sys.time(), '%Y%m%d%H%M%S'),
'.R')
file.copy(script_name,
copy_script_name)
})
This will copy the script into the current directory and also adds a timestamp to the filename. In case something goes wrong, the rest of the script will still execute.
I originally posted this other thread, and I think it addresses your problem: https://stackoverflow.com/a/62781925/9076267
In my case, I needed a way to copy the executing file to back up the original >script together with its outputs. This is relatively important in research.
What worked for me while running my script on the command line, was a mixure of >other solutions presented here, that looks like this:
library(scriptName)
file_dir <- paste0(gsub("\\", "/", fileSnapshot()$path, fixed=TRUE))
file.copy(from = file.path(file_dir, scriptName::current_filename()) ,
to = file.path(getwd(), scriptName::current_filename()))
Alternatively, one can add to the file name the date and our to help in >distinguishing that file from the source like this:
file.copy(from = file.path(current_dir, current_filename()) ,
to = file.path(getwd(), subDir, paste0(current_filename(),"_", Sys.time(), ".R")))

"File is still in use": How to free all resources / force deletion of files?

Question: How to free all file handlers / connections R is using ? In Python, one could have a look which objects are still alive. Is there anything comparable in R?
Within a function, I create a directory with some files. At the end of the function, it should be deleted again. I am facing the problem that I am unable to delete the files, presumably because a file handler is still open. The example is with the MetaSKAT package, but I'm interested in a general solution. The example data can be found here: https://groups.google.com/group/skat_slee/attach/28a76339619d8358/Datasets.zip?part=4&authuser=0
# Code author: Seunggeun (Shawn) Lee
setwd('./Datasets')
foo <- function(dir.name) {
###### Preparation stuff ################################################
if (!require(MetaSKAT)) {install.packages('MetaSKAT'); require(MetaSKAT)}
dir.create(file.path('.',dir.name),showWarnings=F)
dir.path <- paste("./",dir.name,sep="")
file.copy(c("01.fam","01.bed", "01.bim", "01_3.SetID"), dir.path)
setwd(dir.path)
FAM<-read.table("01.fam", header=FALSE)
y<-FAM[,6]
N.Sample<-length(y)
x1<-rnorm(N.Sample)
x2<-rbinom(N.Sample,1, 0.5)
obj <-SKAT_Null_Model(y~cbind(x1, x2))
re <-Generate_Meta_Files(obj, "01.bed", "01.bim", "01_3.SetID", "01.MSSD", "01.MInfo", N.Sample)
###### Problem #######################################################
print(file.remove(list.files(), force = T)) # problem: cannot delete
# curiously, sometimes there is 1, sometimes 2 False...
###### my different tries to solve it ################################
rm(re)
closeAllConnections()
sink.number() # shows 0
rm(list = ls())
gc()
###### problem is still there ######################################
print(file.remove(list.files()))
setwd('..')
# print(unlink(dir.path, recursive = T)) # I want finally delete the directory
}
debug(foo)
foo("temp2")
I am using R Studio. Even if a try to delete it manually in Windows while R is still open, it tells me that the file is be used by a program. I can only delete it after I closed R.
So how can I force R to free these files? I will try to solve the problem at the root and look at the source code of Generate_Meta_Files(), but I thought there must be a global function in R which forces to free everything (Note: I am well aware that it does not make sense to create the files and delete it directly afterwards, it's just an example.)
Edit: After a hint, I tried it under Linux. It turns out that though it shows me that there was a problem with deletion of one (of the 6) files, all is properly deleted, hence I guess this is a windows-specific problem. Any hints what this is?

R: Improving workflow and keeping track of output

I have what I think is a common enough issue, on optimising workflow in R. Specifically, how can I avoid the common issue of having a folder full of output (plots, RData files, csv, etc.), without, after some time, having a clue where they came from or how they were produced? In part, it surely involves trying to be intelligent about folder structure. I have been looking around, but I'm unsure of what the best strategy is. So far, I have tackled it in a rather unsophisticated (overkill) way: I created a function metainfo (see below) that writes a text file with metadata, with a given file name. The idea is that if a plot is produced, this command is issued to produce a text file with exactly the same file name as the plot (except, of course, the extension), with information on the system, session, packages loaded, R version, function and file the metadata function was called from, etc. The questions are:
(i) How do people approach this general problem? Are there obvious ways to avoid the issue I mentioned?
(ii) If not, does anyone have any tips on improving this function? At the moment it's perhaps clunky and not ideal. Particularly, getting the file name from which the plot is produced doesn't necessarily work (the solution I use is one provided by #hadley in 1). Any ideas would be welcome!
The function assumes git, so please ignore the probable warning produced. This is the main function, stored in a file metainfo.R:
MetaInfo <- function(message=NULL, filename)
{
# message - character string - Any message to be written into the information
# file (e.g., data used).
# filename - character string - the name of the txt file (including relative
# path). Should be the same as the output file it describes (RData,
# csv, pdf).
#
if (is.null(filename))
{
stop('Provide an output filename - parameter filename.')
}
filename <- paste(filename, '.txt', sep='')
# Try to get as close as possible to getting the file name from which the
# function is called.
source.file <- lapply(sys.frames(), function(x) x$ofile)
source.file <- Filter(Negate(is.null), source.file)
t.sf <- try(source.file <- basename(source.file[[length(source.file)]]),
silent=TRUE)
if (class(t.sf) == 'try-error')
{
source.file <- NULL
}
func <- deparse(sys.call(-1))
# MetaInfo isn't always called from within another function, so func could
# return as NULL or as general environment.
if (any(grepl('eval', func, ignore.case=TRUE)))
{
func <- NULL
}
time <- strftime(Sys.time(), "%Y/%m/%d %H:%M:%S")
git.h <- system('git log --pretty=format:"%h" -n 1', intern=TRUE)
meta <- list(Message=message,
Source=paste(source.file, ' on ', time, sep=''),
Functions=func,
System=Sys.info(),
Session=sessionInfo(),
Git.hash=git.h)
sink(file=filename)
print(meta)
sink(file=NULL)
}
which can then be called in another function, stored in another file, e.g.:
source('metainfo.R')
RandomPlot <- function(x, y)
{
fn <- 'random_plot'
pdf(file=paste(fn, '.pdf', sep=''))
plot(x, y)
MetaInfo(message=NULL, filename=fn)
dev.off()
}
x <- 1:10
y <- runif(10)
RandomPlot(x, y)
This way, a text file with the same file name as the plot is produced, with information that could hopefully help figure out how and where the plot was produced.
In terms of general R organization: I like to have a single script that recreates all work done for a project. Any project should be reproducible with a single click, including all plots or papers associated with that project.
So, to stay organized: keep a different directory for each project, each project has its own functions.R script to store non-package functions associated with that project, and each project has a master script that starts like
## myproject
source("functions.R")
source("read-data.R")
source("clean-data.R")
etc... all the way through. This should help keep everything organized, and if you get new data you just go to early scripts to fix up headers or whatever and rerun the entire project with a single click.
There is a package called Project Template that helps organize and automate the typical workflow with R scripts, data files, charts, etc. There is also a number of helpful documents like this one Workflow of statistical data analysis by Oliver Kirchkamp.
If you use Emacs and ESS for your analyses, learning Org-Mode is a must. I use it to organize all my work. Here is how it integrates with R: R Source Code Blocks in Org Mode.
There is also this new free tool called Drake which is advertised as "make for data".
I think my question belies a certain level of confusion. Having looked around, as well as explored the suggestions provided so far, I have reached the conclusion that it is probably not important to know where and how a file is produced. You should in fact be able to wipe out any output, and reproduce it by rerunning code. So while I might still use the above function for extra information, it really is a question of being ruthless and indeed cleaning up folders every now and then. These ideas are more eloquently explained here. This of course does not preclude the use of Make/Drake or Project Template, which I will try to pick up on. Thanks again for the suggestions #noah and #alex!
There is also now an R package called drake (Data Frames in R for Make), independent from Factual's Drake. The R package is also a Make-like build system that links code/dependencies with output.
install.packages("drake") # It is on CRAN.
library(drake)
load_basic_example()
plot_graph(my_plan)
make(my_plan)
Like it's predecessor remake, it has the added bonus that you do not have to keep track of a cumbersome pile of files. Objects generated in R are cached during make() and can be reloaded easily.
readd(summ_regression1_small) # Read objects from the cache.
loadd(small, large) # Load objects into your R session.
print(small)
But you can still work with files as single-quoted targets. (See 'report.Rmd' and 'report.md' in my_plan from the basic example.)
There is package developed by RStudio called pins that might address this problem.

Resources