Saving R history from a particular session - r

I'm an instructor, and my students have expressed interest in a record of the code I run in class. Since this is often off-the-cuff, I would like to have an easy function I could run at the end of class which would save everything I had run in the session. I know savehistory() will save my entire history, but that is not what I'm looking for.
Similar to this question, except I believe I have a pedagogically sound reason for wanting the history, and I'm looking to limit what gets saved on a per-session basis, rather than on the basis of the number of lines.

I think if you invoke R with --no-restore-history (so that history from previous sessions isn't appended to the record for this one) and add
.Last <- function() {
if(interactive())
try(savehistory(paste0("~/.Rhistory_", sys.time())))
}
to your Rprofile, you should get self-contained, and timestamped history files every time R closes naturally.
The .Last function, when defined in the global environment, is called directly before normal close. See ?.Last
NB: this won't preserve your history in the case of a fatal error (crash) in R itself, though that shouldn't come up much in a teaching context.
NB2: the above code will have generated file names with spaces in them. Depending on your OS, this could range from no big deal to hairpulling nightmarescape. If it's a problem, wrap sys.time() with your favorite datetime formatting code, e.g. format(sys.time(), "<format string>") or something from lubridate (probably, I don't actually know as I don't use it myself).

In the development version of rite on GitHub (>= v0.3.6), you can use the sinkstart() function to dump all of your code, all of your results, or both to a little interactive Tcl/tk widget, from which you could then just copy or save the output.
To make it work you can do this:
devtools::install_github("leeper/rite")
library("rite")
sinkstart(print.eval = FALSE, prompt.echo = "", split = TRUE)
## any code here
sinkstop() # stop printing to the widget
That will look something like:
You can dynamically change what is printed in the widget from the context (right-click) menu on the widget. You can also dynamically switch between sinkstart() and sinkstop() if you only want some code and/or results output to there.
Full Disclosure: This is a package I developed.

Related

R package: Refresh/Update internal raw data

I am developing a package in R (3.4.3) that has some internal data and I would like to be able to let users refresh it without having to do too much work. But let me be more specific.
The package contains several functions that rely on multiple parameters but I am really only exporting wrapper functions that do all the work. Because there is a large number of parameters to pass to those wrapper functions, I am putting them in a .csv file and passing them as a list. The data is therefore a simple .csv file called param.csv with the names of the parameters in one column and their value in the next column
parameter_name,parameter_value
ticker, SPX Index
field, PX_LAST
rolling_window,252
upper_threshold,1
lower_threshold,-1
signal_positive,1
signal_neutral,0
signal_negative,-1
I process it running
param.df <- read.csv(file = "param.csv", header = TRUE, sep = ",")
param.list <- split(param.df[, 2], param.df[, 1])
and I put the list inside the package like this
usethis::use_data(param.list, overwrite = TRUE)
Then, when reset the R interactive window and execute
devtools::document()
devtools::load_all()
devtools::install()
require(pkg)
everything works out great, the data is available and can be passed to functions.
First problem: when I change param.csv, save it and repeat the above four lines of code, the internal param.list is not updated. Is there something I am missing here ? Or is it by nature that the developer of the package should run usethis::use_data(param.list, overwrite = TRUE) each time he changes the .csv file where the data comes from?
Because it is intended to be a sort of map, the users will want to tweak parameters for (manual) model calibration. To try and solve this issue I let users provide the functions with their own parameter list (as a .csv file like before). I have a function get_param_list("path/to/file.csv") that does exactly the same processing that is done above and returns the list. It allows the users to pass their own parameter list. De facto, the internal data param.list is considered a default setting of parameters.
Second Problem: I'd like to let users modify this default list of parameters param.list from outside the package in a safe way.
So since the internal object is a list, the users can simply modify the element of their choice in the list from outside the package. But that is pretty dangerous in my mind because users could
- forget parameters, so I have default values inside the functions for that case
- replace a parameter with another type which can provoke an error or worse, side effects.
Is there a way to let users modify the internal list of parameters without opening the package ? For example I had thought about a function that could replace the .csv file by a new one provided by the user but this brings me back to the First Problem, when I restart the R prompt and reinstall the package, nothing happens unless I usethis::use_data(param.list, overwrite = TRUE) from inside the package. Would another type of data be more useful (e.g. make the data internal) ?
This is my first question on this website but I realize it's quite long and might be taken as a coding style question. But if anyone can see an obvious mistake or misunderstanding of package development in R from my side that would really help me.
Cheers

How to run code when opening Rstudio? [duplicate]

I wrote an R function that updates the version number of a package in another question. I work a lot with GitHub and RStudio, and it would safe me quite some time (plus be much more precise) if this function was automatically run every time I opened a certain project (or better yet, make a git commit/push, but I assume that is harder to do). But I don't know how to do this or if this is even possible.
I could use .Rprofile to run R codes every time I start R, so I could just update versions whenever I start R (or build in that it only updates the version if the date is not today or something) but that seems overdoing it.
You can make a separate .Rprofile for each project. You have to put it in the main directory of the project (http://www.rstudio.com/ide/docs/using/projects).
Well I would use .Rprofile for that. There is something to be said for being independent of the tool chain around you: knitr works from RStudio as well as without it, dito for Rcpp/RInside etc pp.
You can hook into commit hooks for svn, both explicitly via hooks in the back end, or simply at your by end adding wrapper scripts. I presume you can do likewise with git but I simply know much less about it. So to abstract this away, I would write myself a 'commitThis' or 'pushThis' or ... function that does the number increment, test run, code push and what have you.
If your code needs RStudio to be already running (e.g. because it's relying on some rstudioapi:: function), putting it directly in .Rprofile won't work (.Rprofile is executed before RStudio is available).
Instead, you could set a hook for "rstudio.sessionInit":
setHook(
hookName = "rstudio.sessionInit",
action = function(newSession) {
if (newSession) {
# your code goes here
},
action = "append"
)

How to use objects from global environment in Rstudio Markdown

I've seen similar questions on Stack Overflow but virtually no conclusive answers, and certainly no answer that worked for me.
What is the easiest way to access and use objects (regression fits, data frames, other objects) that are located in the global R environment in the Markdown (Rstudio) script.
I find it surprising that there is no easy solution to this, given the tendency of the RStudio team to make things comfortable and effective.
Thanks in advance.
For better or worse, this omission is intentional. Relying on objects created outside the document makes your document less reproducible--that is, if your document needs data in the global environment, you can't just give someone (or yourself in two years) the document and data files and let them recreate it themselves.
For this reason, and in order to perform the render in the background, RStudio actually creates a separate R session to render the document. That background R session cannot see any of the environments in the interactive R session you see in RStudio.
The best way around this problem is to take the code you used to create the contents of your global environment and move it inside your document (you can use echo = FALSE if you don't want it to show up in the document). This makes your document self-contained and reproducible.
If you can't do that, there are a few approaches you can take to use the data in the global environment directly:
Instead of using the Knit HTML button, type rmarkdown::render("your_doc.Rmd") at the R console. This will knit in the current session instead of a background session. Alternatively:
Save your global environment to an .Rdata file prior to rendering (use R's save function), and load it in your document.
Well, in my case i found the following solution:
(1) Save your Global Environmental in a .Rdata file inside the same folder where you have your .Rmd file. (You just need click at disquet picture that is on "Global Environmental" panel)
(2) Write the following code in your script of Rmarkdown:
load(file = "filename.RData") # it load the file that you saved before
and stop suffering.
Going to RStudio´s 'Tools' and 'Global options' and visiting the 'R Markdown' tab, you can make a selection in 'Evaluate chunks in directory', there select the option 'Documents' and the R Markdown knitting engine will be accessing the global environment as plain R code does. Hope this helps those who search this info!
The thread is old but in case anyone's still looking for a solution (as I was):
You can pass an envir parameter to the render() (or knit() function) so that it can access objects from the environment it was called from.
rmarkdown::render(
input = input_rmd,
output_file = output_file,
envir = parent.frame()
)
I have the same problem myself. Some stuff is pretty time consuming to reproduce every time.
I think there could be another answer. What if you save your environment with the save.image() function to a different file than the standard .Rdata one. Then, bring it back with load().
To be sure you are using the same data, use the md5sum() from tools.
Cheers, Cord
I think I solved this problem by referring to the package explicitly in the code that is being knitted. Using the yarrr package, for example, I loaded the dataframe "pirates" using data(pirates). This worked fine at the console and within an Rstudio code chunk, but with knitr it failed following the pattern in the question above. If, however, I loaded the data into memory by creating an object using pirates <- yarrr::pirates, the document then knitted cleanly to HTML.
You can load the script in the desired environment as follows:
```{r, include=FALSE}
source("your-script.R", local = knitr::knit_global())
# or sys.source("your-script.R", envir = knitr::knit_global())
```
Next in the R Markdown document, you can use objects created in these scripts (e.g., data objects or functions).
https://bookdown.org/yihui/rmarkdown-cookbook/source-script.html
One option that I have not yet seen is the use of parameters.
This chapter goes through a simple example of how to do this.

Automatically run R code when RStudio project is opened

I wrote an R function that updates the version number of a package in another question. I work a lot with GitHub and RStudio, and it would safe me quite some time (plus be much more precise) if this function was automatically run every time I opened a certain project (or better yet, make a git commit/push, but I assume that is harder to do). But I don't know how to do this or if this is even possible.
I could use .Rprofile to run R codes every time I start R, so I could just update versions whenever I start R (or build in that it only updates the version if the date is not today or something) but that seems overdoing it.
You can make a separate .Rprofile for each project. You have to put it in the main directory of the project (http://www.rstudio.com/ide/docs/using/projects).
Well I would use .Rprofile for that. There is something to be said for being independent of the tool chain around you: knitr works from RStudio as well as without it, dito for Rcpp/RInside etc pp.
You can hook into commit hooks for svn, both explicitly via hooks in the back end, or simply at your by end adding wrapper scripts. I presume you can do likewise with git but I simply know much less about it. So to abstract this away, I would write myself a 'commitThis' or 'pushThis' or ... function that does the number increment, test run, code push and what have you.
If your code needs RStudio to be already running (e.g. because it's relying on some rstudioapi:: function), putting it directly in .Rprofile won't work (.Rprofile is executed before RStudio is available).
Instead, you could set a hook for "rstudio.sessionInit":
setHook(
hookName = "rstudio.sessionInit",
action = function(newSession) {
if (newSession) {
# your code goes here
},
action = "append"
)

Cache expensive operations in R

A very simple question:
I am writing and running my R scripts using a text editor to make them reproducible, as has been suggested by several members of SO.
This approach is working very well for me, but I sometimes have to perform expensive operations (e.g. read.csv or reshape on 2M-row databases) that I'd better cache in the R environment rather than re-run every time I run the script (which is usually many times as I progress and test the new lines of code).
Is there a way to cache what a script does up to a certain point so every time I am only running the incremental lines of code (just as I would do by running R interactively)?
Thanks.
## load the file from disk only if it
## hasn't already been read into a variable
if(!(exists("mytable")){
mytable=read.csv(...)
}
Edit: fixed typo - thanks Dirk.
Some simple ways are doable with some combinations of
exists("foo") to test if a variable exists, else re-load or re-compute
file.info("foo.Rd")$ctime which you can compare to Sys.time() and see if it is newer than a given amount of time you can load, else recompute.
There are also caching packages on CRAN that may be useful.
After you do something you discover to be costly, save the results of that costly step in an R data file.
For example, if you loaded a csv into a data frame called myVeryLargeDataFrame and then created summary stats from that data frame into a df called VLDFSummary then you could do this:
save(c(myVeryLargeDataFrame, VLDFSummary),
file="~/myProject/cachedData/VLDF.RData",
compress="bzip2")
The compress option there is optional and to be used if you want to compress the file being written to disk. See ?save for more details.
After you save the RData file you can comment out the slow data loading and summary steps as well as the save step and simply load the data like this:
load("~/myProject/cachedData/VLDF.RData")
This answer is not editor dependent. It works the same for Emacs, TextMate, etc. You can save to any location on your computer. I recommend keeping the slow code in your R script file, however, so you can always know where your RData file came from and be able to recreate it from the source data if needed.
(Belated answer, but I began using SO a year after this question was posted.)
This is the basic idea behind memoization (or memoisation). I've got a long list of suggestions, especially the memoise and R.cache packages, in this query.
You could also take advantage of checkpointing, which is also addressed as part of that same list.
I think your use case mirrors my second: "memoization of monstrous calculations". :)
Another trick I use is to do a lot of memory mapped files, which I use a lot of, to store data. The nice thing about this is that multiple R instances can access shared data, so I can have a lot of instances cracking at the same problem.
I want to do this too when I'm using Sweave. I'd suggest putting all of your expensive functions (loading and reshaping data) at the beginning of your code. Run that code, then save the workspace. Then, comment out the expensive functions, and load the workspace file with load(). This is, of course, riskier if you make unwanted changes to the workspace file, but in that event, you still have the code in comments if you want to start over from scratch.
Without going into too much detail, I usually follow one of three approaches:
Use assign to assign a unique name for each important object throughout my execution. Then include an if(exists(...)) get(...) at the top of each function to get the value or else recompute it. (same as Dirk's suggestion)
Use cacheSweave with my Sweave documents. This does all the work for you of caching computations and retrieves them automatically. It's really trivial to use: just use the cacheSweave driver and add this flag to each block: <<..., cache=true>>=
Use save and load to save the environment at crucial moments, again making sure that all names are unique.
The 'mustashe' package is great for this kind of problem. In addition to caching the results, it also can include links to dependencies so that the code is re-run if the dependencies change.
Disclosure: I wrote this tool ('mustashe'), though I do not make any financial gains from others using it. I made it for this exact purpose for my own work and want to share it with others.
Below is a simple example. The foo variable is created and "stashed" for later. If the same code is re-run, the foo variable is loaded from disk and added to the global environment.
library(mustashe)
stash("foo", {
foo <- some_long_running_opperation(1e3)
}
#> Stashing object.
The documentation has additional examples of more complex use-cases and a detailed explanation of how it works under the hood.

Resources