for example
data <- read.csv ("data.csv")
a <- mean(data)
b <- sd(data)
and I save the workspace, and then quit.
Later, I open this workspace and forget what a and b were.
I want R to show me that a is mean of the data and b is standard deviation of the data.
How do I do that?
Thank you.
You could always store some attributes with your data like so:
x <- 1:10
a <- mean(x)
attr(a,"info") <- "mean of x"
> a
[1] 5.5
attr(,"info")
[1] "mean of x"
> attributes(a)
$info
[1] "mean of x"
An alternative noted by #mnel below is to use comment. These will not be printed by default but can be accessed later in a similar fashion like so:
comment(a) <- "mean of x"
> comment(a)
[1] "mean of x"
A suggestion is to use the script feature of the R environment, rather than typing directly commands in the console.
The idea is that you can type commands, comments and even gibberish text (stuff that doesn't conform to R syntax), in a script window, and using Ctrl-R (or one of Run commands from the Edit menu) you send the the current line, or whatever portion of the text that is currently selected, to the R Console window (just like if had typed it directly there).
In this fashion, you can:
add voluminous comments as to the nature of the variables that you create
save the script along with the environment or independently.
In addition to implicitly saving a memory of the genesis of the variables, the scripts have several advantages, in particular they can save a lot of typing and they can also allow to recreate everything "from scratch", verbatim or with a few modifications.
In general you won't be able to find out how an object was created from the object itself. Some object types will have a call element that may save the call used to create them.
lm objects have this property.
eg
dd <- data.frame(y=runif(10), x= rnorm(10))
model <- lm(y~x,dd)
model$call
lm(formula = y ~ x, data = dd)
In this case mean and sd will not as they will return atomic vectors.
You could look at the history to see if you can find commands that created them (this is not ideal, it is dependent your IDE and how some environmental variables are set up).
Rstudio has a history tab that shows some subset of the previous commands called within a project.
You may also be able to press the up key (this works in the RGui on windows at least), to scroll through the previously called commands.
These commands based on the history require that you used the same computer and version of R.
Reproducible research or literate programming are the best ways to overcome these issues.
Related
I am trying to create a plot and eventually save it as a file. But because I am making a lot of changes and want to test it out, I want to be able to view and save the plot at the same time. I have looked at this page to do what I want to do but in my system, it does not seem to be working as it is supposed to.
Here are my codes:
png('Save.png')
sample.df <- data.frame(group = c('A','B','A','C','B','A','A','C','B','C','C','C','B'),
X = c(2,11,3,4,1,6,3,7,5,9,10,2,8),
Y = c(3,8,5,2,7,9,3,6,6,1,3,4,10))
plot(Y ~ X, data = sample.df)
dev.copy(png, 'Save.png')
dev.off()
There are several issues (I am new to R so I might be missing something entirely):
(1) When I use png(), I cannot view the plot in RStudio so I used dev.copy() but it does not allow me to view my plot in R studio
(2) Even after I use dev.off(), I cannot view the saved file until I close the RStudio (says "Windows Photo Viewer can't open this picture because the picture is being edited in another program"). I need to restart every time so it is very inconvenient.
What am I doing wrong and how could I view and view saved file without restarting RStudio every time? Thank you in advance!
Addition
Based on Love Tätting's comments, when I run dev.list(), this is what I get.
> png('Save.png')
>
> sample.df <- data.frame(group = c('A','B','A','C','B','A','A','C','B','C','C','C','B'),
+ X = c(2,11,3,4,1,6,3,7,5,9,10,2,8),
+ Y = c(3,8,5,2,7,9,3,6,6,1,3,4,10))
>
> plot(Y ~ X, data = sample.df)
>
> dev.copy(png, 'Save.png')
png
3
> dev.off()
png
2
> dev.list()
png
2
> dev.off()
null device
1
> dev.list()
NULL
Why do I not get RStudioGD?
RStudio has its own device, "RStudioGD". You can see it with dev.list(), where it by default is the first and only one.
R's design for decoupling rendering and backend is by the abstraction of devices. Which ones you can use is platform and environment dependent. dev.list() shows the stack of current devices.
If I understand your problem correctly you want to display the graph first in RStudio, and then decide whether you want to save it or not. Depending on how often you save th image you could use the 'export' button in the plot pane in RStudio and save it manually.
Otherwise, your choice of trying to copy it would be the obvious one for me as well.
To my knowledge the device abstraction in R does not allow one to encapsulate the device as an object, so one for example could make it an argument to a function that does the actual plot. Since dev.set() takes an index as argument, passing the index as argument will be dependent on state of the stack of devices.
I have not come up with a clean solution to this myself and have sometimes retorted to bracketing the plot rendering code with a call to a certain device and saving it right after, and switching device depending on a global.
So, if you can, use RStudios export functionality, otherwise an abstraction would need to maintain the state of the global stack of devices and do extensive testing of its state as it is global and you cannot direct a plot call to a certain device, it simply plots to the current device (to my knowledge).
Edit after OP comment
It seems that it is somewhat different behaviour you are experiencing if you cannot watch the file after dev.off, but also need to quit RStudio. For some type of plot frameworks there is a need to call print on the graphical object to have it actually print to the file. Perhaps this is done by RStudio at shutdown as part of normal teardown procedures of open devices? In that ase the file should be empty if you forcibly look in its contents before quiting RStudio.
The other thing that sometimes work is to call dev.off twice. I don't know exactly why, but sometimes more devices get created than I have anticipated. After you have done dev.off, what does dev.list show?
Edit after OP's edit
I can see that you do, png(); dev.copy(); dev.off(). This will leave you with one more device opened than closed. You will still have the first graphics device that you started open as can be seen when you do the listing. You can simply remove dev.copy(). The image will be saved on dev.off() and should be able to open from the filesystem.
As to why you cannot see the RStudio graphics device, I am not entirely sure. It might be that other code is messing with your device stack. I would check in a clean session if it is there to make sure other code isn't tampering with the device stack. From RStudio forums and other SO questions there seem to have been plot pane related problems in RStudio that have resolved after updating RStudio to the latest. If that is a viable solution for you I would try that.
I've just added support for RStudio's RStudioGD device to the developer's version of R.devices package (I'm the author). This will allow you to do the following in RStudio:
library("R.devices")
sample.df <- data.frame(
group = c('A','B','A','C','B','A','A','C','B','C','C','C','B'),
X = c(2,11,3,4,1,6,3,7,5,9,10,2,8),
Y = c(3,8,5,2,7,9,3,6,6,1,3,4,10)
)
figs <- devEval(c("RStudioGD", "png"), name = "foo", {
plot(Y ~ X, data = sample.df)
})
You can specify any set of output target types, e.g. c("RStudioGD", "png", "pdf", "x11"). The devices that output to file will by default write the files in folder figures/ with filenames as <name>.<ext>, e.g. figures/foo.png in the above example.
The value of the call, figs, holds references to all figures produced, e.g. figs$png. You can open them directly from R using the operator !. For example:
> figs$png
[1] "figures/foo.png"
> !figs$png
[1] "figures/foo.png"
The latter call should show the PNG file using your system's PNG viewer.
Until I submit these updates to CRAN, you can install the developer's version (2.15.1.9000) as:
remotes::install_github("HenrikBengtsson/R.devices#develop")
Question:
I'm using sys.source to source a script's output into a new environment. However, that script itself source()'s some things as well.
When it sources functions, they (and their output) get loaded into R_GlobalEnv instead of into the environment specified by sys.source(). It seems the functions enclosing and binding environments end up being under R_GlobalEnv instead of what you specify in sys.source().
Is there a way like sys.source() to run a script and keep everything it makes in a separate environment? An ideal solution would not require modifying the scripts I'm sourcing and still have "chdir = TRUE" style functionality.
Example:
Running this should show you what I mean:
# setup an external folder
other.folder = tempdir()
# make a functions script, it just adds "1" to the argument.
# Note: the strange-looking "assign(x=" bit is important
# to what I'm actually doing, so any solution needs to be
# robust to this.
functions = file.path(other.folder, "functions.R")
writeLines("myfunction = function(a){assign(x=c('function.output'), a+1, pos = 1)}", functions)
# make a parent script, which source()'s functions.R
# and invokes it on some data, and then modifies that data
parent = file.path(other.folder, "parent.R")
writeLines("source('functions.R')\n
original.data=1\n
myfunction(original.data)\n
resulting.data = function.output + 1", parent)
# make a separate environment
myenv = new.env()
# source parent.R into that new environment,
# using chdir=TRUE so parent.R can find functions.R
sys.source(parent, myenv, chdir = TRUE)
# You can see "myfunction" and "function.output"
# end up in R_GlobalEnv.
# Whereas "original.data" and "resulting.data" end up in the intended environment.
ls(myenv)
More information (what I'm actually trying to do):
I have data from several similar experiments. I'm trying to keep everything in line with "reproducible research" ideals (for my own sanity if nothing else). So what I'm doing is keeping each experiment in its own folder. The folder contains the raw data, and all the metadata which describes each sample (treatment, genotype, etc.). The folder also contains the necessary R scripts to read the raw data, match it with metadata, process it, and output graphs and summary statistics. These are tied into a "mother script" which will do the whole process for each experiment.
This works really well but if I want to do some meta-analysis or just compare results between experiments there are some difficulties. Right now I am thinking the best way would be to run each experiment's "mother script" in its own environment, and then pull out the data from each environment to do my meta-analysis. An alternative approach might be running each mother script in its own instance and then saving the .RData files separately and then re-loading them into a new environment in a new instance. This seems kinda hacky though and I feel like there's a more elegant solution.
Is there a simple workflow to write tests that store objects as .rds or .rda so that future runs of a test can compare the result of code execution vs. the stored object? This would make it easy to check that functions that return somewhat complex values are still behaving as they should.
For example, something like:
test_obj(res <- lm(y ~ x, data.frame(x=1:3, y=5:7)))
which, if *extdata/test_obj.res.rds* doesn't exist, would create it in *inst/extdata/test_obj.res.rds*, with res from above, but if it does exist, would identical/all.equal etc. the newly generated object with the one recovered from the rds.
I would find such tests super useful, and I am a bit surprised that RUnit/svUnit / testthat don't implement something of the sort (I'm hoping they do, and I just haven't found it).
testthat::make_expectation is close, but I'd prefer to have an automated store/retrieve rds rather than copy paste the text representation to a file, which I think is how you're supposed to use testthat::make_expectation (I guess I could pipe stdout() to a .R file, but even then there is a bit of automation that could facilitate the process).
It only took me three years, but I wrote unitizer to resolve this issue. It is a unit testing framework with an interactive UI that allows you to review test output and store it / reject it with a single keystroke. It also streamlines the update/test/debug cycle by showing you a proper diff of failing tests, and dropping you into those tests evaluation environments for debugging in the interactive UI.
For example, if we have a matrix rotation function (courtesy #MatthewLundberg) we want to test:
# mx-rotate.R
rotate <- function(x) t(apply(x, 2, rev))
And a script with some tests:
# mx-test.R
mx <- matrix(1:9, 3)
rotate(mx)
rotate(rotate(mx))
rotate(rotate(rotate(mx)))
Then:
library(unitizer)
unitize('mx-test.R')
Will kick-off an interactive session that will allow you to review the results of the three rotation calls and accept them as tests if they work as expected.
There is a screencast demo available.
As of 2017, testthat has the feature expect_equal_to_reference, which does exactly what the question asks. I guess Hadley W. figured out a way.
I have what I think is a common enough issue, on optimising workflow in R. Specifically, how can I avoid the common issue of having a folder full of output (plots, RData files, csv, etc.), without, after some time, having a clue where they came from or how they were produced? In part, it surely involves trying to be intelligent about folder structure. I have been looking around, but I'm unsure of what the best strategy is. So far, I have tackled it in a rather unsophisticated (overkill) way: I created a function metainfo (see below) that writes a text file with metadata, with a given file name. The idea is that if a plot is produced, this command is issued to produce a text file with exactly the same file name as the plot (except, of course, the extension), with information on the system, session, packages loaded, R version, function and file the metadata function was called from, etc. The questions are:
(i) How do people approach this general problem? Are there obvious ways to avoid the issue I mentioned?
(ii) If not, does anyone have any tips on improving this function? At the moment it's perhaps clunky and not ideal. Particularly, getting the file name from which the plot is produced doesn't necessarily work (the solution I use is one provided by #hadley in 1). Any ideas would be welcome!
The function assumes git, so please ignore the probable warning produced. This is the main function, stored in a file metainfo.R:
MetaInfo <- function(message=NULL, filename)
{
# message - character string - Any message to be written into the information
# file (e.g., data used).
# filename - character string - the name of the txt file (including relative
# path). Should be the same as the output file it describes (RData,
# csv, pdf).
#
if (is.null(filename))
{
stop('Provide an output filename - parameter filename.')
}
filename <- paste(filename, '.txt', sep='')
# Try to get as close as possible to getting the file name from which the
# function is called.
source.file <- lapply(sys.frames(), function(x) x$ofile)
source.file <- Filter(Negate(is.null), source.file)
t.sf <- try(source.file <- basename(source.file[[length(source.file)]]),
silent=TRUE)
if (class(t.sf) == 'try-error')
{
source.file <- NULL
}
func <- deparse(sys.call(-1))
# MetaInfo isn't always called from within another function, so func could
# return as NULL or as general environment.
if (any(grepl('eval', func, ignore.case=TRUE)))
{
func <- NULL
}
time <- strftime(Sys.time(), "%Y/%m/%d %H:%M:%S")
git.h <- system('git log --pretty=format:"%h" -n 1', intern=TRUE)
meta <- list(Message=message,
Source=paste(source.file, ' on ', time, sep=''),
Functions=func,
System=Sys.info(),
Session=sessionInfo(),
Git.hash=git.h)
sink(file=filename)
print(meta)
sink(file=NULL)
}
which can then be called in another function, stored in another file, e.g.:
source('metainfo.R')
RandomPlot <- function(x, y)
{
fn <- 'random_plot'
pdf(file=paste(fn, '.pdf', sep=''))
plot(x, y)
MetaInfo(message=NULL, filename=fn)
dev.off()
}
x <- 1:10
y <- runif(10)
RandomPlot(x, y)
This way, a text file with the same file name as the plot is produced, with information that could hopefully help figure out how and where the plot was produced.
In terms of general R organization: I like to have a single script that recreates all work done for a project. Any project should be reproducible with a single click, including all plots or papers associated with that project.
So, to stay organized: keep a different directory for each project, each project has its own functions.R script to store non-package functions associated with that project, and each project has a master script that starts like
## myproject
source("functions.R")
source("read-data.R")
source("clean-data.R")
etc... all the way through. This should help keep everything organized, and if you get new data you just go to early scripts to fix up headers or whatever and rerun the entire project with a single click.
There is a package called Project Template that helps organize and automate the typical workflow with R scripts, data files, charts, etc. There is also a number of helpful documents like this one Workflow of statistical data analysis by Oliver Kirchkamp.
If you use Emacs and ESS for your analyses, learning Org-Mode is a must. I use it to organize all my work. Here is how it integrates with R: R Source Code Blocks in Org Mode.
There is also this new free tool called Drake which is advertised as "make for data".
I think my question belies a certain level of confusion. Having looked around, as well as explored the suggestions provided so far, I have reached the conclusion that it is probably not important to know where and how a file is produced. You should in fact be able to wipe out any output, and reproduce it by rerunning code. So while I might still use the above function for extra information, it really is a question of being ruthless and indeed cleaning up folders every now and then. These ideas are more eloquently explained here. This of course does not preclude the use of Make/Drake or Project Template, which I will try to pick up on. Thanks again for the suggestions #noah and #alex!
There is also now an R package called drake (Data Frames in R for Make), independent from Factual's Drake. The R package is also a Make-like build system that links code/dependencies with output.
install.packages("drake") # It is on CRAN.
library(drake)
load_basic_example()
plot_graph(my_plan)
make(my_plan)
Like it's predecessor remake, it has the added bonus that you do not have to keep track of a cumbersome pile of files. Objects generated in R are cached during make() and can be reloaded easily.
readd(summ_regression1_small) # Read objects from the cache.
loadd(small, large) # Load objects into your R session.
print(small)
But you can still work with files as single-quoted targets. (See 'report.Rmd' and 'report.md' in my_plan from the basic example.)
There is package developed by RStudio called pins that might address this problem.
I do a lot of data exploration in R and I would like to keep every plot I generate (from the interactive R console). I am thinking of a directory where everything I plot is automatically saved as a time-stamped PDF. I also do not want this to interfere with the normal display of plots.
Is there something that I can add to my ~/.Rprofile that will do this?
The general idea is to write a script generating the plot in order to regenerate it. The ESS documentation (in a README) says it well under 'Philosophies for using ESS':
The source code is real. The objects are realizations of the
source code. Source for EVERY user modified object is placed in a
particular directory or directories, for later editing and
retrieval.
With any editor allows stepwise (or regionwise) execution of commands you can keep track of your work this way.
The best approach is to use a script file (or sweave or knitr file) so that you can just recreate all the graphs when you need them (into a pdf file or other).
But here is the start of an approach that does the basics of what you asked:
savegraphs <- local({i <- 1;
function(){
if(dev.cur()>1){
filename <- sprintf('graphs/SavedPlot%03d.pdf', i)
dev.copy2pdf( file=filename )
i <<- i + 1
}
}
})
setHook('before.plot.new', savegraphs )
setHook('before.grid.newpage', savegraphs )
Now just before you create a new graph the current one will be saved into the graphs folder of the current working folder (make sure that it exists). This means that if you add to a plot (lines, points, abline, etc.) then the annotations will be included. However you will need to run plot.new in order for the last plot to be saved (and if you close the current graphics device without running another plot.new then that last plot will not be saved).
This version will overwrite plots saved from a previous R session in the same working directory. It will also fail if you use something other than base or grid graphics (and maybe even with some complicated plots then). I would not be surprised if there are some extra plots on occasion that show up (when internally a plot is created to get some parameters, then immediatly replaced with the one of interest). There are probably other things that I have overlooked as well, but this might get you started.
you could write your own wrapper functions for your commonly used plot functions. This wrapper function would call both the on-screen display and a timestamped pdf version. You could source() this function in your ~/.Rprofile so that it's available every time you run R.
For latice's xyplot, using the windows device for the on-screen display:
library(lattice)
my.xyplot <- function(...){
dir.create(file.path("~","RPlots"))
my.chart <- xyplot(...)
trellis.device(device="windows",height = 8, width = 8)
print(my.chart)
trellis.device(device = "pdf",
file = file.path("~", "RPlots",
paste("xyplot",format(Sys.time(),"_%Y%m%d_%H-%M-%S"),
".pdf", sep = "")),
paper = "letter", width = 8, height = 8)
print(my.chart)
dev.off()
}
my.data <- data.frame(x=-100:100)
my.data$y <- my.data$x^2
my.xyplot(y~x,data=my.data)
As others have said, you should probably get in the habit of working from an R script, rather than working exclusively from the interactive terminal. If you save your scripts, everything is reproducible and modifiable in the future. Nonetheless, a "log of plots" is an interesting idea.