Access cacheSweave objects from cache - r

I use Sweave and cacheSweave with pleasure.
Occasionally I have a document in which some section takes a really long time (like 20 minutes or something) and after processing I'd like to open up the objects it created and play around with them in an interactive session.
Does anyone know a method for doing that, presumably by fetching directly against the stashR database or something?

I would have preferred to put this as a comment, but i dont have an account here, so anyway.
The easiest way to do this is probably to put save.image() statements at strategic points in the .Rnw file, so that all objects created at that point are saved. Then, one can open up a new instance of R and interact with the objects without altering the sweave file. HTH.

Related

Is there a way of rendering rmarkdown to an object directly without saving to disk?

We are working on a project where following an analysis of data using R, we use rmarkdown to render an html report which will be returned to users uploading the original dataset. This will be part of an online complex system involving multiple steps. One of the requirements is that the rmarkdown html will be serialized and saved in a SQL database for the system to return to users.
My question is - is there a way to render the markdown directly to an object in R to allow for direct serialisation? We want to avoid saving to disk unless absolutely needed as there will be multiple parallel processes doing similar tasks and resources might be limited. From my reasearch so far it doesn't seem possible, but would appreciate any insight.
You are correct, it's not possible due to the architecture of rmarkdown.
If you have this level of control over your server, you could create a RAM disk, using part of your the memory to simulate a hard drive. The actual hard disk won't be used.

Saving workspace image, in R

When closing R Studio at the end of a R session, I am asked via a dialog box: "Save workspace image to [working directory] ?"
What does that mean? If I choose to save the workspace image, where is it saved? I always choose not to save the workspace image, are there any disadvantages to save it?
I looked at stackoverflow but did not find posts explaining what does the question mean? I only find a question about how to disable the prompt (with no simple answers...): How to disable "Save workspace image?" prompt in R?
What does that mean?
It means that R saves a list of objects in your global environment (i.e. where your normal work happens) into a file. When R next loads, this list is by default restored (at least partially — there are cases where it won’t work).
A consequence is that restarting R does not give you a clean slate. Instead, your workspace is cluttered with existing stuff, which is generally not what you want. People then resort to all kinds of hacks to try to clean their workspace. But none of these hacks are reliable, and none are necessary if you simply don’t save/restore your workspace.
If I choose to save the workspace image, where is it saved?
R creates a (hidden) file called .RData in your current working directory.
I always choose not to save the workspace image, are there any disadvantages to save it?
The advantage is that, under some circumstances, you avoid recomputing results when you continue your work later. However, there are other, better ways of achieving this. On the flip side, starting R without a clean slate has many disadvantages: Any new analysis you now start won’t be in a clean room, and it won’t be reproducible when executed again.
So you are doing the right thing by not saving the workspace! It’s one of the rules of creating reproducible R code. For more information, I recommend Jenny Bryan’s article on using R with a Project-oriented workflow
But having to manually reject saving the workspace every time is annoying and error-prone. You can disable the dialog box in the RStudio options.
The workspace will include any of your saved objects e.g. dataframes, matrices, functions etc.
Saving it into your working directory will allow you to load this back in next time you open up RStudio so you can continue exactly where you left off. No real disadvantage if you can recreate everything from your script next time and if your script doesn't take a long time to run.
The only thing I have to add here is that you should consider seriously that some people may be working on ongoing projects, i.e. things that aren't accomplished in one day and thus must save their workspace image so as to not start from the beginning again.
I think, best practice is: its ok to save your workspace, but your code only really works if you can clear your entire workspace and then rerun it completely with no errors!

How to access the script/source history in RStudio?

I would like to access the history of what have been typed in the source panel in RStudio.
I'm interested in the way we learn and type code. Three things I would like to analyse are: i) the way a single person type code, ii) how different persons type code, iii) the way a beginner improve typing.
Grabbing the history of commands is quite satisfying as first attempt in this way but I would like to reach a finer granularity and thus access the successive changes, within a single line in a way.
So, to be clear, I'm neither looking for the history of commands or for a diff between different versions of and .R file.
What I would like to access is really the successive alterations to the source panel that are visible when you recursively press Ctrl+Z. I do not know if there is a more accurate word for what I describe, but again what I'm interested in is how bits of code are added/moved/deleted/corrected/improved in the source panel but not necessary passed to the Console and thus absent from the history of command.
This must be somewhere/somehow saved by RStudio as it is accessible by the later. This may be saved in a quite hidden/private/memory/process/... way and I have a very vague idea of how a GUI works. I do not know it if would be easily accessible, then programmaticaly analyzed, typically if we could save a file from it. Timestamps would be the cherry on top but I would be happy without.
Do you have idea how to access this history?
RStudio's source panel is essentially a view to an Ace Editor. As such you'd need to access the editor session's editSession and use getDocument or getWordRange along with the undo of the editSession's undoManager instance.
I don't think you'll be doing that from within RStudio without hacking on the RStudio code unless the RStudio Addin api is made to pass-thru editor events in the future.
It might be easier to write a session recorder as changes are made rather than try to mess with the undo history. I imagine you could write an Addin that calls a javascript to communicate over the existing RStudio port using the Ace Editor's events (ie. onChange).
As #GegznaV said, RStudio saves code history to a ".RHistory" file. It's in the "Documents" folder on my computer. It's probably the same folder if you're using Windows. If you do not know the location, you can find the file by searching.
It also allows saving RStudio history to a file manually. There is a "Save" button in the History panel. So you can also get a timestamp. You can ask different users to save their code history after they have finished writing code. It may be indirectly useful to your research.

How to directly work with a data.frame in physical RData?

Do I have to
1) load a data.frame from the physical RData to the memory,
2) make changes,
3) save it back to the physical RData,
4) remove it from the memory to avoid conflicts?
Is there anyway I can skip the load/save steps and make permanent changes to the physical RData directly? Is there a way to work with data.frame like the way working with a SQLite/MySQL database? Or should I just use SQLite/MySQL (instead of data.frame) as the data storage?
More thoughts: I think the major difference is that to work with SQLite/MySQL you establish a connection to the database, but to work with data.frame from RData you make a copy in the memory. The later approach can create conflicts in complex programs. To avoid potential conflicts you have to save the data.frame and immediately remove it from the memory every time you change it.
Thanks!
Instead of using load you may want to consider using attach. This can attach the saved data object to the search path without loading all the objects in it into the global environment. The data frame would then be available to use.
If you want to change the data frame then you would need to copy it to the global environment (will happen automatically for most editing) and then you would need to save it again (there would not be a simple way to save it into a .Rdata file that contains other objects).
When you are done you can use detach (but if you have made a copy in the global environment then you will still need to delete that copy).
If you don't like typing the load/save commands (or attach/detach) each time then you could write your own function that goes through all the steps for you (and if the copy is only in the environment of the function then you don't need to worry about deleting it).
You may also want to consider different ways of storing your data. The typical .Rdata file works well for an all or nothing approach. The saveRDS and readRDS functions will save and read a single object (and do not force you to use the same name when reading it back in). The interfacing with a database approach is probably the best if you are making frequent changes to tables and want them stored outside of R.

Store map key/values in a persistent file

I will be creating a structure more or less of the form:
type FileState struct {
LastModified int64
Hash string
Path string
}
I want to write these values to a file and read them in on subsequent calls. My initial plan is to read them into a map and lookup values (Hash and LastModified) using the key (Path). Is there a slick way of doing this in Go?
If not, what file format can you recommend? I have read about and experimented with with some key/value file stores in previous projects, but not using Go. Right now, my requirements are probably fairly simple so a big database server system would be overkill. I just want something I can write to and read from quickly, easily, and portably (Windows, Mac, Linux). Because I have to deploy on multiple platforms I am trying to keep my non-go dependencies to a minimum.
I've considered XML, CSV, JSON. I've briefly looked at the gob package in Go and noticed a BSON package on the Go package dashboard, but I'm not sure if those apply.
My primary goal here is to get up and running quickly, which means the least amount of code I need to write along with ease of deployment.
As long as your entiere data fits in memory, you should't have a problem. Using an in-memory map and writing snapshots to disk regularly (e.g. by using the gob package) is a good idea. The Practical Go Programming talk by Andrew Gerrand uses this technique.
If you need to access those files with different programs, using a popular encoding like json or csv is probably a good idea. If you just have to access those file from within Go, I would use the excellent gob package, which has a lot of nice features.
As soon as your data becomes bigger, it's not a good idea to always write the whole database to disk on every change. Also, your data might not fit into the RAM anymore. In that case, you might want to take a look at the leveldb key-value database package by Nigel Tao, another Go developer. It's currently under active development (but not yet usable), but it will also offer some advanced features like transactions and automatic compression. Also, the read/write throughput should be quite good because of the leveldb design.
There's an ordered, key-value persistence library for the go that I wrote called gkvlite -
https://github.com/steveyen/gkvlite
JSON is very simple but makes bigger files because of the repeated variable names. XML has no advantage. You should go with CSV, which is really simple too. Your program will make less than one page.
But it depends, in fact, upon your modifications. If you make a lot of modifications and must have them stored synchronously on disk, you may need something a little more complex that a single file. If your map is mainly read-only or if you can afford to dump it on file rarely (not every second) a single csv file along an in-memory map will keep things simple and efficient.
BTW, use the csv package of go to do this.

Resources