I would like to access the history of what have been typed in the source panel in RStudio.
I'm interested in the way we learn and type code. Three things I would like to analyse are: i) the way a single person type code, ii) how different persons type code, iii) the way a beginner improve typing.
Grabbing the history of commands is quite satisfying as first attempt in this way but I would like to reach a finer granularity and thus access the successive changes, within a single line in a way.
So, to be clear, I'm neither looking for the history of commands or for a diff between different versions of and .R file.
What I would like to access is really the successive alterations to the source panel that are visible when you recursively press Ctrl+Z. I do not know if there is a more accurate word for what I describe, but again what I'm interested in is how bits of code are added/moved/deleted/corrected/improved in the source panel but not necessary passed to the Console and thus absent from the history of command.
This must be somewhere/somehow saved by RStudio as it is accessible by the later. This may be saved in a quite hidden/private/memory/process/... way and I have a very vague idea of how a GUI works. I do not know it if would be easily accessible, then programmaticaly analyzed, typically if we could save a file from it. Timestamps would be the cherry on top but I would be happy without.
Do you have idea how to access this history?
RStudio's source panel is essentially a view to an Ace Editor. As such you'd need to access the editor session's editSession and use getDocument or getWordRange along with the undo of the editSession's undoManager instance.
I don't think you'll be doing that from within RStudio without hacking on the RStudio code unless the RStudio Addin api is made to pass-thru editor events in the future.
It might be easier to write a session recorder as changes are made rather than try to mess with the undo history. I imagine you could write an Addin that calls a javascript to communicate over the existing RStudio port using the Ace Editor's events (ie. onChange).
As #GegznaV said, RStudio saves code history to a ".RHistory" file. It's in the "Documents" folder on my computer. It's probably the same folder if you're using Windows. If you do not know the location, you can find the file by searching.
It also allows saving RStudio history to a file manually. There is a "Save" button in the History panel. So you can also get a timestamp. You can ask different users to save their code history after they have finished writing code. It may be indirectly useful to your research.
Related
I am in the midst of writing some scripts to perform data analysis on large excel sheets faster than by hand. However, my company has a strict quality review system where the program used needs to be validated and secure (i.e. no one can edit it, there is proof of what code was run, etc.). So essentially I would like my code to be able to be ran by my coworkers without them being able to edit the script. I was also interested in inserting prompts that they can fill in (e.g. "Which column would you like to analyze?")
Is all of this possible? I have read a few things online about file permissions but I know that these can easily be changed by the user. I also read about obfuscators but am entirely unfamiliar with their use.
One thought I have is to use Rmarkdown as a method of displaying which lines were run for which results. However, I believe that document could be edited as well? This would also leave the issue of the script itself being able to be edited.
When closing R Studio at the end of a R session, I am asked via a dialog box: "Save workspace image to [working directory] ?"
What does that mean? If I choose to save the workspace image, where is it saved? I always choose not to save the workspace image, are there any disadvantages to save it?
I looked at stackoverflow but did not find posts explaining what does the question mean? I only find a question about how to disable the prompt (with no simple answers...): How to disable "Save workspace image?" prompt in R?
What does that mean?
It means that R saves a list of objects in your global environment (i.e. where your normal work happens) into a file. When R next loads, this list is by default restored (at least partially — there are cases where it won’t work).
A consequence is that restarting R does not give you a clean slate. Instead, your workspace is cluttered with existing stuff, which is generally not what you want. People then resort to all kinds of hacks to try to clean their workspace. But none of these hacks are reliable, and none are necessary if you simply don’t save/restore your workspace.
If I choose to save the workspace image, where is it saved?
R creates a (hidden) file called .RData in your current working directory.
I always choose not to save the workspace image, are there any disadvantages to save it?
The advantage is that, under some circumstances, you avoid recomputing results when you continue your work later. However, there are other, better ways of achieving this. On the flip side, starting R without a clean slate has many disadvantages: Any new analysis you now start won’t be in a clean room, and it won’t be reproducible when executed again.
So you are doing the right thing by not saving the workspace! It’s one of the rules of creating reproducible R code. For more information, I recommend Jenny Bryan’s article on using R with a Project-oriented workflow
But having to manually reject saving the workspace every time is annoying and error-prone. You can disable the dialog box in the RStudio options.
The workspace will include any of your saved objects e.g. dataframes, matrices, functions etc.
Saving it into your working directory will allow you to load this back in next time you open up RStudio so you can continue exactly where you left off. No real disadvantage if you can recreate everything from your script next time and if your script doesn't take a long time to run.
The only thing I have to add here is that you should consider seriously that some people may be working on ongoing projects, i.e. things that aren't accomplished in one day and thus must save their workspace image so as to not start from the beginning again.
I think, best practice is: its ok to save your workspace, but your code only really works if you can clear your entire workspace and then rerun it completely with no errors!
I'm processing data where the source is a manual update in Excel format which is provided monthly.
One of the quirk of the data is that some cancelled records are indicated either by the person keying in the data highlighting the cells in red, or changing the fonts to be strikeout. Unfortunately I have no control on the data entry source, so I have had to regularly manually search the file for red cells or strike out fonts, and manually clean them up (either delete, or add a column with status as Cancelled, depending on the usage).
Does anyone have suggestion on the best data cleaning practice for this? Is there an automated approach for this, or do I simply have to be resigned to documenting the steps, and executing them regularly?
For info my preferred tool is R, so if there is a way to clean it from within R, that would be best. I'm open to other approaches.
There are a few R packages for working with Excel, but filtering data based on formatting is going to involve using rcom and excel's COM interface. I'm not terribly familiar with either.
The route I would go is to write a VBA macro which would filter the data, wrap that macro in a VBS script, and call that script from the command line (or via R's system or shell functions)
The reason I would go that route is that both VBA and VBS are very easy to pick up if you have any familiarity at all with programming. COM on the other hand isn't something that people gain a level of comfort with very quickly.
VBA is what will give you access to the excel formatting. (Visual Basic for Applications). VBS is what you will need to automate the macro via the command line rather than from within Excel (Visual Basic Scripting Edition).
You can do this in Excel. Start by recording a macro, then put a colour based filter on the column you want (to select only the red cells), delete the rows and then stop recording of the macro.
This should give you a macro. You might need to make some small changes, you can google specific commands for more details.
I am doing some analysis in Rstudio and at the moment - as I am refreshing my knowledge of R after a few decades away from S - this involves writing lots of one-liner statements which operate on test datasets, and then inspecting/testing the output, then finally scaling it up when I've checked all the little bits work.
So my history is full of syntax errors and similar. But I am making progress every time I work, and each time I work there are statements that worked, that I want to save, in order to document the bits of the session that are worth saving. Is there any established way of extracting these from my history for re-use, in RStudio? Should I just scroll through after each session and copy and paste them into a textfile with a word processor? Or is there something more clever than that that I can do, staying within RStudio?
The easiest way to see your history, is to hit Ctrl-4 and that will bring up the history window. You can copy this to source and then edit it, or where ever. However, for what you are doing it is probably better to edit directly into a source window.
The setup I use is to have a script window open, and use ctrl-enter to run the current line.
To make this easier go into Tools>Options>Code Editing and ensure that "focus console after executing from source" is unchecked and your cursor will stay in the script after the line is executed.
You can now type your lines and edit them until they do what you want, then move on to the next when it works. Once you get to the end you have built up your script already. Also since your "history" is just their in front of you, it is much easier to skip back to older lines and rerun or modify them. If you want to run a block of code, simply highlight the block and hit ctrl-enter.
In the history panel in RStudio (top right panel), you can click "send to source" and it will copy the line you have selected over to whatever .R file you have open in the top left panel.
While I am writing .R functions I constantly need to manually write source("funcname.r") to get the changes reflected in the workspace. I am sure it must be possible to do this automatically. So what I would like would be just to make changes in my function, save the function and be able to use the new function in R workspace without manually "sourcing" this function. How can I do that?
UPDATE: I know about selecting appropriate lines of code and pressing CTRL+R in R Editor (RGui) or using Notepad++ and executing the lines into R. But this approach has a disadvantage of making my workspace console "muddled". I would like to stop this practice if at all possible.
You can use R studio which has a source on save option.
If you are prepared to package your functions into a package, you may enjoy exploring Hadley's devtools package. This provides a suite of tools to write, test and document
packages.
https://github.com/hadley/devtools
This approach offer many advantages, but mainly reloading the package with a minimum of retyping.
You will still have to type load_all("yourpackage") but I find this small amount of typing is small beer compared to the advantages of devtools.
For additional information, including how to setup devtools, have a look at https://github.com/hadley/devtools/wiki/development
If you're using Eclipse + StatET, you can press CTRL+R+S, which saves your script and sources it. As close to automatic as I can get.
If you can get your text editor to run a system command after it saves the file, then you could use something like AutoIt (on Windows) or a batch script (on UNIX-derivative) to pass a call to source off to all running copies of R. But that's a heck of a lot of work for not much gain.
Still, I think it's much more likely to work being event-driven on the text editor end vs. having R constantly scan for updates (or somehow interface with the OS's update-event-messaging-system).
This is likely not possible (automatically detecting disc changes without intervention or running at least one line).
R needs to read into memory functions, so a change on the disc wouldn't be reflected in the workspace without reloading your functions.
If you are into developing R functions, some amount of messiness during your development process will be likely inevitable, but perhaps I could suggest that you try writing an R-package to house your functions?
This has the advantage of being able to robustly document your functions, using lazy loading so that you have access to your functions/datasets immediately without sourcing them.
Don't be afraid of making a package, it's easy with package.skeleton() and doesn't have to go on CRAN but could be for your own personal use without distribution! Just have fun!
Try to accept some messiness during development knowing you are working towards your goal and fighting the good fight of code organization and documentation!
We are only imperfect people, in an imperfect world, but we mean well!