My Rmarkdown scripts are getting quite large lately. Especially the code inside the R-chunk which makes over-viewing the whole script more and more tricky. Luckily in RStudio there's the functionality to close resp. minimize the code chunk to one line! However when chunks are becoming more, it takes time to close them all by hand.
Question: Is there a feature to close them all at once? Say when starting to work on the script and then reopen single chunks when needed.
PS: Wasn't sure to post this as a feature request on github or here.
Edit > Folding > Collapse All
Alternatively, (on Windows) Alt+O.
Related
I'm not sure I can include reproducible code on this, given there's 4,000 lines of code and that may be part of the problem, but let me try to explain my question the best I can:
I love using beepr to play an audible sound when a bunch of code is done processing. If my computer is taking a while to run it, I'll go look at a different screen or do something else in the room when its thinking.
I have a large .rmd file. Its 4187 lines long and beep() is on line 4185. I made sure it was nowhere else in the document using ctrl+f. When I "run all", the beep goes off when I'm about this far through the document:
And then it'll continue thinking for another few minutes before its done. This defeats the entire purpose of beepr().
So I guess my question is: is this a known problem? Is there anything particular to a .rmd document that does this? Any known fixes?
{knitr} manpage says:
This function takes an input file, extracts the R code in it according to a list of patterns, evaluates the code and writes the output in another file.
So the thing you are observing is due to the fact all R code in the .rmd gets evaluated before the whole process is finished. The sound plays when the beepr line is executed, since this will happen (rcode chunk) before the document is processed by pandoc (or similar) i would just advise you to put the beeper outside of the .rmd itself to trigger it after the process finished. write a 3 line r sript:
knit("my.rmd")
Sys.sleep(1)
beepr()
This makes sure the beep will only start after the document is created (Sys.sleep just to make sure and prob. not necessary)
Despite numerous searches, I can't seem to find a clear explanation as to what "Source on Save" means in RStudio.
I have tried ?source and the explanation there isn't clear, either.
As far as I can tell, it seems to run the script when I hit Save, but I don't understand the relevance/significance of it.
In simple terms, what exactly does Source on Save do and why would/should I use it?
This is kind of a shortcut to save and execute your code. You type something, save the script and it will be automatically sourced.
Very useful for short scripts but very annoying for time consuming longer scripts.
So sourcing is basically running each line of your file.
EDIT:
SO thinking of a scenario where this might be useful...
You developing a function which you will later put into a package... So you write this function already in an extra file but execute the function for testing in the command line...
Normally, you have to execute the whole function again, when you changed something. While using "Source on Save" the function will be executed and you can use Ctrl + 2 to jump into command line and test the function directly.
Since I am working with R, my datasets are much bigger. But I am remembering starting coding in python and vi, I updated my setting in a way to execute the code on save, since these little scripts where done in less then 10 seconds...
So maybe it is just not standard to work with small datasets... But I can still recommend it, for development, to use only 10% of a normal dataset. It will speed up the graphics creation and a lot of other things as well. Test it with the complete dataset every now and then.
I have a script with self-written functions (no plots). When I copy-paste that script into the R-Studio console, it takes ages to execute, but when I use source("Helperfunctions.R") it doesn't take more than a second.
Question: Where does the difference in speed come from?
I am aware of two differences between running code via the source() function vs. entering code at the R-Studio console:
From ?source:
Since expressions are not executed at the top level, auto-printing is not done.
The way I understand this: source() will not plot graphs (unless made specific with e.g. print(plot)), while the R Studio console codes will always plot graphs. I'm sure this will affect the speed of execution to a certain degree, but this seems irrelevant in my case, because there are barely any plot calls.
And:
(...) the complete file is parsed before any of it is run
I have been working with R for a while now, but I'm not sure whether this relevant for the speed-issue I'm having. Is it possible that completely parsing all code "before any of it is run" speeds up the execution of my helper functions script by a factor of a hundred?
Edit: I'm using R version 3.2.3.
The issue is not source() vs. console line code. Instead, it is an issue of how RStudio sends code from the source pane to the console.
When I copy the content of Helperfunctions.R and run it in RGui (instead of RStudio), the code is executed with nearly the same speed as when I use source("Helperfunctions.R") in RStudio.
Apparently, lines of code always (?) require more execution time in RStudio than in RGui. Even though you may usually not notice the time-difference when executing a couple of lines in the console, it seems to make a huge difference when, say, 3.000 lines of code are being executed in the R Studio console at once.
My understanding is that upon using source("Helperfunctions.R") in the RStudio source pane, the code is not actually sent to the RStudio console (which would have been slow), but is actually executed directly in the R language.
When I run a large R scripts (works nicely as expected, basically produces a correct PDF at the end of the script (base plotting plus beeswarm, last line of script is dev.off()), I notice that the PDF is finished after ~3 seconds and can even be opened in other applications, long before the console output (merely few integer values and echo of code ~400 lines) is finished (~20 seconds). There are no errors reported. In between, the echo stops and does nothing for seconds.
I work with R Studio V0.97.551, R version 3.0.1, on Win-7.
gc() or close and restart R did not help, and the data structures used are not big anyway (5 dataframes with up to 60 obs and 64 numeric or short character variables). The available memory should be sufficient (according to task manager, around 4 GB throughout), but CPU is busy during that time.
I agree this is not reproducible for other people w/o the script, which is however too large to post, but maybe someone has experienced the same problem or even an explanation or suggestion what to check? Thanks in advance!
EDIT:
I run exactly the same code directly in R 3.0.1 (w/o RStudio), and the problem was gone, suggesting the problem is related to RStudio. I added the tag RStudio, but I am not sure if I am now supposed to move this question somewhere else?
Recently I came across similar problem--running from RStudio becomes very slow, even when it is executing something as simple as example('plot'). After searching around, this post pointed me to the right place that eventually led to a workaround: resetting RStudio by renaming the "RStudio-Desktop Directory". The exact way to do so depends upon the OS you are using, and you could find the detail instruction here. I just tried it, and it works.
I understandably broke cache when updating a chunk (however the result should be the same, it was cosmetic changes). However, I do not want to run the chunk again because it takes 1 week to run. How can I change the cache so that the new code thinks the cache holds?
I think I just need to change the file names in the cache folder. But I don't know what to change them to without running the code because knitr only writes the files after successful completion of the chunk.
Another motivation is that knitr cache can be invalidated when using different knitr versions. This happened to me between 1.5 and 1.5.33, the development versions. Also see here: R knitr: is it possible to use cached results across different machines?. I think if I find a solution to the above that can help with this.
Using the knitr cache to store the results of a week-long simulation sounds a bit crazy susceptible to disaster.
My suggestion for a safer workflow is:
Run the simulation and store the results in a file (csv, rda, whatever is suitable).
Load that data inside a chunk (probably with echo = FALSE) near the start of your knitr report.
Now simulating and reporting are decoupled.