I'm working in an R Markdown document - after creating a number of outputs (ggplots, ggmaps, base r plots, kable tables) I then go on to work further down in the document, but after a period of time (maybe 10-20 minutes) I scroll back up to find my visual outputs gone. They are not hidden (using Expand/Collapse Output) but gone entirely, requiring me to rerun the cell to show them again.
This occurs after a period of time either when running indivudial cells or after restarting R and rerunning the entire file.
Is there a setting I'm missing in my notebook or in R? Many thanks
Related
Chunk output inline setting not working.
I started R for my daily work, and when ran a code (simply display a subset of a data frame) nothing is shown, neither inline nor on the console or viewer. I have set absolutely nothing differently, didn't even touch the settings. It says "Chunk output inline" anyway.
It was working normally for a year, and now from a day to the other, this happens. If I execute the subsetting in the console, then it displays the subset in the console. But if I execute in the chunk, no displaying anywhere.
Thanks in advance
I am new to R Markdown. Apologies if the question has an obvious answer that I missed.
Context:
I am working with a large dataset in R Markdown (roughly 90 million rows) to produce a short report. While working on the file formatting, I want to knit the final HTML document frequently (e.g., after making a change) to look at the formatting.
Problem:
The problem is the dataset takes a long time to be load and so the each knit takes a long time to be executed (roughly five to ten minutes). I do need all of the data, so loading in a smaller file isn't a workable option. Of course, I am able to code the individual chunks since the data are loaded into the global environment, but formatting is in credibly onerous since it is difficult to visualize the result of formatting changes without looking at the knitted product.
Attempts to solve the issue:
After some research, I found and tried to use cache = TRUE and cache.extra = file.mtime('my-precious.csv') (as per this section of Yihui's Bookdown). However, this option didn't work as it resulted in the following:
Error in lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress, :
long vectors not supported yet: connections.c:6073
Calls: <Anonymous> ... <Anonymous> -> <Anonymous> -> lazyLoadDBinsertVariable
To overcome this error, I added cache.lazy = FALSE into the chunk options (as mentioned here). Unfortunately, while the code worked, the time it took to knit the document did not go down.
My limited understanding of this process is that having cache = TRUE and cache.extra = file.mtime('my-precious.csv') will lead to a code chunk's executed results be cached so that the next time the file is knit, results from the previous run are loaded. However, because my file is too large, cache = TRUE doesn't work so I have to use cache.lazy = FALSE to turn reverse what is done by cache = TRUE. In the end, this means that the dataset is being loaded into my memory each time I run the file, thereby lengthening the time it take to knit the document.
Questions to which I seek answers from the R community:
Is there a way to cache the data-loading chunk in R Markdown when the file size is large (~90 million rows)?
Is there a (better) way to circumvent the time-intensive data-loading process every time I knit the R Markdown file?
Is my understanding of the cache = TRUE method of circumventing the time-intensive data-loading process correct? And if it isn't, why didn't the cache = TRUE method work for me?
Any help is appreciated.
Is there a (better) way to circumvent the time-intensive data-loading process every time I knit the R Markdown file?
Yes. Perform your computations outside of the Rmarkdown report.
Plots can be saved into files and included into the report via knitr::include_graphics(myfile)
Tables can be saved into smaller summary files, loaded via fread and displayed via kable.
Note that if you need to print tables in a loop, you should specify the result=asis chunk option.
```{r my_chunk_label, results='asis', echo=F}
for(i in 1:length(data_full)) {
print(kable(data_full[[i]]))
cat('\n')
}
```
Run your expensive computations once, save the results. Consume these results with a light Rmarkdown report that is easy to format.
If you still have large csv files to load, you should use data.table::fread which is much more efficient than base functions.
I actually posted a similar question not so long ago. You're not alone.
In an RMarkdown file, you can see the below sample text in red. It prints in the console and the RStudio Notebook Output (using view()) up to 10,000 lines after running my code.
However, the total text is 20,000 lines long. I can't find help online which indicates how to increase the number of lines you can view in R and I need to access all of it. Can anyone help? I want to view all of the text basically.
Note that my code took hours to run, RStudio crashed when the code finished executing, and the information I need is saved in red. Hence, I can't re-run the code without the same problem occurring.
I have a chunk of SQL in my R Markdown / Notebook Document:
```{sql output.var = "df"}
SELECT * FROM FakeData
WHERE Date >= '2017-01-01
```
It takes literally 5 minutes to run. Is there an easy way to cache the result of the query without Knitting the document or writing the file to a CSV.
I'd perhaps like the cache to live for a couple of hours, or maybe a day (is there a way to adjust this as well?)
If you put cache=TRUE in the chunk options and you are working in rStudio, you can select a section of code and run it directly using the green arrows in the upper right of the rMarkdown/knitr console.
{sql output.var = "df", cache=TRUE}
SELECT * FROM FakeData
WHERE Date >= '2017-01-01
Also, I tend to run a regular R script in another window with EVERYTHING I am going to use in knitR. I find that I have less issues with package availability and caching if the data is stored in the global environment.
If you do it this way, and run with cache=TRUE you will definitely be able to save the data on the first run and skip the waiting next time around.
I'm trying to build a webpage displaying a few graphs. I want to use R to build the graphs, and I want these graphs to automatically update themselves periodically. For example, a webpage showing a graph of the stock price of a particular company over time, which refreshes every day with new data.
what is the best way to approach this? Using Rook to run the R-scripts? can I use it along Markdown, for example, to make the html webpage? Or do you suggest something else?
You can make your plots in a R file and write your webpage in markdown in which you call your R objects and plots. Alternatively you can also run the R code directly in the markdown file. With the knit2html function of the knitr package you can create the html page with the desired output. You can find basic examples on the knitr webpage.
You can schedule these file(s) on your own computer or on a server to update the data of the html output every day. If you have a machine that runs on Windows you can use the Windows Task Manager to run the batch file.
EDIT:
Here you can find a worked out example.