r system.time() causes my r session to hang - r

I'm running Rstudio on a mac and my code slows down greatly when I place it between the brackets of a system.time command.
However system.time appears to report the actual time it takes to run the code without being placed in system.time(). Thus although it takes 2 or 3 minutes for the command to run within system.time(), it will report only a few seconds of elapsed time.
I'm not sure how to diagnose this behavior further.
One possible cause might be that I'm working with very large data tables and running efficient data.table commands that would take a long time to run in base r. Would this interfere with system.time()?

Try running it in the RGUI. One issue with RStudio (I have seen this same problem on my code) is that RStudio will want to update its Environment after each execution and this may take some time. I usually try to run large batch jobs in RGUI to avoid this issue.
Try it and report back.

Related

Issue with applying str_length to a dataframe

I created a simple R Script that is run on a monthly basis by colleagues.
This script brings in a fairly chunky RDS file that has around 2.6M observations and 521 variables.
Against this file the following two commands are run:
Latest$MFU <- substr(Latest$SUB_BUSINESS_UNIT_CODE, 1, 2)
Latest$LENGTH <- str_length(Latest$POLICYHOLDER_COMPANY_NAME_LAST_NAME)
This script has run perfectly for the last three years, but today, for some reason, it is now failing for all three people tasked to run it and has indeed fallen over for myself too.
The error message received is
Error: cannot allocate vector of size 10.0 Mb
At first I assumed that their computers were running out of memory, or they were not using 64Bit R, or some other reason such as not restarting their computers, etc.
It turns out though that they have plenty of memory available, have restarted their computers, are using 64 Bit R in R Studio and all are using different versions of R Studio/R.
I tried running the process myself, my computer has 32GB of Ram and 768GB of Hard Drive space free. I am getting the same error message.......
So, must be a corrupt source file I figure. Try last months file which all ran just fine last month for everyone and same error.
Maybe just try stringr package instead then, move around the problem that way. Nope, no dice, exact same error message.
I have to admit I'm stumped. I have tried gc(), tried previous versions of the file, tried cutting the file in half and running it that way, it just flat out refuses to run.
Anyone know of an alternative to stringr/base R commands to get the length of a character string as a new variable and to get a substring as a new variable?
What about rm(list=ls()) before running, and memory.limit(size = 16265*4) (or another big number) ?

R/Rstudio Frustration - Can't Stop Code Execution

I often include View() statements in my R scripts. If I accidentally forget the closing bracket at the end of the line, and then run the line of code from the script window using ctrl-enter, R just keeps trying to execute the remainder of my script. I don't know why it does that (rather than using the + symbol to prompt me to provide further input).
Moreover, I've tried to stop this by setting break points in my code - I can click on the LHS of the page and a little red circle appears. But the breakpoints don't seem to work - R just ignores them and keeps going.
The only way I can get out of it is by killing the process in the Windows task manager and then going back in afterwards. But it's wasting a lot of time.
Does anyone know how I can fix this please?
Thank you.
In effect, what your function is processing looks like that:
... %>% View(
lm(am~cyl, mtcars)
...
...
As R can't find the bracket for ) it includes remaining statements as input to View and searches for the bracket.
Solutions
Kind of depends on what you want to do with those scripts but if the intention is to run them in the background consider using callr. This package lets you run R from R and offers kill methods to kill the process you started that way.
On Windows pressing Esc should enable you to get back to the console but if it's a memory intense process it may be difficult.
You may try pressing Ctrl+c in order to kill the process.

Is there a way to run R code in the background while continue to work in the same session?

I want to know if there is a way to run R code (train, mutate, search, ...) in the background, without the need to wait for execution to end or to manually transfer related data to a new session.
Multiple tabs in R-Studio or running multiple sessions in a Jupyer notebook from localhost:8888, localhost:8889, localhost:8890, localhost:8891, etc. is another crude way.
Be mindful of system compute strengths/limitations.

Why is source speed different from RStudio console line code?

I have a script with self-written functions (no plots). When I copy-paste that script into the R-Studio console, it takes ages to execute, but when I use source("Helperfunctions.R") it doesn't take more than a second.
Question: Where does the difference in speed come from?
I am aware of two differences between running code via the source() function vs. entering code at the R-Studio console:
From ?source:
Since expressions are not executed at the top level, auto-printing is not done.
The way I understand this: source() will not plot graphs (unless made specific with e.g. print(plot)), while the R Studio console codes will always plot graphs. I'm sure this will affect the speed of execution to a certain degree, but this seems irrelevant in my case, because there are barely any plot calls.
And:
(...) the complete file is parsed before any of it is run
I have been working with R for a while now, but I'm not sure whether this relevant for the speed-issue I'm having. Is it possible that completely parsing all code "before any of it is run" speeds up the execution of my helper functions script by a factor of a hundred?
Edit: I'm using R version 3.2.3.
The issue is not source() vs. console line code. Instead, it is an issue of how RStudio sends code from the source pane to the console.
When I copy the content of Helperfunctions.R and run it in RGui (instead of RStudio), the code is executed with nearly the same speed as when I use source("Helperfunctions.R") in RStudio.
Apparently, lines of code always (?) require more execution time in RStudio than in RGui. Even though you may usually not notice the time-difference when executing a couple of lines in the console, it seems to make a huge difference when, say, 3.000 lines of code are being executed in the R Studio console at once.
My understanding is that upon using source("Helperfunctions.R") in the RStudio source pane, the code is not actually sent to the RStudio console (which would have been slow), but is actually executed directly in the R language.

R console unexpectedly slow, long behind job (PDF output) is finished

When I run a large R scripts (works nicely as expected, basically produces a correct PDF at the end of the script (base plotting plus beeswarm, last line of script is dev.off()), I notice that the PDF is finished after ~3 seconds and can even be opened in other applications, long before the console output (merely few integer values and echo of code ~400 lines) is finished (~20 seconds). There are no errors reported. In between, the echo stops and does nothing for seconds.
I work with R Studio V0.97.551, R version 3.0.1, on Win-7.
gc() or close and restart R did not help, and the data structures used are not big anyway (5 dataframes with up to 60 obs and 64 numeric or short character variables). The available memory should be sufficient (according to task manager, around 4 GB throughout), but CPU is busy during that time.
I agree this is not reproducible for other people w/o the script, which is however too large to post, but maybe someone has experienced the same problem or even an explanation or suggestion what to check? Thanks in advance!
EDIT:
I run exactly the same code directly in R 3.0.1 (w/o RStudio), and the problem was gone, suggesting the problem is related to RStudio. I added the tag RStudio, but I am not sure if I am now supposed to move this question somewhere else?
Recently I came across similar problem--running from RStudio becomes very slow, even when it is executing something as simple as example('plot'). After searching around, this post pointed me to the right place that eventually led to a workaround: resetting RStudio by renaming the "RStudio-Desktop Directory". The exact way to do so depends upon the OS you are using, and you could find the detail instruction here. I just tried it, and it works.

Resources