R console unexpectedly slow, long behind job (PDF output) is finished - r

When I run a large R scripts (works nicely as expected, basically produces a correct PDF at the end of the script (base plotting plus beeswarm, last line of script is dev.off()), I notice that the PDF is finished after ~3 seconds and can even be opened in other applications, long before the console output (merely few integer values and echo of code ~400 lines) is finished (~20 seconds). There are no errors reported. In between, the echo stops and does nothing for seconds.
I work with R Studio V0.97.551, R version 3.0.1, on Win-7.
gc() or close and restart R did not help, and the data structures used are not big anyway (5 dataframes with up to 60 obs and 64 numeric or short character variables). The available memory should be sufficient (according to task manager, around 4 GB throughout), but CPU is busy during that time.
I agree this is not reproducible for other people w/o the script, which is however too large to post, but maybe someone has experienced the same problem or even an explanation or suggestion what to check? Thanks in advance!
EDIT:
I run exactly the same code directly in R 3.0.1 (w/o RStudio), and the problem was gone, suggesting the problem is related to RStudio. I added the tag RStudio, but I am not sure if I am now supposed to move this question somewhere else?

Recently I came across similar problem--running from RStudio becomes very slow, even when it is executing something as simple as example('plot'). After searching around, this post pointed me to the right place that eventually led to a workaround: resetting RStudio by renaming the "RStudio-Desktop Directory". The exact way to do so depends upon the OS you are using, and you could find the detail instruction here. I just tried it, and it works.

Related

Issue with applying str_length to a dataframe

I created a simple R Script that is run on a monthly basis by colleagues.
This script brings in a fairly chunky RDS file that has around 2.6M observations and 521 variables.
Against this file the following two commands are run:
Latest$MFU <- substr(Latest$SUB_BUSINESS_UNIT_CODE, 1, 2)
Latest$LENGTH <- str_length(Latest$POLICYHOLDER_COMPANY_NAME_LAST_NAME)
This script has run perfectly for the last three years, but today, for some reason, it is now failing for all three people tasked to run it and has indeed fallen over for myself too.
The error message received is
Error: cannot allocate vector of size 10.0 Mb
At first I assumed that their computers were running out of memory, or they were not using 64Bit R, or some other reason such as not restarting their computers, etc.
It turns out though that they have plenty of memory available, have restarted their computers, are using 64 Bit R in R Studio and all are using different versions of R Studio/R.
I tried running the process myself, my computer has 32GB of Ram and 768GB of Hard Drive space free. I am getting the same error message.......
So, must be a corrupt source file I figure. Try last months file which all ran just fine last month for everyone and same error.
Maybe just try stringr package instead then, move around the problem that way. Nope, no dice, exact same error message.
I have to admit I'm stumped. I have tried gc(), tried previous versions of the file, tried cutting the file in half and running it that way, it just flat out refuses to run.
Anyone know of an alternative to stringr/base R commands to get the length of a character string as a new variable and to get a substring as a new variable?
What about rm(list=ls()) before running, and memory.limit(size = 16265*4) (or another big number) ?

r system.time() causes my r session to hang

I'm running Rstudio on a mac and my code slows down greatly when I place it between the brackets of a system.time command.
However system.time appears to report the actual time it takes to run the code without being placed in system.time(). Thus although it takes 2 or 3 minutes for the command to run within system.time(), it will report only a few seconds of elapsed time.
I'm not sure how to diagnose this behavior further.
One possible cause might be that I'm working with very large data tables and running efficient data.table commands that would take a long time to run in base r. Would this interfere with system.time()?
Try running it in the RGUI. One issue with RStudio (I have seen this same problem on my code) is that RStudio will want to update its Environment after each execution and this may take some time. I usually try to run large batch jobs in RGUI to avoid this issue.
Try it and report back.

read.csv crashes RStudio

Help me figure out what I am doing wrong!
I have about 20 .csv files (product feeds) online. I used to be able to fetch them all. But now they crash R if I fetch more than one or two. File size is about 50K rows / 30 columns each.
I guess it's a memory issue but I've tried on a different computer with the exact same result.
Could it be some formatting in the files that make R use too much memory? Or what can it be?
If I run one of these everything is good. Two sometimes. Three and it almost certainly crashes
a <- read.csv("URL1")
b <- read.csv("URL2")
c <- read.csv("URL3")
I have tried specifying all sorts of stuff like:
d <- read.csv("URL4",skipNul=TRUE,sep=",",stringsAsFactors=FALSE,header=TRUE)
I keep getting this message:
R session aborted.
R encountered a fatal error.
The session was terminated.
We have some commercial software where I can fetch the same files without issues, so the files should be fine.
And my script was running twice daily for several months without issues
R version 3.6.1
Platform: x86_64-apple-darwin15.6.0 (64-bit)
I have had this issue as well but with read_csv(). I haven't figured out what the exact cause is yet, but my best guess is that trying to read a file and write that file to a variable at the same time is too much for memory or CPU to handle.
Stemming from that guess, I tried this method and it has worked perfectly for me:
library(dplyr)
a <- read.csv("URL1") %>% as_tibble()
# you can use other data types instead of tibble. that is just my example.
The whole idea is to split the reading process from the writing process by separating them using a pipe. This makes sure that one must be finished before the next can start.

Why is source speed different from RStudio console line code?

I have a script with self-written functions (no plots). When I copy-paste that script into the R-Studio console, it takes ages to execute, but when I use source("Helperfunctions.R") it doesn't take more than a second.
Question: Where does the difference in speed come from?
I am aware of two differences between running code via the source() function vs. entering code at the R-Studio console:
From ?source:
Since expressions are not executed at the top level, auto-printing is not done.
The way I understand this: source() will not plot graphs (unless made specific with e.g. print(plot)), while the R Studio console codes will always plot graphs. I'm sure this will affect the speed of execution to a certain degree, but this seems irrelevant in my case, because there are barely any plot calls.
And:
(...) the complete file is parsed before any of it is run
I have been working with R for a while now, but I'm not sure whether this relevant for the speed-issue I'm having. Is it possible that completely parsing all code "before any of it is run" speeds up the execution of my helper functions script by a factor of a hundred?
Edit: I'm using R version 3.2.3.
The issue is not source() vs. console line code. Instead, it is an issue of how RStudio sends code from the source pane to the console.
When I copy the content of Helperfunctions.R and run it in RGui (instead of RStudio), the code is executed with nearly the same speed as when I use source("Helperfunctions.R") in RStudio.
Apparently, lines of code always (?) require more execution time in RStudio than in RGui. Even though you may usually not notice the time-difference when executing a couple of lines in the console, it seems to make a huge difference when, say, 3.000 lines of code are being executed in the R Studio console at once.
My understanding is that upon using source("Helperfunctions.R") in the RStudio source pane, the code is not actually sent to the RStudio console (which would have been slow), but is actually executed directly in the R language.

RStudio does not display any output in console after entering code

The problem is that when I run the code, there's no return in the console; I mean it does run the code, but does not return any output.
For example, if I write
v <- c(1, 2, 3, 4, 5)
v
I would expect in return
[1] 1 2 3 4 5
But it's not working.
I have version RStudio Version 0.98.1079 and R Version 3.1.1
Possibility 1 (until the + sign was mentioned): I was wondering if you had been doing a tutorial where they were demonstrating the sink function and you hadn't gotten to the point where it was reversed.
> sink('out.txt') # diverts all output to a disk file
> v <- c(1,2)
> v # output went to file
> sink() # sets the output back to the console
> v
[1] 1 2
Another way would be to call closeAllConnections:
> sink('out.txt')
> v
> v
> closeAllConnections()
> v
[1] 1 2
Possibility 2: To address the lack of response with a "+" showing at the Rstudio console ... that is a sign that the R parser "thinks" the entered text has not completed a full R command. It may indicate that you haven't typed a closing bracket or parenthesis. If typing one or two of those is unsuccessful and you keep getting mor +'s then you may be successful with typing the [esc]-key. If it is showing up immediately after a restart then you should check your code for correctness and make sure that the .Rdata file is deleted from your working directory. If you don't know what that means then you may need to search for the methods appropriate to your operating system. You could also have an error in the code of one of your .rprofile files.
In any case these two possibilities have nothing to do with Rstudio per se and everything to to with the typical behavior of an R console session in pretty much any IDE.
Do the lines still start with a "+"? It is also possible you forgot to close the brackets of a function. Try "}".
I had the same issue and none of the tips mentioned here were working.
Session > Restart R did the trick for me, possibly suggesting that I had a similar problem as andrewH but was not patient enough to wait for R to behave again.
This is a very old question, but I just had the same problem with a different cause, so I thought I would describe it here case it should be useful to someone else. I was getting the regular command prompt, with nothing more, no matter what I typed at the command line. I tried multiple returns, escape, sink, traceback, closeAllConnections (which did give me a response, "error: unexpected ) in (), but then went back to the command prompt and ignored a second traceback).
Anyway after half an hour or so of pulling my hair out, up pops "View(Mid2)". Mid2 is a tibble with 8.5 million observations of 88 numeric variables. I must have tapped it in the environment pane accidentally. I suppose it just took that long for the viewer to render it. I assume that all the other things I did hit at once, because RStudio crashed immediately thereafter.
The interesting thing about this particular version of the problem is what didn't happen. The red stop sign in the upper right of the console window, that lights when R is busy, didn't light. That is unfortunate -- but understandable, if the RStudio viewer is a different process. But also, when my computer is working hard on a really big computation or IO task, the fan usually starts, but it didn't. Don't know why. . I took its absence, incorrectly, to mean no such computation was underway.
If the lines in console are starting with "+".
Save your work and close the 'RStudio' or other tool which you are using and Start it again, it worked for me.
If you are using R Studio Cloud, refresh or re-opening won't work.
Only clue from the above posts or answers is your console will always start with '+'
In my case I tried all possibilities of closing braces.
And ")" worked for me when I typed that into the console and press enter.
sink() function did nothing in R Studio Cloud
A simple mistake might have also caused this problem:
A rather lengthy command left abandoned in the console is blocking the appearance of the result line.
Thus, the console only shows that line, but the result from any code run from the source, will not appear.
To solve this, just switch to the console, remove any remaining command and try again.
Experiencing something like that explained here as an unresponsive console to the R-Code running was just devastating for me when I experienced it. But luckly although I tried every trick explained in this page, it did not work for me. At last I clicked on the "To console" option available just below the Environment, History, Connections, Tutorial Tab on the R Studio. It solved the puzzle for me just now.
The best solution I've found is closeAllConnections and/or sink which almost always work
But as a stop gap measure, View()'ing always works. It's sort of a pain but whatever you wanted to print out, surround by View and you can see it

Resources