How to open up a matrix that's running into an error - r

I am running into an error on a big job in R. I running it as an R script. I keep getting the error that Error in chol.default(F.mat) :
the leading minor of order 1 is not positive definite.
I normally run my job in a qsub but that only gives me an error output but I can't poke around. I then tried running my job locally but my 4gb Macbook was completely overwhelmed.
Now I am trying using screen name and running it on a screen with options(error=recover). Now I am running into the same error as above but I don't know how to access the data frames. I get recover called non-interactively; frames dumped, use debugger() to view but then I get put into my bash line and I don't know how to open up the data frame.
Any ideas?

This is a bit awkward since (1) it's more or less remote debugging and (2) I don't actually ever try to debug non-interactively myself, but: it seems that
options(error=function() dump.frames(to.file=TRUE)) might be worth trying?
After your frames dump to a file (last.dump.rda in the working directory,by default), you should be able to run load("last.dump.rda"); debugger(last.dump) to get back to the debugging environment.
Two caveats:
I haven't actually tested this, just read & interpreted ?dump.frames;
I strongly recommend that you test this with short test runs, either running your original code on a small subset of your data or setting a mini-test script something like
options(error=function() dump.frames(to.file=TRUE))
Sys.sleep(60)
stop("testing error exit")

Related

Issue with applying str_length to a dataframe

I created a simple R Script that is run on a monthly basis by colleagues.
This script brings in a fairly chunky RDS file that has around 2.6M observations and 521 variables.
Against this file the following two commands are run:
Latest$MFU <- substr(Latest$SUB_BUSINESS_UNIT_CODE, 1, 2)
Latest$LENGTH <- str_length(Latest$POLICYHOLDER_COMPANY_NAME_LAST_NAME)
This script has run perfectly for the last three years, but today, for some reason, it is now failing for all three people tasked to run it and has indeed fallen over for myself too.
The error message received is
Error: cannot allocate vector of size 10.0 Mb
At first I assumed that their computers were running out of memory, or they were not using 64Bit R, or some other reason such as not restarting their computers, etc.
It turns out though that they have plenty of memory available, have restarted their computers, are using 64 Bit R in R Studio and all are using different versions of R Studio/R.
I tried running the process myself, my computer has 32GB of Ram and 768GB of Hard Drive space free. I am getting the same error message.......
So, must be a corrupt source file I figure. Try last months file which all ran just fine last month for everyone and same error.
Maybe just try stringr package instead then, move around the problem that way. Nope, no dice, exact same error message.
I have to admit I'm stumped. I have tried gc(), tried previous versions of the file, tried cutting the file in half and running it that way, it just flat out refuses to run.
Anyone know of an alternative to stringr/base R commands to get the length of a character string as a new variable and to get a substring as a new variable?
What about rm(list=ls()) before running, and memory.limit(size = 16265*4) (or another big number) ?

How do you change memory available for data storage settings in R using OSX command line?

I'm very new to R and I'm having trouble changing the memory available for data storage setting "--max-ppsize". From reading other posts, the error that resulted from running a line of my code (Error: protect(): protection stack overflow) indicates that I should change this to the maximum allowed value (--max-ppsize = 500000) using command line. This threw an error because I'm trying to run the Rtsne package on a data set that is very large. I don't fully understand how to run command line for R in OSX terminal. I can launch R in terminal, but from there I'm not exactly sure what to do. Any help would be very much appreciated.
So this ended up being a fairly simple fix with a fresh brain the next day. I didn't end up changing the --max-ppsize. I read another post's comments section and saw that I needed to use a data matrix (used data.matrix to change my data frame to a matrix) for the function I was trying to run. Hope this helps someone else out.

Extract what was printed deep in the R console

When I execute commands in R, the output is printed in the console. After some threshold (I guess, some maximum number of lines), the R console no longer shows the first commands and their output. I cannot scroll up that far because it is simply no longer there.
How can I access this "early" output if it has disappeared from the console?
I care mostly about error messages and messages generated by my own script. I do use a script file and save my results to a file, if anyone wonders, but this still does not help solve my problem.
(I have tried saving the R workspace and R history and then loading it again, but did not know what to do next and was not able to find what I needed...)

R - "Browser()" on Error?

I have been using some R libraries to analyze some large data recently, and I find myself frustrated by waiting several hours for the beginning of an analysis, just to get to the end and receive some trivial error, like that I did not install a prerequisite library, or that one of my parameters was wrong. So, then I have to start all over, do the exact same analysis, generate the same variables that it had when it died, and wait a long time. Please note that these are not handled exceptions--they are fatal errors from R.
This is just a thought--and perhaps it is too good to be true, so please at least explain why it wouldn't work--but is there any way to cause R to execute "browser()" in the environment whenever it has a fatal error? For example, say it is executing a script, and encounters "require(notInstalledYet)". Instead of just dying, and losing all the variables in the memory, it would be great if it would give me a browser() at the place it died, so that I could at least save the variables, and at best, fix the problem (e.g. install the library) and try again.
You can change the error option to open a browser on error
options(error=browser)
the default is
options(error=NULL)

R console unexpectedly slow, long behind job (PDF output) is finished

When I run a large R scripts (works nicely as expected, basically produces a correct PDF at the end of the script (base plotting plus beeswarm, last line of script is dev.off()), I notice that the PDF is finished after ~3 seconds and can even be opened in other applications, long before the console output (merely few integer values and echo of code ~400 lines) is finished (~20 seconds). There are no errors reported. In between, the echo stops and does nothing for seconds.
I work with R Studio V0.97.551, R version 3.0.1, on Win-7.
gc() or close and restart R did not help, and the data structures used are not big anyway (5 dataframes with up to 60 obs and 64 numeric or short character variables). The available memory should be sufficient (according to task manager, around 4 GB throughout), but CPU is busy during that time.
I agree this is not reproducible for other people w/o the script, which is however too large to post, but maybe someone has experienced the same problem or even an explanation or suggestion what to check? Thanks in advance!
EDIT:
I run exactly the same code directly in R 3.0.1 (w/o RStudio), and the problem was gone, suggesting the problem is related to RStudio. I added the tag RStudio, but I am not sure if I am now supposed to move this question somewhere else?
Recently I came across similar problem--running from RStudio becomes very slow, even when it is executing something as simple as example('plot'). After searching around, this post pointed me to the right place that eventually led to a workaround: resetting RStudio by renaming the "RStudio-Desktop Directory". The exact way to do so depends upon the OS you are using, and you could find the detail instruction here. I just tried it, and it works.

Resources