How do you change memory available for data storage settings in R using OSX command line? - r

I'm very new to R and I'm having trouble changing the memory available for data storage setting "--max-ppsize". From reading other posts, the error that resulted from running a line of my code (Error: protect(): protection stack overflow) indicates that I should change this to the maximum allowed value (--max-ppsize = 500000) using command line. This threw an error because I'm trying to run the Rtsne package on a data set that is very large. I don't fully understand how to run command line for R in OSX terminal. I can launch R in terminal, but from there I'm not exactly sure what to do. Any help would be very much appreciated.

So this ended up being a fairly simple fix with a fresh brain the next day. I didn't end up changing the --max-ppsize. I read another post's comments section and saw that I needed to use a data matrix (used data.matrix to change my data frame to a matrix) for the function I was trying to run. Hope this helps someone else out.

Related

I cannot obtain scores for a metaMDS object in RStudio (package: vegan)

I'm using vegan 2.6.4 in RStudio, and have had an unusual error message pop up when I run the the following:
nmds11 = metaMDS(m_com11, distance = "bray")
data.scores11 = as.data.frame(scores(nmds11)$sites)
Error in UseMethod("scores") :
no applicable method for 'scores' applied to an object of class "c('metaMDS', 'monoMDS')
I can safely say this has never happened to me, and I was using the exact same code on a different dataset 5 minutes ago with no issues. I have also previously run this same script on at least a dozen other matrices with no errors.
I have tried calling scores.metaMDS as suggested when looking up the scores function (to help specify what type of object I'm trying to get scores from), but that function apparently does not exist. I've also tried running some old scripts that always worked in the past, with the same unfortunate results.
Any idea what I can do to address this?
Try using vegan::scores(); it could be that some other package you have loaded also has a scores() generic that is overwriting vegan::scores(). You can also try the much more specific vegan:::scores.metaMDS() if the whole S3 system has gotten clobbered.
Beyond that, restart R (in RStudio, find the Restart R option in the menus) so you get a clean session and try running your code again.
I I tried vegan:::scores.metaMDS() without restarting RStudio and it works ! Thanks !!!

Issue with applying str_length to a dataframe

I created a simple R Script that is run on a monthly basis by colleagues.
This script brings in a fairly chunky RDS file that has around 2.6M observations and 521 variables.
Against this file the following two commands are run:
Latest$MFU <- substr(Latest$SUB_BUSINESS_UNIT_CODE, 1, 2)
Latest$LENGTH <- str_length(Latest$POLICYHOLDER_COMPANY_NAME_LAST_NAME)
This script has run perfectly for the last three years, but today, for some reason, it is now failing for all three people tasked to run it and has indeed fallen over for myself too.
The error message received is
Error: cannot allocate vector of size 10.0 Mb
At first I assumed that their computers were running out of memory, or they were not using 64Bit R, or some other reason such as not restarting their computers, etc.
It turns out though that they have plenty of memory available, have restarted their computers, are using 64 Bit R in R Studio and all are using different versions of R Studio/R.
I tried running the process myself, my computer has 32GB of Ram and 768GB of Hard Drive space free. I am getting the same error message.......
So, must be a corrupt source file I figure. Try last months file which all ran just fine last month for everyone and same error.
Maybe just try stringr package instead then, move around the problem that way. Nope, no dice, exact same error message.
I have to admit I'm stumped. I have tried gc(), tried previous versions of the file, tried cutting the file in half and running it that way, it just flat out refuses to run.
Anyone know of an alternative to stringr/base R commands to get the length of a character string as a new variable and to get a substring as a new variable?
What about rm(list=ls()) before running, and memory.limit(size = 16265*4) (or another big number) ?

How to open up a matrix that's running into an error

I am running into an error on a big job in R. I running it as an R script. I keep getting the error that Error in chol.default(F.mat) :
the leading minor of order 1 is not positive definite.
I normally run my job in a qsub but that only gives me an error output but I can't poke around. I then tried running my job locally but my 4gb Macbook was completely overwhelmed.
Now I am trying using screen name and running it on a screen with options(error=recover). Now I am running into the same error as above but I don't know how to access the data frames. I get recover called non-interactively; frames dumped, use debugger() to view but then I get put into my bash line and I don't know how to open up the data frame.
Any ideas?
This is a bit awkward since (1) it's more or less remote debugging and (2) I don't actually ever try to debug non-interactively myself, but: it seems that
options(error=function() dump.frames(to.file=TRUE)) might be worth trying?
After your frames dump to a file (last.dump.rda in the working directory,by default), you should be able to run load("last.dump.rda"); debugger(last.dump) to get back to the debugging environment.
Two caveats:
I haven't actually tested this, just read & interpreted ?dump.frames;
I strongly recommend that you test this with short test runs, either running your original code on a small subset of your data or setting a mini-test script something like
options(error=function() dump.frames(to.file=TRUE))
Sys.sleep(60)
stop("testing error exit")

R - "Browser()" on Error?

I have been using some R libraries to analyze some large data recently, and I find myself frustrated by waiting several hours for the beginning of an analysis, just to get to the end and receive some trivial error, like that I did not install a prerequisite library, or that one of my parameters was wrong. So, then I have to start all over, do the exact same analysis, generate the same variables that it had when it died, and wait a long time. Please note that these are not handled exceptions--they are fatal errors from R.
This is just a thought--and perhaps it is too good to be true, so please at least explain why it wouldn't work--but is there any way to cause R to execute "browser()" in the environment whenever it has a fatal error? For example, say it is executing a script, and encounters "require(notInstalledYet)". Instead of just dying, and losing all the variables in the memory, it would be great if it would give me a browser() at the place it died, so that I could at least save the variables, and at best, fix the problem (e.g. install the library) and try again.
You can change the error option to open a browser on error
options(error=browser)
the default is
options(error=NULL)

Slow or stacking file.choose() in R

If I have more data loaded in R I'm having difficulties with opening and choosing new file via file.choose() and later upload via read.csv(), but I would not get to that point since the file.choose function stacks and the R "crushes" and reports something like "unidentified error occurred and that the R must restart".
I'm using RStudio and running this on Windows 7. The hardware is up to date.
Could someone point me on why this is happing and what would be a remedy against this. Are there other options to select file? I know I can insert the path right into the read.csv command, but the (file is different every time).
EDIT:
The error just happened again. I can not reproduce the error so it happens rather only with high likelihood if the conditions for it are met.
The error reads as: R Session Aborted.R encountered fatal error. The session was terminated. And in window: "Start New Session".
EDIT 2:
I would just rephrase my question. The question is whether there is other option like command or package that deals with choosing a file. [file.choose()]
The error can not be reproduced and hence I can not expect someone gives reasonable comment on this. But if this occurred someone in the past and solved it, I would like to hear about it. Thanks.
EDIT 3: Further to the error. I have spotted just now sentence in red in Console: Error: Unable to provide connection with R

Resources