glm running out of memory in 64-bit R? - r

I am trying to run glm on a dataset with 255001 data points, but it's saying
Error: cannot allocate vector of size 10.0 Gb
This is very strange because when I start up R, I see the message
R version 3.1.1 (2014-07-10) -- "Sock it to Me"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Which seems to indicate that I'm running a 64 bit version of R, and I read that the memory limit for 64 bit versions of R on Unix is on the order of 128 Tb.
Furthermore, I have successfully run glm logistic regression on very similar datasets that are twice as large without any problem.
How can I reconcile these facts, and how can I get R to hold large objects in memory?

It turns out there was a bug in my code, and when I was reading in the data, I set header=FALSE instead of header=TRUE. Changing this fixed the problem.

Related

How can I determine and increase the memory allocated to R on a Mac

Variants of this question have been asked before (e.g., here, here, here, and here), but none of the suggested solutions works for me.
R returns an error message ("Error: vector memory exhausted (limit reached?)"), even though there is available memory on my computer (a 2019 MacBook Pro with 16 GB memory), as indicated by the Memory Pressure monitor in the Memory tab of the Activity Monitor.
I have set the memory both from the command line (using open .Renviron) and from RStudio (using usethis::edit_r_environ) , as suggested here. Neither solution works.
Has anybody found other solutions to this problem? Also, is there a way to determine, in RStudio, the maximum memory allocated? Sys.getenv() does not return this information.
I do not encounter this problem in base R -- only RStudio.
Session info:
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

cannot allocate vector of size xxx even after upgrading server instance

I could use some troubleshooting advice. I'm trying to use the block() function of the {blocktools} package on a dataframe with only 45k obs and 14 variables (4 of which I'm trying to block on). I got an error that R could not allocate vector of size X, so I upgraded the AWS instance which doubled the memory. I restarted the instance and tried running again.
I'm still getting the error and can't figure out why seeing as I doubled the memory. Does R on Linux require me to say how much memory should be available? Any other troubleshooting tips?
FWIW, I'm running rm(list=ls()), loading only the dataframe I need, and throwing in gc() for good measure.
What else can I try?
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

R vector memory exhausted

I am currently using RStudio on my Macbook Pro.
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
When using the agnes() function from the cluster package I received the error message:
Error: vector memory exhausted (limit reached?)
To solve I followed the steps mentioned in the answer to the following question: R on MacOS Error: vector memory exhausted (limit reached?)
Now running the same function I receive R session aborted message. R encountered a fatal error. The session was terminated.
Any other solutions?
AGNES needs at least two copies of a distance matrix.
Now if you have 100.000 instances, double precision (8 bytes) that means we are talking about memory usage on the order of 160000000000 bytes. That is 160GB.
Not including the input data, or any overhead. If you are lucky, the R version of AGNES only stores the upper triangular matrix, which would reduce this by a favor of 2. But OTOH if it did, it would likely produce an integer overrun at about 64k objects.
So you probably need to choose a different algorithm than AGNES, or reduce your data first.

Why is R reported to use much more memory by Windows than by itself?

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
On a 32G system, I got this error when creating a distance matrix:
df <- remove_duplicates_quanteda(dfm, df)
Error: cannot allocate vector of size 1.3 Gb
Looking inside my enviroment, there is little reason for concern:
print(object.size(x = lapply(ls(), get)), units = "Mb")
96.5 Mb
However, Windows reports the following figures:
What is the reason for this difference? Is there a way to find out?
Hadley put it pretty simple in Advanced R:
This number won’t agree with the amount of memory reported by your
operating system for a number of reasons:
It only includes objects created by R, not the R interpreter itself.
Both R and the operating system are lazy: they won’t reclaim memory
until it’s actually needed. R might be holding on to memory because
the OS hasn’t yet asked for it back.
R counts the memory occupied by objects but there may be gaps due to
deleted objects. This problem is known as memory fragmentation.
For more information see the Section about Memory

What causes the error when using ggplot plot on different OSes?

I'm getting a strange error when loading .Rdata produced on one OS and transferred to another. On a Windows machine I generate a large number of plots and store them in a .Rdata file. I then transfer them to a linux server running CentOS 5 and access them by loading the file and recalling the plot.
When I run the following lines on CentOS I get an error:
library(ggplot2)
load('mydata.Rdata')
p
Error in UseMethod("facet_train_layout") :
no applicable method for 'facet_train_layout' applied to an object of class "c('proto', 'environment')"
The Windows 7 OS that was used to produce this .Rdata file is using the following version of R:
R version 2.14.2 (2012-02-29) Copyright (C) 2012 The R Foundation for
Statistical Computing ISBN 3-900051-07-0 Platform:
x86_64-pc-mingw32/x64 (64-bit)
The CentOS system that produces an error is as follows:
R version 2.14.2 (2012-02-29) Copyright (C) 2012 The R Foundation for
Statistical Computing ISBN 3-900051-07-0 Platform: i686-pc-linux-gnu
(32-bit)
The ggplot2 version on both systems is ggplot2_0.9.2.1. This process has worked fine for the last six months, the only problem has been today due to an update and I don't understand what is going wrong.
Both versions of R were upgraded to 2.14.2, as well as the ggplot package to ggplot2_0.9.2.1.
I presume the old version was 0.8.9 or below. There was a fundamental change with version 0.9.0:
FACETS
Converted from proto to S3 objects, and class methods (somewhat) documented in facet.r. This should make it easier to develop new
types of facetting specifications.
See http://cran.r-project.org/web/packages/ggplot2/NEWS for more information

Resources