I am doing logistic regression on a data set with dimensions 190000 X 53.(91Mb) mixed of categorical & numeric Data.
but i am facing serious issues of memory as R gets hang every time I run the logistic regression code.
I have tried to reduce the sample by taking only 10% of the data but R stuck again.
Error: cannot allocate vector of size 12.9 Gb
I have tried to increase the memory using memory.limit() and memory.size() but not getting any results
> memory.size()
[1] 152.98
> memory.limit()
[1] 1.759219e+13
I am having 8Gb physical RAM but If i increase the memory using memory limit to 16GB then I get this error
> memory.limit(size=16000)
[1] 1.759219e+13
Warning message:
In memory.limit(size = 16000) : cannot decrease memory limit: ignored
can any one let me know how to solve the problem. I am using R 4.0.2 with windows 10(64Bit)
Related
I have a computer with 8GB ram and R giving me the error below?
cannot allocate a vector of size 23.7 GB.
I have tried #memory.limit(size=) but didn't work.
Is there any way to work it out? Thanks
I am trying to run a binary logistic regression in R on a very large set of data. I keep running into memory problems. I have tried many different packages to try to circumvent this issue, but am still stuck. I thought packages such as caret and biglm would help. But they gave me the same memory error. Why is it that when I start with a dataset with 300,000 rows and 300 columns and proceed to subset it to 50,000 rows and 120 columns, it still requires the same amount of memory? It makes no sense. I have no way of replicating the data since it is sensitive information, but most of the variables are factors. Below are some examples I have tried
model = bigglm(f, data = reg, na.action = na.pass, family = binomial(link=logit), chunksize = 5000)
But I get:
Error: cannot allocate vector of size 128.7 Gb
MyControl <- trainControl(method = "repeatedCV", index = MyFolds, summaryFunction = twoClassSummary, classProbs = TRUE)
fit = train(f, data = reg, family = binomial, trControl = MyControl)
The error message "Error: cannot allocate vector of size 128.7 Gb" doesn't meant that R cannot allocate a total memory of 128.7 Gb.
Quoting Patrick Burns:
"It is because R has already allocated a lot of memory successfully.
The error message is about how much memory R was going after at the
point where it failed".
So it is your interpretation of the error that is wrong. Even though the size of the problems might be very different, they are probably both just too big for your computer, and the amount of memory displayed in the error message is unrelated to the size of your problem.
Trying to convert a data.frame with numeric, nominal, and NA values to a dissimilarity matrix using the daisy function from the cluster package in R. My purpose involves creating a dissimilarity matrix before applying k-means clustering for customer segmentation. The data.frame has 133,153 rows and 36 columns. Here's my machine.
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform x86_64-w64-mingw32/x64 (64-bit)
How can I fix the daisy warning?
Since the Windows computer has 3 Gb RAM, I increased the virtual memory to 100GB hoping that would be enough to create the matrix - it didn't work. I still got a couple errors about the memory. I've looked into other R packages for solving the memory problem, but they don't work. I cannot use the bigmemory with the biganalytics package because it only accepts numeric matrices. The clara and ff packages also accept only numeric matrices.
CRAN's cluster package suggests the gower similarity coefficient as a distance measure before applying k-means. The gower coefficient takes numeric, nominal, and NA values.
Store1 <- read.csv("/Users/scdavis6/Documents/Work/Client1.csv", head=FALSE)
df <- as.data.frame(Store1)
save(df, file="df.Rda")
library(cluster)
daisy1 <- daisy(df, metric = "gower", type = list(ordratio = c(1:35)))
#Error in daisy(df, metric = "gower", type = list(ordratio = c(1:35))) :
#long vectors (argument 11) are not supported in .C
**EDIT: I have RStudio lined to Amazon Web Service's (AWS) r3.8xlarge with 244Gbs of memory and 32 vCPUs. I tried creating the daisy matrix on my computer, but did not have enough RAM. **
**EDIT 2: I used the clara function for clustering the dataset. **
#50 samples
clara2 <- clara(df, 3, metric = "euclidean", stand = FALSE, samples = 50,
rngR = FALSE, pamLike = TRUE)
Use algorithms that do not require O(n^2) memory, if you have a lot of data. Swapping to disk will kill performance, this is not a sensible option.
Instead, try either to reduce your data set size, or use index acceleration to avoid the O(n^2) memory cost. (And it's not only O(n^2) memory, but also O(n^2) distance computations, which will take a long time!)
I am running a zero inflated regression model function for a 498,501 rows dataframe on a 32gb linux machine with a 2.6GHz CPU. I used the following command
library(pscl)
zeroinfl(response ~ predictor1 + predictor2, data=dataframe, dist = "negbin", EM = TRUE)
R has now been computing for more that 48 hours now with no warning or error message.
I am a bit puzzled since a traditional lm() on the same data delivers almost instantaneously. Should I suspect some problem with my command? Or is it just a very slow function?
I'm relatively new to R and am currently trying to run a SIMPROF analysis (clustsig package) on a small dataset of 1000 observations and 24 variables. After ~30 iterations I receive the following error:
Error: cannot allocate vector of size 1.3 Mb.
In addition: There were 39 warnings (use warnings() to see them)
All the additional warnings relate to R reaching a total allocation of 8183Mb
The method I'm using to run the analysis is below.
Data <- read.csv(file.choose(), header=T, colClasses="numeric")
Matrix <- function(Data) vegan::vegdist(Data, method="gower")
SimprofOutput <- simprof(Data, num.expected=1000, num.simulated=999, method.cluster="average", method.distance=Matrix, alpha = 0.10, silent=FALSE, increment=100)
I'm wondering if anybody else has had trouble running the SIMPROF analysis or any ideas how to stop R running out of RAM. I'm running 64 bit Win7 Enterprise and using R 2.15.1 on a machine with 8gb RAM