R Cannot allocate memory though memory seems to be available - r

After running several models I need to run a system() command on my R script to shutdown my EC2 instance, but when I get to that point I get:
cannot popen 'ls', probable reason 'Cannot allocate memory'
Note: for this question I even tried ls which did not work
The flow of my script is the following
Load Model (about 2GB)
Mine documents and write to a MySQL database
The above steps are repeated around 20 times with different models with an average size of 2GB each
Terminate the instance
At this point is when I need to call system("sudo shutdown -h now") and nothing happens, but when I try system("sudo shutdown -h now",intern=TRUE) I get the allocation error.
I tried rm() for all my objects just before calling the shutdown, but the same error persists.
Here is some data on my system which is a large EC2 Ubuntu instance
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] RTextTools_1.3.9 tau_0.0-15 glmnet_1.8 Matrix_1.0-6
[5] lattice_0.20-10 maxent_1.3.2 Rcpp_0.9.13 caTools_1.13
[9] bitops_1.0-4.1 ipred_0.8-13 prodlim_1.3.2 KernSmooth_2.23-8
[13] survival_2.36-14 mlbench_2.1-1 MASS_7.3-21 rpart_3.1-54
[17] e1071_1.6-1 class_7.3-4 tm_0.5-7.3 nnet_7.3-4
[21] tree_1.0-31 randomForest_4.6-6 SparseM_0.96 RMySQL_0.9-3
[25] ggplot2_0.9.1 DBI_0.2-5
loaded via a namespace (and not attached):
[1] colorspace_1.1-2 dichromat_1.2-4 digest_0.5.2 grid_2.15.1
[5] labeling_0.2 memoise_0.1 munsell_0.3 plyr_1.7.1
[9] proto_0.3-9.2 RColorBrewer_1.0-5 reshape2_1.2.1 scales_0.2.1
[13] slam_0.1-25 stringr_0.6.1
gc() returns
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1143171 61.1 5234604 279.6 5268036 281.4
Vcells 1055057 8.1 465891772 3554.5 767962930 5859.1
I noticed that if I run just 1 model instead of the 20 it works fine, so it might be that memory is not getting free after each run although I did rm() the used objects
I also noticed that if I close R and restart it and then call system() it works. If there is a way to restart R within R then maybe I can add that to my script.sh flow.
Which would be the appropriate way of cleaning all of my objects and letting the memory free for each loop so when I need to call the system() commands there is no memory issue?
Any tip in the right direction will be much appreciated!
Thanks

I'm just posting this because it's too long to fit in the comments. Since you haven't included any code, it's pretty hard to give advice. But, here is some code that maybe you can think about.
wd <- getwd()
assign('.First', function(x) {
require('plyr') #and whatever other packages you're using
file.remove(".RData") #already been loaded
rm(".Last", pos=.GlobalEnv) #otherwise won't be able to quit R without it restarting
setwd(wd)
}, pos=.GlobalEnv)
assign(".Last", function() {
system("R --no-site-file --no-init-file --quiet")
}, pos=.GlobalEnv)
save.image() #or only save the things you want to be reloaded.
q("no")
The idea is that you save the things you need in a file called .RData. You create a .Last function that will be run when you quit R. The .Last function will start a new session of R. And you create a .First function that will be run as soon as R is restarted. The .First function will load packages you need and clean up.
Now, you can quit R and it will restart loading the things you need.
(q("no") means don't save, but you already saved everything you need in .RData which will be loaded when it restarts)

Related

How can I change the temp folder where sqlite creates etilqs files on Ubuntu Linux R?

I'm running sqldf in R on Ubuntu to select certain IDs from a big table with gigabytes of data and the process is creating temporary etilqs files under /var/tmp according to inotifywait monitoring file changes. However, my /var/tmp is on a small disk and this occasionally causes R to error out. I found a thread on how to change the temp folder location for sqlite on Windows, but I could not figure out how to make it work under Linux.
library(sqldf)
customer_extr <- sqldf("select b.*, a.year, a.name from product as b left join customer as a on a.ID = b.ID", dbname = "/home/userName/customer.db")
It seems to me that sqlite searches directories for temporary file storage locations (NOT the tempfile() that I can choose where to create the file by selecting tmpdir=) in the following order:
The directory set by PRAGMA temp_store_directory or by the sqlite3_temp_directory global variable
The SQLITE_TMPDIR environment variable
The TMPDIR environment variable
/var/tmp
/usr/tmp
/tmp
The current working directory (".")
I tried a few options but none of them seemed to work:
set temp_store_directory:
con <- dbConnect(dbDriver("SQLite"), dbname = "/home/userName/customer.db")
dbGetQuery(con, "PRAGMA temp_store_directory = '/mnt/tmp'")
But this errors out:
Error in rsqlite_send_query(conn#ptr, statement) : basic_string::resize
Currently, temp_store_directory is not set after checking
Sys.getenv('temp_store_directory')
Before running R, I set the environmental variables to the desired temp folder: /mnt/tmp:
export SQLITE_TMPDIR=/mnt/tmp
export TMPDIR=/mnt/tmp
I verified this has been successfully set up by
echo $SQLITE_TMPDIR
echo $TMPDIR
under Linux,
Sys.getenv('SQLITE_TMPDIR')
Sys.getenv('TMPDIR')
in R.
However, my sqldf step still writes etilqs files to /var/tmp.
I tried to run
dbGetQuery(con, "PRAGMA temp_store = 2")
to instruct sqlite to save temporary files in memory. However, it's still writing etilqs files to /var/tmp.
I thought about creating a symbolic link for /var/tmp to point to /mnt/tmp but to do that I think I have to delete the folder /mnt/tmp first. This is not ideal since it's a shared Linux server and the disk for /mnt/tmp sometimes gets unmounted. I am not sure if this will cause any trouble for other applications and users.
I don't know how to check/change the sqlite3_temp_directory global variable in R.
This is my session info:
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sqldf_0.4-10 RSQLite_1.1 gsubfn_0.6-6 proto_1.0.0
loaded via a namespace (and not attached):
[1] DBI_0.5-1 memoise_1.0.0 Rcpp_0.12.8 digest_0.6.10 chron_2.3-47
I can try upgrading my OS disk to a larger drive but isn't there a way to tell sqlite in R under Linux to write temporary files somewhere else? Any suggestions would be highly appreciated!
You can get R to use a different temporary directory, it respects several settings of environment variables:
edd#max:~$ Rscript -e 'print(tempdir())' # default
[1] "/tmp/RtmpUdPCFL"
edd#max:~$ TMPDIR="." Rscript -e 'print(tempdir())' # overridden
[1] "./RtmpsJk2lP"
edd#max:~$
We will have to see with the sources of the RSQLite and/or sqldf packages to see if they use their own settings, or take it from R. If it is the latter, as I suspect for at least sqldf then you have a way.
But do remember to set TMPDIR (or alike) before you start R.

Rscript not finding library

I have a problem when running R scripts on a Unix cluster as a batch job. The issue is when trying to load libraries in the environment, R cannot find the library. I'll give you an example. I'll use a basic R script names sess.R:
print(.libPaths())
library("gtools")
print(sessionInfo())
If I just run this script from the command line using the command:
$ Rscript sess.R
I get the following output:
[1] "/usr/lib64/R/library" "/usr/share/R/library"
R version 3.2.3 (2015-12-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] gtools_3.5.0
The library "gtools" is correctly loaded, script is working. however if I write a simple batch job (I will include in the job a couple of optional parameters including an error and output file) like:
#!/bin/bash
#SBATCH --output=sess.out
#SBATCH --error=sess.err
Rscript sess.R
The job fails after a second. The two output files I get are of course sess.out and sess.err.
Sess.out contains the library directory:
[1] "/usr/lib64/R/library" "/usr/share/R/library"
which seem to be the same as when running Rscript from the command line, so no error there. However there is no printing of the Info Session since the script has been terminated. The sess.err file contains the following error:
Error in library("gtools") : there is no package called ‘gtools’
Execution halted
There, it seems like R cannot find gtools in this situation, even if the library path is the same...
Am I missing something? Is there an error I don't see somewhere? Or is it a problems of cluster settings?

R `dev.new()` freezes

this just started to occur: when I type the command dev.new(), the window stays frozen, and I can't Ctrl+c to stop it. I have to kill the R process in another terminal. I am running 64-bit CentOS 6.7 and R 3.2.1. Here is the output from sessionInfo():
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 LC_MONETARY=en_US.utf8
[6] LC_MESSAGES=en_US.utf8 LC_PAPER=en_US.utf8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] graphics grDevices utils datasets stats methods base
other attached packages:
[1] ggplot2_1.0.1 data.table_1.9.4 plyr_1.8.3 reshape2_1.4.1 vimcom_0.9-9 setwidth_1.0-4 colorout_1.1-0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 digest_0.6.8 MASS_7.3-44 chron_2.3-47 grid_3.2.1 gtable_0.1.2 magrittr_1.5 scales_0.3.0
[9] stringi_0.5-5 proto_0.3-10 tools_3.2.1 stringr_1.0.0 munsell_0.4.2 colorspace_1.2-6
Has anybody come across this issue? Perhaps of relevance, but I was able to use a GUI application that creates plot devices as a cairoDevice, with no issues.
EDIT: A bit more info - when running R --vanilla, the same behavior occurs. Same with calling plot directly (e.g. plot(rnorm(1e2))), and making a call to ggplot.
EDIT 2: in case this wasn't confusing enough, I am able to plot without issue on my home system (where sessionInfo gives the same output, aside from some packages loaded via a namespace). I believe the same CentOS packages are installed, as well.
EDIT 3: to add a bit more info, in addition to the (RGtk2) GUI that I mentioned still works, I can call Cairo from the command line directly and plotting works without issue that way. So it seems to be specific to base plotting.
I had the same problem on SL6.7. This is not a R problem but rather the xorg-x11-server-Xorg update crashed it.
Just downgrade the package and restart your X-session and you can plot again.
~$ yum downgrade http://ftp.scientificlinux.org/linux/scientific/6.6/x86_64/updates/security/xorg-x11-server-Xorg-1.15.0-26.sl6.x86_64.rpm
To make it permanent disable the upgrade of the package in yum.conf
~$ echo "exclude=xorg-x11-server-Xorg" >> /etc/yum.conf
Actually, the issue with R was not really a bug in xorg-x11-server.
Its update (Fix backing store's Always mode) revealed a bug in the X11
module of R. More details can be found in R's bugzilla :
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16497
(See comment 5 for details)
A couple of patches to fix the issue have been proposed.

Trouble with packrat corrupting R functioning

I installed the package packrat at some point, used it perhaps once and moved on with my life.
However, despite not having loaded it in months, it remains a nuisance to my regular R usage.
Seemingly at random, my R session within RStudio will fail with errors at certain operations, especially package installation. Here's the most recent error message (after running parallel::makeCluster(parallel::detectCores()):
Error in file(filename, "r", encoding = encoding) : cannot open the
connection
Calls: source -> file
In addition: Warning message:
In
file(filename, "r", encoding = encoding) : cannot open file
'packrat/init.R': No such file or directory
Execution halted
I checked all of the folders on .libPaths() and I don't even have packrat installed anymore. Why on earth is R still trying to carry out packrat operations? And how can I stop this?
My duct-tape solution so far is to simply close and reopen RStudio, which works like a charm for package installation issues.
However, I cannot seem to get around this for makeCluster(detectCores()) within just one .R script I've got. It works perfectly fine in another script for another project.
Background:
sessionInfo()
# R version 3.2.2 (2015-08-14)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 14.04.2 LTS
# locale:
# [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
# [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
# [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8
# [10] LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
# attached base packages:
# [1] parallel stats graphics grDevices utils datasets methods base
# other attached packages:
# [1] doParallel_1.0.8 iterators_1.0.7 foreach_1.4.2 geosphere_1.4-3 xlsx_0.5.7 xlsxjars_0.6.1
# [7] rJava_0.9-6 xtable_1.7-4 sandwich_2.3-3 texreg_1.35 maptools_0.8-36 sp_1.1-1
# [13] ggmap_2.5.2 ggplot2_1.0.1 data.table_1.9.5
# loaded via a namespace (and not attached):
# [1] Rcpp_0.11.6 plyr_1.8.3 tools_3.2.2 digest_0.6.8 gtable_0.1.2
# [6] lattice_0.20-33 png_0.1-7 mapproj_1.2-4 proto_0.3-10 stringr_1.0.0
# [11] RgoogleMaps_1.2.0.7 maps_2.3-11 grid_3.2.2 jpeg_0.1-8 foreign_0.8-66
# [16] RJSONIO_1.3-0 reshape2_1.4.1 magrittr_1.5 codetools_0.2-11 scales_0.2.5
# [21] MASS_7.3-43 colorspace_1.2-6 stringi_0.5-9003 munsell_0.4.2 chron_2.3-47
# [26] rjson_0.2.15 zoo_1.7-12
Update 1:
Installing packrat had no effect. Running packrat::init() resulted in an error before finishing; nothing changed.
Update 2:
I've isolated the problem by identifying that it's the working directory that's causing the issues. What in the working directory I'm using might be causing the problems? Some residual file from having run packrat previously in this directory?
Through further trial and error given the prods of #BondedDust I finally appear to have solved the issue. Having previously tried to use packrat in the particular working directory in which I was working appears to have left some vestiges despite later uninstalling packrat.
In particular, packrat edits your local .Rprofile (original credit due to #zerweck and #snaut), which is source()d on R startup in that directory.
If you use the .Rprofile to store some local configurations, you should edit the file and remove the packrat lines (or any you don't recognize); otherwise, you should just delete that file to restore your project to working as expected.
Check your HOME directory for an unintentional .Rprofile.
Packrat may have put this here if you tried to packrat::init() in HOME.
install.package() with packrat looks for .Rprofile when run. The behavior I've observed has it prioritizing the HOME .Rprofile over the getwd() one, causing the
cannot open file 'packrat/init.R': No such file or directory

KnitR: issue with cache and figures

I had this piece of code that was working several months ago but not anymore now. I changed the directory but the data is the same.
My issue is that the cache is never saved. I tried different methods: either dep_prev(), dep_auto() + autodep or dependson and different debugging approaches but nothing seems to work. The computation is still performed at each run and there is nothing in the cache folder.
Another issue that appeared is that my figures don´t appear anymore into my pdf file. They pop out when the R code is executed but they are not saved (the problem is probably linked to the cache problem).
In my code I setup everything in the first chunk then I don´t add any options in the following chunks. Here is the code of my first chunk:
<<setupKnitr, include=FALSE, cache=FALSE>>=
library(knitr)
## set global chunk options
opts_chunk$set(fig.path='figures_knitR/', cache.path='cache/', fig.align='center', fig.show='asis', par=TRUE, tidy = TRUE, cache = TRUE, autodep=T, fig.keep='all', dev='tikz', echo = TRUE, eval = TRUE)
opts_knit$set(output.dir = getwd())
dep_auto()
options(formatR.arrow=TRUE, width=68, digits=4)
options(tikzDefaultEngine = 'pdftex')
#
Also I found something strange when looking at the dep_auto function: it uses the functions valid_path and parse_objects that are not known in my R session which might be the cause of the problem. And I cannot find the package from which they originates.
here is my sessionInfo
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] jerdev_0.1 knitr_1.10.5
loaded via a namespace (and not attached):
[1] tools_3.2.0 tcltk_3.2.0
Thank you very much for your help!

Resources