Extend the limit of Memory allocated to R using Sever - r

I am running a species distribution model using R in a server. I am saving all my R environment, however when I try to visualize and plot the objects saved I get an error "Error in file(fn, "rb") : cannot open the connection In addition: Warning message: In file(fn, "rb") : cannot open file '/localscratch/anandam.9761522.0/RtmpnCnH0y/raster/r_tmp_2020-07-13_195526_260024_04625.gri': No such file or directory".
It seems that some important information was saved in a temporary directory and I cannot access it after the analysis is done. Is that right?
One possible solution seems to use the raster::readAll (Read all values from a raster file associated with a Raster* object into memory). However, when I use this, I am getting "Error: cannot allocate vector of size 12.5Gb". I tried to extend the limit of Memory allocated to R in the cluster, but it seems the memory does not change. When I do the same using my MacBook Pro it works using memory_limit{ulimit}. Is there another way to increase the memory used in R for linux OS? Or, is there another way to save all my R objects without using a temporary directory, so I can recover everything after the analysis is finished?
#LIST the tif files with selected predictors
predictors1 <- list.files(path="/home/.../predictors_test",pattern =".tif", full.names = TRUE)
#RASTER the objects from the list
predictors2 <- lapply(predictors1, raster)
#STACK predictors
Predictors <- stack(predictors2)
#Save all the information of Predictors raster object
Predictors <- readAll(Predictors)

You say you are "saving all my R environment" --- That is a nice shortcut, that generally works, although I think it is almost always a bad idea. Things are much clearer if instead you explicitly save the data you need to keep in files; and read them again as needed. In most cases you can use saveRDS/readRDS.
You cannot use saveRDS for Raster* objects that point to a file in the temp folder. All you would be saving is an object that points to a file that disappeared when your session ended (and that is why saving sessions does not work either).
The best approach is to avoid these temp files by using the filename= argument that most functions in the raster package provide; or at least for the last processing step. Alternatively, you can use writeRaster to save what you want to keep after you are done processing.

Related

Does R have an equivalent to python's io for saving file like objects to memory?

In python we can import io and then make make a file like object with some_variable=io.BytesIO() and then download any type of file to that and interact with it like it were a locally saved file except that it's in memory. Does R have something like that? To be clear I'm not asking about what any particular OS does when you save some R object to a temp file.
This is kind of a duplicate of Can I write to and access a file in memory in R? but that is about 9 years old so maybe the functionality exists now either in base or with a package.
Yes, readBin.
readBin("/path", raw(), file.info("/path")$size)
This is a working example:
tfile <- tempfile()
writeBin(serialize(iris, NULL), tfile)
x <- readBin(tfile, raw(), file.info(tfile)$size)
unserialize(x)
…and you get back your iris data.
This is just an example, but for R objects, it is way more convenient to use readRDS/saveRDS().
However, if the object is an image you want to analyse, readBin gives a raw memory representation.
For text files, you should then use:
rawToChar(x)
but again there are readLines(), read.table(), etc., for these tasks.

How to save raster data in R object format?

I don't know how to deal with save.image()and saveRDS()with raster data in R. I have understood that raster package open a connexion with the image file using raster() function, so it doesn't really open the file into R workspace.
I want to save my workspace (data.frame, list, raster, etc) with save.image() function (or similar) and open it in a different computer. If I try to plot or process a raster object saved in a different computer, always have the same issue:
Error in .local(.Object, ...) :
`C:\path\to\file.tif' does not exist in the file system,
and is not recognised as a supported dataset name.
Is there a way to save a raster object (opened as external file) in R format? I don't mean raster format as tiff nor grid and others.
At your own risk, you can use the readAll function to load the raster into memory before saving. e.g.
r <- raster(system.file("external/test.grd", package="raster"))
r <- readAll(r) # force data into memory
save(r, file = 'r.RData')
It can be loaded on a different machine as mentioned
load('r.Rdata`)
Beware, this will be problematic for very large rasters on memory limited systems
You can save rasters, like other R objects, using the save command.
save(r,file="r.Rdata")
On a different computer, you can load that file using
load("r.Rdata")
which will bring back the raster r in your workspace.
I have tried this across Windows and Linux and it never gives problems

Error while parsing a very large (10 GB) XML file in R, using the XML package

Context
I'm currently working on a project involving osm data (Open Street Map). In order to manipulate geographic objects, I have to convert the data (an osm xml file) into an object. The osmar package lets me do this, but it fails to parse the raw xml data.
The error
Error in paste(file, collapse = "\n") : result would exceed 2^31-1 bytes
The code
require(osmar)
osmar_obj <- get_osm("anything", source = osmsource_file("my filename"))
Inside the get_osm function, the code calls ret <- xmlParse(raw), which triggers the error after a few seconds.
The question
How am I supposed to read a large XML file (here 10GB), knowing that I have 64G of memory ?
Thanks a lot !
This is the solution I came up with, even though it is not 100% satisfying.
Transform the .osm file by removing every newline (but the last) in your shell
Run the exact same code as before, skipping the paste that is not needed anymore (since you just did the equivalent in shell)
Profit :)
Obviously, I'm not very happy with it because modifying the data file in shell is more a trick that an actual solution :(

use readOGR to load in a large spatial file in R

For my processes in R I want to read in a 20 gigabyte file. I got it in a XML file type.
In R I cannot load it in with readOGR since it is to big. It gives me the error cannot allocate vector 99.8 mb.
Since my file is to big the logical next step in my mind would be to split the file. But since I can not open it in R and any other GIS package at hand, I can not split the file before I load it in. I am already using the best PC to my availability.
Is there a solution?
UPDATE BECAUSE OF COMMENT
If I use head() my line looks like underneath. It does not work unfortunately.
headfive <- head(readOGR('file.xml', layer = 'layername'),5)

Running jobs in background in R

I am working with a 250 by 250 matrix. However, it takes loads and loads of time to compute this. It takes like an hour at least.
Is it possible that I can store this matrix in memory in R, such that everytime I open up R, it is already there.
Ideally, I would like to know if it is possible to run a job on background in R , so that I dont have to wait an hour to get the matrix out and be able to play around with it.
1) You can save the workspace of R when closing R. Usually R asks "Save workspace image?" when you are closing it. If you will answer "Yes" it will save the workspace in a file named ".Rdata" and will load it when staring a new R instance.
2) The better option (more safe) is to save the matrix explicitly. There are several options how it can be done. One of the options is to save it as Rdata file:
save(m, file = "matrix.Rdata")
where m is your matrix.
You can load the matrix at any time with
load("matrix.Rdata")
if you are on the same working directory.
3) There is not such option as background computing for R. But you can open several R instances. Do computation in one instance, and do something else on other instance.
What would help is to output it to a file when you have computed it and then parse that file everytime you open R. Write yourself a computeMatrix() function or script to produce a file with the matrix stored in a sensible format. Also write yourself a loadMatrix() function or script to read in that file and load the matrix into memory for use, then call or run loadMatrix everytime you start R and want to use the matrix.
In terms of running an R job in the background, you can run an R script from the command line with the syntax "R CMD BATCH scriptName" with scriptName replaced by the name of your script.
It might be better to use the ff package and save the matrix as an ff object. This means that the actual matrix will be saved on the disk in an efficient manner, then when you start a new R session you can point to that same file without loading the entire matrix into memory. When you need part of the matrix, only the part you need will be loaded so it will be much quicker. Even if you need the entire matrix loaded into memory it should load faster than reading a text file.

Resources