Issue using saveRDS() with raster objects - r

I'm trying to use saveRDS() to save a large number of lists each containing a raster layer and a list with metadata. It worked fine when the raster layer was extracted from a ncdf file, but when the original file is an ascii file, saveRDS() only writes a pointer to the original file instead of writing the values to the end file.
Here's a condensed version of what's going on:
require(raster)
mf <- raster('myfile.asc')
meta <- list(mylonglistofmetadata)
res <- list(mf, meta)
saveRDS(res, 'myresult.Rdata')
myresult.Rdata is now simply a 33KB pointer to myfile.asc, when I really would like it to store the values so it will still work after I erase myfile.asc (so it should be about 15MB)
In contrast, for other files in ncdf format:
require(ncdf4)
require(raster)
ff <- 'myfile2.nc'
nc <- nc_open(ff)
meta <- list(mylonglistofmetadata)
res <- list(nc, meta)
saveRDS(res, 'myresult2.Rdata')
Here, myresult2.Rdata is storing everything just like I want it to, so my guess is that the issue arises with the raster package?
Anyone has any idea on how to fix this? I would prefer not to use writeRaster(), since I'm trying to keep the metadata together with the data, and use the same format as in my batch extracted from ncdf files to ease later processing.

The short answer is that you can do:
mf <- raster('myfile.asc')
mf <- readAll(mf)
mf
Now, the values are in memory and will be saved to the .RData file
Also note that:
You can save metadata with the data via writeRaster (see ?raster::metadata
you can access ncdf files (with geographic data) via raster('myfile2.nc')
your example for the ncdf file is not informative, as you do not actually use nc for anything. If you replaced mf with nc it would not work either after you removed 'myfile2.nc'

Related

Working with Temp raster from s3 bucket acquired using s3read_using function

I'm pulling a .tif from an s3 bucket using the aws.s3 R package
test_tif <- s3read_using(FUN = raster, object = "test_tif.tif", bucket = "bucketname")
This is placing the raster in my Global Environment: test_tif
When i go to perform any sort of raster based operations, i get a repeated error
Error in .local(.Object, ...) :
no further error codes or warnings
Looking at the structure of the raster, there is a nothing different compared with the same .tif read in from a local directory.
Only difference is one is saved as a temp file.
any ideas on how to work around this.
using s3read_using is a must, as this will eventually be incorporated into a shiny app.
Thanks.
What is see is that s3read_using downloads the file (with save_object, applies the function with the file as argument, and then deletes the file. That works if the function reads the data into memory. But the raster method only reads the metadata from the file; reading the actual values later, as needed.
So if I do
r <- s3read_using(FUN = raster, object = "test.tif", bucket = "bucketname")
f <- filename(r)
#"C:\\temp\\RtmpcbsI2z\\file9b846977650.tif"
file.exists(f)
#[1] FALSE
The file is gone, and you cannot do anything with RasterLayer r.
A work around could be to read all the values immediately. If that is not possible you could also multiply the values with 1. This would have a similar effect, unless files are very large, in which case it would create a (more) permanent temp file.
rr <- s3read_using(FUN = function(f) readAll(raster(f)), object = "test.tif", bucket = "bucketname")
# or
rr <- s3read_using(FUN = function(f) raster(f) * 1, object = "test.tif", bucket = "bucketname")
But in that case you might as well use the save_object function --- which is what you want to avoid.
Perhaps you can instead use Cloud Optimized GeoTiff and access them like this "vsicurl/https://mybucket/test.tif". You should be able to restrict access to your domain only. Also, the terra package might give you better performance than raster.

How can I perform the same set of commands separately for each file in the same file path?

I am trying to deal with extracting a subset from multiple .grb2 files in the same file path, and write them in a csv. I am able to do it for one (or a few) by using the following set of commands:
GRIB <- brick("tmp2m.1989102800.time.grb2")
GRIB <- as.array(GRIB)
readGDAL("tmp2m.1989102800.time.grb2")
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,"tmp1.csv")
The above set of commands extract, in csv, temperature values for specific latitude "51" and longitude "27", as well as for a specific time range "c(261:1232)".
Now I have hundreds of these files (with different file names, of course) in the same directory and I want to do the same for all. As you know, better than me, I cannot do this to one by one, changing the file name each time.
I have struggled a lot with this, but so far I did not manage to do it. Since I am new in R, and my knowledge is limited, I would very much appreciate any possible help with this.
The simplest way would be to use a normal for loop:
path <- "your file path here"
input.file.names <- dir(path, pattern =".grb2")
output.file.names <- paste0(tools::file_path_sans_ext(file.names),".csv")
for(i in 1:length(file.names)){
GRIB <- brick(input.file.names[i])
GRIB <- as.array(GRIB)
readGDAL(input.file.names[i]) # edited line
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,output.file.names[i])
}
You could of course create the body of the for loop into a function and then use the standard lapply or the map function from purrr.
Note that this code will print out different CSV files. If you want to append the data to a single file then you should check out write.table

Package a large data set

Column-wise storage in the inst/extdata directory of a package, as suggested by Jan, is now implemented in the dfunbind package.
I'm using the data-raw idiom to make entire analyses from the raw data to the results reproducible. For this, datasets are first wrapped in R packages which can then be loaded with library().
One of the datasets I'm using is largish, around 8 million observations with about 80 attributes. For my current analysis I only need a small fraction of the attributes, but I'd like to package the entire dataset anyway.
Now, if it is simply packaged as a data frame (e.g., with devtools::use_data()), it will be loaded in its entirety when first accessing it. What would be the best approach to package this kind of data so that I can lazy-load at the column level? (Only those columns which I'm actually accessing are loaded, the others happily stay on disk and don't occupy RAM.) Would the ff package help? Can anyone point me to a working example?
I think, I would store the data in inst/extdata. Then create a couple of functions in your package that can read and return parts of that data. In your functions you can get the path to your data using: system.file("extdata", "yourfile", package = "yourpackage"). (As on the page you linked to).
The question then is in what format you store your data and how do you obtain selections from it without reading the data in memory. For that, there are a large number of options. To name some:
sqlite: Store your data in a sqlite database. You can then perform queries on this data using the rsqlite package.
ff: store your data in ff objects (e.g. save using the save.ffdf function from ffbase; use load.ffdf to load again). ff doesn't handle character fields well (they are always converted to factors). And in theory the files are not cross platform although as long as you stay on intel platforms you should be ok.
CSV: store your data in a plain old csv file. You can then make selections from this file using the LaF package. The performance will probably be less than with ff but might be good enough.
RDS: store each of your columns in a seperate RDS file (using saveRDS) and load them using readRDS the advantage is that you do not depend on any R-packages. This is fast. The disadvantage is that you cannot do row selections (but that does not seem to be the case).
If you only want to select columns, I would go with RDS.
A rough example using RDS
The following code creates an example package containing the iris data set:
load_data <- function(dataset, columns) {
result <- vector("list", length(columns));
for (i in seq_along(columns)) {
col <- columns[i]
fn <- system.file("extdata", dataset, paste0(col, ".RDS"), package = "lazydata")
result[[i]] <- readRDS(fn)
}
names(result) <- columns
as.data.frame(result)
}
store_data <- function(package, name, data) {
dir <- file.path(package, "inst", "exdata", name)
dir.create(dir, recursive = TRUE)
for (col in names(data)) {
saveRDS(data[[col]], file.path(dir, paste0(col, ".RDS")))
}
}
packagename <- "lazyload"
package.skeleton(packagename, "load_data")
store_data(packagename, "iris", iris)
After building and installing the package (you'll need to fix the documentation, e.g. delete it) you can do:
library(lazyload)
data <- load_data("iris", "Sepal.Width")
To load the Sepal.Width column of the iris data set.
Of course this is a very simple implementation of load_data: no error handling, it assumes all column exist, it does not know which columns exist, it does not know which data sets exist.

raster images stacked recursively

I have the following problem, please.
I need to read recursively raster images, stack and store them in a file with different names (e.g. name1.tiff, name2.tiff, ...)
I tried the following:
for (i in 10) {
fn <- system.file ("external / test.grd", package = "raster")
fn <-stack (fn) # not sure if this idea can work.
fnSTACK[,, i] <-fn
}
here expect a result of the form:
dim (fnSTACK)
[1] 115 80 10
or something like that
but it didn't work.
Actually, I have around 300 images that I have to be store with different names.
The purpose is to extract time series information (if you know another method or suggestions I would appreciate it)
Any suggestions are welcomed. Thank you in advance for your time.
What I would first do is put all your *.tiff in a single folder. Then read all their names into a list. Stack them and then write a multi-layered raster. I'm assuming all the images have the same extent and projection.
### Load necessary packages
library(tiff)
library(raster)
library(sp)
library(rgdal) #I cant recall what packages you might need so this is probably
library(grid) # overkill
library(car)
############ function extracts the last n characters from a string
############ without counting the last m
subs <- function(x, n=1,m=0){
substr(x, nchar(x)-n-m+1, nchar(x)-m)
}
setwd("your working directory path") # you set your wd to were all your images are
filez <- list.files() # creates a list with all the files in the wd
no <- length(filez) # amount of files found
imagestack <- stack() # you initialize your raster stack
for (i in 1:no){
if (subs(filez[i],4)=="tiff"){
image <- raster(filez[i]) # fill up raster stack with only the tiffs
imagestack <- addLayer(imagestack,image)
}
}
writeRaster(imagestack,filename="output path",options="INTERLEAVE=BAND",overwrite=TRUE)
# write stack
I did not try this, but it should work.
Your question is rather vague and it would have helped if you had provided a full example script such that it could be more easily understood. You say you need to read several (probably not recursively?) raster images (files, presumably) and create a stack. Then you need to store them in files with different names. That sounds like copying the files to new files with a different names, and there are R functions for that, but that is probably not what you intended to ask.
if you have a bunch of files (with full path names or in the working directory), e.g. from list.files()
f <- system.file ("external/test.grd", package = "raster")
ff <- rep(f, 10)
you can do
library(raster)
s <- stack(ff)
I am assuming that you simply need this stack for operations in R (it is an object, but not a file). You can extract the values in many ways (see the help files and vignette of the raster package). If you want a three dimensional array, you can do
a <- as.array(s)
dim(a)
[1] 115 80 10
thanks "JEquihua" for your suggestion, just need to add the initial variable before addLayer ie:
for (i in 1:no){
if (subs(filez[i],4)=="tiff"){
image <- raster(filez[i]) # fill up raster stack with only the tiffs
imagestack <- addLayer(imagestack,image)
}
}
And sorry "RobertH", I'm newbie about R. I will be ask, more sure or exact by next time.
Also, any suggestions for extracting data from time series of MODIS images stacked. Or examples of libraries: "rts ()", "ndvits ()" or "bfast ()"
Greetings to the entire community.
Another method for stacking
library(raster)
list<-list.files("/PATH/of/DATA/",pattern="NDVI",
recursive=T,full.names=T)
data_stack<-stack(list)

Which is the best method to apply a script repetitively to n .csv files in R?

My situation:
I have a number of csv files all with the same suffix pre .csv, but the first two characters of the file name are different (ie AA01.csv, AB01.csv, AC01.csv etc)
I have an R script which I would like to run on each file. This file essentially extracts the data from the .csv and assigns them to vectors / converts them into timeseries objects. (For example, AA01 xts timeseries object, AB01 xts object)
What I would like to achieve:
Embed the script within a larger loop (or as appropriate) to sequentially run over each file and apply the script
Remove the intermediate objects created (see code snippet below)
Leave me with the final xts objects created from each raw data file (ie AA01 to AC01 etc as Values / Vectors etc)
What would be the right way to embed this script in R? Sorry, but I am a programming noob!
My script code below...heading of each column in each CSV is DATE, TIME, VALUE
# Pull in Data from the FileSystem and attach it
AA01raw<-read.csv("AA01.csv")
attach(AA01raw)
#format the data for timeseries work
cdt<-as.character(Date)
ctm<-as.character(Time)
tfrm<-timeDate(paste(cdt,ctm),format ="%Y/%m/%d %H:%M:%S")
val<-as.matrix(Value)
aa01tsobj<-timeSeries(val,tfrm)
#convert the timeSeries object to an xts Object
aa01xtsobj<-as.xts(tsobj)
#remove all the intermediate objects to leave the final xts object
rm(cdt)
rm(ctm)
rm(aa01tsobj)
rm(tfrm)
gc()
and then repeat on each .csv file til all xts objects are extracted.
ie, what we would end up within R, ready for further applications are:
aa01xtsobj, ab01xtsobj, ac01xtsobj....etc
any help on how to do this would be very much appreciated.
Be sure to use Rs dir command to produce the list of filenames instead of manually entering them in.
filenames = dir(pattern="*01.csv")
for( i in 1:length(filenames) )
{
...
I find a for loop and lists is well enough for stuff like this. Once you have a working set of code it's easy enough to move from a loop into a function which can be sapplyied or similar, but that kind of vectorization is idiosyncratic anyway and probably not useful outside of private one-liners.
You probably want to avoid assigning to multiple objects with different names in the workspace (this a FAQ which usually comes up as "how do I assign() . . .").
Please beware my untested code.
A vector of file names, and a list with a named element for each file.
files <- c("AA01.csv", "AA02.csv")
lst <- vector("list", length(files))
names(lst) <- files
Loop over each file.
library(timeSeries)
for (i in 1:length(files)) {
## read strings as character
tmp <- read.csv(files[i], stringsAsFactors = FALSE)
## convert to 'timeDate'
tmp$tfrm <- timeDate(paste(tmp$cdt, tmp$ctm),format ="%Y/%m/%d %H:%M:%S"))
## create timeSeries object
obj <- timeSeries(as.matrix(tmp$Value), tmp$tfrm)
## store object in the list, by name
lst[[files[i]]] <- as.xts(obj)
}
## clean up
rm(tmp, files, obj)
Now all the read objects are in lst, but you'll want to test that the file is available, that it was read correctly, and you may want to modify the names to be more sensible than just the file name.
Print out the first object by name index from the list:
lst[[files[1]]]

Resources