writeRaster output file size - r

I have a function that reads a multi-band image in as a raster brick object, iterates through the bands doing various calculations, and then writes the raster out as a new .tif. All of this works fine, but the file size of the new image file is roughly four times greater (I assume because the original image has 4 bands). I'm wondering if there's a parameter in the writeRaster() function that I'm unaware of, or if there's some other way I can ensure that the output image is basically the same file size as the input.
Original file size is 134 MB; output ranges from 471 to 530 MB or so, depending on format.
Simplified code:
library(rgdal)
library(raster)
path = "/Volumes/ENVI Standard Files/"
img = "qb_tile.img"
imageCorrection = function(path, img){
raster = brick(paste0(path, img))
raster = reclassify(raster, cbind(0, NA))
for(i in 1:nlayers(raster)){
raster[[i]] = raster[[i]] - minValue(raster[[i]])
}
writeRaster(raster, paste0(path,img,"_process.tif"), format = "GTiff", overwrite=TRUE)
}

You can set the default datatype for writing rasters with the rasterOptions() as follows:
rasterOptions(datatype="INT2U")
Or directly in the writeRaster call:
writeRaster(yourRas, "path/to/raster/", dataType="INT2U", options="COMPRESS=LZW")
Also notice the options argument where you can specify compression.
Usually when I export integer rasters from R, I make sure that I really have integers and not floats, since this can result in an empty raster. Try the following before exporting:
ras <- as.integer(ras)
Please note:
Also check for negative values in your raster. Try INT2S if you have values below zero.

Related

How do I get mean intensity of TIFF files in a tibble?

I am using the following code to get TIFF files into R for analysis:
library(magick)
tiffiles<-list.files("C:/Users/folder_with_multiple_tifs/", pattern = "*.tif", full.names=TRUE)
importedtifs<-c()
for(file in tiffiles) {importedtifs<-append(importedtifs, image_read(file))}
importedtifs
This gives me a tibble with each row corresponding to a TIFF file. I can then use mean(as.integer(importedtifs[[1]])) to get the average pixel intensity of the first TIFF. It is a small positive number for the images I am working with.
I would like to have a single command that returns the mean pixel intensity of each individual TIFF in the tibble. When I try lapply(importedtifs, function(x) mean(as.integer(x))), I get a large negative number, which is not the pixel intensity.
Is there a way to do this? I don't understand exactly how the tibble is storing the data for each TIFF.
DaveArmstrong's solution works. The variation below delivers the means in a list that can be manipulated downstream:
means<-c()
for(i in 1:length(importedtifs)){
means<-c(means, mean(as.integer(importedtifs[[i]])))
}

Normalize RasterLayer as Matrix to use as Clip Frame

I was assigned the task to clip a raster from .nc file from a .tif file.
edit (from comment):
i want to extract temp. info from the .nc because i need to check the yearly mean temperature of a specific region. to be comparable the comparison has to occur on exactly the same area. The .nc file is larger than the previously checked area so i need to "clip" it to the extent of a .tif I have. The .tif data is in form 0|1 where it is 0 (or the .tif is smaller than the .nc) the .nc data should be "cliped". In the end i want to keep the .nc data but at the extent of the .tif while still retaining its resolution & projection. (.tif and .nc have different projections&pixel sizes)
Now ordinarily that wouldn't be a problem as i could use raster::crop. This doesn't deal with different projections and different pixel size/resolution though. (I still used it to generate an approximation, but it is not precise enough for the final infromation, as can be seen in the code snippet below). The obvious method to generate a more reliable dataset/rasterset would be to first use a method like raster::projectRaster or raster::sp.Transform # adding sp.transform was done in an edit to the original question and homogenize the datasets but this approach takes too much time, as i have to do this for quite a few .nc files.
I was told the best method would be to generate a normalized matrix from the smaller raster "clip_frame" and then just multiply it with the "nc_to_clip" raster. Doing so should prevent any errors through map projections or other factors. This makes a lot of sense to me in theory but I have no idea how to do this in practice. I would be very grateful to any kind of hint/code snippet or any other help.
I have looked at similar problems on StackOverflow (and other sites) like:
convert matrix to raster in R
Convert raster into matrix with R
https://www.researchgate.net/post/Hi_Is_there_a_way_to_multiply_Raster_value_by_Raster_Latitude
As I am not even sure how to frame the question correctly, I might have overlooked an answer to this problem, if so please point me there!
My (working) code so far, just to give you an idea of how I want to approach the topic (here using the crop-function).
#library(ncdf4)
library(raster)
library(rgdal)
library(tidyverse)
nc_list<-list.files(pattern = ".*0.nc$") # list of .nc files containing raster and temperature information
#nc_to_clip <- lapply(nc_list, raster, varname="GST") # read in as raster
nc_to_clip < -raster(ABC.nc, vername="GST)
clip_frame <- raster("XYZ.tif") # read in .tif for further use as frame
mean_temp_from_raster<-function(input_clip_raster, input_clip_frame){ # input_clip_raster= raster to clip, input_clip_frame
r2_coord<-rasterToPoints(input_clip_raster, spatial = TRUE) # step 1 to extract coordinates
map_clip <- crop(input_clip_raster, extent(input_clip_frame)) # use crop to cut the input_clip_raster (this being the function I have to extend on)
temp<-raster::extract(map_clip, r2_coord#coords) # step 2 to extract coordinates
temp_C<-temp*0.01-273.15 # convert kelvin*100 to celsius
temp_C<-na.omit(temp_C)
mean(temp_C)
return_list<-list(map_clip, mean(temp_C))
return(return_list)
}
mean_tempC<-lapply(nc_to_clip, mean_temp_from_raster,clip_frame)
Thanks!
PS:
I don't have much experience working with .nc files and/or RasterLayers in R as I used to work with ArcGIS/Python (arcpy) for problems like this, which is not an option right now.
Perhaps something like this?
library(raster)
nc <- raster(ABC.nc, vername="GST)
clip <- raster("XYZ.tif")
x <- as(extent(clip), "SpatialPolygons")
crs(x) <- crs(clip)
y <- sp::spTransform(x, crs(nc))
clipped <- crop(nc, y)

efficient use of raster functions in r

I have 500+ points in a SpatialPointsDataFrame object; I have a 1.7GB (200,000 rows x 200,000 cols) raster object. I want to have a tabulation of the values of the raster cells within a buffer around each of the 500+ points.
I have managed to achieve that with the code below (I got a lot of inspiration from here.). However, it is slow to run and I would like to make it run faster. It actually runs OK for buffers with "small" widths, say 5km ro even 15km (~1 million cells), but it becomes super slow when buffer increases to say 100km (~42 million cells).
I could easily improve on the loop below by using something from the apply family and/or a parallel loop. But my suspicion is that it is slow because the raster package writes 400Mb+ temporary files for each interaction of the loop.
# packages
library(rgeos)
library(raster)
library(rgdal)
myPoints = readOGR(points_path, 'myLayer')
myRaster = raster(raster_path)
myFunction = function(polygon_obj, raster_obj) {
# this function return a tabulation of the values of raster cells
# inside a polygon (buffer)
# crop to extent of polygon
clip1 = crop(raster_obj, extent(polygon_obj))
# crops to polygon edge & converts to raster
clip2 = rasterize(polygon_obj, clip1, mask = TRUE)
# much faster than extract
ext = getValues(clip2)
# tabulates the values of the raster in the polygon
tab = table(ext)
return(tab)
}
# loop over the points
ids = unique(myPoints$ID)
for (id in ids) {
# select point
myPoint = myPoints[myPoints$ID == id, ]
# create buffer
myPolygon = gBuffer(spgeom = myPoint, byid = FALSE, width = myWidth)
# extract the data I want (projections, etc are fine)
tab = myFunction(myPolygon, myRaster)
# do stuff with tab ...
}
My questions:
Am I right to partially blame the writing operations? If I managed to avoid all those writing operations, would this code run faster? I have access to a machine with 32GB of RAM -- so I guess it is safe to assume I could load the raster to the memory and need not to write temporary files?
What else could I do to improve efficiency in this code?
I think you should approach it like this
library(raster)
library(rgdal)
myPoints <- readOGR(points_path, 'myLayer')
myRaster <- raster(raster_path)
e <- extract(myRaster, myPoints, buffer=myWidth)
And then something like
etab <- sapply(e, table)
It is hard to answer your question #1 as we do not know enough about your data (we do not know how many cells are covered by a "100 km" buffer). But you can set options about when to write to file with the rasterOptions function. You notice that getValues is faster than extract, based on the post you link to, but I think that is wrong, or at least not very important. The combination of crop, rasterize and getValues should have a similar performance as extract (which does almost exactly that under the hood). If you go this route anyway, you should pass an empty RasterLayer, created by raster(myRaster) for faster cropping.

Using Juno/LT to run Julia Code, Error, `getindex` has no method matching getindex(::DataFrame, ::ASCIIString

Below is the first portion of the code I am using. The intention of this code is to, when given a file of images in .bmp format to correctly identity the letter shown in the image.
#Install required packages
Pkg.add("Images")
Pkg.add("DataFrames")
using Images
using DataFrames
#typeData could be either "train" or "test.
#labelsInfo should contain the IDs of each image to be read
#The images in the trainResized and testResized data files
#are 20x20 pixels, so imageSize is set to 400.
#path should be set to the location of the data files.
function read_data(typeData, labelsInfo, imageSize, path)
#Intialize x matrix
x = zeros(size(labelsInfo, 1), imageSize)
for (index, idImage) in enumerate(labelsInfoTrain["ID"])
#Read image file
nameFile = "$(path)/$(typeData)Resized/$(idImage).Bmp"
img = imread(nameFile)
#Convert img to float values
temp = float32sc(img)
#Convert color images to gray images
#by taking the average of the color scales.
if ndims(temp) == 3
temp = mean(temp.data, 1)
end
#Transform image matrix to a vector and store
#it in data matrix
x[index, :] = reshape(temp, 1, imageSize)
end
return x
end
imageSize = 400 # 20 x 20 pixels
#Set location of data files , folders
#Probably will need to set this path to which folder your files are in
path = "C:\\Users\\Aaron\\Downloads\\Math512Project"
#Read information about test data ( IDs )
labelsInfoTest = readtable("$(path)/sampleSubmissionJulia.csv")
#Read test matrixnformation about training data , IDs.
labelsInfoTrain = readtable("$(path)/trainLabels.csv")
#Read training matrix
xTrain = read_data("train", labelsInfoTrain, imageSize, path)
the error that I am facing is when the my program reaches the very last line of code above that reads:
xTrain = read_data("train", labelsInfoTrain, imageSize, path)
I receive an error saying: getindex has no method matching getindex(::DataFrame, ::ASCIIString in read_data at benchmarkJeff.jl:18
which refers to the line of code :
for (index, idImage) in enumerate(labelsInfoTrain["ID"])
Some research online has given me insight that the problem has to do with a conflict when using the DataFrames package and Image package. I was recommended to change the "ID" in my code to [:ID], but this does not solve the problem but rather causes another error. I was wondering if anyone new how to fix this problem or what exactly the problem is with my code. I get the same error when running the code in Julia command line 0.4.0. Look forward to hearing from you.

Calculating percentile across netcdf files fails

I have five netcdf files where each file contains data for a time section. I want to calculate the 98th percentile for the whole timespan for each cell individually.
The accumulated file size for the netcdf files is around 250 MB.
My approach it this:
library(raster)
fileType="\\.nc$"
filenameList <- list.files(path=getwd(), pattern=fileType, full.names=F, recursive=FALSE)
#rasterStack for all layers
rasterStack <- stack()
#stack all data
for(i in 1:length(filenameList)){
filename <- filenameList[i]
stack.temp<-stack(filename)
rasterStack<-stack(rasterStack, stack.temp)
}
#calculate raster containing the 98th percentiles
result <- calc(rasterStack, fun = function(x) {quantile(x,probs = .98,na.rm=TRUE)} )
However, I get this error:
Error in ncdf4::nc_close(x#file#con) :
no slot of name "con" for this object of class ".RasterFile"
The stacking section of my code works, the crash happens during the calc function.
Do you have any idea where this might come from? Is it maybe an issue of where the data is stored (memory/disk)?
Strange, I generated some dummy data and it seems to work just fine, it does not seems to be your method. 250MB is not overly huge. I would clip a small piece of each raster and test if it works.
dat<-matrix(rnorm(16), 4, 4)
r1<-raster(dat)
r2<-r1*2
r3<-r2+1
r4<-r3+4
rStack <- stack(r1,r2,r3,r4)
result <- calc(rStack, fun = function(x) {quantile(x,probs = .98)} )
Perhaps this is related to the odd way you create a RasterStack. You should simply do:
filenames <- list.files(pattern="\\.nc$")
rasterStack <- stack(filenames)

Resources