Related
I have been struggling with this for hours.
I have a shapefile (called "shp") containing 177 polygons i.e. 177 counties. This shapefile is overlaid on a raster. My raster (called "ras") is made of pixels having different pollution values.
Now I would like to extract all pixel values and their number of occurrences for each polygon.
This is exactly what the QGIS function "zonal histogram" is doing. But I would like to do the exact same thing in R.
I tried the extract() function and I managed to get a mean value per county, which is already a first step, but I would like to make a pixels distribution (histogram).
Could someone give me a hand ?
Many thanks,
Marie-Laure
Thanks a lot for your help. Next time I promise I will be careful and explain my issue more in details.
With your help I managed to find a solution.
I also used this website : http://zevross.com/blog/2015/03/30/map-and-analyze-raster-data-in-r/
For information, first I had to uninstall the "tidyr" package because there was a conflict with the extract function.
In case it can help someone, here is the final code :
# Libraries loading
library(raster)
library(rgdal)
library(sp)
# raster layer import
ras=raster("C:/*.tif")
# shapefile layer import
shp<-shapefile("C:/*.shp")
# Extract the values of the pixels raster per county
ext <- extract(ras, shp, method='simple')
# Function to tabulate pixel values by region & return a data frame
tabFunc <- function(indx, extracted, region, regname) {
dat <- as.data.frame(table(extracted[[indx]]))
dat$name <- region[[regname]][[indx]]
return(dat)
}
# run through each county & compute a table of the number
# of raster cells by pixel value. ("CODE" is the county code)
tabs <- lapply(seq(ext), tabFunc, ext, shp, "CODE")
# assemble into one data frame
df <- do.call(rbind, tabs)
# to see the data frame in R
print(df)
# table export
write.csv(df,"C:/*.csv", row.names = FALSE)
Here is a minimal, self-contained, reproducible example (almost literally from ?raster::extract, so not difficult to make)
library(raster)
r <- raster(ncol=36, nrow=18, vals=rep(1:9, 72))
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
polys <- spPolygons(cds1, cds2)
Now you can do
v <- extract(r, polys)
par(mfrow=c(1,2))
z <- lapply(v, hist)
Or more fancy
mains <- c("first", "second")
par(mfrow=c(1,2))
z <- lapply(1:length(v), function(i) hist(v[[i]], main=mains[i]))
Or do you want a barplot
z <- lapply(1:length(v), function(i) barplot(table(v[[i]]), main=mains[i]))
My problem is simple. I have found very good package called adehabitat in R. To use it I need to transform my data into specificaly structured object containing raster map data and coordinates of an animal. To see it please type:
# example data in adahabitat package
data(bauges)
bauges
str(bauges)
How do I convert my data (bellow) into such structure? I figured out how to convert $locs into SpatialPoints, but I don't know how to convert map (in my example are raster values categorical codes of individual types of habitat -i.e. not continuous variable).
# My example data:
library(raster)
library(adehabitatHS)
# map
habitat_type_temp <- matrix(c(1,1,1,1,1,1,1,1,2,2,
1,1,2,2,1,1,1,2,2,2,
1,2,2,2,3,3,3,2,2,2,
2,2,2,1,1,1,3,2,2,1,
2,2,1,1,1,1,3,2,1,1,
2,1,1,1,1,1,3,3,1,1,
2,1,1,1,1,3,3,3,3,1,
1,1,1,1,1,1,1,3,3,3), 10)
habitat_type <- t(habitat_type_temp)
# coordinates
animal_coords <- data.frame(x = c(2,4,5,5,6,9),
y = c(2,8,3,2,4,3))
# see the situation
plot(raster(habitat_type, xmn=1, xmx=10, ymn=1, ymx=8))
points(animal_coords$x, animal_coords$y)
# creating object which could be manipulated in adehabitat package
my.hab <- list()
my.hab$map <- SpatialPixelsDataFrame(...)
my.hab$locs <- SpatialPoints(animal_coords)
Is it even possible to insert such manually fabricated data into such specific type of object, or I need some original tiff with specific CRS?
You could just drop the location somewhere to produce the SpatialPixelsDataFrame, I think this is roughly Iowa:
x <- 93+rep(1:8,each=10)/100
y <- rep(seq(42.01,42.1,by=0.01), 8)
z <- c(1,1,1,1,1,1,1,1,2,2,
1,1,2,2,1,1,1,2,2,2,
1,2,2,2,3,3,3,2,2,2,
2,2,2,1,1,1,3,2,2,1,
2,2,1,1,1,1,3,2,1,1,
2,1,1,1,1,1,3,3,1,1,
2,1,1,1,1,3,3,3,3,1,
1,1,1,1,1,1,1,3,3,3)
xy.df <- data.frame(x,y)
xy.coords <- SpatialPixels(SpatialPoints(xy.df))
llCRS <- CRS("+proj=utm +zone=15 +ellps=WGS84")
xy.sp <- SpatialPoints(xy.coords, proj4string = llCRS)
xyz <- as.data.frame(cbind(x,y,z))
xyz.spdf <- SpatialPixelsDataFrame(xy.coords, xyz)
plot(xyz.spdf)
Your spatialpoints would have to be changed similarly.
Managed to solve problem now
I have a set of around 50 thousand points that have coordinates and one value associated with them. I would like to be able to place points into a grid averaging the associated value of all points that fall into a grid square. So I want to end up with an object that identifies each grid square and gives the average inside the grid square.
I have the data in a spatial points data frame and a spatial grid object if that helps.
Improving answer: I have definitely done some searching, sorry about the initial state of the question I had only managed to frame the question inside my own head; hadn't had to communicate it to anyone else before...
Here is example data that hopefully illustrates the problem more clearly
##make some data
longi <- runif(100,0,10)
lati <- runif(100,0,10)
value <- runif(500,20,30)
##put in data frame then change to spatial data frame
df <- data.frame("lon"=longi,"lat"=lati,"val"=value)
coordinates(df) <- c("lon","lat")
proj4string(df) <- CRS("+proj=longlat")
##create a grid that bounds the data
grd <- GridTopology(cellcentre.offset=bbox(df)[,1],
cellsize=c(1,1),cells.dim=c(11,11))
sg <- SpatialGrid(grd)
Then I hope to get an object albeit a vector/data frame/list that gives me the average of value in each grid cell/square and some way of identifying which cell it is.
Solution
##convert the grid into a polygon##
polys <- as.SpatialPolygons.GridTopology(grd)
proj4string(polys) <- CRS("+proj=longlat")
##can now use the function over to select the correct points and average them
results <- rep(0, length(polys))
for(i in 1:length(polys)) {
results[i] = mean(df$val[which(!is.na(over(x=df,y=polys[i])))])
}
My question now is if this is the best way to do it or is there a more efficient way?
Your description is vague at best. Please try to ask more specific answers preferably, with code illustrating what you have already tried. Averaging a single value in your point data or a single raster cell makes absolutely no sense.
The best guess at an answer I can provide is to use raster extract() to assign the raster values to a sp point object and then use tapply() to aggregate the values to your grouping values in the points. You can use the coordinates of the points to identify cell location or alternately, the cellnumbers returned from extract (per below example).
require(raster)
require(sp)
# Create example data
r <- raster(ncol=500, nrow=500)
r[] <- runif(ncell(r))
pts <- sampleRandom(r, 100, sp=TRUE)
# Add a grouping value to points
pts#data <- data.frame(ID=rownames(pts#data), group=c( rep(1,25),rep(2,25),
rep(3,25),rep(4,25)) )
# Extract raster values and add to #data slot dataframe. Note, the "cells"
# attribute indicates the cell index in the raster.
pts#data <- data.frame(pts#data, extract(r, pts, cellnumbers=TRUE))
head(pts#data)
# Use tapply to cal group means
tapply(pts#data$layer, pts#data$group, FUN=mean)
I have two 30m x 30m raster files which I would like to sample points from. Prior to sampling, I would like to remove the clouded areas from the images. I turned to R and Hijman's Raster package for the task.
Using the drawPoly(sp=TRUE) command, I drew in 18 different polygons. The function did not seem to allow 18 polygons as one sp object, so I drew them all separately. I then gave the polygons a proj4string matching the rasters', and set them into a list. I ran the list through a lapply function to convert them to rasters (rasterize function in Hijman's package) with the polygon areas set to NA, and the rest of the image set to 1.
My end goal is one raster layer with the 18 areas set to NA. I have tried stacking the list of rasterized polygons, and subsetting it to put set a new raster to NA in the same areas. My reproducible code is below.
library(raster)
r1 <- raster(nrow=50, ncol = 50)
r1[] <- 1
r1[4:10,] <- NA
r2 <- raster(nrow=50, ncol = 50)
r2[] <- 1
r2[9:15,] <- NA
r3 <- raster(nrow=50, ncol = 50)
r3[] <- 1
r3[24:39,] <- NA
r4 <- raster(nrow=50, ncol = 50)
r4[] <- 1
s <- stack(r1, r2, r3)
test.a.cool <- calc(s, function(x){r4[is.na(x)==1] <- NA})
For whatever reason, the darn testacool is a blank plot, where I'm aiming to have it as a raster with all values except for the NAs in the stack, s, equal to 1.
Any tips?
Thanks.
Doing sum(s) will work, as sum() returns NA for any grid cell with even one NA value in the stack.
To see that it works, compare the figures produced by the following:
plot(s)
plot(sum(s))
I posted this question on the R-Sig-Geo forum, as well, and received a response from the package author. The two simplest solutions:
Use the sp package to rbind my polygons into one, then rasterize the polygon.
p <- rbind(p1, p2, p3...etc., makeUniqueIDs = TRUE)
r4 <- raster(nrow=50, ncol = 50)
r4[] <- 1
mask <- rasterize(p, r4)
mask[mask %in% 1:18] <- 1
#The above code produces a single raster file with
#my polygons as unique values, ready for masking.
And the second simple solution, as just pointed out by Josh O'Brien:
m <- sum(s)
test <- mask(r4, m)
The R community rocks. Problem solved (twice) within an hour. Thanks.
I'm not familiar with the package you are using, however looking at the final line in your code, I think the issue might be here:
function(x){r4[is.na(x)==1] <- NA})
It doesn't look like calc will do much with that. It is setting the values of r4 indexed by the NAs of x and setting those to NA.
What then? If anything, maybe:
function(x){r4[is.na(x)==1] <- NA; return(r4) })
Although, it's not clear if that is even what you are after.
You were on the right track. The [ operator is defined for rasters and raster stacks, so you could just use the single line:
r4[ any(is.na(s) ) ] <- NA
plot(r4)
If you wanted to use calc you could have used it like this:
r4 <- calc( s, function(x){ ( ! any( is.na(x) ) ) } )
r4[is.na(r4)] <- NA
plot(r4)
I was wondering if there was a function that scales down matrices in R statistical software exactly like with image resizing. The function imresize() in MATLAB is exactly what I'm looking for (I believe it takes the average of the surrounding points, but I am not sure of this), but I am wondering if there is an R equivalent for this function.
This question has been posted before on this forum, but with reference to MATLAB, not R:
Matlab "Scale Down" a Vector with Averages
The post starting with "Any reason why you can't use the imresize() function?" is exactly what I am looking for, but in R, not MATLAB.
Say I have a latitude-longitude grid of temperatures around the world, and let's say this is represented by a 64*128 matrix of temperatures. Now let's say I would like to have the same data contained in a new matrix, but I would like to rescale my grid to make it a 71*114 matrix of temperatures around the world. A function that would allow me to do so is what I'm looking for (again, the imresize() function, but in R, not MATLAB)
Thank you.
Steve
One way to do this is by using the function resample(), from the raster package.
I'll first show how you could use it to rescale your grid, and then give an easier-to-inspect example of its application to smaller raster objects
Use resample() to resize matrices
library(raster)
m <- matrix(seq_len(68*128), nrow=68, ncol=128, byrow=TRUE)
## Convert matrix to a raster with geographical coordinates
r <- raster(m)
extent(r) <- extent(c(-180, 180, -90, 90))
## Create a raster with the desired dimensions, and resample into it
s <- raster(nrow=71, ncol=114)
s <- resample(r,s)
## Convert resampled raster back to a matrix
m2 <- as.matrix(s)
Visually confirm that resample() does what you'd like:
library(raster)
## Original data (4x4)
rr <- raster(ncol=4, nrow=4)
rr[] <- 1:16
## Resize to 5x5
ss <- raster(ncol=5, nrow=5)
ss <- resample(rr, ss)
## Resize to 3x3
tt <- raster(ncol=3, nrow=3)
tt <- resample(rr, tt)
## Plot for comparison
par(mfcol=c(2,2))
plot(rr, main="original data")
plot(ss, main="resampled to 5-by-5")
plot(tt, main="resampled to 3-by-3")
The answer posted by Josh O'Brien is OK and it helped me (for starting point), but this approach was too slow since I had huge list of data. The method below is good alternative. It uses fields and works much faster.
Functions
rescale <- function(x, newrange=range(x)){
xrange <- range(x)
mfac <- (newrange[2]-newrange[1])/(xrange[2]-xrange[1])
newrange[1]+(x-xrange[1])*mfac
}
ResizeMat <- function(mat, ndim=dim(mat)){
if(!require(fields)) stop("`fields` required.")
# input object
odim <- dim(mat)
obj <- list(x= 1:odim[1], y=1:odim[2], z= mat)
# output object
ans <- matrix(NA, nrow=ndim[1], ncol=ndim[2])
ndim <- dim(ans)
# rescaling
ncord <- as.matrix(expand.grid(seq_len(ndim[1]), seq_len(ndim[2])))
loc <- ncord
loc[,1] = rescale(ncord[,1], c(1,odim[1]))
loc[,2] = rescale(ncord[,2], c(1,odim[2]))
# interpolation
ans[ncord] <- interp.surface(obj, loc)
ans
}
Lets look how it works
## Original data (4x4)
rr <- matrix(1:16, ncol=4, nrow=4)
ss <- ResizeMat(rr, c(5,5))
tt <- ResizeMat(rr, c(3,3))
## Plot for comparison
par(mfcol=c(2,2), mar=c(1,1,2,1))
image(rr, main="original data", axes=FALSE)
image(ss, main="resampled to 5-by-5", axes=FALSE)
image(tt, main="resampled to 3-by-3", axes=FALSE)