Managed to solve problem now
I have a set of around 50 thousand points that have coordinates and one value associated with them. I would like to be able to place points into a grid averaging the associated value of all points that fall into a grid square. So I want to end up with an object that identifies each grid square and gives the average inside the grid square.
I have the data in a spatial points data frame and a spatial grid object if that helps.
Improving answer: I have definitely done some searching, sorry about the initial state of the question I had only managed to frame the question inside my own head; hadn't had to communicate it to anyone else before...
Here is example data that hopefully illustrates the problem more clearly
##make some data
longi <- runif(100,0,10)
lati <- runif(100,0,10)
value <- runif(500,20,30)
##put in data frame then change to spatial data frame
df <- data.frame("lon"=longi,"lat"=lati,"val"=value)
coordinates(df) <- c("lon","lat")
proj4string(df) <- CRS("+proj=longlat")
##create a grid that bounds the data
grd <- GridTopology(cellcentre.offset=bbox(df)[,1],
cellsize=c(1,1),cells.dim=c(11,11))
sg <- SpatialGrid(grd)
Then I hope to get an object albeit a vector/data frame/list that gives me the average of value in each grid cell/square and some way of identifying which cell it is.
Solution
##convert the grid into a polygon##
polys <- as.SpatialPolygons.GridTopology(grd)
proj4string(polys) <- CRS("+proj=longlat")
##can now use the function over to select the correct points and average them
results <- rep(0, length(polys))
for(i in 1:length(polys)) {
results[i] = mean(df$val[which(!is.na(over(x=df,y=polys[i])))])
}
My question now is if this is the best way to do it or is there a more efficient way?
Your description is vague at best. Please try to ask more specific answers preferably, with code illustrating what you have already tried. Averaging a single value in your point data or a single raster cell makes absolutely no sense.
The best guess at an answer I can provide is to use raster extract() to assign the raster values to a sp point object and then use tapply() to aggregate the values to your grouping values in the points. You can use the coordinates of the points to identify cell location or alternately, the cellnumbers returned from extract (per below example).
require(raster)
require(sp)
# Create example data
r <- raster(ncol=500, nrow=500)
r[] <- runif(ncell(r))
pts <- sampleRandom(r, 100, sp=TRUE)
# Add a grouping value to points
pts#data <- data.frame(ID=rownames(pts#data), group=c( rep(1,25),rep(2,25),
rep(3,25),rep(4,25)) )
# Extract raster values and add to #data slot dataframe. Note, the "cells"
# attribute indicates the cell index in the raster.
pts#data <- data.frame(pts#data, extract(r, pts, cellnumbers=TRUE))
head(pts#data)
# Use tapply to cal group means
tapply(pts#data$layer, pts#data$group, FUN=mean)
Related
I want to create a weight matrix based on distance. My code for the moment looks as follows and functions for a smaller sample of the data. However, with the large dataset (569424 individuals in 24077 locations) it doesn't go through. The problem arise at the nb2blocknb fuction. So my question would be: How can I optimize my code for large datasets?
# load all survey data
DHS <- read.csv("Daten/final.csv")
attach(DHS)
# define coordinates matrix
coormat <- cbind(DHS$location, DHS$lon_s, DHS$lat_s)
coorm <- cbind(DHS$lon_s, DHS$lat_s)
colnames(coormat) <- c("location", "lon_s", "lat_s")
coo <- cbind(unique(coormat))
c <- as.data.frame(coo)
coor <- cbind(c$lon_s, c$lat_s)
# get a list with beneighbored locations thath are inbetween 50 km distance
neighbor <- dnearneigh(coor, d1 = 0, d2 = 50, row.names=c$location, longlat=TRUE, bound=c("GE", "LE"))
# get neighborhood list on individual level
nb <- nb2blocknb(neighbor, as.character(DHS$location)))
# weight matrix in list format
nbweights.lw <- nb2listw(nb, style="B", zero.policy=TRUE)
Thanks a lot for your help!
you're trying to make 1.3 e10 distance calculations. The results would be in the GB.
I think you'd want to limit either the maximum distance or the number of nearest neighbors you're looking for. Try nn2 from the RANN package:
library('RANN')
nearest_neighbours_w_distance<-nn2(coordinatesA, coordinatesB,10)
note that this operation is not symmetric (Switching coordinatesA and coordinatesB gives different results).
Also you would first have to convert your gps coordinates to a coordinate reference system in which you can calculate euclidean distances, for example UTM (code not tested):
library("sp")
gps2utm<-function(gps_coordinates_matrix,utmzone){
spdf<-SpatialPointsDataFrame(gps_coordinates_matrix[,1],gps_coordinates_matrix[,2])
proj4string(spdf) <- CRS("+proj=longlat +datum=WGS84")
return(spTransform(spdf, CRS(paste0("+proj=utm +zone=",utmzone," ellps=WGS84"))))
}
I am trying to do universal cokriging in R with the Gstat package. I have a script that i was helped with, but now i'm stuck and can't ask assistance from the original source.
The problem is that i can't change the output resolution of the cokriged data. I would like to import the interpolated map to ArcMap and point-to-raster leaves me with a very low resolution.
My script is as follows:
library(raster)
library(gstat)
library(sp)
library(rgdal)
library(FitAR)
Loading my dataset, that containes coordinates and sampled values:
kova<-read.table("katvus_point_modif3.txt",sep=" ",header=T)
coordinates(kova)=~POINT_X+POINT_Y
Loading depth values at the same coordinates as the previous, this is my covariate:
Sygavus<-read.table("sygavus_point_cokrig.txt",sep=" ",header=T)
coordinates(Sygavus)=~POINT_X+POINT_Y
overlay <- over(kova,Sygavus)
kova$Sygavus <- overlay$Sygavus
This is supposed to set the boundary for my interpolation, the file is an exported shapefile from ArcMap:
border <- shapefile("area_2014.shp")
projection(kova)=projection(border)
This is supposed to create a grid for cokriging and the res= should let me specify what resolution i want the output to be, but no matter what number i use the output does not change.
grid <- spsample(border,type="regular",res=25)
I remove overlaping points:
zero <- zerodist(kova)
kova <- kova[-zero[,2],]
I load in the depth covariate raster-file. This is a depth raster export from ArcMap to ascii form:
depth <- raster("htp_depth_covar.asc")
projection(depth)=projection(border)
overlay <- extract(depth,kova)
kova$depth <- overlay
I remove na! values from the overlain depth values (These values should be the same as the previously loaded depth covariate table at the respective coordinates, but if i leave that part out, the script stops functioning)
kova <- kova[!is.na(kova$depth),]
kova.gstat <- gstat(id="Kova",formula=kova~depth,data=kova)
kova.gstat <- gstat(kova.gstat,id="Sygavus",formula=Sygavus~depth,data=kova)
var.kova <- variogram(kova.gstat)
plot(var.kova)
kova.gstat <- gstat(kova.gstat,id=c("Kova","Sygavus"),model=vgm(psill=cov(kova$kova,kova$Sygavus),model="Mat",range=12000,nugget=0))
kova.gstat <- fit.lmc(var.kova,kova.gstat,model=vgm(psill=cov(kova$kova,kova$Sygavus),model="Mat",range=12000,nugget=0))
plot(var.kova,kova.gstat$model)
overlay <- extract(depth,grid)
grid <- as.data.frame(grid)
grid$depth <- overlay
coordinates(grid)=~x1+x2
projection(grid)=projection(border)
krige <- predict.gstat(kova.gstat,grid)
spplot(krige,c("Kova.pred"))
write.table(krige, "kova.raster1.ck.csv", sep=";", dec=",", row.names=F)
Any help in understanding the gstat cokriging and the script overall would be greatly appreciated!
Because you don't provide a reproducible example I can only guess, but I think that spsample ignores the res=25 argument. Try n=1000 instead and then increase that value to get higher resolution.
I've been running into all sorts of issues using ArcGIS ZonalStats and thought R could be a great way. Saying that I'm fairly new to R, but got a coding background.
The situation is that I have several rasters and a polygon shape file with many features of different sizes (though all features are bigger than a raster cell and the polygon features are aligned to the raster).
I've figured out how to get the mean value for each polygon feature using the raster library with extract:
#load packages required
require(rgdal)
require(sp)
require(raster)
require(maptools)
# ---Set the working directory-------
datdir <- "/test_data/"
#Read in a ESRI grid of water depth
ras <- readGDAL("test_data/raster/pl_sm_rp1000/w001001.adf")
#convert it to a format recognizable by the raster package
ras <- raster(ras)
#read in polygon shape file
proxNA <- readShapePoly("test_data/proxy/PL_proxy_WD_NA_test")
#plot raster and shp
plot(ras)
plot(proxNA)
#calc mean depth per polygon feature
#unweighted - only assigns grid to district if centroid is in that district
proxNA#data$RP1000 <- extract(ras, proxNA, fun = mean, na.rm = TRUE, weights = FALSE)
#check results
head(proxNA)
#plot depth values
spplot(proxNA[,'RP1000'])
The issue I have is that I also need an area based ratio between the area of the polygon and all non NA cells in the same polygon. I know what the cell size of the raster is and I can get the area for each polygon, but the missing link is the count of all non-NA cells in each feature. I managed to get the cell number of all the cells in the polygon proxNA#data$Cnumb1000 <- cellFromPolygon(ras, proxNA)and I'm sure there is a way to get the actual value of the raster cell, which then requires a loop to get the number of all non-NA cells combined with a count, etc.
BUT, I'm sure there is a much better and quicker way to do that! If any of you has an idea or can point me in the right direction, I would be very grateful!
I do not have access to your files, but based on what you described, this should work:
library(raster)
mask_layer=shapefile(paste0(shapedir,"AOI.shp"))
original_raster=raster(paste0(template_raster_dir,"temp_raster_DecDeg250.tif"))
nonNA_raster=!is.na(original_raster)
masked_img=mask(nonNA_raster,mask_layer) #based on centroid location of cells
nonNA_count=cellStats(masked_img, sum)
I have a Spatial Point DF spo (covering an irregular shaped area of interest). The data are not on a regular grid due to crs transformation.
My goal is a raster with predefined resolution and extent of the area of interest ( more spatial point data are to be mapped on this master raster) .
Problems start when I
rasterize(spo, raster(ncol, nrow, extent, crs), spo$param)
I need to adjust nrowand ncol in a way so that I wont get moire patterns of NAs within my area of interest. I can't use a predefined (higher) resolution, since rasterize has no interpolation capabilities.
As a solution to this, I thought I might need some kind of Spatial Pixel DF spi, that covers my whole area of interest (just like meuse.grid in library(raster); data(meuse.grid)), and serves as a master grid. Then, I can use it to interpolate my data, e.g.
idw(param~1,spo,spi)
and by this, get full cover of my area of interest at my chosen resolution.
But how can a SpatialPixelsDataFrame be produced from the point data?
So in my view, the question boils down to: How to produce meuse.grid from meuse dataset?
Maybe I'm taking the wrong approach here, so please let me know if more easily can achieved what I'm after, using a different way.
If you have a polygon that defines the boundary of your region of interest, (which you should), then it is straight forward. One approach is to use the polygrid function from geoR, which itself is just a wrapper for SpatialPoints, expand.grid and overlay
Lets assume that you have a polygon that defines your region of interest called called ROI
In this case I will create one from meuse.grid
data(meuse.grid)
coordinates(meuse.grid) = ~x+y
x <- chull(meuse.grid#coords)
borders <- meuse.grid#coords[c(x,x[1]),]
ROI <- SpatialPolygons(list(Polygons(list(Polygon(borders)), ID = 'border')))
In reality, to use polygrid you only need the coordinates of the polygon that define your region of interest.
To create 10-m grid covering the area of this ROI you can create a call to polygrid
# get the bounding box for ROI an convert to a list
bboxROI <- apply(bbox(ROI), 1, as.list)
# create a sequence from min(x) to max(x) in each dimension
seqs <- lapply(bboxROI, function(x) seq(x$min, x$max, by= 10))
# rename to xgrid and ygrid
names(seqs) <- c('xgrid','ygrid')
thegrid <- do.call(polygrid,c(seqs, borders = list(ROI#polygons[[1]]#Polygons[[1]]#coords)))
I am fairly new to R, but not to ArcView. I am plotting some two-mode data, and want to convert the plot to a shapefile. Specifically, I would like to convert the vertices and the edges, if possible, so that I can get the same plot to display in ArcView, along with the attributes.
I've installed the package "shapefiles", and I see the convert.to.shapefile command, but the help doesn't talk about how to assign XY coords to the vertices.
Thank you,
Tim
Ok, I'm making a couple of assumptions here, but I read the question as you're looking to assign spatial coordinates to a bipartite graph and export both the vertices and edges as point shapefiles and polylines for use in ArcGIS.
This solution is a little kludgey, but will make shapefiles with coordinate limits xmin, ymin and xmax, ymax of -0.5 and +0.5. It will be up to you to decide on the graph layout algorithm (e.g. Kamada-Kawai), and project the shapefiles in the desired coordinate system once the shapefiles are in ArcGIS as per #gsk3's suggestion. Additional attributes for the vertices and edges can be added where the points.data and edge.data data frames are created.
library(igraph)
library(shapefiles)
# Create dummy incidence matrix
inc <- matrix(sample(0:1, 15, repl=TRUE), 3, 5)
colnames(inc) <- c(1:5) # Person ID
rownames(inc) <- letters[1:3] # Event
# Create bipartite graph
g.bipartite <- graph.incidence(inc, mode="in", add.names=TRUE)
# Plot figure to get xy coordinates for vertices
tk <- tkplot(g.bipartite, canvas.width=500, canvas.height=500)
tkcoords <- tkplot.getcoords(1, norm=TRUE) # Get coordinates of nodes centered on 0 with +/-0.5 for max and min values
# Create point shapefile for nodes
n.points <- nrow(tkcoords)
points.attr <- data.frame(Id=1:n.points, X=tkcoords[,1], Y=tkcoords[,2])
points.data <- data.frame(Id=points.attr$Id, Name=paste("Vertex", 1:n.points, sep=""))
points.shp <- convert.to.shapefile(points.attr, points.data, "Id", 1)
write.shapefile(points.shp, "~/Desktop/points", arcgis=TRUE)
# Create polylines for edges in this example from incidence matrix
n.edges <- sum(inc) # number of edges based on incidence matrix
Id <- rep(1:n.edges,each=2) # Generate Id number for edges.
From.nodes <- g.bipartite[[4]]+1 # Get position of "From" vertices in incidence matrix
To.nodes <- g.bipartite[[3]]-max(From.nodes)+1 # Get position of "To" vertices in incidence matrix
# Generate index where position alternates between "From.node" to "To.node"
node.index <- matrix(t(matrix(c(From.nodes, To.nodes), ncol=2)))
edge.attr <- data.frame(Id, X=tkcoords[node.index, 1], Y=tkcoords[node.index, 2])
edge.data <- data.frame(Id=1:n.edges, Name=paste("Edge", 1:n.edges, sep=""))
edge.shp <- convert.to.shapefile(edge.attr, edge.data, "Id", 3)
write.shapefile(edge.shp, "~/Desktop/edges", arcgis=TRUE)
Hope this helps.
I'm going to take a stab at this based on a wild guess as to what your data looks like.
Basically you'll want to coerce the data into a data.frame with two columns containing the x and y coordinates (or lat/long, or whatever).
library(sp)
data(meuse.grid)
class(meuse.grid)
coordinates(meuse.grid) <- ~x+y
class(meuse.grid)
Once you have it as a SpatialPointsDataFrame, sp provides some decent functionality, including exporting shapefiles:
writePointsShape(meuse.grid,"/home/myfiles/wherever/myshape.shp")
Relevant help files examples are drawn from:
coordinates
SpatialPointsDataFrame
readShapePoints
At least a few years ago when I last used sp, it was great about projection and very bad about writing projection information to the shapefile. So it's best to leave the coordinates untransformed and manually tell Arc what projection it is. Or use writeOGR rather than writePointsShape.