Spatial polygon dataset + subset - r

I'm working with a spatial polygon dataframe.
data can be downloaded here:
http://geoportal.statistics.gov.uk/datasets/lower-layer-super-output-areas-december-2011-super-generalised-clipped-boundaries-in-england-and-wales
This contains the lower layer output area (lsoa) for England and Wales.
I need to subset the dataframe in order to keep only the polygons for the london lsoa11cd.
I have a list of lsoa11cd for London.
These are between E01000001 and E01004765. I'm not sure how to proceed to subset the spatial polygons (see image attached). Find below an attempt which does not work.
london <- shapefile[substr(shapefile#data$lsoa11cd, -7 , -1) <= 1004765, ]
london <- london[substr(london#data$lsoa11cd, -7 , -1) >= 1000001, ]

If I'm interpretting your question correctly, this should work nicely:
Use the shapefile function from the raster package to read-in the shapefile:
library(raster)
# Read-in the data. This will create a SpatialPolygonsDataFrame with 34,753 features
s <- shapefile('Lower_Layer_Super_Output_Areas_December_2011_Super_Generalised_Clipped__Boundaries_in_England_and_Wales.shp')
It looks like all of the lsoa11cd values have a letter and a number as the first two characters in the string. Let's first subset the data to keep only those with 'E' as the first chatacter for their lsoa11cd value.
s <- s[grep("^[aE].*", s$lsoa11cd), ]
Now we can remove the first two characters from each lsoa11cd string and convert to a numeric variable for easier subsetting as follows:
s$lsoa11cd <- as.numeric(substring(s$lsoa11cd, 3))
Then you can simply subset within the range you've specified:
s <- s[s$lsoa11cd %in% 1000001:1004765, ]

Related

Assign column from an existing dataframe as an attribute column in spatial points vector

I'm trying to include a column from a existing dataframe in a spatial points vector as an attribute but i'm getting no success.
My data is something like this:
ID x y dsp section
136 592251.4 7775385 -0.0000000002806002 top
726 592319.1 7775182 -0.0000000002805585 top
130 592170.2 7775385 -0.0018586431397125 center
1074 592278.5 7775060 NA center
And I create the spatial points from this same data, using x and y info:
pontos <- vect(cbind(amostragem$x,amostragem$y))
crs(pontos) <- "epsg:32723"
pontos <- project(pontos,worldDEM)
It's totally functional but when I assign a column that is not a longitude/latitude (x,y) information (the "Section" column is my interest for classification) the vector loses its spatial points characteristics.
And this lead me to work outside R since I've got no time to lose and this was a small quantity of points:
I exported the vector as a shapefile and went to QGIS ("attribute table") manually adding the column in shp and filling the points (rows) with the information that I wanted.
Works perfectly, so I read the shp edited to R again and apply extract using a raster and the edited points. And now I have the attribute as a column. As this is not the smartest for a big volumn of points I want to make this works in R too. Any thoughts?
Thanks for your help.
You could use the sf package to create a spatial data frame. The call to st_as_sf does this and converts the x and y coordinates to a geometry column which is retained when functions in the dplyr package are called.
library(sf)
library(dplyr)
amostragem <- read.table(text="ID x y dsp section
136 592251.4 7775385 -0.0000000002806002 top
726 592319.1 7775182 -0.0000000002805585 top
130 592170.2 7775385 -0.0018586431397125 center
1074 592278.5 7775060 NA center", header=T)
amostragem_sf <- amostragem %>% st_as_sf(coords = c('x', 'y'), crs=st_crs('epsg:32723'))
names(amostragem_sf)
#[1] "ID" "dsp" "section" "geometry"
class(amostragem_sf)
#[1] "sf" "data.frame"
# add a new column
amostragem_sf <- amostragem_sf %>% mutate(new_column = paste(ID, section))
# new column added
names(amostragem_sf)
#[1] "ID" "dsp" "section" "geometry" "new_column"
# it's still a spatial data frame
class(amostragem_sf)
#[1] "sf" "data.frame"

How to calculate monthly NDVI raster with R

NDVI RasterStack
I have a 15day 1981-2015 NDVI RasterStack.
I need to calculate monthly NDVI with the 15d data.
I want to know how to calculate mean of the same names MM raster into a new monthly 1981-2015 rasterstack
I appreciate for your help! Thank you very much.
names XYYYY.MM.DD
I have recently been working on the same solution to you problem, this should work for you also.
You want to create a separate variable that contains the dates from your layer names.
#this removes the "X" character from the name leaving only the dates
layer_name <- sub('.', '', names(NDVI_stack))
#install.packages("lubridate")
library(lubridate)
layer_name <- ymd(layer_name)
#Create an indices to prepare it for stackApply, which takes the means for all the days of the month within each year.
indices <- format(as.Date(layer_name, format = "%Y.%m.%d"), format = "%Y.m")
NDVI_mean <- stackApply(NDVI_stack, indices, mean)

R - combine character and numeric columns in SpatialPolygonsDataFrame

I would like to find an efficient way to combine certain character + numeric column values in a list of SpatialPolygonsDataFrame objects. Here is reproducible data:
library(maptools) ## For wrld_simpl
library(sp)
## Example SpatialPolygonsDataFrames (SPDF)
data(wrld_simpl) #polygon of world countries
spdf1 <- wrld_simpl[1:25,] #country subset 1
spdf2 <- wrld_simpl[26:36,] #subset 2
spdf3 <- wrld_simpl[36:50,] #subset 3
#make list of SPDF objects
spdfl<-list()
spdfl[[1]]<-spdf1
spdfl[[2]]<-spdf2
spdfl[[3]]<-spdf3
#view data (attribute table) for one list element
spdfl[[1]]#data
What I would like to do is add another column that is a combination of the FIPS, REGION, and SUBREGION columns, separated by an underscore ('_'). I know how to add+name a new column to each SPDF object in the list as done in the loop below, but I don't know how to get the desired column row entry:
#add new 'unique.id' column to SPDF
for (i in 1:length(spdfl)){
spdfl[[i]]#data["unique.id"] = ""
}
The row entries for the new unique.id column would be in this format: FIPS_REGION_SUBREGION. For example, for the ATG polygon feature in spdfl[[1]], I would like the 'unique.id' column to have an entry like this:
unique.id
AC_19_29
Please advise on how to do this for all features in the SPDF list.
spdfl[[1]]#data$unique.id<-
paste(spdfl[[1]]#data$FIPS,spdfl[[1]]#data$REGION,spdfl[[1]]#data$SUBREGION,sep="_")
Edit: for your desired looping behavior:
for (i in 1:length(spdfl)){
spdfl[[i]]#data$unique.id<-
paste(spdfl[[i]]#data$FIPS,spdfl[[i]]#data$REGION,
spdfl[[i]]#data$SUBREGION,sep="_")
}

extracting cell numbers from multiple counties in R

I'm new to R so please excuse any terminology mistakes... I'm trying to extract the cell numbers for every county in the state of Oklahoma and paste them on top of each other so that I can use them to look at different temperatures throughout Oklahoma state. I have a shapefile of counties in the US, so I made a vector of all the county ID numbers for the state of OK. I then tried to extract the cell numbers and max temp values for every county in a loop. That extract line that I wrote works when I do it one county at a time, I think it's the okcounty=rbind line that's the problem but I don't know what the best way to do this is.
Thank you for your help! I really appreciate it.
`okcounties=which(counties$STATE_NAME=="Oklahoma") #contains 58 counties
county = NULL
for (i in 1:58){
countyvalues=extract(OK.tmax[[1]], extent(counties[okcounties[i],]), cellnumbers=T)
county=rbind(county, countyvalues) #add data from each of 58 counties
}`
I am finding your code a bit confusing and can see a few places it is going wrong. You are overthinking things a bit. I am not sure why you are extracting cellnumbers and not just taking advantage of extract and the stack object.
The "okcounties" object could be a sp class subset of the counties object, that you could pass directly to extract eg., okcounties <- counties[counties$STATE_NAME=="Oklahoma",] .
If you drop the call to extent, which is returning a bounding box for each county and not the county boundary, things get much simpler. To leverage the stack you could just let extract provide a data.frame of the raster values. Here is a worked example on synthetic data. I approximated your object naming convention for this example. The final object "ok.county" I believe would be the same as the "county" object that you are trying to create.
First, let's create some example data and plot
library(raster)
library(sp)
# create polygons
p <- raster(nrow=10, ncol=10)
p[] <- runif(ncell(p)) * 10
counties <- rasterToPolygons(p, fun=function(x){x > 9})
counties$county <- paste0("county",1:nrow(counties))
counties$STATE_NAME <- c(rep("CA",3),
rep("OK",nrow(counties)-3))
# Create raster stack
r <- raster(nrow=100, ncol=100)
r[] <- runif(ncell(r), 40,70)
r <- stack(r, r+5, r+10) # stack
names(r) <- c("June", "July", "Aug")
plot(r[[1]])
plot(p, add=TRUE, lwd=4)
We can use an index to subset to the state we are interested in.
ok <- counties[counties#data$STATE_NAME == "OK",]
Now we can use extract on the entire raster stack. The resulting object will be a list where each polygon has its own element in the list containing a data.frame. Each column of the data.frame represents a layer in the raster stack object.
ok.county <- extract(r, ok)
class(ok.county)
head(ok.county[[1]])
However, if you want to collapse the list into a single data.frame, unique polygon identifiers are missing. Here we are going to use the ID column in the SpatialPolygonsDataFrame object. Since the list is ordered the same as the polygon object you can assign unique values from the polygon object. In your case it would likely be the county names and the method would follow the same as the example.
cnames <- unique( counties#data$county )
for(i in 1:length(ok.county)) {
ok.county[[i]] <- data.frame(county = cnames[i], ok.county[[i]])
}
head(ok.county[[1]])
Now that we have a unique identifier assigned to each data.frame in the list we can collapse it using do.call.
ok.county <- as.data.frame(do.call("rbind", ok.county))
str(ok.county)
Using an apply function we can pull the maximum value for a given column (time-period) for each unique ID.
tapply(ok.county[,"June"], ok.county$county, max)
As to your original code, something like this would work (obviously, not tested) but there is no unique polygon ID tying results back to the county and it is still the bounding box of the county and not the polygon boundaries.
okcounties <- counties[counties$STATE_NAME=="Oklahoma",]
county = NULL
for (i in 1:nrow(okcounties)){
county <- rbind(county, extract(OK.tmax[[1]],
extent(okcounties[i,]), cellnumbers=T))
}

Extract data using a matching matrix pair of data in R

I have two data sets with latitude, longitude, and temperature data. One data set corresponds to a geographic region of interest with the corresponding lat/long pairs that form the boundary and contents of the region (Matrix Dimension = 4518x2)
The other data set contains lat/long and temperature data for a larger region that envelopes the region of interest (Matrix Dimenion = 10875x3).
My question is: How do you extract the appropriate row data (lat, long, temperature) from the 2nd data set that matches the first data set's lat/long data?
I've tried a variety of "for loops," "subset," and "unique" commands but I can't obtain the matching temperature data.
Thanks in advance!
10/31 Edit: I forgot to mention that I'm using "R" to process this data.
The lat/long data for the region of interest was provided as a list of 4,518 files containing the lat/long coordinates in the name of each file:
x<- dir()
lenx<- length(x)
g <- strsplit(x, "_")
coord1 <- matrix(NA,nrow=lenx, ncol=1)
coord2 <- matrix(NA,nrow=lenx, ncol=1)
for(i in 1:lenx) {
coord1[i,1] <- unlist(g)[2+3*(i-1)]
coord2[i,1] <- unlist(g)[3+3*(i-1)]
}
coord1<-as.numeric(coord1)
coord2<-as.numeric(coord2)
coord<- cbind(coord1, coord2)
The lat/long and temperature data was obtained from an NCDF file for with temperature data for 10,875 lat/long pairs:
long<- tempcd$var[["Temp"]]$size[1]
lat<- tempcd$var[["Temp"]]$size[2]
time<- tempcd$var[["Temp"]]$size[3]
proj<- tempcd$var[["Temp"]]$size[4]
temp<- matrix(NA, nrow=lat*long, ncol = time)
lat_c<- matrix(NA, nrow=lat*long, ncol=1)
long_c<- matrix(NA, nrow=lat*long, ncol =1)
counter<- 1
for(i in 1:lat){
for(j in 1:long){
temp[counter,]<-get.var.ncdf(precipcd, varid= "Prcp", count = c(1,1,time,1), start=c(j,i,1,1))
counter<- counter+1
}
}
temp_gcm <- cbind(lat_c, long_c, temp)`
So now the question is how do you remove values from "temp_gcm" that correspond to lat/long data pairs from "coord?"
Noe,
I can think of a number of ways you could do this. The simplest, albeit not the most efficient would be to make use of R's which() function, which takes a logical argument, while iterating over the data frame which you want to apply the matches to. Of course, this is assuming that there can be at most a single match in the larger data set. Based on your data sets, I would do something like this:
attach(temp_gcm) # adds the temp_gcm column names to the global namespace
attach(coord) # adds the coord column names to the global namespace
matched.temp = vector(length = nrow(coord)) # To store matching results
for (i in seq(coord)) {
matched.temp[i] = temp[which(lat_c == coord1[i] & long_c == coord2[i])]
}
# Now add the results column to the coord data frame (indexes match)
coord$temperature = matched.temp
The function which(lat_c == coord1[i] & long_c == coord2[i]) returns a vector of all rows in the dataframe temp_gcm which satisfy lat_c and long_c matching coord1 and coord2 respectively from row i in the iteration (NOTE: I'm assuming this vector will only have length 1, i.e. there is only 1 possible match). matched.temp[i] will then be assigned the value from the column temp in the dataframe temp_gcm which satisfied the logical condition. Note that the goal in doing this is that we create a vector which has matched values that correspond by index to the rows of the dataframe coord.
I hope this helps. Note that this is a rudimentary approach, and I would advise looking up the function merge() as well as apply() to do this in a more succinct manner.
I added an additional column full of zeros to use as the resultant for an IF statement. "x" is the number of rows in temp_gcm. "y" is the number of columns (representative of time steps). "temp_s" is the standardized temperature data
indicator<- matrix(0, nrow = x, ncol = 1)
precip_s<- cbind(precip_s, indicator)
temp_s<- cbind(temp_s, indicator)
for(aa in 1:x){
current_lat<-latitudes[aa,1] #Latitudes corresponding to larger area
current_long<- longitudes[aa,1] #Longitudes corresponding to larger area
for(ab in 1:lenx){ #Lenx coresponds to nrow(coord)
if(current_lat == coord[ab,1] & current_long == coord[ab,2]) {
precip_s[aa,(y/12+1)]<-1 #y/12+1 corresponds to "indicator column"
temp_s[aa,(y/12+1)]<-1
}
}
}
precip_s<- precip_s[precip_s[,(y/12+1)]>0,] #Removes rows with "0"s remaining in "indcator" column
temp_s<- temp_s[temp_s[,(y/12+1)]>0,]
precip_s<- precip_s[,-(y/12+1)] #Removes "indicator column
temp_s<- temp_s[,-(y/12+1)]

Resources