I have a sf object as in the example here:
library(sf)
fname <- system.file("shape/nc.shp", package="sf")
nc <- st_read(fname)
plot(nc[1])
Created on 2021-04-15 by the reprex package (v2.0.0)
I want to subset my data in such a way that I could get the approximate four different sf objects for four different quadrants.
For the data which I am working now, subset method like this nc[1:50, ] doesn't make sense since the rows are randomly ordered. And doing so will reduce the number of features but not the extent. I even tried group_by(geom), didn't work for me.
Can you help me here with this part using nc data as example?
I suggest you assign your objects to quadrants via sf::st_join().
It has a very helpful argument largest which ensures that the small polygons are not multiplied (but assigned to the quadrant to which falls the largest area of the small polygon). So NC keeps all 100 counties / and no duplicites are created.
To create the quadrants object consider applying sf::st_make_grid() to the bounding box of your spatial object, specifying that you want two by two split.
For a full workflow consider the following code:
library(sf)
fname <- system.file("shape/nc.shp", package="sf")
nc <- st_read(fname)
plot(nc[1])
# create quadrants
quads <- st_bbox(nc) %>%
st_make_grid(n = 2) %>%
st_as_sf(crs = st_crs(nc)) %>%
dplyr::mutate(quad_id = 1:4)
# a visual check
plot(st_geometry(nc))
plot(st_geometry(quads), add = T)
# intersect NC by the quadrants
nc_intersected <- st_join(nc,
quads,
largest = T) # do *not* multiply polygons!
# a visual check
plot(nc_intersected["quad_id"])
Related
I'm trying to include a column from a existing dataframe in a spatial points vector as an attribute but i'm getting no success.
My data is something like this:
ID x y dsp section
136 592251.4 7775385 -0.0000000002806002 top
726 592319.1 7775182 -0.0000000002805585 top
130 592170.2 7775385 -0.0018586431397125 center
1074 592278.5 7775060 NA center
And I create the spatial points from this same data, using x and y info:
pontos <- vect(cbind(amostragem$x,amostragem$y))
crs(pontos) <- "epsg:32723"
pontos <- project(pontos,worldDEM)
It's totally functional but when I assign a column that is not a longitude/latitude (x,y) information (the "Section" column is my interest for classification) the vector loses its spatial points characteristics.
And this lead me to work outside R since I've got no time to lose and this was a small quantity of points:
I exported the vector as a shapefile and went to QGIS ("attribute table") manually adding the column in shp and filling the points (rows) with the information that I wanted.
Works perfectly, so I read the shp edited to R again and apply extract using a raster and the edited points. And now I have the attribute as a column. As this is not the smartest for a big volumn of points I want to make this works in R too. Any thoughts?
Thanks for your help.
You could use the sf package to create a spatial data frame. The call to st_as_sf does this and converts the x and y coordinates to a geometry column which is retained when functions in the dplyr package are called.
library(sf)
library(dplyr)
amostragem <- read.table(text="ID x y dsp section
136 592251.4 7775385 -0.0000000002806002 top
726 592319.1 7775182 -0.0000000002805585 top
130 592170.2 7775385 -0.0018586431397125 center
1074 592278.5 7775060 NA center", header=T)
amostragem_sf <- amostragem %>% st_as_sf(coords = c('x', 'y'), crs=st_crs('epsg:32723'))
names(amostragem_sf)
#[1] "ID" "dsp" "section" "geometry"
class(amostragem_sf)
#[1] "sf" "data.frame"
# add a new column
amostragem_sf <- amostragem_sf %>% mutate(new_column = paste(ID, section))
# new column added
names(amostragem_sf)
#[1] "ID" "dsp" "section" "geometry" "new_column"
# it's still a spatial data frame
class(amostragem_sf)
#[1] "sf" "data.frame"
I am trying to perform a Social Network Analysis of Congressional Roll Call data. The data I have comes as a csv, from voteview.com, and has the following format:
Format of the csv
There are a high number of unique bills (represented by roll number) that I need to loop through to see how often politicians (represented by icpsr) agree in their vote (represented by cast_code).
However, I am really unsure of how I would loop through this data frame, check if two politicians vote the same on a unique bill, and then add that to a new data frame which would have three columns [politician 1|politican 2|weight (how many times they voted the same on unique bills)].
I have produced the following code when there was just a single bill being considered, which was able to get me a network map:
#1. creating a dataframe with all the yayers and one with all the nayers
yay_list <- S117 %>% filter(cast_code == '1')
nay_list <- S117 %>% filter(cast_code == '6')
#2. a list of the icpsr numbers who agree for yay and nay
y_list <- list(yay_list$icpsr)
n_list <- list(nay_list$icpsr)
#3. trying to use this list to make an igraph graph - BUT it does not recognise it
# I am not sure where to go next
make_ring(yay_list)
a1 <- as_adj_list(y_list)
#4. Alternative method - using only columns for icpsr & cast_code
# this will make an edge/adjency style data frame
foo <- S117[, c("icpsr", "cast_code")]
library(plyr)
# define a function returning the edges for a single group
group.edges <- function(x) {
edges.matrix <- t(combn(x, 2))
colnames(edges.matrix) <- c("Sen_A", "Sen_B")
edges.df <- as.data.frame(edges.matrix)
return(edges.df)
}
# apply the function above to each group and bind altogether
all.edges <- do.call(rbind, lapply(unstack(foo), group.edges))
# add weights if needed
#all.edges$weight <- 1
#all.edges <- aggregate(weight ~ Sen_A + Sen_B, all.edges, sum)
all.edges
#convert to a dataframe for igraph
df <- data.frame(all.edges)
df
# use igraph function on new datafame and plot
g <- graph_from_data_frame(df)
print(g, e=TRUE, v=TRUE)
plot(g)
# a plot is produced, which is good, but I do not know how to do this for
# a situation where there are multiple bills - it seems very complicated
Does anyone have any advice on how I would create a similar style edge list data frame, ideally with weights (as there are many bills in the data frame not just 1)?
The weight should show how many times politicians vote the same way (either yay or nay) on unique bills.
Thanks!
I'm working with a spatial polygon dataframe.
data can be downloaded here:
http://geoportal.statistics.gov.uk/datasets/lower-layer-super-output-areas-december-2011-super-generalised-clipped-boundaries-in-england-and-wales
This contains the lower layer output area (lsoa) for England and Wales.
I need to subset the dataframe in order to keep only the polygons for the london lsoa11cd.
I have a list of lsoa11cd for London.
These are between E01000001 and E01004765. I'm not sure how to proceed to subset the spatial polygons (see image attached). Find below an attempt which does not work.
london <- shapefile[substr(shapefile#data$lsoa11cd, -7 , -1) <= 1004765, ]
london <- london[substr(london#data$lsoa11cd, -7 , -1) >= 1000001, ]
If I'm interpretting your question correctly, this should work nicely:
Use the shapefile function from the raster package to read-in the shapefile:
library(raster)
# Read-in the data. This will create a SpatialPolygonsDataFrame with 34,753 features
s <- shapefile('Lower_Layer_Super_Output_Areas_December_2011_Super_Generalised_Clipped__Boundaries_in_England_and_Wales.shp')
It looks like all of the lsoa11cd values have a letter and a number as the first two characters in the string. Let's first subset the data to keep only those with 'E' as the first chatacter for their lsoa11cd value.
s <- s[grep("^[aE].*", s$lsoa11cd), ]
Now we can remove the first two characters from each lsoa11cd string and convert to a numeric variable for easier subsetting as follows:
s$lsoa11cd <- as.numeric(substring(s$lsoa11cd, 3))
Then you can simply subset within the range you've specified:
s <- s[s$lsoa11cd %in% 1000001:1004765, ]
I am trying to find the nearest distance of locations in dataset 1 to dataset 2. Both data sets are different sizes. Ive looked into using the Haversine function but I'm unsure what I need to do after.
Since you have not provided a sample of your data, I am going to use the oregon.tract data set from the UScensus2000tract library as a reproducible example.
Here is a solution based on fast data.table that I get from this other answer here.
# load libraries
library(data.table)
library(geosphere)
library(UScensus2000tract)
library(rgeos)
Now let's create a new data.table with all possible pair combinations of origins (census centroids) and destinations (facilities)
# get all combinations of origin and destination pairs
# Note that I'm considering here that the distance from A -> B is equal
from B -> A.
odmatrix <- CJ(Datatwo$Code_A , Dataone$Code_B)
names(odmatrix) <- c('Code_A', 'Code_B') # update names of columns
# add coordinates of Datatwo centroids (origin)
odmatrix[Datatwo, c('lat_orig', 'long_orig') := list(i.Latitude,
i.Longitude), on= "Code_A" ]
# add coordinates of facilities (destination)
odmatrix[Dataone, c('lat_dest', 'long_dest') := list(i.Latitude,
i.Longitude), on= "Code_B" ]
Now you just need to:
# calculate distances
odmatrix[ , dist := distHaversine(matrix(c(long_orig, lat_orig), ncol
= 2),
matrix(c(long_dest, lat_dest), ncol
= 2))]
# and get the nearest destinations for each origin
odmatrix[, .( Code_B = Code_B[which.min(dist)],
dist = min(dist)),
by = Code_A]
### Prepare data for this reproducible example
# load data
data("oregon.tract")
# get centroids as a data.frame
centroids <- as.data.frame(gCentroid(oregon.tract,byid=TRUE))
# Convert row names into first column
setDT(centroids, keep.rownames = TRUE)[]
# get two data.frames equivalent to your census and facility data
frames
Datatwo<- copy(centroids)
Dataone <- copy(centroids)
names(Datatwo) <- c('Code_A', 'Longitude', 'Latitude')
names(Dataone) <- c('Code_B', 'Longitude', 'Latitude')
I'm new to R so please excuse any terminology mistakes... I'm trying to extract the cell numbers for every county in the state of Oklahoma and paste them on top of each other so that I can use them to look at different temperatures throughout Oklahoma state. I have a shapefile of counties in the US, so I made a vector of all the county ID numbers for the state of OK. I then tried to extract the cell numbers and max temp values for every county in a loop. That extract line that I wrote works when I do it one county at a time, I think it's the okcounty=rbind line that's the problem but I don't know what the best way to do this is.
Thank you for your help! I really appreciate it.
`okcounties=which(counties$STATE_NAME=="Oklahoma") #contains 58 counties
county = NULL
for (i in 1:58){
countyvalues=extract(OK.tmax[[1]], extent(counties[okcounties[i],]), cellnumbers=T)
county=rbind(county, countyvalues) #add data from each of 58 counties
}`
I am finding your code a bit confusing and can see a few places it is going wrong. You are overthinking things a bit. I am not sure why you are extracting cellnumbers and not just taking advantage of extract and the stack object.
The "okcounties" object could be a sp class subset of the counties object, that you could pass directly to extract eg., okcounties <- counties[counties$STATE_NAME=="Oklahoma",] .
If you drop the call to extent, which is returning a bounding box for each county and not the county boundary, things get much simpler. To leverage the stack you could just let extract provide a data.frame of the raster values. Here is a worked example on synthetic data. I approximated your object naming convention for this example. The final object "ok.county" I believe would be the same as the "county" object that you are trying to create.
First, let's create some example data and plot
library(raster)
library(sp)
# create polygons
p <- raster(nrow=10, ncol=10)
p[] <- runif(ncell(p)) * 10
counties <- rasterToPolygons(p, fun=function(x){x > 9})
counties$county <- paste0("county",1:nrow(counties))
counties$STATE_NAME <- c(rep("CA",3),
rep("OK",nrow(counties)-3))
# Create raster stack
r <- raster(nrow=100, ncol=100)
r[] <- runif(ncell(r), 40,70)
r <- stack(r, r+5, r+10) # stack
names(r) <- c("June", "July", "Aug")
plot(r[[1]])
plot(p, add=TRUE, lwd=4)
We can use an index to subset to the state we are interested in.
ok <- counties[counties#data$STATE_NAME == "OK",]
Now we can use extract on the entire raster stack. The resulting object will be a list where each polygon has its own element in the list containing a data.frame. Each column of the data.frame represents a layer in the raster stack object.
ok.county <- extract(r, ok)
class(ok.county)
head(ok.county[[1]])
However, if you want to collapse the list into a single data.frame, unique polygon identifiers are missing. Here we are going to use the ID column in the SpatialPolygonsDataFrame object. Since the list is ordered the same as the polygon object you can assign unique values from the polygon object. In your case it would likely be the county names and the method would follow the same as the example.
cnames <- unique( counties#data$county )
for(i in 1:length(ok.county)) {
ok.county[[i]] <- data.frame(county = cnames[i], ok.county[[i]])
}
head(ok.county[[1]])
Now that we have a unique identifier assigned to each data.frame in the list we can collapse it using do.call.
ok.county <- as.data.frame(do.call("rbind", ok.county))
str(ok.county)
Using an apply function we can pull the maximum value for a given column (time-period) for each unique ID.
tapply(ok.county[,"June"], ok.county$county, max)
As to your original code, something like this would work (obviously, not tested) but there is no unique polygon ID tying results back to the county and it is still the bounding box of the county and not the polygon boundaries.
okcounties <- counties[counties$STATE_NAME=="Oklahoma",]
county = NULL
for (i in 1:nrow(okcounties)){
county <- rbind(county, extract(OK.tmax[[1]],
extent(okcounties[i,]), cellnumbers=T))
}