Loosing data using st_join - r

I am trying to perform a spatial join of two sf shape files.I am losing all information from the second data set (i.e output_inmap). Whichever dataset is placed second will return all NA values. Anyone know what could be happening?
output_inmap <- st_read("processed/ceidars_data_inmap.shp")
output_inmap <-st_transform(output_inmap, crs=3310)
unzip("census-tract.zip")
census_tracts <- st_read("census-tract/tl_2019_06_tract.shp")
st_transform(census_tracts, crs = 3310)
st_transform(output_inmap, crs = 3310)
TC_1<- st_join(census_tracts, output_inmap)
I am losing all information from the second data set (i.e output_inmap). Whichever dataset is placed second will return all NA values. Anyone know what could be happening?

Your second st_transform (of the census tracts) seems to be leading nowhere; consider this code (slightly adjusted via dplyr style pipe) to ensure both spatial objects are on the same CRS.
You may also consider setting parameter left of the sf::st_join() call (by default true) to false = change behaviour from left (preserving) to inner (filtering) style join. Sometimes this makes for a more concise code.
library(sf)
library(dplyr)
output_inmap <- st_read("processed/ceidars_data_inmap.shp") %>%
st_transform(crs=3310)
unzip("census-tract.zip")
census_tracts <- st_read("census-tract/tl_2019_06_tract.shp") %>%
st_transform(crs = 3310)
TC_1<- st_join(census_tracts, output_inmap)

Related

Aggregating values of overlapping polygons within a layer using the origins value from st_intersection

I would like to take a spatial layer with overlapping polygons and transform it so that a layer without any overlaps is created, and the attributes of the overlapping polygons aggregated as a list, or a sum (will vary depending on the field and goal of analysis). Based on these posts https://github.com/r-spatial/sf/issues/1230 , summarise attributes from sf::st_intersection() where geometries overlaps
I think that I should be able to do this using the origins field created when st_intersection() is used with a single input.
An example with some dummy data:
set.seed(123)
p1 = st_cast(st_sfc(st_multipoint(cbind(runif(10),runif(10)))),"POINT")
b1 =st_buffer(p1, .15)
b1d = data.frame(id=1:10, class=rep(c("high", "low"), times=5), value= rep(c(10,5),times=5))
b1d$geometry = b1
b1d = st_as_sf(b1d)
b1d_intersect <- st_intersection(b1d)%>%
mutate(id_int=seq(from=1, to=nrow(.), by=1)) %>%
st_collection_extract()
ggplot(b1d)+geom_sf(aes(fill=class), alpha=0.5)
ggplot(b1d_intersect)+geom_sf(aes(fill=class), alpha=0.5)
In my desired output the attributes of each polygon would aggerate overlapping attribute values, something like:
b1d_intersect_values <- b1d_intersect %>%
group_by(rownames(.)) %>%
summarise(Class_List=paste0(class, collapse = "; "), value_sum=sum(value))
but that actually works by linking back to the original b1d layer using the origins field? I feel like I need to use map() somewhere in there.
I've also tried to tackle it using st_join, but that doesn't work quite right as it is assigning multiple classes for polygons where n.overlaps=1, and it also seems as though would take longer than using an index when applied to large data sets as it would require the additional spatial processing step with st_join.
b1d_intersect_values_2 <- st_join(b1d_intersect, b1d) %>%
group_by(id_int, n.overlaps) %>%
summarise(class_list=paste0((class.y),collapse="; "), value_sum=sum(value.y))
Any suggestions on how to proceed would be appreciated.
Figured it out eventually, answering myself so nobody wastes time or effort, and in case helpful to anyone else.
b1d_intersect_values <- b1d_intersect %>%
mutate(class_list=map_chr(origins, ~ paste0(class[.], collapse = "; ")),
value_sum=map_dbl(origins, ~ sum(value[.])))
ggplot(b1d_intersect_values)+geom_sf(aes(fill=class_list, alpha=value_sum))

R Adding polygon name data to point data

I have two sets of data: Point level data and polygon data. I am aiming to add the name of the polygon into which a pint is located as an extra column on the point level data.
I have found and used the code below utilising the sf library.
new_point_data <- point_data %>% mutate(
intersection = as.integer(st_intersects(geometry, polygon_data))
, area = if_else(is.na(intersection), polygon_data$Name[intersection])
This works in 90% of cases, however when a point intersects a polygon the code does not bring any data back. Which I'm assuming is because it will return two (or more) values and cannot determine which to use, how can I update this to select any value e.g. the first?

How to subset a large SpatialPolygonsDataFrame

I want to calculate the area of a wildfire. I tried this by substracting the NDVI calculated on a Landsat image before and another image after the fire and see where the NDVI was reduced. However, not only in the burning areas the NDVI has changed, but there are also many random differences. I used rasterToPolygons to create a large SpatialPolygonsDataFrame containing all areas where NDVI after - NDVI before < 0.
Now I want to remove all the polygons with an area below a certain threshold value. However, I cannot find a way to subset the large SpatialPolygonsDataFrame.
I found an example on how to get a list of the polygons with an area above the threshold (where burned_poly is the large SpatialPolygonsDataFrame):
pols <- lapply(burned_poly#polygons , slot , "Polygons")
pols_areas <- lapply(pols[[2]], function(x) slot(x, "area"))
However, accessing the large SpatialPolygonsDataFrame like this
bp <- burned_poly#polygons[[1]]#Polygons[pols_areas >= 9000]
gives me a list which I am currently unable to coerce into a SpatialPolygonsDataFrame.
Can someone tell me how to do this last step (I have trouble with the Sf argument of which I don't know what it is in the SpatialPolygonsDataFrame function), or maybe there is a different and better approach to extract the fire extent as a polygon?
Alright, I think I have found a way thanks to Orlandos suggestion to use sf.
I transformed my large SpatialPolygonsDataFrame object to a sf object via st_as_sf() which gave me a multipolygon. This stf_MULTIPOLYGON object can be subdivided into single polygons using st_cast() and the resulting object is subsettable like a data.frame.
bp_sf <- st_as_sf(burned_poly)
bps_sf <- st_cast(bp_sf, "POLYGON")
BpSf <- bps_sf[as.numeric(st_area(bps_sf))>=10000,]
If you are using the simple features sf library you can use functions from the tidyverse. Filtering data is a matter of using the filter() function. Notice that you can convert your objects to sf using st_as_sf(). See: https://r-spatial.github.io/sf/reference/st_as_sf.html and How to filter an R simple features collection using sf methods like st_intersects()?

Choropleth Plotting polygons with ggplot2 R on a map

I realise this has been asked about 100 times prior, but none of the answers I've read so far on SO seem to fit my problem.
I have data. I have the lat and lon values. I've read around about something called sp and made a bunch of shape objects in a dataframe. I have matched this dataframe with the variable I am interested in mapping.
I cannot for the life of me figure out how the hell to get ggplot2 to draw polygons. Sometimes it wants explicit x,y values (which are a PART of the shape anyway, so seems redundant), or some other shape files externally which I don't actually have. Short of colouring it in with highlighters, I'm at a loss.
if I take an individual sps object (built with the following function after importing, cleaning, and wrangling a shitload of data)
createShape = function(sub){
#This funciton takes the list of lat/lng values and returns a SHAPE which should be plottable on ggmap/ggplot
tempData = as.data.frame(do.call(rbind, as.list(VICshapes[which(VICshapes$Suburb==sub),] %>% select(coords))[[1]][[1]]))
names(tempData) = c('lat', 'lng')
p = Polygon(tempData)
ps = Polygons(list(p),1)
sps = SpatialPolygons(list(ps))
return(sps)
}
These shapes are then stored in the same dataframe as my data - which only this afternoon for some reason, I can't even look at, as trying to look at it yields the following error.
head(plotdata)
Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic
I realise I'm really annoyed at this now, but I've about 70% of a grade riding on this, and my university has nobody capable of assisting.
I have pasted the first few rows of data here - https://pastebin.com/vFqy5m5U - apparently you can't print data with an s4 object - the shape file that I"m trying to plot.
Anyway. I'm trying to plot each of those shapes onto a map. Polygons want an x,y value. I don't have ANY OTHER SHAPE FILES. I created them based on a giant list of lat and long values, and the code chunk above. I'm genuinely at a loss here and don't know what question to even ask. I have the variable of interest based on locality, and the shape for each locality. What am I missing?
edit: I've pasted the summary data (BEFORE making them into shapes) here. It's a massive list of lat/lng values for EACH tile/area, so it's pretty big...
Answered on gis.stackexchange.com (link not provided).

Creating SpatialLinesDataFrame from SpatialLines object and basic df

Using leaflet, I'm trying to plot some lines and set their color based on a 'speed' variable. My data start at an encoded polyline level (i.e. a series of lat/long points, encoded as an alphanumeric string) with a single speed value for each EPL.
I'm able to decode the polylines to get lat/long series of (thanks to Max, here) and I'm able to create segments from those series of points and format them as a SpatialLines object (thanks to Kyle Walker, here).
My problem: I can plot the lines properly using leaflet, but I can't join the SpatialLines object to the base data to create a SpatialLinesDataFrame, and so I can't code the line color based on the speed var. I suspect the issue is that the IDs I'm assigning SL segments aren't matching to those present in the base df.
The objects I've tried to join, with SpatialLinesDataFrame():
"sl_object", a SpatialLines object with ~140 observations, one for each segment; I'm using Kyle's code, linked above, with one key change - instead of creating an arbitrary iterative ID value for each segment, I'm pulling the associated ID from my base data. (Or at least I'm trying to.) So, I've replaced:
id <- paste0("line", as.character(p))
with
lguy <- data.frame(paths[[p]][1])
id <- unique(lguy[,1])
"speed_object", a df with ~140 observations of a single speed var and row.names set to the same id var that I thought I created in the SL object above. (The number of observations will never exceed but may be smaller than the number of segments in the SL object.)
My joining code:
splndf <- SpatialLinesDataFrame(sl = sl_object, data = speed_object)
And the result:
row.names of data and Lines IDs do not match
Thanks, all. I'm posting this in part because I've seen some similar questions - including some referring specifically to changing the ID output of Kyle's great tool - and haven't been able to find a good answer.
EDIT: Including data samples.
From sl_obj, a single segment:
print(sl_obj)
Slot "ID":
[1] "4763655"
[[151]]
An object of class "Lines"
Slot "Lines":
[[1]]
An object of class "Line"
Slot "coords":
lon lat
1955 -74.05228 40.60397
1956 -74.05021 40.60465
1957 -74.04182 40.60737
1958 -74.03997 40.60795
1959 -74.03919 40.60821
And the corresponding record from speed_obj:
row.names speed
... ...
4763657 44.74
4763655 34.8 # this one matches the ID above
4616250 57.79
... ...
To get rid of this error message, either make the row.names of data and Lines IDs match by preparing sl_object and/or speed_object, or, in case you are certain that they should be matched in the order they appear, use
splndf <- SpatialLinesDataFrame(sl = sl_object, data = speed_object, match.ID = FALSE)
This is documented in ?SpatialLinesDataFrame.
All right, I figured it out. The error wasn't liking the fact that my speed_obj wasn't the same length as my sl_obj, as mentioned here. ("data =
object of class data.frame; the number of rows in data should equal the number of Lines elements in sl)
Resolution: used a quick loop to pull out all of the unique lines IDs, then performed a left join against that list of uniques to create an exhaustive speed_obj (with NAs, which seem to be OK).
ids <- data.frame()
for (i in (1:length(sl_obj))) {
id <- data.frame(sl_obj#lines[[i]]#ID)
ids <- rbind(ids, id)
}
colnames(ids)[1] <- "linkId"
speed_full <- join(ids, speed_obj)
speed_full_short <- data.frame(speed_obj[,c(-1)])
row.names(speed_full_short) <- speed_full$linkId
splndf <- SpatialLinesDataFrame(sl_obj, data = speed_full_short, match.ID = T)
Works fine now!
I may have deciphered the issue.
When I am pulling in my spatial lines data and I check the class it reads as
"Spatial Lines Data Frame" even though I know it's a simple linear shapefile, I'm using readOGR to bring the data in and I believe this is where the conversion is occurring. With that in mind the speed assignment is relatively easy.
sl_object$speed <- speed_object[ match( sl_object$ID , row.names( speed_object ) ) , "speed" ]
This should do the trick, as I'm willing to bet your class(sl_object) is "Spatial Lines Data Frame".
EDIT: I had received the same error as OP, driving me to check class()
I am under the impression that the error that was populated for you is because you were trying to coerce a data frame into a data frame and R wasn't a fan of that.

Resources