How to filter for nearby geometries - r

I want to create a subset of census data for nearby political boundaries. For example, with zip code 17954, how do I filter a data frame containing all zip codes in the US to just zip codes within a 10 mile radius?
I would be using tigris, dyplr I believe for this. From documentation it seems I should use the st_filter() command to do so.
states = c(unique(FoodAccessResearchAtlasData2019$State))
state_list = vector("list", length = length(states))
for (i in 1:length(states)){
#tigris command cd=TRUE doesn't seem to work with year command, so this function suffices that
dat = zctas(state = states[i], year = 2010)
state_list[[i]] = dat
}
#combining into one large zip code data frame
zipcodes = do.call(rbind, state_list)
nearby_zips = zipcodes %>%
st_filter(zipcodes$ZCTA5CE10[17603], #this is what I imagine is the point to filter from
..predicate = st_is_within_distance, #honestly just copying from the documentation
dist = 10) #is within 10 miles, I don't know the base unit for the dist command

Related

Convert List of lists to data frame where each list within the list are the results from using Sapply + decompose on multiple columns

this is my first project using a coded environment so may not phrase things accurately. I am building an ARIMA forecast.
I want to forecast for multiple sectors (business areas) at a time. Using help forums I have managed to write code that takes my time series data as input, fits the model, and sends the outputs to CSV. I am happy with this.
My problem is that I would also like capture the results from the decomposition analysis on a sector level. Currently, when I use a solution I found elsewhere it outputs to CSV in a format that is unusable, where everything is spread by row and the different lists are half in one row and another.
Thanks In advance!
My current solution (probably not super efficient but like I say cobbled together based on forum tips)
Clean data down to TS
NLDemand <- read_excel("TS Demand 2018 + Non London no lockdown.xlsx")
NLDemand <- as_tibble(NLDemand)
NLDemand <- na.omit(NLDemand)
NLDemand <- subset(NLDemand, select = -c(Month,Year))
NLDemand <- subset(NLDemand, select = -c(YearMonth))
##this gets the data to a point where each column is has a header of business sector and the time series data underneath it with no categorical columns left E.G:
Sector 1a, sector1b, sector...
500,450,300
450,500,350
...,...,...
Season capture for all sectors
tsData<-sapply(NLDemand, FUN = ts, simplify = FALSE,USE.NAMES = TRUE,start=c(2018,1),frequency=12)
tsData
timeseriescomponents <- sapply(tsData,FUN=decompose,simplify = FALSE, USE.NAMES = TRUE)
timeseriescomponents
this produces a list of lists where each sublist is the decomposed elements of the sector time series.
##Covert all season captures to the same length
TSC <- list(timeseriescomponents[1:41])
n.obs <- sapply(TSC, length)
seq.max <- seq_len(max(n.obs))
mat <- t(sapply(TSC, "[", i = seq.max ))
##Export to CSV
write.csv(mat, "Non london 2018 + S-T componants.csv", row.names=FALSE)
***What I want as an output would be a table that showed each componant as a a column in a list
Desired output format
Current output(sample)

How to create a function to retrieve multiple cities historical weather using R and DARKSKY api?

I'm trying to retrieve historical weather data for 100 cities in R using DARKSKY API.
The following code works to get historical data for 1 city, however I'm having issues creating a loop function to go through a list of 100 latitude and longitudes and spit out the data.
weather <- function(Long,Lat)
{ a <-seq(Sys.Date()-10, Sys.Date(), "1 day") %>%
map(~get_forecast_for(Long,Lat,.x, units = 'si')) %>%
map_df('daily')
write.csv(a,"blah blah")
}
weather(52.6983,-1.0735)
My initial thought was to upload csv file with all the longitude and latitudes I require. Set them as variables and then map them to the function above.
data <- read.csv("blah blah")
Long <- data$Longitude
Lat <- data$Latitude
map(c("Long","Lat"),weather)
But it keeps bringing back error messages.
Can anyone help please?
Thank you
You are almost there. There are a couple of things needed to iterate the get_forecast_for function by rows. From the purrr package, the pmap function is good for repeating a function by row whereas the imap function can be used for repeating a function by cells in a row.
Using this approach, I wrote two functions: weather_at_coords and weather. weather_at_coords is used to send a request to DarkSkyAPI for weather at specific location in a given time range (i.e., last ten days). The weather function is used to repeat the function by row.
I saw that you wanted the nested object daily, so wrote the function to extract that list from the response. I'm assuming that you also wanted the results in a data.frame so I added bind_rows. I added a column id so that rows can be properly linked to a location (or you can add any columns that you like).
# pkgs
library(tidyverse)
library(darksky)
# set API Key: free from https://darksky.net/dev
darksky::darksky_api_key()
# Forecast at a given point and time period
weather_at_coords <- function(...) {
d <- rlang::list2(...)
time <- seq(Sys.Date()-10, Sys.Date(), "1 day")
response <- imap(time, ~ darksky::get_forecast_for(d$lat, d$lon, .x, units = "si")[["daily"]])
out <- bind_rows(response) %>% mutate(id = d$id)
return(out)
}
# primary function (iterates across rows)
weather <- function(data) {
result <- pmap(data, ~ weather_at_coords(...))
return(bind_rows(result))
}
# sample data
d <- data.frame(
id = c("a", "b"),
lat = c(37.8267,34.8267),
lon = c(-122.423, -120.423)
)
# run
x <- weather(d)
x
Notes
Make sure you have the rlang package installed
Adjust the lat and lon variable names as required.

How to write a loop for creating cropped raster for every id of a shapefile with a raster base?

I'm still new to R and don't know how to create a loop for my workprocess to make it more efficient.
I have a Digital Elevation Model (raster Barrow_5m.tif), a shapefile for lakes and buffer with 10 iDs in a row of the table each.
In the script below I created a new raster file for all values of the lake and the buffer shape file with the data from the DEM raster. This works fine.
setwd("...")
Barrow_5m <- raster("Barrow_5m.tif")
Barrow_DTLB <- st_read("Barrow_DTLB.shp")
Barrow_DTLB_Buffer <- st_read("Barrow_DTLB_BufferOUT.shp")
Barrow_lake <- crop(Barrow_5m, extent(Barrow_DTLB))
raster_lake <- rasterize(Barrow_DTLB, Barrow_lake, mask = TRUE)
Barrow_buffer <- crop(Barrow_2m, extent(Barrow_DTLB_Buffer))
raster_buffer <- rasterize(Barrow_DTLB_Buffer, Barrow_buffer, mask = TRUE)
writeRaster(raster_lake, "raster_lake.tif")
writeRaster(raster_buffer, "raster_buffer.tif")
But now I want to have a raster file for every id of the lake and the buffer shapefile seperately, so 2x10 files.
I thought it's best to write a loop for this, but my skills are not enough so far to do this.
Also other questions didn't bring the solution so far. I tried to help me with this.
Alternatively I could use my end product tif from the script above and undo this in files for every ID.
I want to write the loop and not do it by hand for all the IDs of the shapefiles, because afterwards I am going to do the same with an even bigger shapefile of more values.
I found a solution now, by extracting data by the ID.
It creates a largelist with 11 elements and all values of each id, which is sufficient for my further work. You can also directly creat the mean, max, min, etc values of each element (so each ID).
k <- Barrow_DTLB$ID #k= number of rows
LakesA <- extract(raster_lakeA, Barrow_DTLB[k, ])
LakesA_mean <- extract(raster_lakeA, Barrow_DTLB[k, ], fun=mean)
Maybe this solution is also helpful for a few, who already viewed the question.
I think this should work:
for (i in unique(raster_lake)){
r <- raster_lake
r[!(values(r) == i)] <- NA
r <- trim(r)
writeRaster(r, paste0("raster_lake_", i, ".tif"))
}

How to group repeating sequences of numbers using R

The simplest description of what I am trying to do is that I have a column in a data.frame like 1,2,3,..., n, 1,2,3,...n,.... and I want group the first 1...n as 1 the second 1...n as 2 and so on.
The full context is; I am using the R spcosa package to do equal area stratification composite sampling on parcels of land. I start with a shape file from a GIS that contains a number of polygons (land parcels). The end result I want is a GIS file with each of the strata and sample locations in a GIS file format with each stratum and sample location labeled by land parcel, stratum and sample id. So far I can do all this except one bit which is identifying the stratum that the samples belongs too and including it in the sample label. The sample label needs to look like "parcel#-strata#-composite# (where # is the number). In practice I don't need this actual label but as separate attributes in GIS file.
The basic work flow is a follows
For each individual polygon using spcosa::stratify I divide it into a number of equal area strata like
strata.CSEA <- stratify(poly[i,], nStrata = n, nTry = 1, equalArea = TRUE, nGridCells = x)
Note spcosa::stratify generates a CompactStratificationEqualArea object. I cocerce this to a SpatialPixelData then use rasterToPolygon to be able to output it as a GIS file.
I then generate the sample locations as follows:
samples.SPRC <- spsample(strata.CSEA, n = n, type = "composite")
spcosa::spsample creates a SamplingPatternRandomComposite object. I coerce this to a SpatialPointsDataFrame
samples.SPDF <- as(samples.SPRC, "SpatialPointsDataFrame")
and add two columns to the #data slot
samples.SPDF#data$Strata <- "this is the bit I can't do yet"
samples.SPDF#data$CEA <- poly[i,]$name
I can then write samples.SPDF as a GIS file ( ie writeOGE) with all the wanted attributes.
As above the part I can't sort out is how the sample ids relate to the strata ids. The sample points are a vector like 1,2,3...n, 1,2,3...n,.... How do I extract which sample goes with which strata? As actual strata number are arbitrary, I can just group ( as per my simple question above) but ideally I would like to use the numbering of the actual strata so everything lines up.
To give any contributors access to a hands on example I copy below the code from the spcosa documentation slightly modified to generate the correct objects.
# Note: the example below requires the 'rgdal'-package You may consider the 'maptools'-package as an alternative
if (require(rgdal)) {
# read a vector representation of the `Farmsum' field
shpFarmsum <- readOGR(
dsn = system.file("maps", package = "spcosa"),
layer = "farmsum"
)
# stratify `Farmsum' into 50 strata
# NB: increase argument 'nTry' to get better results
set.seed(314)
myStratification <- stratify(shpFarmsum, nStrata = 50, nTry = 1, equalArea = TRUE)
# sample two sampling units per stratum
mySamplingPattern <- spsample(myStratification, n = 2 type = "composite")
# plot the resulting sampling pattern on
# top of the stratification
plot(myStratification, mySamplingPattern)
}
Maybe order() function can help you
n <- 10
dat <- data.frame(col1 = rep(1:n, 2), col2 = rnorm(2*n))
head(dat)
dat[order(dat$col1), ]
I did not get where the "ID" (1,2,3...n) is to be found; so let's assume you have your SpatialPolygonsDataFrame called shpFarmsum with a attribute data column "ID". You can access this column via shpFarmsum$ID. Therefore, if you want to create individual subsets for each ID this is one way to go:
for (i in unique(shpFarmsum$ID)) {
tempSubset shpFarmsum[shpFarmsum$ID == i,]
writeOGR(tempSubset, ".", paste0("subset_", i), driver = "ESRI Shapefile")
}
I added the line writeOGR(... so all subsets are written to your working direktory. However, you can change this line or add further analysis into the for-loop.
How it works
unique(shpFarmsum$ID) extracts all occuring IDs (compareable to your 1,2,3...n).
In each repetition of the for loop, another value of this IDs will be used to create a subset of the whole SpatialPolygonsDataFrame, which you can use for further analysis.

How can I batch geocode street addresses from a csv file in to R?

Edit; answer below.
Batch geocoding can be done like this using ggmap, file names are mine. Code was adapted from David Smith's Revolutions Blog Post
library(ggmap)
#Read in csv file
FDNYHouse = read.csv("Path to your csv file here.csv")
#Get column header names if you don't already have them
names(FDNYHouse)
#Create a file of just addresses that need to be geocoded
#You can require a state by replacing State below with New York if state was missing
#Everything inside paste() is a column header from the csv file
FDNYAddresses = with(FDNYHouse, paste(FacilityAddress, Borough, State, sep = ","))
#Now we can geocode the addresses
FDNYLocations = geocode(FDNYAddresses)
#The FDNYLocations file will have a lon and lat column representing your geocoded data
#My next problem is getting the shape file projection to match my geocoded points
I have created a census tract map of NYC using ggplot2 and a shape file. Next, I'd like to create a data frame using street addresses of fire houses to lay over the top of the map using a csv file I downloaded here:
FDNY Firehouse Locations
The shape file for census tracts is locate here (it's the 2010 version in black):
NYC Shape File
My problems are that the data doesn't list city and state, and that I don't know how to write a function that can grab these addresses and geocode them with Google using something like ggmap.
Any advice or nudges in the right direction would be appreciated. I'm new to R and stackoverflow so go easy on me.
Edit: Did anyone who marked this as already asked either A) look at my actual data or B) realize that the question you think I repeated is 3 years old? Guess nothing new has happened in R in the last 3 years right? The world is flat, move along folks. /rant
I can use ggmap and the geocode() function to get lat and lon without creating a function to do it.
#As an example
install.packages("ggmap")
library(ggmap)
geocode("San Francisco")
The problem, again, is how to tell R to read my csv file, which is missing city and state data, so that it can create the 200+ lat and lon measurement I need without me having to geocode 1 address at a time.
The second issue is then taking this data, making a data frame and adding it to the NYC shape file I already have.
That answer from 3 years ago is complicated and confusing for someone without the experience most people who looked at this post have...I also believe it doesn't answer my question.
I recently solved a similar problem. Below are two pieces of code. The first function converts addresses to lat/lon (if you can't abide by Google's terms of use, look for the Data Science Toolkit as a good standalone alternative for geo-coding.) The second function looks at a given lat/lon pair and determines which polygon (Census tract) contains those coordinates. Very useful for doing choropleth maps.
library("RJSONIO") #Load Library
library("plyr")
library("RODBC")
library(maptools)
getGeoCode <- function(gcStr)
{ gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters
#Open Connection
connectStr <- paste('http://http://maps.googleapis.com/maps/api/geocode/json?address=',gcStr, sep="")
con <- url(connectStr)
data.json <- fromJSON(paste(readLines(con, warn = FALSE), collapse=""))
close(con)
#Flatten the received JSON
data.json <- unlist(data.json)
if (data.json["status"] == "OK" && data.json["results.geometry.location_type"] == "ROOFTOP") {
address <- data.json["results.formatted_address"]
lat <- data.json["results.geometry.location.lat"]
lon <- data.json["results.geometry.location.lng"]
gcodes <- data.frame("Address" = address, "Lon" = as.numeric(lon), "Lat" = as.numeric(lat))
return (gcodes)
} else return ()
}
# Testing...
geoCodes <- getGeoCode("Palo Alto,California")
geoCodes
# "-122.1430195" "37.4418834"
# Required for TractLookup
Washington <-readShapePoly("g:/USCensus/tl_2012_53_tract/tl_2012_53_tract")
# US Census tract files (includes shape and data files)
tractLookup <- function(x) {
# pt <- SpatialPoints(data.frame(x = -80.1, y = 26.3))
pt <- SpatialPoints(data.frame(x = x$Lon, y = x$Lat))
Mapping <- over(pt, Washington) # what index number does pt fall inside?
Mapping <- data.frame(
"GEOID" = as.character(Mapping$GEOID),
"State" = as.character(Mapping$STATEFP) ,
"County" = as.character(Mapping$COUNTYFP),
"Tract" = as.character(Mapping$TRACTCE),
"Tract_Name" = as.character(Mapping$NAME),
"INTPTLAT" = as.character(Mapping$INTPTLAT),
"INTPTLON" = as.character(Mapping$INTPTLON),
stringsAsFactors = FALSE)
Mapping[is.na(Mapping)] <- "NULL"
return(Mapping)
}
tractLookup(data.frame("Lon" = -122, "Lat" = 47.5))
# GEOID State County Tract Tract_Name INTPTLAT INTPTLON
# 1 53033032102 53 033 032102 321.02 +47.4851507 -121.9657839
Looking at the New York fire department shape file, you should be able to change the mapping statement to look for and return the appropriate fields in place of the GEOID and tract information from the standard US Census shape file in my example.
Try it this way.
# Geocoding a csv column of "addresses" in R
#load ggmap
library(ggmap)
# Select the file from the file chooser
fileToLoad <- file.choose(new = TRUE)
# Read in the CSV data and store it in a variable
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)
# Initialize the data frame
geocoded <- data.frame(stringsAsFactors = FALSE)
# Loop through the addresses to get the latitude and longitude of each address and add it to the
# origAddress data frame in new columns lat and lon
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
# Write a CSV file containing origAddress to the working directory
write.csv(origAddress, "geocoded.csv", row.names=FALSE)

Resources