R Leaflet- Change density to column name of my own - r

I have been working on leaflet in R.
https://rstudio.github.io/leaflet/choropleths.html
The above us-Map contains density of a state.The Format of the data is Geo-Json. I want to remove the density variable and I want to pass my columnname with corresponding variable value. (For Example when you hover on the New Mexico I am getting density as 17.16 (density:17.16), instead I want to display as (mycolumnname:value) ).

This is a pretty common need in working with leaflet. There are a few ways to do this, but this is the simplest in my mind:
All of the information you would like to plot is stored in the section of the SpatialPolygonsDataFrame found at states#data, which you can see by looking at the head of this data frame section:
I made a data frame (traditional r data frame) using the state names from the original SpatialPolygonsDataFrame names states in your code above and created my_var.
a<-data.frame( States=states#data$name)
a$my_var <- round(runif(52, 15, 185),2)
This is the first few rows of my new data frame, which is like yours but has data OTHER than density in it.
head(a)
States my_var
1 Alabama 120.33
2 Alaska 179.41
3 Arizona 67.92
4 Arkansas 30.57
5 California 72.26
6 Colorado 56.33
Now that you have this data frame you can call up the library maptools and do a polygon cbind as follows:
states2<-spCbind(states,a$my_var)
Now looking at the head of states2 (which you could name states and replace the original states SpatialPolygonsDataFrame I kept both to compare before and after)
head(states2#data)
id name density data.my_var
0 01 Alabama 94.650 58.01
1 02 Alaska 1.264 99.01
2 04 Arizona 57.050 81.05
3 05 Arkansas 56.430 124.68
4 06 California 241.700 138.19
5 08 Colorado 49.330 103.78
this added the data.my_var variable into the spatial data frame. Now you can use find/replace, to go through and replace the references in your code where it says density with data.my_var and the new variables will be used.
Important things to consider
Your data has 50 state names, the spatial data frame has 52, you will need to add in the missing states to your data frame before cBinding them, they must be the same length AND in the same order.
If you grab the names like this:
a<-data.frame( States=states#data$name)
from the states object, you can then left merge on States, with your data and it will keep the order a and all the cells which are empty where the new regions have not data in your data set will remain empty.
Use merge to be sure that data lines up properly.
a<- merge(a, your_data ,by=c("States","name"))
Also, once they are merged and you have checked that states#data$name is in the same order as a$States, you can use any name you want as new heading in the SpatialPolygonDataFrame by extracting the data into a vector with the name you want prior to binding them:
my_var <- a$my_var
states2<-spCbind(states, my_var)
this will leave you with a data frame which looks like this:
id name density my_var
0 01 Alabama 94.650 58.01
1 02 Alaska 1.264 99.01
This is easier to address as a column name from inside leaflet without long strings.

Related

Convert lat and long dataframe to multiple spatial polygons in R

I have two problems I'm trying to solve, the first issue is the main one. Hopefully I've explained the second one decently.
1) My initial issue is trying to create spatial polygon dataframe from a tibble. For example, I have a tibble that outlines U.S. states, from the urbnmapr library and I want to be able to plot spatial polygons for all 50 states. (Note: I already have made a map from these data in ggplot but I specifically want spatial polygons to plot and animate in leaflet):
> states <- urbnmapr::states
> states
# A tibble: 83,933 x 10
long lat order hole piece group state_fips state_abbv state_name fips
<dbl> <dbl> <int> <lgl> <fct> <fct> <chr> <chr> <chr> <chr>
1 -88.5 31.9 1 FALSE 1 01.1 01 AL Alabama 01
2 -88.5 31.9 2 FALSE 1 01.1 01 AL Alabama 01
3 -88.5 31.9 3 FALSE 1 01.1 01 AL Alabama 01
...
2) Once I do this, I will want to join additional data from a separate tibble to the spatial polygons by the state name. What would be the best way to do that if I different data for each year? i.e. for the 50 states I have three years of data, so would I create 150 different polygons for the states across years or have 50 state polygons but have all the information in each to be able to make 3 different plots of all states for the different years?
I can propose you the following (unchecked because I don't have access to the urbnmapr package with my R version).
Problem 1
If you specifically want polygons, I think the best would be to join a dataframe to an object that comes from a shapefile.
If you still want to do it on your own, you need to do two things:
Convert your tibble into a spatial object with a point geometry
Aggregate points by state
sf package can do both. For the first step (the easy one), use sf_as_sf function.
library(sf)
states
states_spat <- states %>% st_as_sf(., coords = c("lon","lat"))
For the second step, you will need to aggregate geometries. I can propose you something that will give you a MULTIPOINT geometry, not polygons. To convert into polygons, you could find this thread to help
states_spat <- states_spat %>% group_by(state_name) %>%
dplyr::summarise(x = n())
Problem 2
That's a standard join based on a common attributes between your data and a spatial object (e.g. a state code). merge or *_join functions from dplyr work with sf data as they would do with tibbles. You have elements there
By the way, I think it is better for you to do that than creating your own polygons from a series of points.

Create data frame of names of 50 states in R

I'm working on a problem where I'm trying to map each state to a region for some data analysis. It seems the first thing I need to do is create a dataframe containing the names of all 50 states. Is there a way to do this without explicitly naming each state and inputting it into a row in the dataframe?
Sample data:
region_key <- as.data.frame("")
colnames(region_key) <- c("state")
region_key$region <- ""
region_key$state <- "AL"
I create an empty data frame, create a "state" and "region" column, then populate the state two letter abbreviations in the above fashion. Is there a way to both populate the data frame with the state abbreviations and classify by region (e.g. Alabama would be "South")?
Expected output:
head(region_key)
state region
1 AL South
Thanks in advance for your help!
Figured out my problem based on the comment from #alistair, thank you.
Sample data:
region_key <- data.frame(state.abb, state.region)
head(region_key)
state.abb state.region
1 AL South
2 AK West
3 AZ West
4 AR South
5 CA West
6 CO West

How do I replace values in an R dataframe column with a corresponding value?

Ok, so I have a dataframe that I downloaded from Pew Research Center. One of the columns (called 'cregion') contains a series of numbers from 1-56, with each number corresponding to a geographic location in the U.S. Most of these locations are states, and the additional 6 are at the sub-state level. So, for example, the number '1' corresponds to 'Alabama', and '11' corresponds to the 'District Of Columbia'.
What I'd like to do is replace each of those numbers in the 'cregion' column with the ACTUAL name of the region it corresponds to. Unfortunately, there is no column in this data frame that I can use to swap the values, as the key for which number corresponds to which region exists completely separately (word document). I'm new to R and while I've been searching for a few hours for the best way to go about this, I can't seem to find a method that would work (or I just don't understand the explanation). Can anybody suggest a method to me?
If you have a vector of the state names as strings called statevec whose ith element corresponds to cregion i, and your data frame is named dat, just do
dat <- data.frame(cregion = sample(1:50), stuff = runif(50))
head(dat)
# cregion stuff
#1 25 0.665843896
#2 11 0.144631131
#3 13 0.691616240
#4 28 0.507454243
#5 9 0.416535139
#6 30 0.004196311
statevec <- state.name
dat$cregion <- statevec[dat$cregion]
head(dat)
# cregion stuff
#1 Missouri 0.665843896
#2 Hawaii 0.144631131
#3 Illinois 0.691616240
#4 Nevada 0.507454243
#5 Florida 0.416535139
#6 New Jersey 0.004196311

Merge spatial point dataset with Spatial grid dataset using R. (Master dataset is in SP Points format)

I am working on spatial datasets using R.
Data Description
My master dataset is in SpatialPointsDataFrame format and has surface temperature data (column names - "ruralLSTday", "ruralLSTnight") for every month. Data snippet is shown below:
Master Data - (in SpatialPointsDataFrame format)
TOWN_ID ruralLSTday ruralLSTnight year month
2920006.11 2920006 303.6800 289.6400 2001 0
2920019.11 2920019 302.6071 289.0357 2001 0
2920015.11 2920015 303.4167 290.2083 2001 0
3214002.11 3214002 274.9762 293.5325 2001 0
3214003.11 3214003 216.0267 293.8704 2001 0
3207010.11 3207010 232.6923 295.5429 2001 0
Coordinates:
longitude latitude
2802003.11 78.10401 18.66295
2802001.11 77.89019 18.66485
2803003.11 79.14883 18.42483
2809002.11 79.55173 18.00016
2820004.11 78.86179 14.47118
I want to add columns in the above data about rainfall and air temperature - This data is present in SpatialGridDataFrame in the table "secondary_data" for every month. Snippet of "secondary_data" is shown below:
Secondary Data - (in SpatialGridDataFrame format)
month meant.69_73 rainfall.69_73
1 1 25.40968 0.6283871
2 2 26.19570 0.4580542
3 3 27.48942 1.0800000
4 4 28.21407 4.9440000
5 5 27.98987 9.3780645
Coordinates:
longitude latitude
[1,] 76.5 8.5
[2,] 76.5 8.5
[3,] 76.5 8.5
[4,] 76.5 8.5
[5,] 76.5 8.5
Question
How do I add the columns from secondary data to my master data by matching over latitude longitude and month? Currently the latitude/longitude information in the two table above will not match exactly as master data is a set of points and secondary data is grid.
Is there a way to find the square of the grid on the "Secondary Data" that the lat/long of my master data falls into, and interpolate?
If your SpatialPointsDataFrame object is called x, and your SpatialGridDataFrame is called y, then
x <- cbind(x, over(x, y))
will add the attributes (grid cell values) of y matching to the locations of x, to the attributes of x. Match is done by point-in-grid cell.
Interpolation is a different question; a simple way would be inverse distance with the four nearest neighbours, e.g. by
library(gstat)
x = idw(meant.69_73~1, y, x, nmax = 4)
whether you want one, or the other really depends on what your grid cells mean: do they refer to (i) the point value at the grid cell center, (ii) a value that is constant throughout the grid cell, or (iii) an average value over the whole grid cell. First case: interpolate, second: use over, third: use area-to-point interpolation (not explained here).
R package raster will offer similar functionality, but use different names.

What's the smart way to aggregate data?

Suppose there is a dataset of different regions, each region a subset of a state, and some outcome variable:
regions <- c("Michigan, Eastern",
"Michigan, Western",
"Minnesota",
"Mississippi, Northern",
"Mississippi, Southern",
"Missouri, Eastern",
"Missouri, Western")
set.seed(123)
outcome <- rpois(7, 12)
testset <- data.frame(regions,outcome)
regions outcome
1 Michigan, Eastern 10
2 Michigan, Western 11
3 Minnesota 17
4 Mississippi, Northern 12
5 Mississippi, Southern 12
6 Missouri, Eastern 17
7 Missouri, Western 13
A useful tool would aggregate each region and add, or take the mean or maximum, etc. of outcome by region and generate a new data frame for state. A sum, for example, would output this:
state outcome
1 Michigan 21
3 Minnesota 17
4 Mississippi 24
6 Missouri 30
The aggregate() function won't solve this problem. Is there something else in R that is built for this? It seems like grep could be used to generate the new column "states" as part of an application specific program. Seems like this would already be out there somewhere though.
The reason this isn't straight forward is that the structure of your data is not consistent, so you couldn't build a library simply for it.
Your state, region column is basically an index column, and you want to index across part of it. tapply is designed for this, but there's no reason to build in a function to do it automatically for this specific scenario. You could do it without creating the column though
tapply(outcome,gsub(",.*$","",testset$regions),sum)
The index column just replaces the , and everything after it, leaving the index column.
PS: you have a slight typo in your example, your data.frame should be
testset <- data.frame(regions,outcome)

Resources