How to troubleshoot mislabeling of provinces in my shapefile in r? - r

I have a shapefile of the Philippines that has all the correct labels of each provinces. After removing some of the provinces I won't be using, aggregating the data into a single data frame, and then attaching my covariates to the shapefile I run into trouble. Using tmap to create some maps, the provinces are mislabeled and therefore, different data is applied to different provinces I am doing a spatial-temporal analysis with this data, so it's important the provinces are in the correct locations.
I have tried retrojecting some of the shapefile, but it doesn't seem to work.
#reading in shapefile
shp <- readOGR(".","province.csv")
#removing provinces not in data from shapefile
myshp82=shp
shp#data$prov=as.character(shp#data$prov)
ind=shp#data$prov%in% mydata$prov
shp.subset=shp[ind,]
#attaching covariates to shapefile for plotting, myagg is my data frame.
#The shape files are divided in four different time periods.
myagg_time1=myagg[myagg$period==1,]
myagg_time2=myagg[myagg$period==2,]
myagg_time3=myagg[myagg$period==3,]
myagg_time4=myagg[myagg$period==4,]
myshptime1=myshptime2=myshptime3=myshptime4=shp
myshptime1#data=merge(myshptime1#data, myagg_time1, by='prov',all.x=TRUE)
myshptime2#data=merge(myshptime2#data, myagg_time2, by='prov',all.x=TRUE)
myshptime3#data=merge(myshptime3#data, myagg_time3, by='prov',all.x=TRUE)
myshptime4#data=merge(myshptime4#data, myagg_time4, by='prov',all.x=TRUE)
#desc maps. Here's the code I've been using for one of the maps.
Per1= tm_shape(myshptime1)+
tm_polygons(c('total_incomeMed','IRA_depMean','pov'), title=c('Total Income', 'IRA', 'Poverty (%)'))+
tm_facets(sync = TRUE, ncol=3)
#sample data from my data sheet "myagg". First column is provinces.
period counts total_income_MED IRA_depMean
Agusan del Norte.1 1 2 119.33052 0.8939136
Agusan del Norte.2 2 0 280.96928 0.8939136
Agusan del Norte.3 3 1 368.30082 0.8939136
Agusan del Norte.4 4 0 368.30082 0.8950379
Aklan.5 1 0 129.63132 0.8716863
Aklan.6 2 3 282.95535 0.8716863
Aklan.7 3 3 460.29969 0.8716863
Aklan.8 4 0 460.29969 0.8437920
Albay.9 1 0 280.12221 0.8696165
Albay.10 2 3 453.05098 0.8696165
Albay.11 3 1 720.40732 0.8696165
Albay.12 4 0 720.40732 0.8254676
Essentially the above tmap code creates three maps for this time period side-by-side for each of the different covariates ('total_incomeMed','IRA_depMean','pov'). This is happening, but the provinces are mislabeled and the data is tied to the name of the province. I just need the provinces properly labeled!
Sorry if this doesn't make sense. Happy to clarify more if needed.

Related

Convert lat and long dataframe to multiple spatial polygons in R

I have two problems I'm trying to solve, the first issue is the main one. Hopefully I've explained the second one decently.
1) My initial issue is trying to create spatial polygon dataframe from a tibble. For example, I have a tibble that outlines U.S. states, from the urbnmapr library and I want to be able to plot spatial polygons for all 50 states. (Note: I already have made a map from these data in ggplot but I specifically want spatial polygons to plot and animate in leaflet):
> states <- urbnmapr::states
> states
# A tibble: 83,933 x 10
long lat order hole piece group state_fips state_abbv state_name fips
<dbl> <dbl> <int> <lgl> <fct> <fct> <chr> <chr> <chr> <chr>
1 -88.5 31.9 1 FALSE 1 01.1 01 AL Alabama 01
2 -88.5 31.9 2 FALSE 1 01.1 01 AL Alabama 01
3 -88.5 31.9 3 FALSE 1 01.1 01 AL Alabama 01
...
2) Once I do this, I will want to join additional data from a separate tibble to the spatial polygons by the state name. What would be the best way to do that if I different data for each year? i.e. for the 50 states I have three years of data, so would I create 150 different polygons for the states across years or have 50 state polygons but have all the information in each to be able to make 3 different plots of all states for the different years?
I can propose you the following (unchecked because I don't have access to the urbnmapr package with my R version).
Problem 1
If you specifically want polygons, I think the best would be to join a dataframe to an object that comes from a shapefile.
If you still want to do it on your own, you need to do two things:
Convert your tibble into a spatial object with a point geometry
Aggregate points by state
sf package can do both. For the first step (the easy one), use sf_as_sf function.
library(sf)
states
states_spat <- states %>% st_as_sf(., coords = c("lon","lat"))
For the second step, you will need to aggregate geometries. I can propose you something that will give you a MULTIPOINT geometry, not polygons. To convert into polygons, you could find this thread to help
states_spat <- states_spat %>% group_by(state_name) %>%
dplyr::summarise(x = n())
Problem 2
That's a standard join based on a common attributes between your data and a spatial object (e.g. a state code). merge or *_join functions from dplyr work with sf data as they would do with tibbles. You have elements there
By the way, I think it is better for you to do that than creating your own polygons from a series of points.

R Leaflet- Change density to column name of my own

I have been working on leaflet in R.
https://rstudio.github.io/leaflet/choropleths.html
The above us-Map contains density of a state.The Format of the data is Geo-Json. I want to remove the density variable and I want to pass my columnname with corresponding variable value. (For Example when you hover on the New Mexico I am getting density as 17.16 (density:17.16), instead I want to display as (mycolumnname:value) ).
This is a pretty common need in working with leaflet. There are a few ways to do this, but this is the simplest in my mind:
All of the information you would like to plot is stored in the section of the SpatialPolygonsDataFrame found at states#data, which you can see by looking at the head of this data frame section:
I made a data frame (traditional r data frame) using the state names from the original SpatialPolygonsDataFrame names states in your code above and created my_var.
a<-data.frame( States=states#data$name)
a$my_var <- round(runif(52, 15, 185),2)
This is the first few rows of my new data frame, which is like yours but has data OTHER than density in it.
head(a)
States my_var
1 Alabama 120.33
2 Alaska 179.41
3 Arizona 67.92
4 Arkansas 30.57
5 California 72.26
6 Colorado 56.33
Now that you have this data frame you can call up the library maptools and do a polygon cbind as follows:
states2<-spCbind(states,a$my_var)
Now looking at the head of states2 (which you could name states and replace the original states SpatialPolygonsDataFrame I kept both to compare before and after)
head(states2#data)
id name density data.my_var
0 01 Alabama 94.650 58.01
1 02 Alaska 1.264 99.01
2 04 Arizona 57.050 81.05
3 05 Arkansas 56.430 124.68
4 06 California 241.700 138.19
5 08 Colorado 49.330 103.78
this added the data.my_var variable into the spatial data frame. Now you can use find/replace, to go through and replace the references in your code where it says density with data.my_var and the new variables will be used.
Important things to consider
Your data has 50 state names, the spatial data frame has 52, you will need to add in the missing states to your data frame before cBinding them, they must be the same length AND in the same order.
If you grab the names like this:
a<-data.frame( States=states#data$name)
from the states object, you can then left merge on States, with your data and it will keep the order a and all the cells which are empty where the new regions have not data in your data set will remain empty.
Use merge to be sure that data lines up properly.
a<- merge(a, your_data ,by=c("States","name"))
Also, once they are merged and you have checked that states#data$name is in the same order as a$States, you can use any name you want as new heading in the SpatialPolygonDataFrame by extracting the data into a vector with the name you want prior to binding them:
my_var <- a$my_var
states2<-spCbind(states, my_var)
this will leave you with a data frame which looks like this:
id name density my_var
0 01 Alabama 94.650 58.01
1 02 Alaska 1.264 99.01
This is easier to address as a column name from inside leaflet without long strings.

R Raster .tif File: selecting raster#data#attributes

I'm using R's raster package to access the USDA National Crop Data Layer.
This is a .tif file. Within the Raster Object are a number of #data#attributes
that sit in a data frame. These are like bands, but they are not bands.
Here's the Crop Data Layer for Colorado:
CDL <- raster("cdl_30m_r_co_2015_albers.tif")
And here's a snippet of the attributes I want to select from:
> CDL#data#attributes
[[1]]
ID COUNT Class.Names Opacity
1 0 0 Background 0
2 1 5133840 Corn 255
3 2 0 Cotton 255
4 3 0 Rice 255
I'd like to be able to select some of these attributes in a new raster to plot or do calculations on them. (I think the COUNT is counts of pixels in the raster.) There are 256 attributes.
How might I do this?
So I have the answer, duh.
It is as simple as just doing a comparison on the raster for the value I want.
Corn <- CDL == 1
(A tif file in the raster sense, I think, is just a normal tif image with georeferencing. It is just a x rows by y cols bitmap and in this case the values
for each pixel are the values in the raster#data#attributes data frame ID column.)
Helps to know that the file format contains the answer the question.
Thanks

Shading counties using FIPS code in R map

I am looking for a way to shade counties on the US maps in R. I have list of numeric/char county FIPS code that I can input as parameter. I just need to highlight these counties -- so would just need to shade them and there are no values or variations corresponding to the counties. I tried to look up
library(choroplethr)
library(maps)
and
county_choropleth(df_pop_county)
head(df_pop_county)
region value
1 1001 54590
2 1003 183226
3 1005 27469
4 1007 22769
5 1009 57466
6 1011 10779
But these need a region, value pair. For e.g.,fips code and population in the above. Is there a way to call the county_choropleth function without having to use the values, just with the fipscode dataframe. In that way, I can my fips code with one color. What would be an efficient way to accomplish this in R using Choroplethr?
Here's an example using the maps library:
library(maps)
library(dplyr)
data(county.fips)
## Set up fake df_pop_county data frame
df_pop_county <- data.frame(region=county.fips$fips)
df_pop_county$value <- county.fips$fips
y <- df_pop_county$value
df_pop_county$color <- gray(y / max(y))
## merge population data with county.fips to make sure color column is
## ordered correctly.
counties <- county.fips %>% left_join(df_pop_county, by=c('fips'='region'))
map("county", fill=TRUE, col=counties$color)
Here's the resulting map:
Notice that counties with lower FIPS are darker, while counties with higher FIPS are lighter.

Merge spatial point dataset with Spatial grid dataset using R. (Master dataset is in SP Points format)

I am working on spatial datasets using R.
Data Description
My master dataset is in SpatialPointsDataFrame format and has surface temperature data (column names - "ruralLSTday", "ruralLSTnight") for every month. Data snippet is shown below:
Master Data - (in SpatialPointsDataFrame format)
TOWN_ID ruralLSTday ruralLSTnight year month
2920006.11 2920006 303.6800 289.6400 2001 0
2920019.11 2920019 302.6071 289.0357 2001 0
2920015.11 2920015 303.4167 290.2083 2001 0
3214002.11 3214002 274.9762 293.5325 2001 0
3214003.11 3214003 216.0267 293.8704 2001 0
3207010.11 3207010 232.6923 295.5429 2001 0
Coordinates:
longitude latitude
2802003.11 78.10401 18.66295
2802001.11 77.89019 18.66485
2803003.11 79.14883 18.42483
2809002.11 79.55173 18.00016
2820004.11 78.86179 14.47118
I want to add columns in the above data about rainfall and air temperature - This data is present in SpatialGridDataFrame in the table "secondary_data" for every month. Snippet of "secondary_data" is shown below:
Secondary Data - (in SpatialGridDataFrame format)
month meant.69_73 rainfall.69_73
1 1 25.40968 0.6283871
2 2 26.19570 0.4580542
3 3 27.48942 1.0800000
4 4 28.21407 4.9440000
5 5 27.98987 9.3780645
Coordinates:
longitude latitude
[1,] 76.5 8.5
[2,] 76.5 8.5
[3,] 76.5 8.5
[4,] 76.5 8.5
[5,] 76.5 8.5
Question
How do I add the columns from secondary data to my master data by matching over latitude longitude and month? Currently the latitude/longitude information in the two table above will not match exactly as master data is a set of points and secondary data is grid.
Is there a way to find the square of the grid on the "Secondary Data" that the lat/long of my master data falls into, and interpolate?
If your SpatialPointsDataFrame object is called x, and your SpatialGridDataFrame is called y, then
x <- cbind(x, over(x, y))
will add the attributes (grid cell values) of y matching to the locations of x, to the attributes of x. Match is done by point-in-grid cell.
Interpolation is a different question; a simple way would be inverse distance with the four nearest neighbours, e.g. by
library(gstat)
x = idw(meant.69_73~1, y, x, nmax = 4)
whether you want one, or the other really depends on what your grid cells mean: do they refer to (i) the point value at the grid cell center, (ii) a value that is constant throughout the grid cell, or (iii) an average value over the whole grid cell. First case: interpolate, second: use over, third: use area-to-point interpolation (not explained here).
R package raster will offer similar functionality, but use different names.

Resources