Can't replicate R tidysynth results - r

This is an odd sort of question, but I'm using the exact data and code from this github here: https://github.com/edunford/tidysynth
However, when I get to the code plotting trends, it turns out I'm plotting the trends for a synthetic vs. real Alabama, rather than the synthetic vs. real California, which is what it supposed to be happening -- and what the hithub results show!!
I am running this code in Rstudio on a laptop. How can the results be different? Why would it be showing a synthetic AL rather than synthetic CA?
All my code is below. Can somebody sanity-check me here -- do you get a trend for CA or for AL? I feel like I'm going crazy.
#install.packages("devtools")
#devtools::install_github("edunford/tidysynth")
require(tidysynth)
library(dplyr)
data("smoking")
smoking %>% dplyr::glimpse()
unique(smoking$state)
smoking_out <-
smoking %>%
synthetic_control(outcome = cigsale,
unit = state,
time = year,
i_unit = "California",
i_time = 1988,
generate_placebos=TRUE
) %>%
generate_predictor(time_window = 1980:1988,
ln_income = mean(lnincome, na.rm = T),
ret_price = mean(retprice, na.rm = T),
youth = mean(age15to24, na.rm = T)) %>%
generate_predictor(time_window = 1984:1988,
beer_sales = mean(beer, na.rm = T)) %>%
generate_predictor(time_window = 1975,
cigsale_1975 = cigsale) %>%
generate_predictor(time_window = 1980,
cigsale_1980 = cigsale) %>%
generate_predictor(time_window = 1988,
cigsale_1988 = cigsale) %>%
generate_weights(optimization_window = 1970:1988,
margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6
) %>%
generate_control()
smoking_out %>% plot_trends()
## This is the plot that is CLEARLY labeled as "Difference in synthetic control and observed Alabama"
smoking_out %>% plot_differences()
Oh -- And also, if I change generate_placebos=TRUE to FALSE in the synthetic_control() specifications, it doesn't run. (I was checking to see if it was stalling on another state via the placebo runs.)

Related

Spatial modelling temperature for specific locations (predictions and plots)

I have a dataset of max temperatures from a number of cities in the UK in 2020 from which i want to fit a spatial model. I have fitted a model but am struggling to predict for the locations I need, which in this instance is Morecambe, Coventry and Kinross on September 12th 2020. Does anyone know how i would create and plot predictions for these locations on this day? Data available from the link and all code available below to replicate;
maxtemp data; https://drive.google.com/file/d/1s9yBHsgaFRlF38CgiXCf_vum1DyhEbz4/view?usp=sharing
metadata(location) data; https://drive.google.com/file/d/1s9yBHsgaFRlF38CgiXCf_vum1DyhEbz4/view?usp=sharing
#converting data to long format and combining both dataframes
MaxTemp %>%
pivot_longer(.,Machrihanish:Lyneham, names_to = "Location") %>%
full_join(.,metadata) -> MaxTemp_df
#renaming value column to temperature
MaxTemp_df = MaxTemp_df %>%
rename(Temp = 'value')
#converting our gata to geodata
geo_MaxTemp_df = as.geodata(MaxTemp_df, coords.col = 4:5, data.col = 3)
full = dup.coords(geo_MaxTemp_df)
geo_MaxTemp_df2 = jitterDupCoords(geo_MaxTemp_df, max = 0.1, min = 0.05)
#converting to a variogram
vario = variog(geo_MaxTemp_df2, option = 'bin')
#fitting a basic parametric model
defult_mod = variofit(vario)
#creating predictions
pred_grid = expand.grid(seq(140, 150, by = 0.5), seq(30.5, 40.1, by = 0.5))
preds = krige.conv(geo_MaxTemp_df2, loc = pred_grid, krige = krige.control(obj.model = defult_mod))
In the final step i have created predictions, but these are for the whole of my data and not what i am looking for, they need to be edited but im not sure how

Problem programming with dplyr - error saying object not found in one part of pipe but working earlier

I have a working function with and if else that was largely duplicated code, so I'm trying to get rid of the duplication by using an in line if else statement. What's strange to me is that the same snippet works in one place of the code but not in the other.
library(dplyr)
library(highcharter)
plot_highchart <- function(.data,
group_by_variable = TRUE,
x_value = "Year",
y_value = "total",
.group = service,
.stacking = "normal",
chart_type = "column") {
.data %>%
# this next line works. If you comment out the hchart part it will group by and summarize
group_by(Year, if (group_by_variable == TRUE) !!rlang::enquo(.group) else NULL) %>%
summarize(total = sum(Spending)) %>%
hchart(chart_type, hcaes(x = !!rlang::ensym(x_value),
y = !!rlang::ensym(y_value),
group = if (group_by_variable == TRUE) !!rlang::ensym(.group) else NULL))
# same bit as before but I get an error
}
Here's the error I get when I try to run this:
Error: Problem with `mutate()` input `group`.
x object 'group_by_variable' not found
i Input `group` is `if (group_by_variable == TRUE) service else NULL`.
Which I find strange because group_by_variable was found before. Not really sure where to go from here.
Here's the dput on the data:
structure(list(Year = c(2016, 2016, 2016, 2016, 2016, 2016),
service = structure(c(10L, 10L, 10L, 10L, 10L, 10L), .Label = c("Defense Logistics Agency",
"Chemical and Biological Defense Program", "Defense Information Systems Agency",
"United States Special Operations Command", "Office of the Secretary Of Defense",
"Missile Defense Agency", "Defense Advanced Research Projects Agency",
"Navy", "Army", "Air Force"), class = "factor"), Spending = c(0.803,
0.628, 0.2, 23.72, 4.782, 12.152)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -6L))
hcaes() captures the expression you provide to group and delays its evaluation. However, the expression undergoes a series of changes in the highcharter package. One of these steps assigns the global environment to be the evaluation context, which then causes the R interpreter to look for group_by_variable in the global scope, and not in your function where it is defined.
One workaround is to pull the if statement outside haes(), so group_by_variable doesn't get captured by the function as part of the expression to evaluate:
plot_highchart <- function(.data,
group_by_variable = TRUE,
x_value = "Year",
y_value = "total",
.group = service,
.stacking = "normal",
chart_type = "column") {
g <- if (group_by_variable == TRUE) list(group = rlang::ensym(.group))
else NULL
.data %>%
group_by(Year, !!g[[1]]) %>%
summarize(total = sum(Spending)) %>%
hchart(chart_type, hcaes(x = !!rlang::ensym(x_value),
y = !!rlang::ensym(y_value),
!!!g))
}
plot_highchart( .data ) # Works
plot_highchart( .data, group_by_variable=FALSE ) # Also works
Here, I am also storing the grouping symbol inside a named list and then using that list with !!!. This is necessary to handle the group_by_variable == FALSE case, because:
hcaes( x = ..., y = ... ) # Works
hcaes( x = ..., y = ..., group = NULL ) # Doesn't

Formatting an ftable in R

I have the following 3 way table I created in R.
with(dataset, ftable(xtabs(count ~ dos + sex + edu)))
The output looks like
edu high low medium unknown
dos sex
five-to-ten-years female 247776 44916 127133 23793
male 225403 37858 147821 20383
five-years-or-less female 304851 58018 182152 33649
male 253977 55720 193621 28972
more-than-ten-years female 709303 452605 539403 165675
male 629162 309193 689299 121336
native-born female 1988476 1456792 2094297 502153
male 1411509 1197395 2790522 395953
unknown female 57974 75480 73204 593141
male 40176 57786 93108 605542
I want to rename the variables and format the table so that I can include it in a report. I know that I can use dnn to rename the variables, but are there any other recommendations to rename the variables? And to format the table (similar to using kable)?
You could convert the output to a text matrix using the following function, after which you can style with kable however you choose:
ftab_to_matrix <- function(ft)
{
row_vars <- attr(ft, "row.vars")
for(i in seq_along(row_vars)){
row_vars[[i]] <- c(names(row_vars[i]), row_vars[[i]])}
rowvar_widths <- sapply(row_vars, function(x) max(nchar(x))) + 1
col_vars <- attr(ft, "col.vars")
rowvar_widths <- c(1, cumsum(c(rowvar_widths, max(nchar(names(col_vars))))))
ft_text <- capture.output(print(ft))
row_cols <- sapply(seq_along(rowvar_widths)[-1], function(x)
substr(ft_text, rowvar_widths[x - 1], rowvar_widths[x]))
ft_text <- substr(ft_text, rowvar_widths[length(rowvar_widths)] + 2, 100)
ft_breaks <- c(1, cumsum(lapply(strsplit(ft_text[length(ft_text)], "\\d "),
function(x) nchar(x) + 2)[[1]]))
col_cols <- sapply(seq_along(ft_breaks)[-1], function(x)
substr(ft_text, ft_breaks[x - 1], ft_breaks[x]))
trimws(cbind(row_cols, col_cols))
}
So, for example, using my example data from your last question, you could do something like:
my_tab <- with(`3waydata`, ftable(xtabs(count ~ duration + sex + education)))
as_image(kable_styling(kable(ftab_to_df(my_tab))), file = "kable.png")
Might have been easier had you given the full picture when you asked your first question... You could use gt to make fancy tables for reports. This is an edited version more fully demonstrating some capabilities.
library(dplyr)
library(gt)
way3data <- data %>%
group_by(duration, education, sex) %>%
summarise(count = sum(number)) %>%
ungroup
# Reorder with select and Titlecase with stringr
longer <- tidyr::pivot_wider(way3data,
values_from = count,
names_from = "education") %>%
select(duration, sex, high, medium, low, unknown) %>%
rename_with(stringr::str_to_title)
# Demonstrating some of the features of gt
# obviously could have done some of this
# to the original dataframe
myresults <- longer %>%
group_by(Duration) %>%
gt(rowname_col = "Sex") %>%
row_group_order(
groups = c("native-born",
"more-than-ten-years",
"five-to-ten-years",
"five-years-or-less",
"unknown")
) %>%
tab_spanner(label = "Education",
columns = matches("High|Low|Medium|Unknown")) %>%
tab_stubhead(label = "Duration or something") %>%
tab_style(
style = cell_text(style = "oblique", weight = "bold"),
locations = cells_row_groups()) %>%
tab_style(
style = cell_text(align = "right", style = "italic", weight = "bold"),
locations = cells_column_labels(
columns = vars(High, Low, Medium, Unknown)
)) %>%
tab_style(
style = cell_text(align = "right", weight = "bold"),
locations = cells_stub()) %>%
tab_header(
title = "Fancy table of counts with Duration, Education and Gender") %>%
tab_source_note(md("More information is available at https://stackoverflow.com/questions/62284264."))
# myresults
# Can save in other formats including .rtf
myresults %>%
gtsave(
"tab_1.png", expand = 10
)
You can read about all the formatting choices here
Data compliments of Allan
set.seed(69)
data <- data.frame(education = sample(c("high","low","medium","unknown"), 600, T),
sex = rep(c("Male", "Female"), 300),
duration = sample(c("unknown", "native-born",
"five-years-or-less", "five-to-ten-years",
"more-than-ten-years"), 600, T),
number = rpois(600, 10))

Problem in plotting a map graph using highcharter

I am having troubles trying to write R code for a choroplet using the highcharter package. I am trying to replicate the code in the following link on lines 84-112: https://www.kaggle.com/gloriousc/global-terrorism-in-1970-2016/code.
I have been encountering 2 errors:
When running line 95, error says that there is no object called "countrycode_data". I looked on the internet in order to find out what countrycode_data is and I discovered that it is a dataset of the containing country code to associate to country names in datasets. Countrycode_data, from what I understood, it should have been contained in the "countrycode" package that I had installed but I didn't manage to find out how to access this dataset. In order to overcome this problem i downloaded this dataset from the internet and managed to go on with the code.
When running the choroplet code starting on line 103, I encountered the following error: "Error: %in%(x = tail(joinBy, 1), table = names(df)) is not TRUE". I actually have no idea about what this error could mean, so I'm here asking for help.
I managed to overcome the 1st error problem even though I am not sure that it is the correct way.
I am going to leave the entire code right here:
knitr::opts_chunk$set(echo=TRUE, error=FALSE)
library(dplyr) #manipulate table
library(ggplot2) #visualization
library(highcharter) #making map
library("viridisLite") #Default Color Maps
library(countrycode) #list of country code
library(treemap) #make a treemap chart
library(reshape2) #melt function
library(plotly) #pie chart
library(tm) #text mining
library(SnowballC) #stemming text
library(wordcloud) #make a text chart
library(RColorBrewer) #make a color pallette
library(DT) #make datatable
#input the data
terror <- read.csv("../input/globalterrorismdb_0617dist.csv")
Terrorist Incidents Map
#count terrorism incidents per country as a dataframe
countries <- terror %>%
group_by(country_txt) %>%
summarise(Total = round(n()))
#Making a terrorism map
#Credit to umeshnarayanappa
names(countries) <- c("country.name", "total") #change the column name
countries$iso3 <- countrycode_data[match(countries$country.name, countrycode_data$country.name.en), "iso3c"] #add iso3 column from country_code
data(worldgeojson, package = "highcharter")
dshmstops <- data.frame(q = c(0, exp(1:5)/exp(5)),
c = substring(viridis(5 + 1, option = "D"), 0, 7)) %>% #from viridisLite, make a color
list_parse2() #from highchart package, parse df to list
highchart() %>% #from highchart package
hc_add_series_map(worldgeojson, countries, value = "total", joinBy = "iso3") %>%
hc_colorAxis(stops = dshmstops) %>%
hc_legend(enabled = TRUE) %>%
hc_add_theme(hc_theme_db()) %>%
hc_mapNavigation(enabled = TRUE) %>%
hc_title(text = "Global Terrorism in 1970-2016", style = list(fontSize = "25px")) %>%
hc_add_theme(hc_theme_google()) %>%
hc_credits(enabled = TRUE, text = "Sources: National Consortium for the Study of Terrorism and Responses to Terrorism (START)", style = list(fontSize = "10px"))
I want to specify that, even though I ctrl+c ctrl+v the lines, they are not working for me.
Thank you for reading everything and also, I hope, for your help.
I tried to replicate the example. I hope the following is enough for you to work by yourself and replicate the example. It seems that countrycode_data is on the psData package. This package requires the rJava package, which is not on my machine now. As you were looking for a workaround, I found my own way; I scrape country data including iso3. (You can probably use the ISOcodes package too.) You need to check if country names in the two datasets are identical or not, which is a common challenge. You usually see some mismatches. I do not have time to correct all, but I showed you how to revise some country names in recode(). The bottom line is that you want to add iso3 to countries. So you need to make sure that you have identical country names as much as possible. (Obviously, some countries do not exist any more. You cannot really do anything about them.) The author used match() in his code, but I rather used left_join() to do the same. After this, I think you are ready to follow the rest of the code. Note that hc_add_series_map() is also doing a join process. worldgeojson has a property called iso3. countries must have a column called iso3. Otherwise, you will get the same error message again.
library(tidyverse)
library(data.table)
library(rvest)
library(highcharter)
library(viridisLite)
# I used fread(). This is much faster.
terror <- fread("globalterrorismdb_0919dist.csv")
# I wrote my own code which does the same job.
count(terror, country_txt) %>%
setNames(nm = c("country.name", "total")) -> countries
# Get iso3 data
map_dfc(.x = c("official", "shortname", "iso3"),
.f = function(x) {read_html("http://www.fao.org/countryprofiles/iso3list/en/") %>%
html_nodes(paste("td.", x, sep = "")) %>%
html_text() %>%
gsub(pattern = "\\n(\\s+)?", replacement = "")}) %>%
setNames(nm = c("official", "shortname", "iso3")) -> iso3
# Revise some country names.
mutate(iso3, shortname = trimws(sub(x = shortname, pattern = "\\(.*\\)",
replacement = "")),
shortname = recode(.x = shortname,
`Bosnia and Herzegovina` = "Bosnia-Herzegovina",
`Brunei Darussalam` = "Brunei",
Czechia = "Czech Republic",
Congo = "Republic of the Congo",
`Côte d'Ivoire` = "Ivory Coast",
`Russian Federation` = "Russia",
`United Kingdom of Great Britain and Northern Ireland` = "United Kingdom",
`United States of America`= "United States"
)) -> iso3
# Join the two data sets
left_join(countries, iso3, by = c("country.name" = "shortname")) -> countries
data(worldgeojson, package = "highcharter")
dshmstops <- data.frame(q = c(0, exp(1:5)/exp(5)),
c = substring(viridis(5 + 1, option = "D"), 0, 7)) %>% #from viridisLite, make a color
list_parse2()
highchart() %>% #from highchart package
hc_add_series_map(worldgeojson, df = countries,
value = "total", joinBy = "iso3") %>%
hc_colorAxis(stops = dshmstops) %>%
hc_legend(enabled = TRUE) %>%
hc_add_theme(hc_theme_db()) %>%
hc_mapNavigation(enabled = TRUE) %>%
hc_title(text = "Global Terrorism in 1970-2016", style = list(fontSize = "25px")) %>%
hc_add_theme(hc_theme_google()) %>%
hc_credits(enabled = TRUE,
text = "Sources: National Consortium for the Study of Terrorism and Responses to Terrorism (START)",
style = list(fontSize = "10px"))

How do I map county-level data as a heatmap using FIPS codes (interactively?) in R

I am hoping to create an interactive map that will allow me to create a plot where users can change the year and variable plotted. I've seen the package tmap be used, so I'm imagining something like that, but I'd also take advice for a static map, or another approach to an interactive one. My data is much, much, richer than this, but looks something like:
example <- data.frame(fips = rep(as.numeric(c("37001", "37003", "37005", "37007", "37009", "37011", "37013", "37015", "37017", "37019"), 4)),
year = c(rep(1990, 10), rep(1991, 10), rep(1992, 10), rep(1993, 10)),
life = sample(1:100, 40, replace=TRUE),
income = sample(8000:1000000, 40, replace=TRUE),
pop = sample(80000:1000000, 40, replace=TRUE))
I'd like my output to be a map of ONLY the counties contained in my dataset (in my case, I have all the counties in North Carolina, so I don't want a map of the whole USA), that would show a heatmap of selected variables of interest (in this sample data, year, life, income, and pop. Ideally I'd like one plot with two dropdown-type menus that allow you to select what year you want to view, and which variable you want to see. A static map where I (rather than the user) defines year and variable would be helpful if you don't know how to do the interactive thing.
I've tried the following (taken from here), but it's static, which is not my ideal, and also appears to be trying to map the whole USA, so the part that's actually contained in my data (North Carolina) is very small.
library(maps)
library(ggmap)
library(mapproj)
data(county.fips)
colors = c("#F1EEF6", "#D4B9DA", "#C994C7", "#DF65B0", "#DD1C77",
"#980043")
example$colorBuckets <- as.numeric(cut(example$life, c(0, 20, 40, 60, 80,
90, 100)))
colorsmatched <- example$colorBuckets[match(county.fips$fips, example$fips)]
map("county", col = colors[colorsmatched], fill = TRUE, resolution = 0,
lty = 0, projection = "polyconic")
Here's almost the whole solution. I had hoped some package would allow mapping to be done by fips code alone, but haven't found one yet. You have to download shapefiles and merge them by fips code. This code does everything I wanted above except allow you to also filter by year. I've asking that question here, so hopefully someone will answer there.
# get shapefiles (download shapefiles [here][1] : http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_county_5m.zip )
usgeo <- st_read("~/cb_2014_us_county_5m/cb_2014_us_county_5m.shp") %>%
mutate(fips = as.numeric(paste0(STATEFP, COUNTYFP)))
### alternatively, this code *should* allow you download data ###
### directly, but somethings slightly wrong. I'd love to know what. ####
# temp <- tempfile()
# download.file("http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_county_5m.zip",temp)
# data <- st_read(unz(temp, "cb_2014_us_county_5m.shp"))
# unlink(temp)
########################################################
# create fake data
example <- data.frame(fips = rep(as.numeric(c("37001", "37003", "37005", "37007", "37009", "37011", "37013", "37015", "37017", "37019"), 4)),
year = c(rep(1990, 10), rep(1991, 10), rep(1992, 10), rep(1993, 10)),
life = sample(1:100, 40, replace=TRUE),
income = sample(8000:1000000, 40, replace=TRUE),
pop = sample(80000:1000000, 40, replace=TRUE))
# join fake data with shapefiles
example <- st_as_sf(example %>%
left_join(usgeo))
# drop layers (not sure why, but won't work without this)
example$geometry <- st_zm(example$geometry, drop = T, what = "ZM")
# filter for a single year (which I don't want to have to do)
example <- example %>% filter(year == 1993)
# change projection
example <- sf::st_transform(example, "+proj=longlat +datum=WGS84")
# create popups
incomepopup <- paste0("County: ", example$NAME, ", avg income = $", example$income)
poppopup <- paste0("County: ", example$NAME, ", avg pop = ", example$pop)
yearpopup <- paste0("County: ", example$NAME, ", avg year = ", example$year)
lifepopup <- paste0("County: ", example$NAME, ", avg life expectancy = ", example$life)
# create color palettes
yearPalette <- colorNumeric(palette = "Blues", domain=example$year)
lifePalette <- colorNumeric(palette = "Purples", domain=example$life)
incomePalette <- colorNumeric(palette = "Reds", domain=example$income)
popPalette <- colorNumeric(palette = "Oranges", domain=example$pop)
# create map
leaflet(example) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(stroke=FALSE,
smoothFactor = 0.2,
fillOpacity = .8,
popup = poppopup,
color = ~popPalette(example$pop),
group = "pop"
) %>%
addPolygons(stroke=FALSE,
smoothFactor = 0.2,
fillOpacity = .8,
popup = yearpopup,
color = ~yearPalette(example$year),
group = "year"
) %>%
addPolygons(stroke=FALSE,
smoothFactor = 0.2,
fillOpacity = .8,
popup = lifepopup,
color = ~lifePalette(example$life),
group = "life"
) %>%
addPolygons(stroke=FALSE,
smoothFactor = 0.2,
fillOpacity = .8,
popup = incomepopup,
color = ~incomePalette(example$income),
group = "income"
) %>%
addLayersControl(
baseGroups=c("income", "year", "life", "pop"),
position = "bottomleft",
options = layersControlOptions(collapsed = FALSE)
)
I'm still looking for a way to add a "year" filter that would be another interactive radio-button box to filter the data by different years.

Resources