I have a panel dataset that I am working with in the current format.
Country
Date
Value
Austria
1956-01-01
1.5
Sweden
1956-01-01
1.2
UnitedKingdom
1956-01-01
1.3
Austria
1957-01-01
1.6
Sweden
1957-01-01
1.0
UnitedKingdom
1957-01-01
1.8
I am currently writing a function to allow the value of one country to act as a "baseline" for a comparitor country on a ggplot graph
PerCap <- function(Data, Nation, Comparitor) {
Nation <- ensym(Nation)
Comparitor <- ensym(Comparitor)
p <- Data %>%
filter(Country %in% c(as_string(Nation), as_string(Comparitor))) %>%
select(c("Country", "Date", "PerCapHouse")) %>%
pivot_wider(names_from = Country, values_from = PerCapHouse) %>%
mutate(Base = !!Nation) %>%
mutate(across(where(is.numeric), ~ .x * 100 / Base)) %>%
select(-c(as_string(Nation), "Base")) %>%
pivot_longer(-c("Date"), names_to = "Country", values_to = "Relative Stock") %>%
ggplot(aes(x = Date, y = `Relative Stock`, color = Country)) + geom_line(size = 1.2) +
geom_hline(yintercept = 100, size = 1.2) +
ggtitle(paste("International Comparison of", as_label(enquo(Nation)), "Housing Stock")) +
labs(y = "Stock Per Capita")
}
Ideally I would like to use the function on an indefinite number of comparitor countries. I understand that I will need to pass something like
function(Data, Nation, ...)
but in this case how would I change the
filter(Country %in% c(as_string(Nation), as_string(Comparitor)))
part to turn every input of the (...) part of he function into a separate string component?
Please find the reproducible Data Below:
Combined <- structure(list(Country = c("Austria", "Sweden", "UnitedKingdom",
"Austria", "Sweden", "UnitedKingdom"), Date = structure(c(-5114,
-5114, -5114, -4748, -4748, -4748), class = "Date"), Value = c(1.5,
1.2, 1.3, 1.6, 1, 1.8)), row.names = c(NA, -6L), class = "data.frame")
Related
The data is Citi Bikes NYC data from January 2019 to December 2019, which can be viewed here:
https://s3.amazonaws.com/tripdata/index.html
You do not need to download the entire dataset you can download just one months
The following is an example of some of the columns of the data frame
start.station.latitude
start.station.longitude
end.station.latitude
end.station.longitude
usertype
40.77897
-73.97375
40.78822
-73.97042
Subscriber
40.75187
-73.97771
40.74780
-73.97344
Customer
The following is the code:
coordinates_table <- ridedata_clean %>% filter(start.station.latitude != end.station.latitude & start.station.longitude != end.station.longitude ) %>%
group_by(start.station.latitude,start.station.longitude,end.station.latitude,end.station.longitude,usertype) %>%
summarise(total = n(), .groups = "drop") %>% filter(total > 250)
Subscriber <- coordinates_table %>% filter(usertype == "Subscriber")
Customer <- coordinates_table %>% filter(usertype == "Customer")
nyc_bb <- c(left= -74.04, bottom = 40.93, right=-73.78, top =40.78)
nyc_stamen <- get_stamenmap( bbox = nyc_bb, zoom = 12, maptype = "toner")
ggmap(nyc_stamen, darken = c(0.8, "white")) +
geom_curve(Customer,
mapping = aes(x= start.station.longitude, y= start.station.latitude, xend = end.station.longitude,
yend = end.station.latitude, alpha = total, color =usertype), size = 0.5
, curvature =.2, arrow= arrow(length = unit(0.2,"cm"), ends = "first", type = "closed"))+
coord_cartesian()+labs(title = "most popular routes by Customers",
x=NULL,y=NULL,
caption = "Data by Citi Bikes and Map by ggmap ") +
theme(legend.position = "none")
The following is the error:
I am getting the following error while running the above code : Coordinate system already present. Adding new coordinate system, which will replace the existing one. Error in grid.Call.graphics(C_raster, x$raster, x$x, x$y, x$width, x$height, : Empty raster
I have a panel dataframe with the values of several countries.
Country
Date
Value
Austria
1956-01-01
1.5
Sweden
1956-01-01
1.2
UnitedKingdom
1956-01-01
1.3
Austria
1957-01-01
1.6
Sweden
1957-01-01
1.0
UnitedKingdom
1957-01-01
1.8
I would like to write a function that filters out a variable country and Britain to be able to make comparisons between them on a graph.
PubHaus <- function(df, Nation) {
df %>%
filter(Country == {{ Nation }} | Country == "Unitedkingdom" ) %>%
ggplot(aes(x = Date, y = Value, color = Country)) + geom_line(size = 1.2) +
geom_hline(yintercept = 0) + geom_vline(xintercept = as.POSIXct(as.Date("1948-01-01"))) +
labs(title = "Gross Public Housebuilding")
}
PubHaus(Combined, Sweden)
However when I pass this code I obtain the error
Error: Problem with `filter()` input `..1`.
i Input `..1` is `Country == Sweden | Country == "Unitedkingdom"`.
x dims [product 532] do not match the length of object [1180]
Run `rlang::last_error()` to see where the error occurred.
I imagine the error is something to do with not being able to define the Nation variable as a string. However for obvious reasons putting in
filter(Country == "Nation"
Would not return a country.
Does anyone know what I am doing wrong and how I fix this?
We could convert to string with as_string after converting to symbol as the input is unquoted. In addition, %in% would be more useful for more than one element as a general case
library(dplyr)
library(ggplot2)
PubHaus <- function(df, Nation) {
df %>%
filter(Country %in%
c(rlang::as_string(rlang::ensym(Nation)), "UnitedKingdom" )) %>%
ggplot(aes(x = Date, y = Value, color = Country)) +
geom_line(size = 1.2) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = as.POSIXct(as.Date("1948-01-01"))) +
labs(title = "Gross Public Housebuilding")
}
-testing
PubHaus(Combined, Sweden)
-output
data
Combined <- structure(list(Country = c("Austria", "Sweden", "UnitedKingdom",
"Austria", "Sweden", "UnitedKingdom"), Date = structure(c(-5114,
-5114, -5114, -4748, -4748, -4748), class = "Date"), Value = c(1.5,
1.2, 1.3, 1.6, 1, 1.8)), row.names = c(NA, -6L), class = "data.frame")
I am trying to reorder the following graph based on the rank of the lowest confidence interval (conf.low). This means that Austria (AU) should be the first country, Bulgaria (BG) the second and Belgium (BE) the third. I know there is a way to do it manually by choosing the order of the country variable but i prefer to find a way to do it automatically since i have 30 countries. Could someone help?
Here is the data and the code:
df= structure(list(cntry = structure(1:3, .Label = c("AU", "BE",
"BG"), class = "factor"), estimate = c(0.0053, 0.01740,
0.0036), conf.low = c(-0.0257, 0.0005,
-0.0006), conf.high = c(0.0365, 0.0343,
0.0079)), row.names = c(NA, -3L), class = "data.frame")
df %>%
arrange(estimate) %>%
mutate(label = replace(round(estimate, 3),cntry==1, '')) %>%
ggplot(aes(estimate, cntry,label=label)) +
geom_point()+
geom_text(vjust= -1) +
geom_linerange(mapping=aes(xmin=conf.low , xmax=conf.high, y=cntry)) +
geom_point(mapping=aes(x=estimate, y=cntry))
Using forcats::fct_reorder() you could do this:
library(dplyr)
library(ggplot2)
library(forcats)
df %>%
arrange(estimate) %>%
mutate(label = replace(round(estimate, 3), cntry==1, '')) %>%
ggplot(aes(estimate, fct_reorder(cntry, conf.low, .desc = TRUE),label=label)) +
geom_point()+
geom_text(vjust= -1) +
geom_linerange(mapping=aes(xmin=conf.low , xmax=conf.high, y=cntry)) +
geom_point(mapping=aes(x=estimate, y=cntry))+
ylab("Country")
Created on 2021-04-22 by the reprex package (v2.0.0)
data
df= structure(list(cntry = structure(1:3, .Label = c("AU", "BE",
"BG"), class = "factor"), estimate = c(0.0053, 0.01740,
0.0036), conf.low = c(-0.0257, 0.0005,
-0.0006), conf.high = c(0.0365, 0.0343,
0.0079)), row.names = c(NA, -3L), class = "data.frame")
I am new in R and creating a function that highlight list of countries from the data set in the plot.
functionality issue If country names are not passed as arguments (which can vary) then it should be able to take from default list of countries.
I understand ... is used for variable arguments and then may be I can use list(...) but I am not able to put this together with default values.
Is there a way I can write: country_highlight_plot(Australia, Singapore, Norway)
and if I don't mention any country then it takes default countries.
Below is the code (using gapminder data to reproduce):
library(tidyverse)
library(gghighlight)
library(scales)
library(gapminder)
country_highlight_plot <- function(df = gapminder, y_var = gdpPercap,
background_line_color = "grey",
countries = default_list
){
# default list of highlight countries
default_list = c("India","Singapore","Malaysia","Norway",
"Denmark","United States","United Kingdom","China")
# quoting y-axis variable
y_var = enquo(y_var)
# Data Prep.
df %>%
mutate(highlight_type = case_when(country %in% countries ~ "Yes",
TRUE ~ "No")) %>%
# Plotting
ggplot() +
geom_line(aes(x = year, y = round(!!y_var,2), col = country), size = 1.1) +
gghighlight(highlight_type == "Yes",
unhighlighted_params = list(size = 1, colour = alpha(background_line_color, 0.4))) +
# facet_wrap(~continent) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90)
) +
labs(title = "GDP/Cap for world countries across time",
subtitle = "created by ViSa",
caption = "Data Source: Gapminder",
y = "Total Tax Revenue % of GDP"
)
}
country_highlight_plot()
# EDITED below line to gapminder df only
# country_highlight_plot(df=gapminder, y_var=gdpPercap, background_line_color= "pink")
I think setting a sane default value and checking for it makes sense to me.
I tend to differentiate between NULL and NA, where NULL means something like "use nothing or everything" and NA means "use a sane default".
Untested:
country_highlight_plot <- function(df = gapminder, y_var = gdpPercap,
background_line_color = "grey",
countries = NA) {
if (is.null(countries)) {
countries <- sort(unique(df[["country"]])) # assuming 'country' is in there
} else if (anyNA(countries)) {
countries <- c("India", "Singapore", "Malaysia", "Norway",
"Denmark", "United States", "United Kingdom", "China")
countries <- intersect(countries, unique(df[["country"]]))
}
# ...
}
This allows one to use country_highlight_plot(..., countries=NULL) for all countries in df, and country_highlight_plot(..., countries=NA) for your default list of countries. The intersect call ensures that if this function is called as part of a filtered-gapminder dataset, it won't look for countries that are not present. (Based on your current %in% usage, this might not be strictly necessary ... but if you use countries for anything else, it might still be useful. Defensive programming.)
Tangentially: if you're writing a function as part of a package, then I suggest you should import (mandate) the gapminder package, and then use a similar technique for using that dataset:
country_highlight_plot <- function(df, y_var = gdpPercap,
background_line_color = "grey",
countries = NA) {
if (missing(df)) {
df <- get("gapminder", envir = asNamespace("gapminder"))
}
if (is.null(countries)) {
countries <- sort(unique(df[["country"]])) # assuming 'country' is in there
} else if (anyNA(countries)) {
countries <- c("India", "Singapore", "Malaysia", "Norway",
"Denmark", "United States", "United Kingdom", "China")
}
# ...
}
With this, you can do
# default data, your default_list
country_highlight_plot()
# default data, all countries
country_highlight_plot(countries = NULL)
# pre-defined data, all default_list countries found
dplyr::filter(gapminder, ...) %>%
country_highlight_plot(.)
# pre-defined data, all countries found
dplyr::filter(gapminder, ...) %>%
country_highlight_plot(., countries = NULL)
# default data, manual countries
country_highlight_plot(countries = c("a","B"))
I have the following 3 way table I created in R.
with(dataset, ftable(xtabs(count ~ dos + sex + edu)))
The output looks like
edu high low medium unknown
dos sex
five-to-ten-years female 247776 44916 127133 23793
male 225403 37858 147821 20383
five-years-or-less female 304851 58018 182152 33649
male 253977 55720 193621 28972
more-than-ten-years female 709303 452605 539403 165675
male 629162 309193 689299 121336
native-born female 1988476 1456792 2094297 502153
male 1411509 1197395 2790522 395953
unknown female 57974 75480 73204 593141
male 40176 57786 93108 605542
I want to rename the variables and format the table so that I can include it in a report. I know that I can use dnn to rename the variables, but are there any other recommendations to rename the variables? And to format the table (similar to using kable)?
You could convert the output to a text matrix using the following function, after which you can style with kable however you choose:
ftab_to_matrix <- function(ft)
{
row_vars <- attr(ft, "row.vars")
for(i in seq_along(row_vars)){
row_vars[[i]] <- c(names(row_vars[i]), row_vars[[i]])}
rowvar_widths <- sapply(row_vars, function(x) max(nchar(x))) + 1
col_vars <- attr(ft, "col.vars")
rowvar_widths <- c(1, cumsum(c(rowvar_widths, max(nchar(names(col_vars))))))
ft_text <- capture.output(print(ft))
row_cols <- sapply(seq_along(rowvar_widths)[-1], function(x)
substr(ft_text, rowvar_widths[x - 1], rowvar_widths[x]))
ft_text <- substr(ft_text, rowvar_widths[length(rowvar_widths)] + 2, 100)
ft_breaks <- c(1, cumsum(lapply(strsplit(ft_text[length(ft_text)], "\\d "),
function(x) nchar(x) + 2)[[1]]))
col_cols <- sapply(seq_along(ft_breaks)[-1], function(x)
substr(ft_text, ft_breaks[x - 1], ft_breaks[x]))
trimws(cbind(row_cols, col_cols))
}
So, for example, using my example data from your last question, you could do something like:
my_tab <- with(`3waydata`, ftable(xtabs(count ~ duration + sex + education)))
as_image(kable_styling(kable(ftab_to_df(my_tab))), file = "kable.png")
Might have been easier had you given the full picture when you asked your first question... You could use gt to make fancy tables for reports. This is an edited version more fully demonstrating some capabilities.
library(dplyr)
library(gt)
way3data <- data %>%
group_by(duration, education, sex) %>%
summarise(count = sum(number)) %>%
ungroup
# Reorder with select and Titlecase with stringr
longer <- tidyr::pivot_wider(way3data,
values_from = count,
names_from = "education") %>%
select(duration, sex, high, medium, low, unknown) %>%
rename_with(stringr::str_to_title)
# Demonstrating some of the features of gt
# obviously could have done some of this
# to the original dataframe
myresults <- longer %>%
group_by(Duration) %>%
gt(rowname_col = "Sex") %>%
row_group_order(
groups = c("native-born",
"more-than-ten-years",
"five-to-ten-years",
"five-years-or-less",
"unknown")
) %>%
tab_spanner(label = "Education",
columns = matches("High|Low|Medium|Unknown")) %>%
tab_stubhead(label = "Duration or something") %>%
tab_style(
style = cell_text(style = "oblique", weight = "bold"),
locations = cells_row_groups()) %>%
tab_style(
style = cell_text(align = "right", style = "italic", weight = "bold"),
locations = cells_column_labels(
columns = vars(High, Low, Medium, Unknown)
)) %>%
tab_style(
style = cell_text(align = "right", weight = "bold"),
locations = cells_stub()) %>%
tab_header(
title = "Fancy table of counts with Duration, Education and Gender") %>%
tab_source_note(md("More information is available at https://stackoverflow.com/questions/62284264."))
# myresults
# Can save in other formats including .rtf
myresults %>%
gtsave(
"tab_1.png", expand = 10
)
You can read about all the formatting choices here
Data compliments of Allan
set.seed(69)
data <- data.frame(education = sample(c("high","low","medium","unknown"), 600, T),
sex = rep(c("Male", "Female"), 300),
duration = sample(c("unknown", "native-born",
"five-years-or-less", "five-to-ten-years",
"more-than-ten-years"), 600, T),
number = rpois(600, 10))