ggplot error using rmarkdown - object ´percent´ not found - r

So I have been making this scatterplot using ggplot in R.
By using this code as listed below in a regular r script in RStudio I am able to produce the plot that I want to without any errors.
The problem is when I am trying to use the same code in a chunk using rmarkdown to knit to PDF.
I get an error saying: Error in check_breaks_labels(breaks, labels): object percent not found.
Any suggestions? Hope the reproducable example is ok.
library(tidyquant)
library(timetk)
library(ggplot2)
SPY <- tq_get("SPY", from = '2010-01-01',
to = "2020-04-04",
get = "stock.prices")
FXI <- tq_get("FXI", from = '2010-01-01',
to = "2020-04-04",
get = "stock.prices")
QQQ <- tq_get("QQQ", from = '2010-01-01',
to = "2020-04-04",
get = "stock.prices")
SPY_monthly_returns <- SPY %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
period = "monthly",
col_rename = "SPY_ret")
FXI_monthly_returns <- FXI %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
period = "monthly",
col_rename = "FXI_ret")
QQQ_monthly_returns <- QQQ %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
period = "monthly",
col_rename = "QQQ_ret")
SPY_monthly_mean_ret <- SPY_monthly_returns %>%
select(SPY_ret) %>%
.[[1]] %>%
mean(na.rm = TRUE)
FXI_monthly_mean_ret <- FXI_monthly_returns %>%
select(FXI_ret) %>%
.[[1]] %>%
mean(na.rm = TRUE)
QQQ_monthly_mean_ret <- QQQ_monthly_returns %>%
select(QQQ_ret) %>%
.[[1]] %>%
mean(na.rm = TRUE)
SPY_monthly_sd_ret <- SPY_monthly_returns %>%
select(SPY_ret) %>%
.[[1]] %>%
sd()
FXI_monthly_sd_ret <- FXI_monthly_returns %>%
select(FXI_ret) %>%
.[[1]] %>%
sd()
QQQ_monthly_sd_ret <- QQQ_monthly_returns %>%
select(QQQ_ret) %>%
.[[1]] %>%
sd()
d <- data.frame(meanret = c(SPY_monthly_mean_ret,FXI_monthly_mean_ret,QQQ_monthly_mean_ret), sd = c(SPY_monthly_sd_ret,FXI_monthly_sd_ret,QQQ_monthly_sd_ret), names = c("SPY","FXI","QQQ"))
ggplot(d, aes(sd,meanret, color= ticker)) +
geom_point(size=1) + geom_text(aes(label=names)) +
ggtitle("Monthly Risk-Return Plot") + xlab("Volatility") +
ylab("Mean Return") + theme_bw() +
scale_y_continuous(label = percent, limits = c(0, 0.02)) +
scale_x_continuous(label = percent, limits = c(0, 0.08))

The solution was to require(scales) and after that specify "ticker" as I had not done that in my code chunk.
Both were suggested really quickly which is very helpful to me who is learning while writing my bachelor thesis. Thanks a lot!

You forgot to quote percent.
ggplot(d, aes(sd,meanret, color= ticker)) +
geom_point(size=1) + geom_text(aes(label=names)) +
ggtitle("Monthly Risk-Return Plot") + xlab("Volatility") +
ylab("Mean Return") + theme_bw() +
scale_y_continuous(label = "percent", limits = c(0, 0.02)) +
scale_x_continuous(label = "percent", limits = c(0, 0.08))
When not using quotes, ggplot is looking for an object named percent that should hold a string. Because of this, you could also do
p <- ggplot(...)
mylabel = "This is my label"
p + scale_x_continuous(label = "mylabel", limits = c(0, 0.08))

Related

How to vary line types when using the autoplot() function to help the colour-blind among us?

Running the code below plots actual data (black line) against 4-month forecasts for that data. However, the forecast lines are indistinguishable to me since I don't see colours. How can the lines be distinguished from each other (with the actual (non-forecast) data line, the black line, made thicker than the others), either via dashed lines or the use of markers? In XLS I used dashed lines/markers to distinguish.
I have fooled around with ggplot(...scale_linetype_manual(values = c("TRUE" = "solid", "FALSE" = "dotted"))...) with no luck.
Code:
library(feasts)
library(fable)
library(ggplot2)
library(tsibble)
tmp <- data.frame(
Month = c(1,2,3,4,5,6,7,8,9,10),
StateX = c(1527,1297,933,832,701,488,424,353,302,280)
) %>%
as_tsibble(index = Month)
tmpNext4 <- data.frame(
Month = c(11,12,13,14),
StateX = c(211,182,153,125)
) %>%
as_tsibble(index = Month)
# Fit the models to tmp dataframe:
fit <- tmp %>%
model(
Mean = MEAN(StateX),
`Naïve` = NAIVE(StateX),
Drift = NAIVE(StateX ~ drift())
)
# Produce forecasts for the next 4 months:
fcTmp <- fit %>%
forecast(new_data = tmpNext4)
# Plot the forecasts:
fcTmp %>%
autoplot(tmp, level = NULL) +
autolayer(tmpNext4, StateX, colour = "black") +
labs(y = "Units",
title = "Units reaching dead state X",
subtitle = "(Months 11 - 15)") +
guides(colour = guide_legend(title = "Forecast"))
fcTmp %>%
ggplot(aes(Month, .mean)) +
geom_line(aes(linetype = .model, color = .model)) +
geom_line(aes(y = StateX, linetype = "Next4", color = "Next4"), data = tmpNext4) +
geom_line(aes(y = StateX), data = tmp)

How to convert to stat_summary_2d?

I am using 'flights' data set from 'nycflights13' package. I was required to convert to code containing 'stat_summary_2d' from the original code:
flights %>%
mutate(cancel = 1*(dep_time %>% is.na)) %>%
group_by(carrier, origin) %>%
summarise(cancel = 100*mean(cancel, na.rm = T)) %>%
ggplot() +
geom_tile(aes(origin, carrier, fill = cancel)) +
geom_text(aes(origin, carrier, label = round(cancel,1)),
col = "blue", size = 5) +
scale_fill_distiller("Cancel Ratio", palette = "RdYlGn") +
theme_bw()
In the data set the missing value of the 'dep_time' variable means a cancelled flight, and the variable 'cancel' is created by calculating the proportion of cancelled flights over the scheduled flights.
Below is how I apply 'stat_summary_2d' to convert the original code:
flights %>%
mutate(cancel = 1*(dep_time %>% is.na)) %>%
ggplot() +
stat_summary_2d(aes(carrier, origin, z = cancel)) +
geom_text(data = flights %>% group_by(carrier, origin) %>%
summarize(cancel = 100*mean(cancel, na.rm = T)) %>% ungroup,
aes(factor(carrier), origin, label = round(cancel,1)),
col = "blue", size = 5) +
scale_fill_distiller("Cancel Ratio", palette = "RdYlGn") +
theme_bw()
When I execute my code, the error is
> Error in summarize()`: ! Problem while computing `cancel = 100 *
> mean(cancel, na.rm = T)`. i The error occurred in group 1: carrier =
> "9E", origin = "EWR".
Could anyone tell me how to fix this problem? Thank you so much!
The original data is not updated when with the new column unless we use %<>% instead of %>%. But, it may be easier to create two objects
library(dplyr)
library(ggplot2)
flight1 <- flights %>%
mutate(cancel = 1*(dep_time %>% is.na))
flight2 <- flights1 %>%
group_by(carrier, origin) %>%
summarize(cancel = 100*mean(cancel, na.rm = TRUE), .groups = 'drop')
ggplot(flight1) +
stat_summary_2d(aes(carrier, origin, z = cancel)) +
geom_text(data = flight2, aes(factor(carrier),
origin, label = round(cancel,1)),
col = "blue", size = 5) +
scale_fill_distiller("Cancel Ratio", palette = "RdYlGn") +
theme_bw()

Write a function to plot original value, mom and yoy change for time series data in 3 subplots [duplicate]

Given two monthly time series data sample from this link.
I will need to create one plot containing 3 subplots: plot1 for the original values, plot2 for month over month changes, and plot3 for year over year changes.
I'm able to draw the plot with code below, but the code is too redundant. So my question is how could achieve that in a concise way? Thanks.
library(xlsx)
library(ggplot2)
library(reshape)
library(dplyr)
library(tidyverse)
library(lubridate)
library(cowplot)
library(patchwork)
df <- read.xlsx('./sample_data.xlsx', 'Sheet1')
colnames(df)
# df
cols <- c('food_index', 'energy_index')
df <- df %>% mutate(date=as.Date(date)) %>%
mutate(across(-contains('date'), as.numeric)) %>%
mutate(date= floor_date(date, 'month')) %>%
group_by(date) %>%
summarise_at(vars(cols), funs(mean(., na.rm=TRUE))) %>%
mutate(across(cols, list(yoy = ~(. - lag(., 12))/lag(., 12)))*100) %>%
mutate(across(cols, list(mom = ~(. - lag(., 1))/lag(., 1)))*100) %>%
filter(date >= '2018-01-01' & date <= '2021-12-31') %>%
as.data.frame()
df1 <- df %>%
select(!grep('mom|yoy', names(df)))
df1_long <- melt(df1, id.vars = 'date')
plot1 <- ggplot(df1_long[!is.na(df1_long$value), ],
aes(x = date,
y = value,
col = variable)) +
geom_line(size=0.6, alpha=0.5) +
geom_point(size=1, alpha=0.8) +
labs(
x='',
y='Unit: $'
)
# MoM changes
df2 <- df %>%
select(grep('date|mom', names(df)))
df2_long <- melt(df2, id.vars = 'date')
plot2 <- ggplot(df2_long[!is.na(df2_long$value), ],
aes(x = date,
y = value,
col = variable)) +
geom_line(size=0.6, alpha=0.5) +
geom_point(size=1, alpha=0.8) +
labs(
x='',
y='Unit: %'
)
# YoY changes
df3 <- df %>%
select(grep('date|yoy', names(df)))
df3_long <- melt(df3, id.vars = 'date')
plot3 <- ggplot(df3_long[!is.na(df3_long$value), ],
aes(x = date,
y = value,
col = variable)) +
geom_line(size=0.6, alpha=0.5) +
geom_point(size=1, alpha=0.8) +
labs(
x='',
y='Unit: %'
)
plot <- plot1 + plot2 + plot3 + plot_layout(ncol=1)
# plot <- plot_grid(plot1, plot2, plot3, labels = c('Value', 'MoM', 'YoY'), label_size = 12)
plot
Out:
The expected result will be similar to the plot below (the upper plot will display the original data, the middle plot will display the mom changes data, and the lower plot will display the yoy changes data):
References:
https://waterdata.usgs.gov/blog/beyond-basic-plotting/
http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/
Side-by-side plots with ggplot2
Maybe this is what you are looking for? By reshaping your data to the right shape, using a plot function and e.g. purrr::map2 you could achieve your desired result without duplicating your code like so.
Using some fake random example data to mimic your true data:
library(tidyr)
library(dplyr)
library(ggplot2)
df_long <- df |>
rename(food_index_raw = food_index, energy_index_raw = energy_index) |>
pivot_longer(-date, names_to = c("variable", ".value"), names_pattern = "^(.*?_index)_(.*)$")
plot_fun <- function(x, y, ylab) {
x <- x |>
select(date, variable, value = .data[[y]]) |>
filter(!is.na(value))
ggplot(
x,
aes(
x = date,
y = value,
col = variable
)
) +
geom_line(size = 0.6, alpha = 0.5) +
geom_point(size = 1, alpha = 0.8) +
labs(
x = "",
y = ylab
)
}
yvars <- c("raw", "mom", "yoy")
ylabs <- paste0("Unit: ", c("$", "%", "%"))
plots <- purrr::map2(yvars, ylabs, plot_fun, x = df_long)
library(patchwork)
wrap_plots(plots) + plot_layout(ncol = 1)
DATA
set.seed(123)
date <- seq.POSIXt(as.POSIXct("2017-01-31"), as.POSIXct("2022-12-31"), by = "month")
food_index <- runif(length(date))
energy_index <- runif(length(date))
df <- data.frame(date, food_index, energy_index)
EDIT Adding subtitles to each plot when using patchwork is (as of the moment) a bit tricky. What I would do in this case would be to use a faceting "hack". To this end I slightly adjusted the function to take a subtitle argument and switched to purrr::pmap:
library(tidyr)
library(dplyr)
library(ggplot2)
df_long <- df |>
rename(food_index_raw = food_index, energy_index_raw = energy_index) |>
pivot_longer(-date, names_to = c("variable", ".value"), names_pattern = "^(.*?_index)_(.*)$")
plot_fun <- function(x, y, ylab, subtitle) {
x <- x |>
select(date, variable, value = .data[[y]]) |>
filter(!is.na(value))
ggplot(
x,
aes(
x = date,
y = value,
col = variable
)
) +
geom_line(size = 0.6, alpha = 0.5) +
geom_point(size = 1, alpha = 0.8) +
facet_wrap(~.env$subtitle) +
labs(
x = "",
y = ylab
) +
theme(strip.background = element_blank(), strip.text.x = element_text(hjust = 0))
}
yvars <- c("raw", "mom", "yoy")
ylabs <- paste0("Unit: ", c("$", "%", "%"))
subtitle <- c("Original", "Month-to-Month", "Year-to-Year")
plots <- purrr::pmap(list(y = yvars, ylab = ylabs, subtitle = subtitle), plot_fun, x = df_long)
library(patchwork)
wrap_plots(plots) + plot_layout(ncol = 1)
The target output is done with facets rather than stitching plots together. You could do this too if you like, but it requires reshaping your data in a different way. Which approach you take is really a matter of taste.
library(ggplot2)
library(dplyr)
yoy <- function(x) 100 * (x - lag(x, 13)) / lag(x, 12)
mom <- function(x) 100 * (x - lag(x)) / lag(x)
df %>%
mutate(date = as.Date(date, origin = "1899-12-30"),
`Actual value (Dollars).Food Index` = food_index,
`Month-on-month change (%).Food Index` = mom(food_index),
`Year-on-year change (%).Food Index` = yoy(food_index),
`Actual value (Dollars).Energy Index` = energy_index,
`Month-on-month change (%).Energy Index` = mom(energy_index),
`Year-on-year change (%).Energy Index` = yoy(energy_index)) %>%
select(-food_index, -energy_index) %>%
tidyr::pivot_longer(-1) %>%
filter(date > as.Date("2018-01-01")) %>%
tidyr::separate(name, into = c("series", "index"), sep = "\\.") %>%
ggplot(aes(date, value, color = index)) +
geom_point(na.rm = TRUE) +
geom_line() +
facet_grid(series~., scales = "free_y") +
theme_bw(base_size = 16)
Reproducible data taken from link in question
df <- structure(list(date = c(42766, 42794, 42825, 42855, 42886, 42916,
42947, 42978, 43008, 43039, 43069, 43100, 43131, 43159, 43190,
43220, 43251, 43281, 43312, 43343, 43373, 43404, 43434, 43465,
43496, 43524, 43555, 43585, 43616, 43646, 43677, 43708, 43738,
43769, 43799, 43830, 43861, 43890, 43921, 43951, 43982, 44012,
44043, 44074, 44104, 44135, 44165, 44196, 44227, 44255, 44286,
44316, 44347, 44377, 44408, 44439, 44469, 44500, 44530, 44561
), food_index = c(58.53, 61.23, 55.32, 55.34, 61.73, 56.91, 54.27,
59.08, 60.11, 66.01, 60.11, 63.41, 69.8, 72.45, 81.11, 89.64,
88.64, 88.62, 98.27, 111.11, 129.39, 140.14, 143.44, 169.21,
177.39, 163.88, 135.07, 151.28, 172.81, 143.82, 162.13, 172.22,
176.67, 179.3, 157.27, 169.12, 192.51, 194.2, 179.4, 169.1, 193.17,
174.92, 181.92, 188.41, 192.14, 203.41, 194.19, 174.3, 174.86,
182.33, 182.82, 185.36, 192.41, 195.59, 202.6, 201.51, 225.01,
243.78, 270.67, 304.57), energy_index = c(127.36, 119.87, 120.96,
112.09, 112.19, 109.24, 109.56, 106.89, 109.35, 108.35, 112.39,
117.77, 119.52, 122.24, 120.91, 125.41, 129.72, 135.25, 139.33,
148.6, 169.62, 184.23, 204.38, 198.55, 189.29, 202.47, 220.23,
240.67, 263.12, 249.74, 240.84, 243.42, 261.2, 256.76, 258.69,
277.98, 289.63, 293.46, 310.81, 318.68, 310.04, 302.17, 298.62,
260.92, 269.29, 258.84, 241.68, 224.18, 216.36, 226.57, 235.98,
253.86, 267.37, 261.99, 273.37, 280.91, 291.84, 297.88, 292.78,
289.79)), row.names = c(NA, 60L), class = "data.frame")

Creating a geographic file for use with tmap and coming up with error when coding shapefile

I am trying to reproduce a map I found here: http://zevross.com/blog/2018/10/02/creating-beautiful-demographic-maps-in-r-with-the-tidycensus-and-tmap-packages/
I am using RStudio and am running the following code:
library(ggplot2) # For plotting
library(tidycensus) # For downloading Census data
library(tmap) # For creating tmap
library(tmaptools) # For reading and processing spatial data related to tmap
library(dplyr) # For data wrangling
library(sf) # For reading, writing and working with spatial objects
census_api_key("enter your API key here", overwrite = TRUE)
dat12 <- get_acs("county", table = "B27001", year = 2012,
output = "tidy", state = NULL, geometry = FALSE) %>%
rename(`2012` = estimate) %>%
select(-NAME, -moe)
dat16 <- get_acs("county", table = "B27001", year = 2016,
output = "tidy", state = NULL, geometry = TRUE, shift_geo = TRUE) %>%
rename(`2016` = estimate) %>%
select(-moe)
dat <- left_join(dat16, dat12, by = c("GEOID", "variable"))
st_geometry(dat) <- NULL # This drops the geometry and leaves a table
head(dat)
dat <- mutate(dat,
cat = case_when(
variable %in% paste0("B27001_0",
c("09","12","37","40")) ~ "pop1834",
variable %in% paste0("B27001_0",
c("11","14","39","42")) ~ "pop1834ni")) %>%
filter(!is.na(cat))
# Create long version
dat <- tidyr::gather(dat, year, estimate, c(`2012`, `2016`))
# Group the data by our new categories and sum
dat <- group_by(dat, GEOID, NAME, year, cat) %>%
summarize(estimate = sum(estimate)) %>%
ungroup() %>%
tidyr::spread(cat, estimate)
dat <- mutate(dat, est = (pop1834ni/pop1834) * 100) %>%
select(-c(pop1834, pop1834ni)) %>%
tidyr::spread(year, est) %>%
mutate(diff = `2016`-`2012`)
head(dat)
datlong <- select(dat, -diff) %>%
tidyr::gather(year, estimate, c(`2012`, `2016`)) %>%
group_by(year) %>%
mutate(med = round(median(estimate, na.rm = TRUE), 1))
ggplot(datlong, aes(estimate)) +
geom_histogram(fill = "firebrick2",
color = "white", bins = 60) +
xlab("Uninsured adults ages 18-34 by county (%)") +
theme(plot.title = element_text(hjust = 0.5)) +
facet_wrap(~year, ncol = 1) +
geom_vline(aes(xintercept = med,
group = year), lty = "dashed") +
geom_text(aes(label = paste("Median = ", med), x = med, y = 55))
d10 <- top_n(dat, 10, diff) %>%
mutate(type = "Insured population decreased",
difftemp = diff)
i10 <- top_n(dat, -10, diff) %>%
mutate(type = "Insured population increased",
difftemp = abs(diff))
id10 <- bind_rows(list(i10, d10)) %>%
arrange(desc(difftemp))
ggplot(id10) +
geom_col(aes(x = forcats::fct_reorder(NAME, difftemp),
y = difftemp, fill = type)) +
coord_flip() +
scale_fill_manual(values = c("firebrick2", "cyan4")) +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "bottom",
legend.title = element_blank()) +
ggtitle("Counties with the greatest change (+/-) in
insured population, ages 18-34, 2012-2016") +
ylab("Difference in % insured (2016 - 2012)") +
xlab("")
shp <- dat16 %>%
filter(variable == "B27001_001") # much faster than using distinct()
select(GEOID, NAME) %>%
left_join(dat, by = c("GEOID", "NAME")) %>%
arrange(GEOID) %>%
rename(uninsured_2012 = `2012`,
uninsured_2016 = `2016`,
uninsured_diff = diff)
Up until the last bit of code, the one that begins with shp, everything runs perfect. Once,
shp <- dat16 %>%
filter(variable == "B27001_001") # much faster than using distinct()
select(GEOID, NAME) %>%
left_join(dat, by = c("GEOID", "NAME")) %>%
arrange(GEOID) %>%
rename(uninsured_2012 = `2012`,
uninsured_2016 = `2016`,
uninsured_diff = diff)
is run, I get the following error:
Error in select(GEOID, NAME) : object 'GEOID' not found
I have checked dat16 and dat. GEOID and NAME are present there. I am not sure what is wrong with the SELECT function as I have not loaded another library which may interfere with it. Any help would be appreciated.
I see now what was missing, a %>% (pipe) following the 'filter':
shp <- dat16 %>%
filter(variable == "B27001_001") %>% # much faster than using distinct()
select(GEOID, NAME) %>%
left_join(dat, by = c("GEOID", "NAME")) %>%
arrange(GEOID) %>%
rename(
uninsured_2012 = `2012`,
uninsured_2016 = `2016`,
uninsured_diff = diff
)

Wrong visual plot representation of a correlation

I built this code to check the correlation between Bitcoin and other altCoins, the table data is working fine, my only issue is when I try to plot the result to get a visual representation, I get the below result:
library(tidyverse)
library(tidyquant)
library(timetk)
library(tibbletime)
library(corrr)
symbols <- c("BTC-USD","ETH-USD","TRX-USD","EOS-USD","ADA-USD")
prices <- getSymbols(symbols,
src = 'yahoo',
from = "2019-09-01",
to = "2020-03-24",
auto.assign = TRUE,
warnings = FALSE) %>%
map(~Ad(get(.))) %>%
reduce(merge) %>%
`colnames<-`(symbols)
prices_monthly <- to.monthly(prices, indexAt = "last", OHLC = FALSE)
prices_monthly %>% correlate() %>% focus('BTC-USD') %>%
ggplot(aes(x = rowname, y = 'BTC-USD')) +
geom_bar(stat = "identity") +
ylab("Correlation with BTC-USD") +
xlab("Variable")
Plot result
What did I did wrong, please?
You need backticks around the variable name.
prices_monthly %>% correlate() %>% focus('BTC-USD') %>%
ggplot(aes(x = rowname, y=`BTC-USD`)) + # <- Here
geom_bar(stat="identity") +
ylab("Correlation with BTC-USD") +
xlab("Variable")

Resources