Is there a way to subset data in ggrepel with data inherited from the pipe? [duplicate] - r

I am trying to subset a layer of a plot where I am passing the data to ggplot through a pipe.
Here is an example:
library(dplyr)
library(ggplot2)
library(scales)
set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
as.Date("2015-12-31"), by = "month"), 2),
Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
Indicator = as.factor(rep(c(1, 2), each = 12)))
df_example %>%
group_by(Month) %>%
mutate(`Relative Value` = Value/sum(Value)) %>%
ungroup() %>%
ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) +
geom_bar(position = "fill", stat = "identity") +
theme_bw()+
scale_y_continuous(labels = percent_format()) +
geom_line(aes(x = Month, y = `Relative Value`))
This gives:
I would like only one of those lines to appear, which I would be able to do if something like this worked in the geom_line layer:
geom_line(subset = .(Indicator == 1), aes(x = Month, y = `Relative Value`))
Edit:
Session info:
R version 3.2.1 (2015-06-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2012 x64
(build 9200)
locale: 2 LC_COLLATE=English_United States.1252
LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United
States.1252 LC_NUMERIC=C [5]
LC_TIME=English_United States.1252
attached base packages: 2 stats graphics grDevices utils
datasets methods base
other attached packages: 2 scales_0.3.0 lubridate_1.3.3
ggplot2_1.0.1 lazyeval_0.1.10 dplyr_0.4.3 RSQLite_1.0.0
readr_0.2.2 [8] RJDBC_0.2-5 DBI_0.3.1 rJava_0.9-7
loaded via a namespace (and not attached): 2 Rcpp_0.12.2
knitr_1.11 magrittr_1.5 MASS_7.3-40 munsell_0.4.2
lattice_0.20-31 [7] colorspace_1.2-6 R6_2.1.1 stringr_1.0.0
plyr_1.8.3 tools_3.2.1 parallel_3.2.1 [13] grid_3.2.1
gtable_0.1.2 htmltools_0.2.6 yaml_2.1.13 assertthat_0.1
digest_0.6.8 [19] reshape2_1.4.1 memoise_0.2.1
rmarkdown_0.8.1 labeling_0.3 stringi_1.0-1 zoo_1.7-12
[25] proto_0.3-10

tl;dr: Pass the data to that layer as a function that subsets the plot's data according to your criteria.
According to ggplots documentation on layers, you have 3 options when passing the data to a new layer:
If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot().
A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify() for
which variables will be created.
A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the
layer data.
The first two options are the most usual ones, but the 3rd is perfect for our needs when the data has been modified through pyps.
In your example, adding data = function(x) subset(x,Indicator == 1) to the geom_line does the trick:
library(dplyr)
library(ggplot2)
library(scales)
set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
as.Date("2015-12-31"), by = "month"), 2),
Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
Indicator = as.factor(rep(c(1, 2), each = 12)))
df_example %>%
group_by(Month) %>%
mutate(`Relative Value` = Value/sum(Value)) %>%
ungroup() %>%
ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) +
geom_bar(position = "fill", stat = "identity") +
theme_bw()+
scale_y_continuous(labels = percent_format()) +
geom_line(data = function(x) subset(x,Indicator == 1), aes(x = Month, y = `Relative Value`))
This is the resulting plot

library(dplyr)
library(ggplot2)
library(scales)
set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
as.Date("2015-12-31"), by = "month"), 2),
Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
Indicator = as.factor(rep(c(1, 2), each = 12)))
df_example %>%
group_by(Month) %>%
mutate(`Relative Value` = Value/sum(Value)) %>%
ungroup() %>%
ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) +
geom_bar(position = "fill", stat = "identity") +
theme_bw()+
scale_y_continuous(labels = percent_format()) +
geom_line(aes(x = Month, y = `Relative Value`,linetype=Indicator)) +
scale_linetype_manual(values=c("1"="solid","2"="blank"))
yields:

You might benefit from stat_subset(), a stat I made for my personal use that is available in metR: https://eliocamp.github.io/metR/articles/Visualization-tools.html#stat_subset
It has an aesthetic called subset that takes a logical expression and subsets the data accordingly.
library(dplyr)
library(ggplot2)
library(scales)
set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
as.Date("2015-12-31"), by = "month"), 2),
Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
Indicator = as.factor(rep(c(1, 2), each = 12)))
df_example %>%
group_by(Month) %>%
mutate(`Relative Value` = Value/sum(Value)) %>%
ungroup() %>%
ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) +
geom_bar(position = "fill", stat = "identity") +
theme_bw()+
scale_y_continuous(labels = percent_format()) +
metR::stat_subset(aes(x = Month, y = `Relative Value`, subset = Indicator == 1),
geom = "line")

Related

Manually sort labels in plot_ly [duplicate]

Is it possible to order the legend entries in R?
If I e.g. specify a pie chart like this:
plot_ly(df, labels = Product, values = Patients, type = "pie",
marker = list(colors = Color), textfont=list(color = "white")) %>%
layout(legend = list(x = 1, y = 0.5))
The legend gets sorted by which Product has the highest number of Patients. I would like the legend to be sorted in alphabetical order by Product.
Is this possible?
Yes, it's possible. Chart options are here:
https://plot.ly/r/reference/#pie.
An example:
library(plotly)
library(dplyr)
# Dummy data
df <- data.frame(Product = c('Kramer', 'George', 'Jerry', 'Elaine', 'Newman'),
Patients = c(3, 6, 4, 2, 7))
# Make alphabetical
df <- df %>%
arrange(Product)
# Sorts legend largest to smallest
plot_ly(df,
labels = ~Product,
values = ~Patients,
type = "pie",
textfont = list(color = "white")) %>%
layout(legend = list(x = 1, y = 0.5))
# Set sort argument to FALSE and now orders like the data frame
plot_ly(df,
labels = ~Product,
values = ~Patients,
type = "pie",
sort = FALSE,
textfont = list(color = "white")) %>%
layout(legend = list(x = 1, y = 0.5))
# I prefer clockwise
plot_ly(df,
labels = ~Product,
values = ~Patients,
type = "pie",
sort = FALSE,
direction = "clockwise",
textfont = list(color = "white")) %>%
layout(legend = list(x = 1, y = 0.5))
Session info:
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.5 plotly_4.7.1 ggplot2_2.2.1
EDIT:
Modified to work with plotly 4.x.x (i.e. added ~)

Plotly R order legend entries

Is it possible to order the legend entries in R?
If I e.g. specify a pie chart like this:
plot_ly(df, labels = Product, values = Patients, type = "pie",
marker = list(colors = Color), textfont=list(color = "white")) %>%
layout(legend = list(x = 1, y = 0.5))
The legend gets sorted by which Product has the highest number of Patients. I would like the legend to be sorted in alphabetical order by Product.
Is this possible?
Yes, it's possible. Chart options are here:
https://plot.ly/r/reference/#pie.
An example:
library(plotly)
library(dplyr)
# Dummy data
df <- data.frame(Product = c('Kramer', 'George', 'Jerry', 'Elaine', 'Newman'),
Patients = c(3, 6, 4, 2, 7))
# Make alphabetical
df <- df %>%
arrange(Product)
# Sorts legend largest to smallest
plot_ly(df,
labels = ~Product,
values = ~Patients,
type = "pie",
textfont = list(color = "white")) %>%
layout(legend = list(x = 1, y = 0.5))
# Set sort argument to FALSE and now orders like the data frame
plot_ly(df,
labels = ~Product,
values = ~Patients,
type = "pie",
sort = FALSE,
textfont = list(color = "white")) %>%
layout(legend = list(x = 1, y = 0.5))
# I prefer clockwise
plot_ly(df,
labels = ~Product,
values = ~Patients,
type = "pie",
sort = FALSE,
direction = "clockwise",
textfont = list(color = "white")) %>%
layout(legend = list(x = 1, y = 0.5))
Session info:
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 dplyr_0.7.5 plotly_4.7.1 ggplot2_2.2.1
EDIT:
Modified to work with plotly 4.x.x (i.e. added ~)

ggplot2: plot time series and multiple point forecasts on a quasi time axis

I have a problem ploting time series data and multiple point forecasts.
I would like to plot historical data and some point forecasts. Historical data should be linked by a line, point forecasts on the other hand by an arrow, since second forecasted value say forecast_02 is actualy a revised forecast_01.
Libraries used:
library(ggplot2)
library(plyr)
library(dplyr)
library(stringr)
library(grid)
Here is my dummy data:
set.seed(1)
my_df <-
structure(list(values = c(-0.626453810742332, 0.183643324222082,
-0.835628612410047, 1.59528080213779, 0.329507771815361, -0.820468384118015,
0.487429052428485, 0.738324705129217, 0.575781351653492, -0.305388387156356
), c = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), time = c("2014-01-01",
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01",
"2014-07-01", "2014-08-01", "2014-09-01", "2014-10-01"), type_of_value = c("historical",
"historical", "historical", "historical", "historical", "historical",
"historical", "historical", "forecast_01", "forecast_02"), time_and_forecast = c("2014-01-01",
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01",
"2014-07-01", "2014-08-01", "forecast", "forecast")), .Names = c("values",
"c", "time", "type_of_value", "time_and_forecast"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L)
which looks like this:
Source: local data frame [10 x 5]
values c time type_of_value time_and_forecast
1 -0.6264538 a 2014-01-01 historical 2014-01-01
2 0.1836433 b 2014-02-01 historical 2014-02-01
3 -0.8356286 c 2014-03-01 historical 2014-03-01
4 1.5952808 d 2014-04-01 historical 2014-04-01
5 0.3295078 e 2014-05-01 historical 2014-05-01
6 -0.8204684 f 2014-06-01 historical 2014-06-01
7 0.4874291 g 2014-07-01 historical 2014-07-01
8 0.7383247 h 2014-08-01 historical 2014-08-01
9 0.5757814 i 2014-09-01 forecast_01 forecast
10 -0.3053884 j 2014-10-01 forecast_02 forecast
With the code below I almost managed to produce a plot that I wanted. However, I cannot get my historical data points to be linked by a line.
# my code for almost perfect chart
ggplot(data = my_df,
aes(x = time_and_forecast,
y = values,
color = type_of_value,
group = time_and_forecast)) +
geom_point(size = 5) +
geom_line(arrow = arrow()) +
theme_minimal()
Could you help me link the blue points with a line? Thank you.
# sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Slovenian_Slovenia.1250 LC_CTYPE=Slovenian_Slovenia.1250 LC_MONETARY=Slovenian_Slovenia.1250
[4] LC_NUMERIC=C LC_TIME=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.0.0 dplyr_0.4.1 plyr_1.8.3 ggplot2_1.0.1
loaded via a namespace (and not attached):
[1] Rcpp_0.11.6 assertthat_0.1 digest_0.6.8 MASS_7.3-40 R6_2.0.1 gtable_0.1.2
[7] DBI_0.3.1 magrittr_1.5 scales_0.2.4 stringi_0.4-1 lazyeval_0.1.10 reshape2_1.4.1
[13] labeling_0.3 proto_0.3-10 tools_3.2.0 munsell_0.4.2 parallel_3.2.0 colorspace_1.2-6
I think this will get what you want:
ggplot(data = my_df,
aes(x = time_and_forecast,
y = values,
color = type_of_value,
group = 1)) +
geom_point(size = 5) +
geom_line(data=my_df[my_df$type_of_value=='historical',]) +
geom_line(data=my_df[!my_df$type_of_value=='historical',], arrow=arrow()) +
theme_minimal()
ggplot tries to draw lines within your x categorical groups, but it fails because each group only has 1 value. If you specify that they should all be the same group with group = 1, it will draw the lines across groups. Since you wanted a line for the historical group and an arrow between the other two points, you can make two geom_line() calls on subsets of the dataframe with different arrow parameters. I don't know if there's a way to get ggplot to pick arrows automatically by group (like it does with color, linetype, etc).
You may want to split up the datasets:
library(ggplot)
library(grid)
df_hist <- subset(my_df, type_of_value == "historical")
df_forc <- subset(my_df, type_of_value != "historical")
ggplot() +
geom_line(data = df_hist, aes(x = time, y = values, group = 1, color = type_of_value)) +
geom_point(data = df_forc, aes(x = time, y = values, color = type_of_value), size = 5) +
geom_path(data = df_forc, aes(x = time, y = values, group = 1), arrow = arrow())
You could even added a shaded rectangle to further stress the forecasting region:
ggplot() +
geom_line(data = df_hist, aes(x = time, y = values, group = 1, color = type_of_value)) +
geom_point(data = df_forc, aes(x = time, y = values, color = type_of_value), size = 5) +
geom_path(data = df_forc, aes(x = time, y = values, group = 1), arrow = arrow()) +
annotate("rect", xmin = min(df_forc$time), xmax = max(df_forc$time),
ymin = -Inf, ymax = +Inf, alpha = 0.25, fill = "yellow")

Strange interaction between Alpha and legend

While plotting several ecdf curves that overlapped, I tried adjusting the alpha of the curves to improve visibility. While tinkering with the correct placement of alpha, I found the following.
library(ggplot2)
library(dplyr)
x <- data.frame(Var = rep(1:3, 10000)) %>%
mutate(Val = rnorm(10000)*Var,
Var = factor(Var)) %>%
arrange(Var, Val) %>%
group_by(Var) %>%
mutate(ecdf = ecdf(Val)(Val))
ggplot(x, aes(x=Val)) +
stat_ecdf(aes(color = Var), size = 1.25, alpha = .9)
This gives the lines the correct alpha, but makes the legend useless. (I'm only using alpha=.9 here to demonstrate the point that the legend colors completely disappear). The work around I've found is to add:
ggplot(x, aes(x=Val)) +
stat_ecdf(aes(color = Var), size = 1.35, alpha = .9) +
guides(color = guide_legend(override.aes= list(alpha = 1)))
So while I have a solution for my immediate problem, can someone explain why the first call to ggplot is messed up? Is this a bug? If it makes any difference, I believe this issue also exists when using geom_line (though a slightly different data.frame is needed).
Wierd. Here's my sessionInfo(). I've also checked to see if there are any outdated packages.
sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Japanese_Japan.932 LC_CTYPE=Japanese_Japan.932 LC_MONETARY=Japanese_Japan.932
[4] LC_NUMERIC=C LC_TIME=Japanese_Japan.932
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 ggplot2_1.0.1 stringr_1.0.0 tidyr_0.2.0 dplyr_0.4.2
[6] data.table_1.9.4
loaded via a namespace (and not attached):
[1] Rcpp_0.11.6 magrittr_1.5 MASS_7.3-40 munsell_0.4.2 colorspace_1.2-6
[6] R6_2.0.1 plyr_1.8.3 tools_3.2.1 parallel_3.2.1 grid_3.2.1
[11] gtable_0.1.2 DBI_0.3.1 lazyeval_0.1.10 assertthat_0.1 digest_0.6.8
[16] reshape2_1.4.1 labeling_0.3 stringi_0.5-4 scales_0.2.5 chron_2.3-47
[21] proto_0.3-10
How are they different? What am I missing?
library(ggplot2)
library(dplyr)
library(gridExtra)
x <- data.frame(Var = rep(1:3, 10000)) %>%
mutate(Val = rnorm(10000)*Var,
Var = factor(Var)) %>%
arrange(Var, Val) %>%
group_by(Var) %>%
mutate(ecdf = ecdf(Val)(Val))
ggplot(x, aes(x=Val)) +
stat_ecdf(aes(color = Var), size = 1.25, alpha = .9) -> gg1
ggplot(x, aes(x=Val)) +
stat_ecdf(aes(color = Var), size = 1.35, alpha = .9) +
guides(color = guide_legend(override.aes= list(alpha = 1))) -> gg2
grid.arrange(gg1, gg2)

R: round() can find object, sprintf() cannot, why?

I have a function that takes a dataframe and plots a number of columns from that data frame using ggplot2. The aes() function in ggplot2 takes a label argument and I want to use sprintf to format that argument - and this is something I have done many times before in other code. When I pass the format string to sprintf (in this case "%1.1f") it says "object not found". If I use the round() function and pass an argument to that function it can find it without problems. Same goes for format(). Apparently only sprintf() is unable to see the object.
At first I thought this was a lazy evaluation issue caused by calling the function rather than using the code inline, but using force() on the format string I pass to sprintf does not resolve the issue. I can work around this, but I would like to know why it happens. Of course, it may be something trivial that I have overlooked.
Q. Why does sprintf() not find the string object?
Code follows (edited and pruned for more minimal example)
require(gdata)
require(ggplot2)
require(scales)
require(gridExtra)
require(lubridate)
require(plyr)
require(reshape)
set.seed(12345)
# Create dummy time series data with year and month
monthsback <- 64
startdate <- as.Date(paste(year(now()),month(now()),"1",sep = "-")) - months(monthsback)
mydf <- data.frame(mydate = seq(as.Date(startdate), by = "month", length.out = monthsback), myvalue5 = runif(monthsback, min = 200, max = 300))
mydf$year <- as.numeric(format(as.Date(mydf$mydate), format="%Y"))
mydf$month <- as.numeric(format(as.Date(mydf$mydate), format="%m"))
getchart_highlight_value <- function(
plotdf,
digits_used = 1
)
{
force(digits_used)
#p <- ggplot(data = plotdf, aes(x = month(mydate, label = TRUE), y = year(mydate), fill = myvalue5, label = round(myvalue5, digits_used))) +
# note that the line below using sprintf() does not work, whereas the line above using round() is fine
p <- ggplot(data = plotdf, aes(x = month(mydate, label = TRUE), y = year(mydate), fill = myvalue5, label = sprintf(paste("%1.",digits_used,"f", sep = ""), myvalue5))) +
scale_x_date(labels = date_format("%Y"), breaks = date_breaks("years")) +
scale_y_reverse(breaks = 2007:2012, labels = 2007:2012, expand = c(0,0)) +
geom_tile() + geom_text(size = 4, colour = "black") +
scale_fill_gradient2(low = "blue", high = "red", limits = c(min(plotdf$myvalue5), max(plotdf$myvalue5)), midpoint = median(plotdf$myvalue5)) +
scale_x_discrete(expand = c(0,0)) +
opts(panel.grid.major = theme_blank()) +
opts(panel.background = theme_rect(fill = "transparent", colour = NA)) +
png(filename = "c:/sprintf_test.png", width = 700, height = 300, units = "px", res = NA)
print(p)
dev.off()
}
getchart_highlight_value (plotdf <- mydf,
digits_used <- 1)
Using the minimal example of Martin (that is a minimal example, see also this question), you can make the code work by specifying the environment ggplot() should use. For that, specify the argument environment in the ggplot() function, eg like this:
require(ggplot2)
getchart_highlight_value <- function(df)
{
fmt <- "%1.1f"
ggplot(df, aes(x, x, label=sprintf(fmt, lbl)),
environment = environment()) +
geom_tile(bg="white") +
geom_text(size = 4, colour = "black")
}
df <- data.frame(x = 1:5, lbl = runif(5))
getchart_highlight_value (df)
The function environment() returns the current (local) environment, which is the environment created by the function getchart_highlight_value(). If you don't specify this, ggplot() will look in the global environment, and there the variable fmt is not defined.
Nothing to do with lazy evaluation, everything to do with selecting the right environment.
The code above produces following plot:
Here's a minimal-er example
require(ggplot2)
getchart_highlight_value <- function(df)
{
fmt <- "%1.1f"
ggplot(df, aes(x, x, label=sprintf(fmt, lbl))) + geom_tile()
}
df <- data.frame(x = 1:5, lbl = runif(5))
getchart_highlight_value (df)
It fails with
> getchart_highlight_value (df)
Error in sprintf(fmt, lbl) : object 'fmt' not found
If I create fmt in the global environment then everything is fine; maybe this explains the 'sometimes it works' / 'it works for me' comments above.
> sessionInfo()
R version 2.15.0 Patched (2012-05-01 r59304)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_0.9.1
loaded via a namespace (and not attached):
[1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 grid_2.15.0
[5] labeling_0.1 MASS_7.3-18 memoise_0.1 munsell_0.3
[9] plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5 reshape2_1.2.1
[13] scales_0.2.1 stringr_0.6

Resources