How to change the geom_point appearance? - r

Using this data:
Data2 <- structure(list(year = c(2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 2016L, 2017L, 2018L, 2019L, 2008L, 2009L, 2010L,
2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 2019L,
2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L,
2017L, 2018L, 2019L), variable = c("var1", "var1", "var1", "var1",
"var1", "var1", "var1", "var1", "var1", "var1", "var1", "var1",
"var2", "var2", "var2", "var2", "var2", "var2", "var2", "var2",
"var2", "var2", "var2", "var2", "var3", "var3", "var3", "var3",
"var3", "var3", "var3", "var3", "var3", "var3", "var3", "var3"
), frequency = c(1L, 0L, 0L, 0L, 1L, 1L, 3L, 4L, 3L, 10L, 9L,
0L, 0L, 1L, 0L, 0L, 3L, 2L, 2L, 3L, 8L, 9L, 12L, 3L, 0L, 0L,
0L, 0L, 0L, 2L, 1L, 0L, 1L, 2L, 4L, 0L)), class = "data.frame", row.names = c(NA,
-36L))
I try to produce a plot like this one:
library(ggplot2)
ggplot(Data2, aes(year, variable, size = frequency, color = variable)) +
geom_point()+ theme(text = element_text(size = 18))
However I have some problem with the view of this graph:
How can I change the x axis to have all years, make bigger the geom_point (I tried the size but it is a fix option and all circles made the same) and from frequency show from 1 and not 0 cicles if exist?

An easy way to get years to show up is to turn them into factors. As far as starting your size scale at one: I'm not sure if you want to remove points representing 0, or just start the scale from 1. The following code should give you and idea of how to do either, or both. To remove 0s from a plot it might be easiest to just turn them into NAs, which won't be plotted (it will throw a warning, but that's okay). You can change your size scale breaks using scale_size_continuous:
Data2 %>%
mutate(frequency = ifelse(frequency == 0, NA, frequency),
year = as.factor(year)
) %>%
ggplot(aes(year, size = frequency, variable, color = variable)) +
geom_point() +
theme(text = element_text(size = 18)) +
scale_size_continuous(breaks = c(1, 4, 8, 12))

When the axis labels are not all showing up, or when they are superimposed, a standard trick is to rotate them. In this case I will rotate the x axis labels by 45 degrees.
library(ggplot2)
ggplot(Data2, aes(year, variable,
size = frequency, color = variable)) +
geom_point() +
theme(text = element_text(size = 18),
axis.text.x = element_text(angle = 45, hjust = 1))

I have a quick&dirty solution for no 0 frequencies: Data2[Data2==0] <- NA
alternatively only for frequency:
library(data.table)
setDT(Data2)
Data2[frequency==0, frequency:= NA]
For having proper years:
Data2$year = as.Date(strptime(Data2$year, "%Y"))
ggplot(Data2, aes(year, variable, size = frequency, color = variable)) +
geom_point()+
theme( axis.text.x = element_text(angle = 60, hjust = 1) )

Related

R Shiny Plotly, proxy to retain selected traces [duplicate]

I am trying to make a shiny app, which consists of a sidebar panel and a plot. In the panel, I have radio buttons to select which ID should be plotted. I also have multiple variables which user can turn off and on using plotly legend.
I want the plot to be empty when app first opens. For that, I am using visible = "legendonly" in my plotly. But then, I want to keep the traces that user already activated (by clicking on them in the legend) when the ID is changed in the sidebar panel; however, since plotly get regenerated every time, again it uses visible = "legendonly" option and that causes the plot to reset.
Is there a way to keep the traces (only the ones that are already selected) when a different option gets selected in the sidebar panel?
See a reproducible example below; please note that I made this example to run locally. You need to load data and packages separately into your R session. Data can be found at the bottom of the question.
library(shiny)
library(plotly)
library(lubridate)
### Read mdata into your R session
# UI
uix <- shinyUI(pageWithSidebar(
headerPanel("Data"),
sidebarPanel(
radioButtons('vars', 'ID',
c("1", "2")),
helpText('Select an ID.')
),
mainPanel(
h4("Plot"),
plotlyOutput("myPlot")
)
)
)
# SERVER
serverx <- function(input, output) {
#load("Data/mdata.RData") #comment out this part and load data locally
# a large table, reative to input$show_vars
output$uteTable = renderDataTable({
ute[, input$show_vars, drop = FALSE]
})
output$myPlot = renderPlotly(
{
p <- plot_ly() %>%
layout(title = "Title", xaxis = list(tickformat = "%b %Y", title = "Date"),
yaxis = list(title = "Y"))
## Add the IDs selected in input$vars
for (item in input$vars) {
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == item) -> foo
p <- add_lines(p, data = foo, x = ~Date, y = ~Value, color = ~Variable, visible = "legendonly",
evaluate = TRUE)
p <- p %>% layout(showlegend = TRUE,
legend = list(orientation = "v", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 100, y=1))
}
print(p)
})
}
shinyApp(uix, serverx)
Created on 2020-06-12 by the reprex package (v0.3.0)
Question: is it possible to keep the Var1 trace when changing to ID == 2?
Idea: I think it'd be possible if I could change the visible = 'legendonly to TRUE right after app deployment, so it only applies to the first example of the plot. Probably, I need to change evaluate to FALSE as well.
Data:
mdata <- structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), Month = c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L,
9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L,
12L), Variable = c("Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2"), ID = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2), Value = c(187.797761979167,
6.34656438541666, 202.288468333333, 9.2249309375, 130.620451458333,
4.61060465625, 169.033213020833, 7.5226940625, 290.015582677083,
10.8697671666667, 178.527960520833, 7.6340359375, 234.53493728125,
8.32400878125, 173.827054583333, 7.54521947916667, 164.359205635417,
5.55496292708333, 151.75458625, 6.361610625, 190.124467760417,
6.45046077083333, 191.377006770833, 8.04720916666667, 170.714612604167,
5.98860073958333, 210.827157916667, 9.46311385416667, 145.784868927083,
5.16647911458333, 159.9545675, 6.7466725, 147.442681895833, 5.43921594791667,
153.057018958333, 6.39029208333333, 165.6476956875, 5.63139815625,
197.179256875, 8.73210604166667, 148.1879651875, 5.58784840625,
176.859451354167, 7.65670020833333, 186.215496677083, 7.12404453125,
219.104379791667, 9.39468864583333)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -48L), groups = structure(list(
Year = 2015L, .rows = list(1:48)), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
The following uses plotlyProxy to replace the data for the existing plotly object (and traces) and therefore avoids re-rendering the plot. This approach is faster than re-rendering.
library(shiny)
library(plotly)
library(lubridate)
# UI
uix <- shinyUI(pageWithSidebar(
headerPanel("Data"),
sidebarPanel(
radioButtons('myID', 'ID',
c("1", "2")),
helpText('Select an ID.')
),
mainPanel(
h4("Plot"),
plotlyOutput("myPlot")
)
)
)
# SERVER
serverx <- function(input, output, session) {
output$myPlot = renderPlotly({
p <- plot_ly() %>%
layout(title = "Title", xaxis = list(tickformat = "%b %Y", title = "Date"),
yaxis = list(title = "Y"))
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == 1) -> IDData
p <- add_lines(p, data = IDData, x = ~Date, y = ~Value,
color = ~Variable, visible = "legendonly")
p <- p %>% layout(showlegend = TRUE,
legend = list(orientation = "v", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 100, y=1))
p
})
myPlotProxy <- plotlyProxy("myPlot", session)
observe({
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == input$myID) -> IDData
req(IDData)
uniqueVars <- unique(IDData$Variable)
for(i in seq_along(uniqueVars)){
IDData %>% filter(Variable == uniqueVars[i]) -> VarData
plotlyProxyInvoke(myPlotProxy, "restyle", list(x = list(VarData$Date),
y = list(VarData$Value)), list(i-1))
}
})
}
shinyApp(uix, serverx)
For further information please also see chapter "17.3.1 Partial plotly updates" in the plotly book, plotly's function reference and this answer.
Data:
### Read mdata into your R session
mdata <- structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), Month = c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L,
9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L,
12L), Variable = c("Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2"), ID = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2), Value = c(187.797761979167,
6.34656438541666, 202.288468333333, 9.2249309375, 130.620451458333,
4.61060465625, 169.033213020833, 7.5226940625, 290.015582677083,
10.8697671666667, 178.527960520833, 7.6340359375, 234.53493728125,
8.32400878125, 173.827054583333, 7.54521947916667, 164.359205635417,
5.55496292708333, 151.75458625, 6.361610625, 190.124467760417,
6.45046077083333, 191.377006770833, 8.04720916666667, 170.714612604167,
5.98860073958333, 210.827157916667, 9.46311385416667, 145.784868927083,
5.16647911458333, 159.9545675, 6.7466725, 147.442681895833, 5.43921594791667,
153.057018958333, 6.39029208333333, 165.6476956875, 5.63139815625,
197.179256875, 8.73210604166667, 148.1879651875, 5.58784840625,
176.859451354167, 7.65670020833333, 186.215496677083, 7.12404453125,
219.104379791667, 9.39468864583333)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -48L), groups = structure(list(
Year = 2015L, .rows = list(1:48)), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
Edit:
The following is an alternative server function to update the trace data with a single plotlyProxyInvoke call (avoiding the for-loop):
serverx <- function(input, output, session) {
output$myPlot = renderPlotly({
p <- plot_ly() %>%
layout(title = "Title", xaxis = list(tickformat = "%b %Y", title = "Date"),
yaxis = list(title = "Y"))
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == 1) -> IDData
p <- add_lines(p, data = IDData, x = ~Date, y = ~Value,
color = ~Variable, visible = "legendonly")
p <- p %>% layout(showlegend = TRUE,
legend = list(orientation = "v", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 100, y=1))
p
})
myPlotProxy <- plotlyProxy("myPlot", session)
IDDataList <- split(mdata %>% mutate(Date = make_date(Year, Month, 15)), ~ ID + Variable)
observe({
selectedIDDataList <- setNames(lapply(list("Date", "Value"), function(i){
unname(lapply(IDDataList[paste0(input$myID, ".Var", c(1L, 2L))], function(j){j[[i]]}))
}), c("x", "y"))
plotlyProxyInvoke(myPlotProxy, "restyle", selectedIDDataList, seq_along(selectedIDDataList)-1)
})
}
What I could think of was adding a check box to select the variables to be plotted instead of turning them off and on in the legend. Using this method, instead of using visible = legendonly, I leave the check box with no default values selected. Also, when user changes the ID, variables stay the same and therefore get plotted for the next ID. See below;
library(shiny)
library(plotly)
library(lubridate)
### Read mdata into your R session
# UI
uix <- shinyUI(pageWithSidebar(
headerPanel("Data"),
sidebarPanel(
radioButtons('vars', 'ID',
c("1", "2")),
checkboxGroupInput('varp', 'Variable',
c("Var1", "Var2")),
helpText('Select an ID and Variables to be plotted.')
),
mainPanel(
h4("Plot"),
plotlyOutput("myPlot")
)
)
)
# SERVER
serverx <- function(input, output) {
#load("Data/mdata.RData") #comment out this part and load data locally
# a large table, reative to input$show_vars
output$uteTable = renderDataTable({
ute[, input$show_vars, drop = FALSE]
})
output$myPlot = renderPlotly(
{
p <- plot_ly() %>%
layout(title = "Title", xaxis = list(tickformat = "%b %Y", title = "Date"),
yaxis = list(title = "Y"))
## Add the IDs selected in input$vars
for (item in input$vars) {
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == item,
Variable %in% input$varp)-> foo
p <- add_lines(p, data = foo, x = ~Date, y = ~Value, color = ~Variable, evaluate = TRUE)
p <- p %>% layout(showlegend = TRUE,
legend = list(orientation = "v", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 100, y=1))
}
print(p)
})
}
shinyApp(uix, serverx)

How to plot Unequal Interval on x axis of a time series data?

I am using plotrix package to visualize changes in the data using colors. The data is available here.
I am using below code for plotting the data.
library(plotrix)
my_colors1=c("red", "green","blue")
a<-read.csv("DataSt.csv")
x<-a$Year
y<-a$TP
clplot(x, y, main="",lwd=5,labels=y,levels=c(37,964,4377),col=my_colors1, showcuts=T, bty="n",xlab="Year", ylab = "numbers", axes=F)
axis(1, at = a$Year, las=2)
axis(2, at = seq(0, 4400, by = 100), las=2)
I am getting the above chart
I want to reduce the axis space between the year 1975 and 1989. Please help me to get unequal interval at the x axis.
It's a bit dangerous to do this give that the viewer might not realize the inconsistent spacing among the x-axis values. Nevertheless, the following example shows a possible solution by treating the x-values as factor levels. The problem is that that plotting function only allows numeric values. I thus plot with factors, but then need to use numeric values to plot some sort of interpolated values in between using segments:
a <- structure(list(Year = c(2021L, 2020L, 2019L, 2018L, 2017L, 2016L,
2015L, 2014L, 2013L, 2012L, 2011L, 2010L, 2009L, 2008L, 2007L,
2006L, 2005L, 2004L, 2003L, 2002L, 2001L, 2000L, 1999L, 1998L,
1997L, 1996L, 1995L, 1994L, 1993L, 1992L, 1991L, 1990L, 1989L,
1975L), TP = c(785L, 848L, 1067L, 1079L, 1263L, 678L, 1204L,
542L, 661L, 387L, 3534L, 4377L, 964L, 244L, 237L, 145L, 86L,
37L, 39L, 23L, 14L, 11L, 7L, 9L, 6L, 3L, 7L, 7L, 6L, 1L, 1L,
1L, 2L, 1L)), class = "data.frame", row.names = c(NA, -34L))
a$Year <- factor(a$Year)
a <- a[order(a$Year),]
head(a)
my_colors1=c("red", "green","blue")
plot(TP ~ Year, a, col = NA, border = NA, las = 2)
for(i in 2:nrow(a)){
b <- as.data.frame(approx(x = as.numeric(a$Year[(i-1):i]), y = a$TP[(i-1):i], n = 100))
b$col <- my_colors1[as.numeric(cut(b$y, breaks = c(-Inf,37,964,4377,Inf)))]
segments(x0 = b$x[-nrow(b)], x1 = b$x[-1], y0 = b$y[-nrow(b)], y1 = b$y[-1], col = b$col[-1])
}
abline(h = c(37,964), lty = 2)

Is there a way in R's ggplot2 to incorporate the stat summary argument into the legend with geom_line?

I have a dataset with SUD treatment rates for each state for each year from 2008 to 2017. Two states received an intervention and the rest did not. I am able to plot the treatment rates for each intervention state as a separate line and the non-intervention states (grouped as one line using the mean) on the same graph.
I would like to include the grouped line in the legend, but am having trouble doing so. So far only the intervention states are labeled. I've pasted a subset of the data (not exactly what the plot shows, but provides insight into the structure) and provided the ggplot code with the resulting figure. Any help would be appreciated. Thank you!
structure(list(statename = c("Alabama", "Alabama", "Alabama",
"Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Alabama",
"Alabama", "Colorado", "Colorado", "Colorado", "Colorado", "Colorado",
"Colorado", "Colorado", "Colorado", "Colorado", "Colorado", "Iowa",
"Iowa", "Iowa", "Iowa", "Iowa", "Iowa", "Iowa", "Iowa", "Iowa",
"Iowa", "Washington", "Washington", "Washington", "Washington",
"Washington", "Washington", "Washington", "Washington", "Washington",
"Washington"), YEAR = c(2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 2016L, 2017L, 2008L, 2009L, 2010L, 2011L, 2012L,
2013L, 2014L, 2015L, 2016L, 2017L, 2008L, 2009L, 2010L, 2011L,
2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2008L, 2009L, 2010L,
2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L), RML_ever_state = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), TotalAdolescent_noprior_Admissionrate = c(4.07959747971533,
17.1095315089946, 28.605586683907, 11.488423601983, 14.9233623164375,
13.0325736612464, 11.5686458431041, 15.2887154594351, 18.9275946907701,
17.9164783752891, 28.9653195613967, 30.542173819128, 33.8611241088185,
29.9657748758525, 25.214146698236, 22.847991066509, 21.975843495247,
21.783383749025, 25.8868468603421, 23.6029880132029, 45.9863203727017,
51.6710909784629, 61.713238062499, 48.3292305542656, 52.4339028740025,
54.9353761700907, 50.2282357945714, 46.8358255273624, 63.8611763171622,
63.7710085311979, 50.5294974023959, 53.0407358054702, 54.7486826331652,
59.981622172154, 57.2193036593259, 54.5478493207391, 49.924220486418,
41.9396870928129, 11.3163228651143, 0.258492875751707)), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 31L, 32L, 33L, 34L, 35L,
36L, 37L, 38L, 39L, 40L, 117L, 118L, 119L, 120L, 121L, 122L,
123L, 124L, 125L, 126L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
394L, 395L, 396L), class = "data.frame")
ggplot(vars1, aes(x = YEAR, y = TotalAdolescent_noprior_Admissionrate, group = statename)) +
stat_summary(
data = ~ subset(., RML_ever_state == 0),
aes(group = -1),
fun.data = mean_se,
geom = "line", size = 3, colour = "green"
) +
geom_line(
data = ~ subset(., RML_ever_state == 1),
aes(colour = statename)
) + scale_x_discrete( limits=c(2008,2009,2010,2011,2012,2013,2014,2015,2016,2017))
A nice solution for this can be realized by understanding how ggplot2 creates legends. We will utilize that to our advantage to force a legend out of stat_summary(). Here, we're going to effectively "hijack" another aesthetic, and then squish the legends together to make it look like it's all one legend.
There are two points to go over to help explain why the final answer is working.
Legend creation inside aes()
First, ggplot2 will create a legend for anything that needs to be discriminated when inside aes(). So, if you have an aesthetic like color, fill, or size inside aes(), you're going to start the process of having ggplot2 make a legend. This process is overwritten when you also define the aesthetic outside aes().
So this code generates a legend:
ggplot(df, aes(x,y)) + geom_point(aes(size=my_category))
But this one does not:
ggplot(df, aes(x,y)) + geom_point(aes(size=my_category), size=3)
So, this means that if you wanted to define the size in the first example above, you would have to take whatever ggplot2 does be default or specify the actual size by adding a scale_size_manual() command:
ggplot(df, aes(x,y)) + geom_point(aes(size=my_category)) + scale_size_manual(values=3)
Quoted stuff vs. column names inside aes()
You can also control the names of the items in your legend by what you send to aes(). Take the above example. If I put aes(size=my_category), this assumes that there is a column in df called my_category with values. If that column is a factor, the levels of the factor will be used as the legend item names and the particular "size" values in the final plot. If the column is a character vector, the legend will be created by kind of "forcing" the column into a factor and then items will be listed alphanumerically. If df$my_category is a number vector, you will get a continuous scale of size. The game changes a bit when you supply a character/string. Let's assume df$my_category is a character vector. When I send it this:
ggplot(df, aes(x,y)) + geom_point(aes(size="my_category")...)
I'll get everything the same size, and a legend with one item called "my_category". Interesting. Basically, you are supplying the full contents of the column in the second example, where every item is labeled as "my_category". Make sense? We can put this logic together to form your solution.
A Solution to Your Question
Now, I'm going to use this information to create your plot. To separate out that stat_summary() into a legend item, I need to pick an aesthetic. You could just specify that color= is inside aes() for stat_summary and use a character to label it correctly. That will work; however, if you do that all the legend keys (called "glyphs") will look the same as the line in stat_summary. That's fine if all the lines look the same, but your summary line is "fatter", so you want it to be different than the rest.
Here, I'll just create a legend using size, which will be a different legend from the color one used by geom_line(), and they can have their different formats without any issue.
Since I'm pulling size into aes() for stat_summary, I need to remove size outside of aes() in stat_summary and put that value inside a separate scale_size_manual() command. Second, I use guides() to control the order of the legends and some title stuff. Finally, I use theme elements to control the spacing between the two legends to push them a bit closer. The final result:
ggplot(vars1, aes(x = YEAR, y = TotalAdolescent_noprior_Admissionrate, group = statename)) +
stat_summary(
data = ~ subset(., RML_ever_state == 0),
aes(group = -1, size='Summary'),
fun.data = mean_se,
geom = "line", colour = "green"
) +
geom_line(
data = ~ subset(., RML_ever_state == 1),
aes(colour = statename)
) +
scale_x_discrete( limits=c(2008,2009,2010,2011,2012,2013,2014,2015,2016,2017)) +
scale_size_manual(values=3) +
guides(
color=guide_legend(title='Legend', order=1),
size=guide_legend(title=NULL, order=2)
) +
theme(
legend.margin = margin(t=0,b=0),
legend.spacing.y = unit(0,'pt'),
legend.title = element_text(margin=margin(b=10))
)

Adding Slight curve (or bend) in ggplot geom_path to make path easier to read

This question is a new question from a previously answered question located here: Plot mean of data within same ggplot
As you can see in the .jpg picture below-- the red line geom_path gets squeezed together making the line harder to interpret. Is there a way to "bend" the curve slightly so that there is less overlap / bunching? Some kind of smoothing or bending around the points so the lines don't overlap?
Here is my syntax:
orbit.plot <- ggplot(orbit.data, aes(x=OpM, y=INVT, colour=Subj, label=Year)) +
geom_point(size=7, shape=20) +
geom_path(size=1.5) +
ggtitle("Title Orbits") +
geom_text(data=subset(orbit.data,Year==2006 | Year==2014), aes(label=Year, vjust=1, hjust=1)) +
theme(panel.background = element_rect(fill = 'white', colour = 'red'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
geom_vline(xintercept=0, size=1) +
geom_hline(yintercept=7, size=1) +
scale_y_continuous(limits = c(7, 15), breaks=seq(7,15,1/2))
Here is dput of the data set:
structure(list(Year = c(2006L, 2006L, 2007L, 2007L, 2008L, 2008L,
2009L, 2009L, 2010L, 2010L, 2011L, 2011L, 2012L, 2012L, 2013L,
2013L, 2014L, 2014L), Subj = structure(c(2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("TMC",
"HMC"), class = "factor"), OPM = c(0.088, 0.09, 0.095, 0.078,
0.085, 0.08, -0.023, 0.019, 0.009, 0.043, 0.025, 0.065, 0.0199,
0.029, 0.06, 0.055, 0.088, 0.065), Invt = c(14.5, 10.3, 13.8,
10, 13.3, 9.5, 12.3, 8, 13.5, 8, 14.3, 10, 13.2, 8.5, 13.8, 9.5,
13.8, 9.75)), .Names = c("Year", "Subj", "OpM", "INVT"
), class = "data.frame", row.names = c(NA, -18L))
Thank you kindly.
EDIT: UPDATE: Essentially, the reason for this plot is to show x/y variable "motion" over time. On the X axis-- I'm plotting a ratio (operating margin in this case). On the Y-axis-- I'm showing a cycle measure (inventory turns in this case.) The "bending" of the curve will certainly "bend" the data itself-- but with the X/Y measures I'm using, the data is understood to two (2) decimals-- so "slight" bending of the data would not contaminate the "essence" of what the data is trying to portray.
You could spline it:
library(ggplot2)
orbit.data <- structure(list(Year =
c(2006L, 2006L, 2007L, 2007L, 2008L, 2008L, 2009L, 2009L, 2010L, 2010L,
2011L, 2011L, 2012L, 2012L, 2013L, 2013L, 2014L, 2014L),
Subj = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L),
.Label = c("TMC", "HMC"), class = "factor"),
OPM = c(0.088, 0.09, 0.095, 0.078, 0.085, 0.08, -0.023, 0.019, 0.009,
0.043, 0.025, 0.065, 0.0199, 0.029, 0.06, 0.055, 0.088, 0.065),
Invt = c(14.5, 10.3, 13.8, 10, 13.3, 9.5, 12.3, 8, 13.5, 8, 14.3,
10, 13.2, 8.5, 13.8, 9.5, 13.8, 9.75)),
.Names = c("Year", "Subj", "OpM", "INVT"), class = "data.frame",
row.names = c(NA, -18L))
lsdf <- list()
plot.new()
for (f in unique(orbit.data$Subj)){
psdf <- orbit.data[orbit.data$Subj==f,]
newf <- sprintf("%s - xspline",f)
lsdf[[f]] <- data.frame(xspline(psdf[,c(3:4)], shape=-0.6, draw=F),Subj=newf)
}
sdf <- do.call(rbind,lsdf)
orbit.plot <- ggplot(orbit.data, aes(x=OpM, y=INVT, colour=Subj, label=Year)) +
geom_point(size=5, shape=20) +
geom_point(data=orbit.data,size=7, shape=20,color="black") +
geom_path(size=1) +
geom_path(data=sdf,aes(x=x,y=y,label="",color=Subj),size=1) +
ggtitle("Title Orbits") +
geom_text(data=subset(orbit.data,Year==2006 | Year==2014),
aes(label=Year, vjust=1, hjust=1)) +
theme(panel.background = element_rect(fill = 'white', colour = 'red'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
geom_vline(xintercept=0, size=1) +
geom_hline(yintercept=7, size=1) +
scale_y_continuous(limits = c(7, 15), breaks=seq(7,15,1/2))
print(orbit.plot)
Which gives:
There are lots of ways to do this, I doubt this is the best. You can play with the shape parameter in the xspline call to get different amounts of curvature.

Label Range between two points on scatterplot with the percent difference

I have a simple scatterplot showing sales difference between years at different ranges.
So, when the range is ">$400", sales are X in 2013 and X in 2014.
I am trying to add an annotation at certain points showing the percent difference from 2013 to 2014. Is that possible?
Here is the dput:
structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L,
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L,
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("$40M",
"$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-100M", "$101-110M",
"$111-120M", "$121-130M", "$131-140M", "$141-150M", "$151-160M",
"$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M",
"$351-375M", "$376-400M", ">$400M"), class = "factor"), Avg_TOTALS = c(44732492.5,
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5,
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417,
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143,
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333,
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379,
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723,
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286,
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year",
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA,
-44L))
And here is the chart I am currently generating:
orderlist = c("$40M", "$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91- 100M", "$101-110M", "$111-120M", "$121-130M",
"$131-140M", "$141-150M", "$151-160M", "$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", "$351-375M", "$376-400M", ">$400M")
myDF = transform(myDF, Range = factor(Range, levels = orderlist))
myChart <- ggplot(myDF, aes(x = Range, y = Avg_TOTALS)) +
geom_point(aes(color = factor(Year))) +
theme_tufte() +
theme(axis.text.x= element_text(angle = 90, hjust = 0)) +
labs(x = "Range", y = "Sales by Range", title = "MyChart")+
scale_y_continuous(breaks = c(50000000, 100000000, 200000000,
300000000,400000000, 500000000),
labels = dollar)
Which gives me:
And leads me to this question:
How would I add the percent difference between each of those points, with 2013 being the base year? Also, there are a few ranges where there were sales in only one of the two years- would it be possible to skip the percent labels on those? A condition in which data must exist in both years to be included?
Thanks for any help!
Here is one way. I think there are better ways. This is my best with my sleepy brain right now. Hope you do not mind that. Let me briefly explain the code. I followed you. Then, I obtained the data which ggplot is using, which I called foo. I created a master data frame to deal with missing data points and used join. The dplyr part was doing some calculation and stuff to get proportion. Using the outcome of it in annotate, I assigned the labels you wanted. Hope this will help you. zzz...
DATA
mydf <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L,
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L,
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("$40M",
"$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-100M", "$101-110M",
"$111-120M", "$121-130M", "$131-140M", "$141-150M", "$151-160M",
"$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M",
"$351-375M", "$376-400M", ">$400M"), class = "factor"), Avg_TOTALS = c(44732492.5,
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5,
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417,
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143,
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333,
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379,
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723,
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286,
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year",
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA,
-44L))
orderlist = c("$40M", "$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91- 100M", "$101-110M", "$111-120M", "$121-130M",
"$131-140M", "$141-150M", "$151-160M", "$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", "$351-375M", "$376-400M", ">$400M")
mydf = transform(myDF, Range = factor(Range, levels = orderlist))
g <- ggplot(mydf, aes(x = Range, y = Avg_TOTALS)) +
geom_point(aes(color = factor(Year))) +
#theme_tufte() +
theme(axis.text.x= element_text(angle = 90, hjust = 0))+
labs(x="Range", y = "Sales by Range", title = "MyChart")+
scale_y_continuous(breaks = c(50000000, 100000000, 200000000, 300000000,400000000, 500000000), labels = dollar)
library(dplyr)
foo <- ggplot_build(g)$data[[1]] %>%
arrange(group) %>%
mutate(year = c(rep("2013", times = 23), rep("2014", times = 21)))
master <- expand.grid(year = c("2013", "2014"), group = 1:24)
full_join(master, foo, by = c("year", c("group" = "x"))) %>%
group_by(group) %>%
mutate(prop = round(order_by(year, y / first(y)), 2)) %>%
summarise(y = first(y), prop = min(prop, na.rm = FALSE)) -> txt
g + annotate("text", x = txt$group, y = txt$y + 15000000, label = txt$prop)

Resources