Numeric year but month as character. How to change months into numeric? - r

So imagine I have a dataset where the column "date" contains years 2011-2017 and months for each year, however months are written out in letters. For example:
date: 11-Jan
I would like to make the months numeric so I get:
date: 11-01
Any suggestions on how I can tackle this problem?
Kind regards!

Make your input proper dates, parse them, then format them.
x <- c("11-Jan", "12-Feb")
Sys.setlocale("LC_TIME", "C") #parsing of months depends on locale
format(
as.Date(paste0(x, "-1"), format = "%y-%b-%d"),
"%y-%m"
)
#[1] "11-01" "12-02"
See help("strptime") for details on format strings.

Assuming your data is like:
df1 <- structure(list(day_mon = c("16-Dec", "18-Nov", "12-Oct", "8-Oct",
"15-May", "29-Jun", "22-Feb", "25-May", "23-Jan", "24-Oct", "23-May",
"27-Sep", "9-Apr", "28-Oct", "18-Jan", "8-Apr", "7-Jan", "13-Dec",
"28-Nov", "24-May"), year = c(2012L, 2014L, 2011L, 2015L, 2015L,
2015L, 2011L, 2015L, 2012L, 2015L, 2011L, 2012L, 2014L, 2012L,
2013L, 2011L, 2017L, 2016L, 2014L, 2014L)),
row.names = c(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), class = "data.frame")
You can:
# Format the month and day: mon_day_fmt => character vector
df1$mon_day_fmt <- paste(
sprintf(
"%02d",
match(
gsub(
"\\d+\\-(\\w+)",
"\\1",
with(
df1,
day_mon
)
),
month.abb
)
),
sprintf(
"%02d",
as.integer(
gsub(
"^(\\d+)\\-\\w+$",
"\\1",
with(
df1,
day_mon
)
)
)
),
sep = "-"
)
# Create a date vector: date => Date Vector
df1$date <- as.Date(
paste(
df1$year,
df1$mon_day_fmt,
sep = "-"
)
)

Related

R Shiny Plotly, proxy to retain selected traces [duplicate]

I am trying to make a shiny app, which consists of a sidebar panel and a plot. In the panel, I have radio buttons to select which ID should be plotted. I also have multiple variables which user can turn off and on using plotly legend.
I want the plot to be empty when app first opens. For that, I am using visible = "legendonly" in my plotly. But then, I want to keep the traces that user already activated (by clicking on them in the legend) when the ID is changed in the sidebar panel; however, since plotly get regenerated every time, again it uses visible = "legendonly" option and that causes the plot to reset.
Is there a way to keep the traces (only the ones that are already selected) when a different option gets selected in the sidebar panel?
See a reproducible example below; please note that I made this example to run locally. You need to load data and packages separately into your R session. Data can be found at the bottom of the question.
library(shiny)
library(plotly)
library(lubridate)
### Read mdata into your R session
# UI
uix <- shinyUI(pageWithSidebar(
headerPanel("Data"),
sidebarPanel(
radioButtons('vars', 'ID',
c("1", "2")),
helpText('Select an ID.')
),
mainPanel(
h4("Plot"),
plotlyOutput("myPlot")
)
)
)
# SERVER
serverx <- function(input, output) {
#load("Data/mdata.RData") #comment out this part and load data locally
# a large table, reative to input$show_vars
output$uteTable = renderDataTable({
ute[, input$show_vars, drop = FALSE]
})
output$myPlot = renderPlotly(
{
p <- plot_ly() %>%
layout(title = "Title", xaxis = list(tickformat = "%b %Y", title = "Date"),
yaxis = list(title = "Y"))
## Add the IDs selected in input$vars
for (item in input$vars) {
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == item) -> foo
p <- add_lines(p, data = foo, x = ~Date, y = ~Value, color = ~Variable, visible = "legendonly",
evaluate = TRUE)
p <- p %>% layout(showlegend = TRUE,
legend = list(orientation = "v", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 100, y=1))
}
print(p)
})
}
shinyApp(uix, serverx)
Created on 2020-06-12 by the reprex package (v0.3.0)
Question: is it possible to keep the Var1 trace when changing to ID == 2?
Idea: I think it'd be possible if I could change the visible = 'legendonly to TRUE right after app deployment, so it only applies to the first example of the plot. Probably, I need to change evaluate to FALSE as well.
Data:
mdata <- structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), Month = c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L,
9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L,
12L), Variable = c("Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2"), ID = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2), Value = c(187.797761979167,
6.34656438541666, 202.288468333333, 9.2249309375, 130.620451458333,
4.61060465625, 169.033213020833, 7.5226940625, 290.015582677083,
10.8697671666667, 178.527960520833, 7.6340359375, 234.53493728125,
8.32400878125, 173.827054583333, 7.54521947916667, 164.359205635417,
5.55496292708333, 151.75458625, 6.361610625, 190.124467760417,
6.45046077083333, 191.377006770833, 8.04720916666667, 170.714612604167,
5.98860073958333, 210.827157916667, 9.46311385416667, 145.784868927083,
5.16647911458333, 159.9545675, 6.7466725, 147.442681895833, 5.43921594791667,
153.057018958333, 6.39029208333333, 165.6476956875, 5.63139815625,
197.179256875, 8.73210604166667, 148.1879651875, 5.58784840625,
176.859451354167, 7.65670020833333, 186.215496677083, 7.12404453125,
219.104379791667, 9.39468864583333)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -48L), groups = structure(list(
Year = 2015L, .rows = list(1:48)), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
The following uses plotlyProxy to replace the data for the existing plotly object (and traces) and therefore avoids re-rendering the plot. This approach is faster than re-rendering.
library(shiny)
library(plotly)
library(lubridate)
# UI
uix <- shinyUI(pageWithSidebar(
headerPanel("Data"),
sidebarPanel(
radioButtons('myID', 'ID',
c("1", "2")),
helpText('Select an ID.')
),
mainPanel(
h4("Plot"),
plotlyOutput("myPlot")
)
)
)
# SERVER
serverx <- function(input, output, session) {
output$myPlot = renderPlotly({
p <- plot_ly() %>%
layout(title = "Title", xaxis = list(tickformat = "%b %Y", title = "Date"),
yaxis = list(title = "Y"))
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == 1) -> IDData
p <- add_lines(p, data = IDData, x = ~Date, y = ~Value,
color = ~Variable, visible = "legendonly")
p <- p %>% layout(showlegend = TRUE,
legend = list(orientation = "v", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 100, y=1))
p
})
myPlotProxy <- plotlyProxy("myPlot", session)
observe({
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == input$myID) -> IDData
req(IDData)
uniqueVars <- unique(IDData$Variable)
for(i in seq_along(uniqueVars)){
IDData %>% filter(Variable == uniqueVars[i]) -> VarData
plotlyProxyInvoke(myPlotProxy, "restyle", list(x = list(VarData$Date),
y = list(VarData$Value)), list(i-1))
}
})
}
shinyApp(uix, serverx)
For further information please also see chapter "17.3.1 Partial plotly updates" in the plotly book, plotly's function reference and this answer.
Data:
### Read mdata into your R session
mdata <- structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), Month = c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L,
9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L,
12L), Variable = c("Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2", "Var1", "Var1", "Var2", "Var2", "Var1", "Var1",
"Var2", "Var2"), ID = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2), Value = c(187.797761979167,
6.34656438541666, 202.288468333333, 9.2249309375, 130.620451458333,
4.61060465625, 169.033213020833, 7.5226940625, 290.015582677083,
10.8697671666667, 178.527960520833, 7.6340359375, 234.53493728125,
8.32400878125, 173.827054583333, 7.54521947916667, 164.359205635417,
5.55496292708333, 151.75458625, 6.361610625, 190.124467760417,
6.45046077083333, 191.377006770833, 8.04720916666667, 170.714612604167,
5.98860073958333, 210.827157916667, 9.46311385416667, 145.784868927083,
5.16647911458333, 159.9545675, 6.7466725, 147.442681895833, 5.43921594791667,
153.057018958333, 6.39029208333333, 165.6476956875, 5.63139815625,
197.179256875, 8.73210604166667, 148.1879651875, 5.58784840625,
176.859451354167, 7.65670020833333, 186.215496677083, 7.12404453125,
219.104379791667, 9.39468864583333)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -48L), groups = structure(list(
Year = 2015L, .rows = list(1:48)), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
Edit:
The following is an alternative server function to update the trace data with a single plotlyProxyInvoke call (avoiding the for-loop):
serverx <- function(input, output, session) {
output$myPlot = renderPlotly({
p <- plot_ly() %>%
layout(title = "Title", xaxis = list(tickformat = "%b %Y", title = "Date"),
yaxis = list(title = "Y"))
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == 1) -> IDData
p <- add_lines(p, data = IDData, x = ~Date, y = ~Value,
color = ~Variable, visible = "legendonly")
p <- p %>% layout(showlegend = TRUE,
legend = list(orientation = "v", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 100, y=1))
p
})
myPlotProxy <- plotlyProxy("myPlot", session)
IDDataList <- split(mdata %>% mutate(Date = make_date(Year, Month, 15)), ~ ID + Variable)
observe({
selectedIDDataList <- setNames(lapply(list("Date", "Value"), function(i){
unname(lapply(IDDataList[paste0(input$myID, ".Var", c(1L, 2L))], function(j){j[[i]]}))
}), c("x", "y"))
plotlyProxyInvoke(myPlotProxy, "restyle", selectedIDDataList, seq_along(selectedIDDataList)-1)
})
}
What I could think of was adding a check box to select the variables to be plotted instead of turning them off and on in the legend. Using this method, instead of using visible = legendonly, I leave the check box with no default values selected. Also, when user changes the ID, variables stay the same and therefore get plotted for the next ID. See below;
library(shiny)
library(plotly)
library(lubridate)
### Read mdata into your R session
# UI
uix <- shinyUI(pageWithSidebar(
headerPanel("Data"),
sidebarPanel(
radioButtons('vars', 'ID',
c("1", "2")),
checkboxGroupInput('varp', 'Variable',
c("Var1", "Var2")),
helpText('Select an ID and Variables to be plotted.')
),
mainPanel(
h4("Plot"),
plotlyOutput("myPlot")
)
)
)
# SERVER
serverx <- function(input, output) {
#load("Data/mdata.RData") #comment out this part and load data locally
# a large table, reative to input$show_vars
output$uteTable = renderDataTable({
ute[, input$show_vars, drop = FALSE]
})
output$myPlot = renderPlotly(
{
p <- plot_ly() %>%
layout(title = "Title", xaxis = list(tickformat = "%b %Y", title = "Date"),
yaxis = list(title = "Y"))
## Add the IDs selected in input$vars
for (item in input$vars) {
mdata %>%
mutate(Date = make_date(Year, Month, 15)) %>%
filter(ID == item,
Variable %in% input$varp)-> foo
p <- add_lines(p, data = foo, x = ~Date, y = ~Value, color = ~Variable, evaluate = TRUE)
p <- p %>% layout(showlegend = TRUE,
legend = list(orientation = "v", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 100, y=1))
}
print(p)
})
}
shinyApp(uix, serverx)

Linking Shiny Reactive inputs and input updates

I have a dataset of baby names that are ranked by popularity for each year.
What I currently have: a simple shiny app that filters on year based on a slider and a select button that identifies which column is the rank column to use (which also creates a color highlight). This will be two datasets in actuality, one for gender marked as M or gender marked as F, but I've left it simple for the example here.
What I would like to do: update this to be reactive to the values of a slider, which then updates a select option for which file column to sort and highlight.
The current approach works for simplicity, but the focus selector for the year obviously throws an error if it's a value that no longer exists in the slider selected range.
I've dug around and tried a few approaches, but I just haven't been able to get the reactivity portion to work successfully. I'm sure I'm missing something elementary but hitting a wall. Thank you for any input.
Example:
library(shiny)
library(tidyverse)
library(DT)
#Fake Data
dat <- structure(list(Name = c("Bill", "Sean", "Kirby", "Philbert",
"Bob", "Lucius", "Fry", "Tyron", "Lionel", "Alister", "Newt",
"Craig", "A-Aron", "Bill", "Sean", "Kirby", "Philbert", "Bob",
"Lucius", "Fry", "Tyron", "Lionel", "Alister", "Newt", "Craig",
"A-Aron", "Bill", "Sean", "Kirby", "Philbert", "Bob", "Lucius",
"Fry", "Tyron", "Lionel", "Alister", "Newt", "Craig", "A-Aron"
), rank = c(8L, 1L, 2L, 3L, 4L, 6L, 5L, 9L, 7L, 25L, 10L, 35L,
99L, 4L, 1L, 3L, 2L, 5L, 6L, 7L, 11L, 5L, 12L, 8L, 9L, 10L, 4L,
2L, 3L, 10L, 8L, 11L, 5L, 6L, 12L, 7L, 13L, 9L, 1L), year = c(2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L,
2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L)), class = "data.frame", row.names = c(NA, -39L))
#Get years
years <- unique(dat$year)
ui <- fluidPage(
titlePanel("Top Ten Male Baby Names"),
sliderInput("range",
label = "Choose year range",
min = min(as.numeric(years)),
max = max(as.numeric(years)),
sep = "",
value = c(max(as.numeric(years))-1,max(as.numeric(years)))
),
selectInput("year",
label = "Choose year for rank",
choices = as.numeric(years),
selected = max(as.numeric(years))
)
,
mainPanel(
dataTableOutput("DataTable")
)
)
server <- function(input, output) {
output$DataTable <- renderDataTable({
dat1 <- dat %>%
filter((year >= input$range[1] & year <= input$range[2]) ) %>%
pivot_wider(id_cols = Name,
values_from = rank,
names_from = year) %>%
filter(.[colnames(.) == as.character(input$year)] <11) %>%
arrange(.[colnames(.)== as.character(input$year)])
datatable(dat1,
options = list(ordering=F,
lengthChange = F,
pageLength = -1)) %>%
formatStyle(input$year,
backgroundColor = "lightgreen"
)
})
}
shinyApp(ui, server)
You could set up an observeEvent to watch for changes to the sliderInput. Then if your select input is not in the range of the slider. Update the selection.
Note: you need to add the session param to the server function.
Also since output$DataTable is filtered by the range and the year. I've added a validate statement incase the user were to manually chose a year that is not in the current range.
server <- function(input, output, session) {
# Observe for a change to slider input
observeEvent(input$range, {
sel = input$year
# update selection if original selected year is not in range
if(!(sel %in% input$range)) {
sel = min(input$range)
updateSelectInput(session, "year", selected = sel)
}
})
output$DataTable <- renderDataTable({
validate(need(input$year %in% input$range,"Current selection not in range"))
dat1 <- dat %>%
filter((year >= input$range[1] & year <= input$range[2]) ) %>%
pivot_wider(id_cols = Name,
values_from = rank,
names_from = year) %>%
filter(.[colnames(.) == as.character(input$year)] <11) %>%
arrange(.[colnames(.)== as.character(input$year)])
datatable(dat1,
options = list(ordering=F,
lengthChange = F,
pageLength = -1)) %>%
formatStyle(input$year,
backgroundColor = "lightgreen"
)
})
}
shinyApp(ui, server)

How to plot Unequal Interval on x axis of a time series data?

I am using plotrix package to visualize changes in the data using colors. The data is available here.
I am using below code for plotting the data.
library(plotrix)
my_colors1=c("red", "green","blue")
a<-read.csv("DataSt.csv")
x<-a$Year
y<-a$TP
clplot(x, y, main="",lwd=5,labels=y,levels=c(37,964,4377),col=my_colors1, showcuts=T, bty="n",xlab="Year", ylab = "numbers", axes=F)
axis(1, at = a$Year, las=2)
axis(2, at = seq(0, 4400, by = 100), las=2)
I am getting the above chart
I want to reduce the axis space between the year 1975 and 1989. Please help me to get unequal interval at the x axis.
It's a bit dangerous to do this give that the viewer might not realize the inconsistent spacing among the x-axis values. Nevertheless, the following example shows a possible solution by treating the x-values as factor levels. The problem is that that plotting function only allows numeric values. I thus plot with factors, but then need to use numeric values to plot some sort of interpolated values in between using segments:
a <- structure(list(Year = c(2021L, 2020L, 2019L, 2018L, 2017L, 2016L,
2015L, 2014L, 2013L, 2012L, 2011L, 2010L, 2009L, 2008L, 2007L,
2006L, 2005L, 2004L, 2003L, 2002L, 2001L, 2000L, 1999L, 1998L,
1997L, 1996L, 1995L, 1994L, 1993L, 1992L, 1991L, 1990L, 1989L,
1975L), TP = c(785L, 848L, 1067L, 1079L, 1263L, 678L, 1204L,
542L, 661L, 387L, 3534L, 4377L, 964L, 244L, 237L, 145L, 86L,
37L, 39L, 23L, 14L, 11L, 7L, 9L, 6L, 3L, 7L, 7L, 6L, 1L, 1L,
1L, 2L, 1L)), class = "data.frame", row.names = c(NA, -34L))
a$Year <- factor(a$Year)
a <- a[order(a$Year),]
head(a)
my_colors1=c("red", "green","blue")
plot(TP ~ Year, a, col = NA, border = NA, las = 2)
for(i in 2:nrow(a)){
b <- as.data.frame(approx(x = as.numeric(a$Year[(i-1):i]), y = a$TP[(i-1):i], n = 100))
b$col <- my_colors1[as.numeric(cut(b$y, breaks = c(-Inf,37,964,4377,Inf)))]
segments(x0 = b$x[-nrow(b)], x1 = b$x[-1], y0 = b$y[-nrow(b)], y1 = b$y[-1], col = b$col[-1])
}
abline(h = c(37,964), lty = 2)

Creating and populating monthly columns from rows of data

I have a CSV file with multiple rows of data for each person. Each row corresponds to a month. They are described by the variables 'year' (e.g. 2019) and 'month' (which goes from 1-12). On each row are monthly expenses in different categories (e.g. 'clothing').
My goal is to look at expenses in a certain category ('clothing'), over all the years ('year', 2018-2020) months ('month', 1-12) for each person ('aid').
Is it a good idea to try to get everything for each person on one row?
I think I have the start of that here for the id variable 'aid' and then I think I need to create and populate variables for clothing per month (something like 'clothing-2019-oct'):
people.df<-as.data.frame(infile.df$aid)
names(people.df)<-"aid"
people.df<-unique(people.df)
How would I get 'clothing-2019-oct', 'clothing-2019-nov' etc.
This should hopefully give an idea of what my dataset looks like:
> dput(infile.df)
structure(list(date = c("2019-12-01", "2020-01-01", "2020-02-01",
"2019-11-01", "2019-12-01", "2020-01-01", "2019-12-01", "2020-01-01",
"2020-02-01", "2016-11-01", "2016-12-01", "2017-01-01", "2017-02-01"
), year = c(2019L, 2020L, 2020L, 2019L, 2019L, 2020L, 2019L,
2020L, 2020L, 2016L, 2016L, 2017L, 2017L), month = c(12L, 1L,
2L, 11L, 12L, 1L, 12L, 1L, 2L, 11L, 12L, 1L, 2L), aid = c("1hg6d",
"1hg6d", "1hg6d", "7gjfgg", "7gjfgg", "7gjfgg", "8hdf9", "8hdf9",
"8hdf9", "hf43dy", "hf43dy", "hf43dy", "hf43dy"), clothing = c("98.4911",
"98.4911", "98.4911", "1125.61", "1125.61", "1125.61", "1584.65",
"0", "390.4", "4327.7", "2446.5", "2489", "0"), culture = c("54.1325",
"54.1325", "54.1325", "618.657", "618.657", "618.657", "2494.5",
"28.5", "0", "4859.31", "1115", "0", "1940")), class = "data.frame", row.names = c(NA,
-13L))

Label Range between two points on scatterplot with the percent difference

I have a simple scatterplot showing sales difference between years at different ranges.
So, when the range is ">$400", sales are X in 2013 and X in 2014.
I am trying to add an annotation at certain points showing the percent difference from 2013 to 2014. Is that possible?
Here is the dput:
structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L,
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L,
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("$40M",
"$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-100M", "$101-110M",
"$111-120M", "$121-130M", "$131-140M", "$141-150M", "$151-160M",
"$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M",
"$351-375M", "$376-400M", ">$400M"), class = "factor"), Avg_TOTALS = c(44732492.5,
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5,
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417,
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143,
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333,
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379,
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723,
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286,
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year",
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA,
-44L))
And here is the chart I am currently generating:
orderlist = c("$40M", "$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91- 100M", "$101-110M", "$111-120M", "$121-130M",
"$131-140M", "$141-150M", "$151-160M", "$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", "$351-375M", "$376-400M", ">$400M")
myDF = transform(myDF, Range = factor(Range, levels = orderlist))
myChart <- ggplot(myDF, aes(x = Range, y = Avg_TOTALS)) +
geom_point(aes(color = factor(Year))) +
theme_tufte() +
theme(axis.text.x= element_text(angle = 90, hjust = 0)) +
labs(x = "Range", y = "Sales by Range", title = "MyChart")+
scale_y_continuous(breaks = c(50000000, 100000000, 200000000,
300000000,400000000, 500000000),
labels = dollar)
Which gives me:
And leads me to this question:
How would I add the percent difference between each of those points, with 2013 being the base year? Also, there are a few ranges where there were sales in only one of the two years- would it be possible to skip the percent labels on those? A condition in which data must exist in both years to be included?
Thanks for any help!
Here is one way. I think there are better ways. This is my best with my sleepy brain right now. Hope you do not mind that. Let me briefly explain the code. I followed you. Then, I obtained the data which ggplot is using, which I called foo. I created a master data frame to deal with missing data points and used join. The dplyr part was doing some calculation and stuff to get proportion. Using the outcome of it in annotate, I assigned the labels you wanted. Hope this will help you. zzz...
DATA
mydf <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L,
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L,
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("$40M",
"$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-100M", "$101-110M",
"$111-120M", "$121-130M", "$131-140M", "$141-150M", "$151-160M",
"$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M",
"$351-375M", "$376-400M", ">$400M"), class = "factor"), Avg_TOTALS = c(44732492.5,
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5,
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417,
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143,
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333,
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379,
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723,
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286,
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year",
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA,
-44L))
orderlist = c("$40M", "$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91- 100M", "$101-110M", "$111-120M", "$121-130M",
"$131-140M", "$141-150M", "$151-160M", "$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M",
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", "$351-375M", "$376-400M", ">$400M")
mydf = transform(myDF, Range = factor(Range, levels = orderlist))
g <- ggplot(mydf, aes(x = Range, y = Avg_TOTALS)) +
geom_point(aes(color = factor(Year))) +
#theme_tufte() +
theme(axis.text.x= element_text(angle = 90, hjust = 0))+
labs(x="Range", y = "Sales by Range", title = "MyChart")+
scale_y_continuous(breaks = c(50000000, 100000000, 200000000, 300000000,400000000, 500000000), labels = dollar)
library(dplyr)
foo <- ggplot_build(g)$data[[1]] %>%
arrange(group) %>%
mutate(year = c(rep("2013", times = 23), rep("2014", times = 21)))
master <- expand.grid(year = c("2013", "2014"), group = 1:24)
full_join(master, foo, by = c("year", c("group" = "x"))) %>%
group_by(group) %>%
mutate(prop = round(order_by(year, y / first(y)), 2)) %>%
summarise(y = first(y), prop = min(prop, na.rm = FALSE)) -> txt
g + annotate("text", x = txt$group, y = txt$y + 15000000, label = txt$prop)

Resources