I wish to create a single heatmap using ggplot including following factors:
year-month, mkt_name, mp_price
str(df)
'data.frame': 2655 obs. of 5 variables:
$ year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
$ yearmonthf: Factor w/ 48 levels "Jan 2012","Feb 2012",..: 1 2 3 4 5 6 7 8 9 10 ...
$ month : int 1 2 3 4 5 6 7 8 9 10 ...
$ mkt_name : Factor w/ 63 levels "Base","Birambo",..: 2 2 2 2 2 2 2 2 2 2 ...
$ mp_price : num 145 136 160 163 181 ...
df <- read.csv('/Users/shashankshekhar/Desktop/R Food Price/Foodprice/datasets/Correlation/Heatmap/Potatoesheat.csv')
df$date <- as.Date(df$date) # format date
df <- df[df$year >= 2012, ] # filter reqd years
# Create Month Week
df$yearmonth <- as.yearmon(df$date)
df$yearmonthf <- factor(df$yearmonth)
df <- df[, c("year", "yearmonthf", "month", "mkt_name", "mp_price")]
head(df)
year yearmonthf month mkt_name mp_price
15 2012 Jan 2012 1 Birambo 145.00
16 2012 Feb 2012 2 Birambo 136.25
17 2012 Mar 2012 3 Birambo 160.00
18 2012 Apr 2012 4 Birambo 162.75
19 2012 May 2012 5 Birambo 181.00
20 2012 Jun 2012 6 Birambo 170.00
Heatmap ggplot
ggplot(df, aes(mkt_name, year, fill = mp_price)) +
geom_tile(colour = "white") +
facet_grid(year~mkt_name) +
scale_fill_gradient(low="red", high="green") +
labs(x="Week of Month",
y="",
title = "Time-Series Calendar Heatmap",
subtitle="Potato Price",
fill="Price")
I think you mis-attributed your variables. Instead try:
ggplot(df, aes(x = yearmonth, y = mkt_name, fill = mp_price))+
geom_tile()+
scale_x_date(date_labels = "%b %Y")+
scale_fill_gradient(low="red", high="green") +
labs(x="Week of Month",
y="",
title = "Time-Series Calendar Heatmap",
subtitle="Potato Price",
fill="Price")
Does it answer your question ?
reproducible example
df <- data.frame(mkt_name = rep("Birambo",6),
year = 2012,
yearmonth = seq(as.Date("2012-01-01", format = "%Y-%m-%d"), as.Date("2012-06-01", format = "%Y-%m-%d"), by = "month"),
month = 1:6,
mp_price = c(145,136.25,160,162.75,181,170))
Hello! Is there a way to index a chart to start and end at specific points
(which may be out of numeric order)?
I have data that begins October 1st, and ends September 31st the following year. The series repeats through multiple years past, and i want to build a daily seasonality chart. The challenge is the X axis is not from low to high, it runs 10-11-12-1-2-3-4-5-6-7-8-9.
Question 1:
Can you order the index by month 10-11-12-1-2-3-4-5-6-7-8-9?
while, being compatible with %m-%d formatting, as the real problem is in
daily format, but for the sake of brevity, I am only using months.
the result should look something like this...sorry i had to use excel...
Question 2:
Can we remove the connected chart lines, or will the solution to 1, naturally fix
question 2? examples in the attempts below.
Question 3:
Can the final formatting of the solution allow to take a moving average, or other
mutations of the initial data? The table in attempt #2 would allow to take the average of each month by year. Since July 17 is 6 and July 18 is 12, we would plot a 9 in the chart, ect for the entire plot.
Question 4:
Is there and XTS equivalent to solve this problem?
THANK YOU, THANK YOU, THANK YOU!
library(ggplot2)
library(plotly)
library(tidyr)
library(reshape2)
Date <- seq(as.Date("2016-10-1"), as.Date("2018-09-01"), by="month")
values <- c(2,3,4,3,4,5,6,4,5,6,7,8,9,10,8,9,10,11,12,13,11,12,13,14)
YearEnd <-c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,
2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018)
df <- data.frame(Date,values,YearEnd)
## PLOT THE TIMESERIES
plot_ly(df, x = ~Date, y = ~values, type = "scatter", mode = "lines")
## PLOT THE DATA BY MONTH: attempt 1
df$Month <- format(df$Date, format="%m")
df2 <- df %>%
select(values, Month, YearEnd)
plot_ly(df2, x = ~Month, y = ~values, type = "scatter", mode = "lines",
connectgaps = FALSE)
## Plot starts on the 10th month, which is good, but the index is
## in standard order, not 10-11-12-1-2-3-4-5-6-7-8-9
## It also still connects the gaps, bad.
## CREATE A PIVOTTABLE: attempt 2
table <- spread(df2,YearEnd, values)
df3 <- melt(table , id.vars = 'Month', variable.name = 'series')
plot_ly(df3, x = ~Month, y = ~values, type = "scatter", mode = "lines",
connectgaps = FALSE)
## now the data are in the right order, but the index is still wrong
## I also do not understand how plotly is ordering it correctly, as 2
## is not the starting point in January.
You just need to set the desired levels for the Month inside factor
library(magrittr)
library(tidyverse)
library(lubridate)
library(plotly)
Date <- seq(as.Date("2016-10-1"), as.Date("2018-09-01"), by = "month")
values <- c(2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 8, 9, 10, 8, 9, 10, 11, 12, 13, 11, 12, 13, 14)
YearEnd <- c(
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018
)
df <- data.frame(Date, values, YearEnd)
# to fiscal year order
df %<>%
mutate(
Month = month(Date),
YearEnd = factor(YearEnd)) %>%
mutate(Month = factor(Month,
levels = c(10:12, 1:9),
labels = c(month.abb[10:12], month.abb[1:9])))
df
#> Date values YearEnd Month
#> 1 2016-10-01 2 2017 Oct
#> 2 2016-11-01 3 2017 Nov
#> 3 2016-12-01 4 2017 Dec
#> 4 2017-01-01 3 2017 Jan
#> 5 2017-02-01 4 2017 Feb
#> 6 2017-03-01 5 2017 Mar
#> 7 2017-04-01 6 2017 Apr
#> 8 2017-05-01 4 2017 May
#> 9 2017-06-01 5 2017 Jun
#> 10 2017-07-01 6 2017 Jul
#> 11 2017-08-01 7 2017 Aug
#> 12 2017-09-01 8 2017 Sep
...
p1 <- ggplot(df, aes(
x = Month, y = values,
color = YearEnd,
group = YearEnd)) +
geom_line() +
theme_classic(base_size = 12)
ggplotly(p1)
Edit: to plot by Julian day, we use a similar method to the 3rd one from this answer
# Generate random data
set.seed(2018)
date = seq(from = as.Date("2016-10-01"), to = as.Date("2018-09-30"),
by = "days")
values = c(rnorm(length(date)/2, 8, 1.5), rnorm(length(date)/2, 16, 2))
dat <- data.frame(date, values)
df <- dat %>%
tbl_df() %>%
mutate(jday = factor(yday(date)),
Month = month(date),
Year = year(date),
# only create label for the 1st day of the month
myLabel = case_when(day(date) == 1L ~ format(date, "%b-%d"),
TRUE ~ NA_character_)) %>%
# create fiscal year column
mutate(fcyear = case_when(Month > 9 ~ as.factor(Year + 1),
TRUE ~ as.factor(Year))) %>%
mutate(Month = factor(Month,
levels = c(10:12, 1:9),
labels = c(month.abb[10:12], month.abb[1:9])))
df
#> # A tibble: 730 x 7
#> date values jday Month Year myLabel fcyear
#> <date> <dbl> <fct> <fct> <dbl> <chr> <fct>
#> 1 2016-10-01 7.37 275 Oct 2016 Oct-01 2017
#> 2 2016-10-02 5.68 276 Oct 2016 <NA> 2017
#> 3 2016-10-03 7.90 277 Oct 2016 <NA> 2017
#> 4 2016-10-04 8.41 278 Oct 2016 <NA> 2017
#> 5 2016-10-05 10.6 279 Oct 2016 <NA> 2017
#> 6 2016-10-06 7.60 280 Oct 2016 <NA> 2017
#> 7 2016-10-07 11.1 281 Oct 2016 <NA> 2017
#> 8 2016-10-08 9.30 282 Oct 2016 <NA> 2017
#> 9 2016-10-09 7.08 283 Oct 2016 <NA> 2017
#> 10 2016-10-10 8.96 284 Oct 2016 <NA> 2017
#> # ... with 720 more rows
# Create a row number for plotting to make sure ggplot plot in
# the exact order of a fiscal year
df1 <- df %>%
group_by(fcyear) %>%
mutate(order = row_number()) %>%
ungroup()
df1
#> # A tibble: 730 x 8
#> date values jday Month Year myLabel fcyear order
#> <date> <dbl> <fct> <fct> <dbl> <chr> <fct> <int>
#> 1 2016-10-01 7.37 275 Oct 2016 Oct-01 2017 1
#> 2 2016-10-02 5.68 276 Oct 2016 <NA> 2017 2
#> 3 2016-10-03 7.90 277 Oct 2016 <NA> 2017 3
#> 4 2016-10-04 8.41 278 Oct 2016 <NA> 2017 4
#> 5 2016-10-05 10.6 279 Oct 2016 <NA> 2017 5
#> 6 2016-10-06 7.60 280 Oct 2016 <NA> 2017 6
#> 7 2016-10-07 11.1 281 Oct 2016 <NA> 2017 7
#> 8 2016-10-08 9.30 282 Oct 2016 <NA> 2017 8
#> 9 2016-10-09 7.08 283 Oct 2016 <NA> 2017 9
#> 10 2016-10-10 8.96 284 Oct 2016 <NA> 2017 10
#> # ... with 720 more rows
# plot with `order` as x-axis
p2 <- ggplot(df1,
aes(x = order, y = values,
color = fcyear,
group = fcyear)) +
geom_line() +
theme_classic(base_size = 12) +
xlab(NULL)
# now replace `order` label with `myLabel` created above
x_break <- df1$order[!is.na(df1$myLabel)][1:12]
x_label <- df1$myLabel[x_break]
x_label
#> [1] "Oct-01" "Nov-01" "Dec-01" "Jan-01" "Feb-01" "Mar-01" "Apr-01"
#> [8] "May-01" "Jun-01" "Jul-01" "Aug-01" "Sep-01"
p3 <- p2 +
scale_x_continuous(
breaks = x_break,
labels = x_label) +
theme(axis.text.x = element_text(angle = 90)) +
scale_color_brewer("Fiscal Year", palette = "Dark2") +
xlab(NULL)
p3
ggplotly(p3)
Created on 2018-09-09 by the reprex package (v0.2.0.9000).
Consider this an appendix to Tung's excellent answer. Here I've made it obvious how to alter the code for different start and end months of financial (or production) years which varies by country (and industry), with the Parameter EndMonth. I've also added an annual average, which seems like a pretty obvious thing to want as well (though outside the OP's request).
library(tidyverse)
library(lubridate)
## Generate random data
set.seed(2018)
date = seq(from = as.Date("2016-06-01"), to = as.Date("2016-06-01")+729,
by = "days") # about 2 years, but even number of days
values = c(rnorm(length(date)/2, 8, 1.5), rnorm(length(date)/2, 16, 2))
dat <- data.frame(date, values)
EndMonth <- 5 #i.e. if last month of financial year is May, use 5 for 5th month of calendar year
df <- dat %>%
tbl_df() %>%
mutate(jday = factor(yday(date)),
Month = month(date),
Year = year(date),
# only create label for the 1st day of the month
myLabel = case_when(day(date) == 1L ~ format(date, "%b%e"),
TRUE ~ NA_character_)) %>%
# create fiscal year column
mutate(fcyear = case_when(Month > EndMonth ~ as.factor(Year + 1),
TRUE ~ as.factor(Year))) %>%
mutate(Month = factor(Month,
levels = c((EndMonth+1):12, 1:(EndMonth)),
labels = c(month.abb[(EndMonth+1):12], month.abb[1:EndMonth])))
df
#make 2 (or n) year average
df_mean <- df %>%
group_by(jday) %>%
mutate(values = mean(values, na.rm=TRUE)) %>%
filter(fcyear %in% c("2017")) %>% #note hard code for first fcyear in dataset
mutate(fcyear = "Average")
#Add average to data frame
df <- bind_rows(df, df_mean)
# Create a row number for plotting to make sure ggplot plot in
# the exact order of a fiscal year
df1 <- df %>%
group_by(fcyear) %>%
mutate(order = row_number()) %>%
ungroup()
df1
# plot with `order` as x-axis
p2 <- ggplot(df1,
aes(x = order, y = values,
color = fcyear,
group = fcyear)) +
geom_line() +
theme_classic(base_size = 12) +
xlab(NULL)
p2
# now replace `order` label with `myLabel` created above
x_break <- df1$order[!is.na(df1$myLabel)][1:12]
x_label <- df1$myLabel[x_break]
x_label
p3 <- p2 +
scale_x_continuous(
breaks = x_break,
labels = x_label) +
theme(axis.text.x = element_text(angle = 90)) +
scale_color_brewer("Fiscal Year", palette = "Dark2") +
xlab(NULL)
p3
I tried the code in the answer in this previous thread, ggplot2 and a Stacked Bar Chart with Negative Values.
dat <- read.table(text = " Division Year OperatingIncome
1 A 2012 11460
2 B 2012 7431
3 C 2012 -8121
4 D 2012 15719
5 E 2012 364
6 A 2011 12211
7 B 2011 6290
8 C 2011 -2657
9 D 2011 14657
10 E 2011 1257
11 A 2010 12895
12 B 2010 5381
13 C 2010 -2408
14 D 2010 11849
15 E 2010 517",header = TRUE,sep = "",row.names = 1)
dat1 <- subset(dat,OperatingIncome >= 0)
dat2 <- subset(dat,OperatingIncome < 0)
plot <- ggplot() +
geom_bar(data = dat1, aes(x=Year, y=OperatingIncome, fill=Division),stat = "identity") +
geom_bar(data = dat2, aes(x=Year, y=OperatingIncome, fill=Division),stat = "identity") +
scale_fill_brewer(type = "seq", palette = 1)
ggplotly(plot)
Here is what I'm getting:
If I run plot(plot) then it works fine:
How do I fix the issue in Plotly?
For future readers
Nowadays, plotly (I am using 4.8.0) supports stacked barcharts with negative values. In the layout you have to set barmode=relative. Moreover, you can also use the ggplotly functionality posted in the question.
plot_ly(dat, y=~OperatingIncome, x=~Year, type='bar', name=~Division, color =~Division,
colors='Blues', marker=list(line=list(width=1, color='lightgray'))) %>%
layout(barmode = 'relative')
Will return:
I have several data frames (i.e t1, t2 and t3) of same format but maybe with different row lengths.
t1
year month avgTemp
2006 1 -0.95
2006 2 1.34
2006 3 3.58
2006 4 9.94
2006 5 14.67
2006 6 18.38
2006 7 23.56
2006 8 16.57
2006 9 18.08
2006 10 13.26
2006 11 8.27
2006 12 4.82
t2
year month avgTemp
2015 1 3.01
2015 2 2.16
2015 3 6.37
2015 4 10.31
2015 5 14.40
2015 6 17.84
2015 7 22.04
2015 8 21.35
2015 9 14.18
2015 10 9.40
2015 11 8.18
2015 12 7.22
and t3
year month avgTemp
2005 7 19.79
2005 8 17.54
2005 9 16.69
2005 10 11.64
2005 11 5.40
2005 12 1.97
Now, when I want to plot those 3 data frames in one diagramm I am doing this:
ggplot() +
geom_line(data=t1, aes(x = t1$month, y = t1$avgTemp, colour = t1$year)) +
geom_line(data=t2, aes(x = t2$month, y = t2$avgTemp, colour = t2$year)) +
geom_line(data=t3, aes(x = t3$month, y = t3$avgTemp, colour = t3$year))
And the output look like this
So far everything is ok, but the plot command is very ugly since I have to put every data frame into a new geom_line.
Is there a more elegant way to achieve this by merging the data frames or so?
Thanks in advance.
You can try something like this:
t <- rbind(t1, t2, t3)
t$year <- as.factor(t$year)
ggplot(t, aes(x = month, y = avgTemp, col = year)) + geom_line()
It should give you the desired plot with three lines for three years.
EDIT: Adding this code option based on the comment below about leaving year as a numeric value:
t <- rbind(t1, t2, t3)
ggplot(t, aes(x = month, y = avgTemp, col = year, group = year)) + geom_line()