Grouped barplot in ggplot2 in R - r

I would like to make a grouped bar plot. An example of my data is as follows:
site code year month gear total value
678490 2012 3 GL 13882
678490 2012 4 GL 50942
678490 2012 5 GL 54973
678490 2012 6 GL 63938
678490 2012 7 GL 23825
678490 2012 8 GL 8195
678490 2012 9 GL 14859
678490 2012 9 RT 3225
678490 2012 10 GL 981
678490 2012 10 RT 19074
678490 2012 11 SD 106384
678490 2012 11 RT 2828
678490 2012 12 GL 107167
678490 2012 12 RT 4514
There are 17 site code options, four year options, twelve month options, and four gear options.
What I would to produce is a plot per site, per year, showing the 'total value' for each gear, for each month, as a bar.
So far I have managed to produce a plot, specific to site and year, but with the total values displayed in one bar per month, not separated into separate bars per month (can not include image in first post!)
But for months 9, 10, 11 and 12 there were two gears used so I want there to be two bars for these months.
I am using the following piece of code:
ggplot(subset(cdata, year %in% c("2012") & site code %in% c("678490")),
aes(x = factor(month), y = total value)) +
geom_bar(stat = "identity") +
labs(x = "Month", y = "Total value")
Any help on this would be greatly appreciated.

If you want separate bars for each gear, then you should add fill=gear to the aes in geom_bar:
ggplot(cdata[cdata$year==2012 & cdata$sitecode==678490,],
aes(x = factor(month), y = totalvalue, fill=gear)) +
geom_bar(stat = "identity", position="dodge") +
labs(x = "Month", y = "Total value")
this gives:
When you want to make a plot per site, per year, showing the 'total value' for each gear, for each month, as a bar, you can use facet_grid. For example:
ggplot(cdata, aes(x = factor(month), y = totalvalue, fill=gear)) +
geom_bar(stat = "identity", position="dodge") +
labs(x = "Month", y = "Total value") +
facet_grid(sitecode ~ year)
this gives:
Some additional comments:
It's probably better not to use spaces in your column names (in the code above I removed the spaces)
Add an example to your question which illustrative for the problem you are facing. In this case, it's better to give an example dataset that includes several sitecodes and several years.
I therefore made up some data:
df1 <- read.table(text="sitecode year month gear totalvalue
678490 2012 3 GL 13882
678490 2012 4 GL 50942
678490 2012 5 GL 54973
678490 2012 6 GL 63938
678490 2012 7 GL 23825
678490 2012 8 GL 8195
678490 2012 9 GL 14859
678490 2012 9 RT 3225
678490 2012 10 GL 981
678490 2012 10 RT 19074
678490 2012 11 SD 106384
678490 2012 11 RT 2828
678490 2012 12 GL 107167
678490 2012 12 RT 4514", header= TRUE)
df2 <- df1
df2$sitecode <- 7849
df2$year <- 2013
df3 <- df1
df3$sitecode <- 7849
df4 <- df1
df4$year <- 2013
cdata <- rbind(df1,df2,df3,df4)

Related

Time Series Heat Map

I wish to create a single heatmap using ggplot including following factors:
year-month, mkt_name, mp_price
str(df)
'data.frame': 2655 obs. of 5 variables:
$ year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
$ yearmonthf: Factor w/ 48 levels "Jan 2012","Feb 2012",..: 1 2 3 4 5 6 7 8 9 10 ...
$ month : int 1 2 3 4 5 6 7 8 9 10 ...
$ mkt_name : Factor w/ 63 levels "Base","Birambo",..: 2 2 2 2 2 2 2 2 2 2 ...
$ mp_price : num 145 136 160 163 181 ...
df <- read.csv('/Users/shashankshekhar/Desktop/R Food Price/Foodprice/datasets/Correlation/Heatmap/Potatoesheat.csv')
df$date <- as.Date(df$date) # format date
df <- df[df$year >= 2012, ] # filter reqd years
# Create Month Week
df$yearmonth <- as.yearmon(df$date)
df$yearmonthf <- factor(df$yearmonth)
df <- df[, c("year", "yearmonthf", "month", "mkt_name", "mp_price")]
head(df)
year yearmonthf month mkt_name mp_price
15 2012 Jan 2012 1 Birambo 145.00
16 2012 Feb 2012 2 Birambo 136.25
17 2012 Mar 2012 3 Birambo 160.00
18 2012 Apr 2012 4 Birambo 162.75
19 2012 May 2012 5 Birambo 181.00
20 2012 Jun 2012 6 Birambo 170.00
Heatmap ggplot
ggplot(df, aes(mkt_name, year, fill = mp_price)) +
geom_tile(colour = "white") +
facet_grid(year~mkt_name) +
scale_fill_gradient(low="red", high="green") +
labs(x="Week of Month",
y="",
title = "Time-Series Calendar Heatmap",
subtitle="Potato Price",
fill="Price")
I think you mis-attributed your variables. Instead try:
ggplot(df, aes(x = yearmonth, y = mkt_name, fill = mp_price))+
geom_tile()+
scale_x_date(date_labels = "%b %Y")+
scale_fill_gradient(low="red", high="green") +
labs(x="Week of Month",
y="",
title = "Time-Series Calendar Heatmap",
subtitle="Potato Price",
fill="Price")
Does it answer your question ?
reproducible example
df <- data.frame(mkt_name = rep("Birambo",6),
year = 2012,
yearmonth = seq(as.Date("2012-01-01", format = "%Y-%m-%d"), as.Date("2012-06-01", format = "%Y-%m-%d"), by = "month"),
month = 1:6,
mp_price = c(145,136.25,160,162.75,181,170))

how to plot a expenditure vs year in r

I have a dataset which has about 100,000 datapoints.
I want to plot two columns.
Y axis - Year
X axis - Sales
Sample Data:
Sales Year
22 2016
10 2016
3.99 2017
8.99 2017
12.99 2017
8.00 2016
12.00 2017
5.00 2016
22 2017
50 2016
53 2017
Im using the following code
plot(subset_4$SALES ~ subset_4$YEAR)
But the plot doesn't look great. Is there any nicer way of doing this?
Update: plot(subset_4$SALES ~ subset_4$WEEKS)
You can try ggplot2 library
df <- data.frame(sales, year)
ggplot(df, aes(x = sales, y = year, color = year)) +
geom_point() +
xlab("Sales") +
ylab("Year")

generate seasonal plot, but with fiscal year start/end dates

Hello! Is there a way to index a chart to start and end at specific points
(which may be out of numeric order)?
I have data that begins October 1st, and ends September 31st the following year. The series repeats through multiple years past, and i want to build a daily seasonality chart. The challenge is the X axis is not from low to high, it runs 10-11-12-1-2-3-4-5-6-7-8-9.
Question 1:
Can you order the index by month 10-11-12-1-2-3-4-5-6-7-8-9?
while, being compatible with %m-%d formatting, as the real problem is in
daily format, but for the sake of brevity, I am only using months.
the result should look something like this...sorry i had to use excel...
Question 2:
Can we remove the connected chart lines, or will the solution to 1, naturally fix
question 2? examples in the attempts below.
Question 3:
Can the final formatting of the solution allow to take a moving average, or other
mutations of the initial data? The table in attempt #2 would allow to take the average of each month by year. Since July 17 is 6 and July 18 is 12, we would plot a 9 in the chart, ect for the entire plot.
Question 4:
Is there and XTS equivalent to solve this problem?
THANK YOU, THANK YOU, THANK YOU!
library(ggplot2)
library(plotly)
library(tidyr)
library(reshape2)
Date <- seq(as.Date("2016-10-1"), as.Date("2018-09-01"), by="month")
values <- c(2,3,4,3,4,5,6,4,5,6,7,8,9,10,8,9,10,11,12,13,11,12,13,14)
YearEnd <-c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,
2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018)
df <- data.frame(Date,values,YearEnd)
## PLOT THE TIMESERIES
plot_ly(df, x = ~Date, y = ~values, type = "scatter", mode = "lines")
## PLOT THE DATA BY MONTH: attempt 1
df$Month <- format(df$Date, format="%m")
df2 <- df %>%
select(values, Month, YearEnd)
plot_ly(df2, x = ~Month, y = ~values, type = "scatter", mode = "lines",
connectgaps = FALSE)
## Plot starts on the 10th month, which is good, but the index is
## in standard order, not 10-11-12-1-2-3-4-5-6-7-8-9
## It also still connects the gaps, bad.
## CREATE A PIVOTTABLE: attempt 2
table <- spread(df2,YearEnd, values)
df3 <- melt(table , id.vars = 'Month', variable.name = 'series')
plot_ly(df3, x = ~Month, y = ~values, type = "scatter", mode = "lines",
connectgaps = FALSE)
## now the data are in the right order, but the index is still wrong
## I also do not understand how plotly is ordering it correctly, as 2
## is not the starting point in January.
You just need to set the desired levels for the Month inside factor
library(magrittr)
library(tidyverse)
library(lubridate)
library(plotly)
Date <- seq(as.Date("2016-10-1"), as.Date("2018-09-01"), by = "month")
values <- c(2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 8, 9, 10, 8, 9, 10, 11, 12, 13, 11, 12, 13, 14)
YearEnd <- c(
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018
)
df <- data.frame(Date, values, YearEnd)
# to fiscal year order
df %<>%
mutate(
Month = month(Date),
YearEnd = factor(YearEnd)) %>%
mutate(Month = factor(Month,
levels = c(10:12, 1:9),
labels = c(month.abb[10:12], month.abb[1:9])))
df
#> Date values YearEnd Month
#> 1 2016-10-01 2 2017 Oct
#> 2 2016-11-01 3 2017 Nov
#> 3 2016-12-01 4 2017 Dec
#> 4 2017-01-01 3 2017 Jan
#> 5 2017-02-01 4 2017 Feb
#> 6 2017-03-01 5 2017 Mar
#> 7 2017-04-01 6 2017 Apr
#> 8 2017-05-01 4 2017 May
#> 9 2017-06-01 5 2017 Jun
#> 10 2017-07-01 6 2017 Jul
#> 11 2017-08-01 7 2017 Aug
#> 12 2017-09-01 8 2017 Sep
...
p1 <- ggplot(df, aes(
x = Month, y = values,
color = YearEnd,
group = YearEnd)) +
geom_line() +
theme_classic(base_size = 12)
ggplotly(p1)
Edit: to plot by Julian day, we use a similar method to the 3rd one from this answer
# Generate random data
set.seed(2018)
date = seq(from = as.Date("2016-10-01"), to = as.Date("2018-09-30"),
by = "days")
values = c(rnorm(length(date)/2, 8, 1.5), rnorm(length(date)/2, 16, 2))
dat <- data.frame(date, values)
df <- dat %>%
tbl_df() %>%
mutate(jday = factor(yday(date)),
Month = month(date),
Year = year(date),
# only create label for the 1st day of the month
myLabel = case_when(day(date) == 1L ~ format(date, "%b-%d"),
TRUE ~ NA_character_)) %>%
# create fiscal year column
mutate(fcyear = case_when(Month > 9 ~ as.factor(Year + 1),
TRUE ~ as.factor(Year))) %>%
mutate(Month = factor(Month,
levels = c(10:12, 1:9),
labels = c(month.abb[10:12], month.abb[1:9])))
df
#> # A tibble: 730 x 7
#> date values jday Month Year myLabel fcyear
#> <date> <dbl> <fct> <fct> <dbl> <chr> <fct>
#> 1 2016-10-01 7.37 275 Oct 2016 Oct-01 2017
#> 2 2016-10-02 5.68 276 Oct 2016 <NA> 2017
#> 3 2016-10-03 7.90 277 Oct 2016 <NA> 2017
#> 4 2016-10-04 8.41 278 Oct 2016 <NA> 2017
#> 5 2016-10-05 10.6 279 Oct 2016 <NA> 2017
#> 6 2016-10-06 7.60 280 Oct 2016 <NA> 2017
#> 7 2016-10-07 11.1 281 Oct 2016 <NA> 2017
#> 8 2016-10-08 9.30 282 Oct 2016 <NA> 2017
#> 9 2016-10-09 7.08 283 Oct 2016 <NA> 2017
#> 10 2016-10-10 8.96 284 Oct 2016 <NA> 2017
#> # ... with 720 more rows
# Create a row number for plotting to make sure ggplot plot in
# the exact order of a fiscal year
df1 <- df %>%
group_by(fcyear) %>%
mutate(order = row_number()) %>%
ungroup()
df1
#> # A tibble: 730 x 8
#> date values jday Month Year myLabel fcyear order
#> <date> <dbl> <fct> <fct> <dbl> <chr> <fct> <int>
#> 1 2016-10-01 7.37 275 Oct 2016 Oct-01 2017 1
#> 2 2016-10-02 5.68 276 Oct 2016 <NA> 2017 2
#> 3 2016-10-03 7.90 277 Oct 2016 <NA> 2017 3
#> 4 2016-10-04 8.41 278 Oct 2016 <NA> 2017 4
#> 5 2016-10-05 10.6 279 Oct 2016 <NA> 2017 5
#> 6 2016-10-06 7.60 280 Oct 2016 <NA> 2017 6
#> 7 2016-10-07 11.1 281 Oct 2016 <NA> 2017 7
#> 8 2016-10-08 9.30 282 Oct 2016 <NA> 2017 8
#> 9 2016-10-09 7.08 283 Oct 2016 <NA> 2017 9
#> 10 2016-10-10 8.96 284 Oct 2016 <NA> 2017 10
#> # ... with 720 more rows
# plot with `order` as x-axis
p2 <- ggplot(df1,
aes(x = order, y = values,
color = fcyear,
group = fcyear)) +
geom_line() +
theme_classic(base_size = 12) +
xlab(NULL)
# now replace `order` label with `myLabel` created above
x_break <- df1$order[!is.na(df1$myLabel)][1:12]
x_label <- df1$myLabel[x_break]
x_label
#> [1] "Oct-01" "Nov-01" "Dec-01" "Jan-01" "Feb-01" "Mar-01" "Apr-01"
#> [8] "May-01" "Jun-01" "Jul-01" "Aug-01" "Sep-01"
p3 <- p2 +
scale_x_continuous(
breaks = x_break,
labels = x_label) +
theme(axis.text.x = element_text(angle = 90)) +
scale_color_brewer("Fiscal Year", palette = "Dark2") +
xlab(NULL)
p3
ggplotly(p3)
Created on 2018-09-09 by the reprex package (v0.2.0.9000).
Consider this an appendix to Tung's excellent answer. Here I've made it obvious how to alter the code for different start and end months of financial (or production) years which varies by country (and industry), with the Parameter EndMonth. I've also added an annual average, which seems like a pretty obvious thing to want as well (though outside the OP's request).
library(tidyverse)
library(lubridate)
## Generate random data
set.seed(2018)
date = seq(from = as.Date("2016-06-01"), to = as.Date("2016-06-01")+729,
by = "days") # about 2 years, but even number of days
values = c(rnorm(length(date)/2, 8, 1.5), rnorm(length(date)/2, 16, 2))
dat <- data.frame(date, values)
EndMonth <- 5 #i.e. if last month of financial year is May, use 5 for 5th month of calendar year
df <- dat %>%
tbl_df() %>%
mutate(jday = factor(yday(date)),
Month = month(date),
Year = year(date),
# only create label for the 1st day of the month
myLabel = case_when(day(date) == 1L ~ format(date, "%b%e"),
TRUE ~ NA_character_)) %>%
# create fiscal year column
mutate(fcyear = case_when(Month > EndMonth ~ as.factor(Year + 1),
TRUE ~ as.factor(Year))) %>%
mutate(Month = factor(Month,
levels = c((EndMonth+1):12, 1:(EndMonth)),
labels = c(month.abb[(EndMonth+1):12], month.abb[1:EndMonth])))
df
#make 2 (or n) year average
df_mean <- df %>%
group_by(jday) %>%
mutate(values = mean(values, na.rm=TRUE)) %>%
filter(fcyear %in% c("2017")) %>% #note hard code for first fcyear in dataset
mutate(fcyear = "Average")
#Add average to data frame
df <- bind_rows(df, df_mean)
# Create a row number for plotting to make sure ggplot plot in
# the exact order of a fiscal year
df1 <- df %>%
group_by(fcyear) %>%
mutate(order = row_number()) %>%
ungroup()
df1
# plot with `order` as x-axis
p2 <- ggplot(df1,
aes(x = order, y = values,
color = fcyear,
group = fcyear)) +
geom_line() +
theme_classic(base_size = 12) +
xlab(NULL)
p2
# now replace `order` label with `myLabel` created above
x_break <- df1$order[!is.na(df1$myLabel)][1:12]
x_label <- df1$myLabel[x_break]
x_label
p3 <- p2 +
scale_x_continuous(
breaks = x_break,
labels = x_label) +
theme(axis.text.x = element_text(angle = 90)) +
scale_color_brewer("Fiscal Year", palette = "Dark2") +
xlab(NULL)
p3

ggplot2 stacked bar with negative values not working with Plotly

I tried the code in the answer in this previous thread, ggplot2 and a Stacked Bar Chart with Negative Values.
dat <- read.table(text = " Division Year OperatingIncome
1 A 2012 11460
2 B 2012 7431
3 C 2012 -8121
4 D 2012 15719
5 E 2012 364
6 A 2011 12211
7 B 2011 6290
8 C 2011 -2657
9 D 2011 14657
10 E 2011 1257
11 A 2010 12895
12 B 2010 5381
13 C 2010 -2408
14 D 2010 11849
15 E 2010 517",header = TRUE,sep = "",row.names = 1)
dat1 <- subset(dat,OperatingIncome >= 0)
dat2 <- subset(dat,OperatingIncome < 0)
plot <- ggplot() +
geom_bar(data = dat1, aes(x=Year, y=OperatingIncome, fill=Division),stat = "identity") +
geom_bar(data = dat2, aes(x=Year, y=OperatingIncome, fill=Division),stat = "identity") +
scale_fill_brewer(type = "seq", palette = 1)
ggplotly(plot)
Here is what I'm getting:
If I run plot(plot) then it works fine:
How do I fix the issue in Plotly?
For future readers
Nowadays, plotly (I am using 4.8.0) supports stacked barcharts with negative values. In the layout you have to set barmode=relative. Moreover, you can also use the ggplotly functionality posted in the question.
plot_ly(dat, y=~OperatingIncome, x=~Year, type='bar', name=~Division, color =~Division,
colors='Blues', marker=list(line=list(width=1, color='lightgray'))) %>%
layout(barmode = 'relative')
Will return:

How to merge several data frames of equal format for plotting with ggplot in one digramm in R

I have several data frames (i.e t1, t2 and t3) of same format but maybe with different row lengths.
t1
year month avgTemp
2006 1 -0.95
2006 2 1.34
2006 3 3.58
2006 4 9.94
2006 5 14.67
2006 6 18.38
2006 7 23.56
2006 8 16.57
2006 9 18.08
2006 10 13.26
2006 11 8.27
2006 12 4.82
t2
year month avgTemp
2015 1 3.01
2015 2 2.16
2015 3 6.37
2015 4 10.31
2015 5 14.40
2015 6 17.84
2015 7 22.04
2015 8 21.35
2015 9 14.18
2015 10 9.40
2015 11 8.18
2015 12 7.22
and t3
year month avgTemp
2005 7 19.79
2005 8 17.54
2005 9 16.69
2005 10 11.64
2005 11 5.40
2005 12 1.97
Now, when I want to plot those 3 data frames in one diagramm I am doing this:
ggplot() +
geom_line(data=t1, aes(x = t1$month, y = t1$avgTemp, colour = t1$year)) +
geom_line(data=t2, aes(x = t2$month, y = t2$avgTemp, colour = t2$year)) +
geom_line(data=t3, aes(x = t3$month, y = t3$avgTemp, colour = t3$year))
And the output look like this
So far everything is ok, but the plot command is very ugly since I have to put every data frame into a new geom_line.
Is there a more elegant way to achieve this by merging the data frames or so?
Thanks in advance.
You can try something like this:
t <- rbind(t1, t2, t3)
t$year <- as.factor(t$year)
ggplot(t, aes(x = month, y = avgTemp, col = year)) + geom_line()
It should give you the desired plot with three lines for three years.
EDIT: Adding this code option based on the comment below about leaving year as a numeric value:
t <- rbind(t1, t2, t3)
ggplot(t, aes(x = month, y = avgTemp, col = year, group = year)) + geom_line()

Resources