ggplot2 stacked bar with negative values not working with Plotly - r

I tried the code in the answer in this previous thread, ggplot2 and a Stacked Bar Chart with Negative Values.
dat <- read.table(text = " Division Year OperatingIncome
1 A 2012 11460
2 B 2012 7431
3 C 2012 -8121
4 D 2012 15719
5 E 2012 364
6 A 2011 12211
7 B 2011 6290
8 C 2011 -2657
9 D 2011 14657
10 E 2011 1257
11 A 2010 12895
12 B 2010 5381
13 C 2010 -2408
14 D 2010 11849
15 E 2010 517",header = TRUE,sep = "",row.names = 1)
dat1 <- subset(dat,OperatingIncome >= 0)
dat2 <- subset(dat,OperatingIncome < 0)
plot <- ggplot() +
geom_bar(data = dat1, aes(x=Year, y=OperatingIncome, fill=Division),stat = "identity") +
geom_bar(data = dat2, aes(x=Year, y=OperatingIncome, fill=Division),stat = "identity") +
scale_fill_brewer(type = "seq", palette = 1)
ggplotly(plot)
Here is what I'm getting:
If I run plot(plot) then it works fine:
How do I fix the issue in Plotly?

For future readers
Nowadays, plotly (I am using 4.8.0) supports stacked barcharts with negative values. In the layout you have to set barmode=relative. Moreover, you can also use the ggplotly functionality posted in the question.
plot_ly(dat, y=~OperatingIncome, x=~Year, type='bar', name=~Division, color =~Division,
colors='Blues', marker=list(line=list(width=1, color='lightgray'))) %>%
layout(barmode = 'relative')
Will return:

Related

How to get same legend categories on multiple stacked bar graphs when data categories are not identical ggplot2

This is my first time posting on here so please go easy on me. I've been googling this problem for days and have not been able to find a solution so sorry if this has been answered elsewhere.
I am making several stacked bar graphs in ggplot and want the legend categories to be identical on all the graphs (i.e. each category has the same color on each graph) without having to manually set all the colors. The issue is that the categories are not identical between graphs so simply specifying a palette results in the categories being different colors.
I can't use the actual data I'm working with so I've created a similar data frame that mimics the problem.
Here is the example df:
Year Trial Concentration Chemical
2013 1 0.8 Benzene
2013 1 1.5 Toluene
2013 1 0.8 Hexane
2013 2 1.5 Toluene
2013 2 0.8 Carboxylic Acid
2013 2 1.5 Acetone
2013 3 0.8 Ethanol
2013 3 1.9 Carboxylic Acid
2013 3 3.1 Acetone
2014 1 1.8 Benzene
2014 1 2.5 Toluene
2014 1 0.6 Methanol
2014 2 1.3 Toluene
2014 2 1.8 Carboxylic Acid
2014 2 2.5 Butane
2014 3 1.5 Ethanol
2014 3 1.2 Carboxylic Acid
2014 3 3.5 Acetone
... ... ... ...
Here is the code for the graphs:
list <- split(df, df$Year)
plot_list <- list()
for (i in 1:5) {
df <- list[[i]]
p <- ggplot(df, aes(x = Trial, y = Concentration, width=0.8)) +
geom_bar(stat = "identity", aes(fill = Chemical))
plot_list = p
}
And here are the resulting graphs:
So for example, on the 2013 graph the brown-yellow = benzene and on the 2014 graph brown-yellow = butane. What I would like is for the legend to be identical on both graphs (i.e. the 2014 graph will show benzene in the legend, even though it was not measured in that year) and for each chemical to be the same color on each graph. Like this:
I know how to do this by hand with scale_file_manual, however I have about 30 chemicals so I would prefer not to set them manually. Let me know if you have questions or need any additional information. Thanks in advance for any help!
I would set up a table ahead of time linking the colors and the chemical names
library(data.table)
library(tidyverse)
library(RColorBrewer)
df <-
fread("
Year Trial Concentration Chemical
2013 1 0.8 Benzene
2013 1 1.5 Toluene
2013 1 0.8 Hexane
2013 2 1.5 Toluene
2013 2 0.8 Carboxylic_Acid
2013 2 1.5 Acetone
2013 3 0.8 Ethanol
2013 3 1.9 Carboxylic_Acid
2013 3 3.1 Acetone
2014 1 1.8 Benzene
2014 1 2.5 Toluene
2014 1 0.6 Methanol
2014 2 1.3 Toluene
2014 2 1.8 Carboxylic_Acid
2014 2 2.5 Butane
2014 3 1.5 Ethanol
2014 3 1.2 Carboxylic_Acid
2014 3 3.5 Acetone
")
chem_colors <-
tibble(Chemical = factor(unique(df$Chemical))) %>%
mutate(color = brewer.pal(n = n(), name = "RdBu")[as.integer(Chemical)])
# you can use your loop here instead
plot_trials <- function(year) {
ggplot(filter(df, Year == year), aes(x = Trial, y = Concentration, width=0.8)) +
geom_bar(stat = "identity", aes(fill = Chemical)) +
scale_fill_manual(values = chem_colors$color, labels = chem_colors$Chemical)
}
gridExtra::grid.arrange(
plot_trials(2013),
plot_trials(2014),
nrow = 1
)
Here is the answer I got to work for my large data set. I used yake84's answer above and added the colorRampPalette() function to be able to extract more colors from a palette. I also changed chem_colors into a named vector because as a tibble the colors were not being mapped to the chemicals in my dataframe.
getPalette = colorRampPalette(brewer.pal(9, "Set1") #create a palette with more than 9 colors
chem_colors <-
tibble(Chemical = factor(unique(df$Chemical))) %>%
mutate(color = getPalette(30))
chem_colors <- setNames(chem_colors$color, as.character(chem_colors$Chemical) #create named vector
plot_trials <- function(year) {
ggplot(filter(df, Year == year), aes(x = Trial, y = Concentration, width=0.8)) +
geom_bar(stat = "identity", aes(fill = Chemical)) +
scale_fill_manual(values = chem_colors)
}

Annotate group on stacked bar graph

How could I add "Division" label on top of the bars themselves in this example of a stacked bar chart?
ggplot2 and a Stacked Bar Chart with Negative Values
I only want to show it for values with space (don't want to overcrowd the figure), so maybe this could be implemented by a minimum bar height. How could I do it for only bars with that minimum height?
Thanks!
You can use geom_text() which comes with a check_overlap parameter -- see ?geom_text():
dat <- read.table(text = " Division Year OperatingIncome
1 A 2012 11460
2 B 2012 7431
3 C 2012 -8121
4 D 2012 15719
5 E 2012 364
6 A 2011 12211
7 B 2011 6290
8 C 2011 -2657
9 D 2011 14657
10 E 2011 1257
11 A 2010 12895
12 B 2010 5381
13 C 2010 -2408
14 D 2010 11849
15 E 2010 517",header = TRUE,sep = "",row.names = 1)
ggplot(dat, aes(x = Year, y = OperatingIncome, fill = Division)) +
geom_col() +
geom_text(aes(label = Division),
position = position_stack(vjust = 0.5),
check_overlap = TRUE)
In the example, however, you will see that the labels do not overlap.

Ordering a 2 bar plot in R

I have a data set as below and I have created a graph with below code as suggested in a previous question. What I want to do is order the bars by rankings rather than team names. Is that possible to do in ggplot?
Team Names PLRankingsReverse Grreserve
Liverpool 20 20
Chelsea 19 19
Manchester City 15 18
Arsenal 16 17
Tottenham 18 16
Manchester United 8 15
Everton 10 14
Watford 13 13
Burnley 17 12
Southampton 9 11
WBA 11 10
Stoke 4 9
Bournemouth 12 8
Leicester 7 7
Middlesbrough 14 6
C. Palace 6 5
West Ham 1 4
Hull 3 3
Swansea 5 2
Sunderland 2 1
And here is the code:
alldata <- read.csv("premierleague.csv")
library(ggplot2)
library(reshape2)
alldata <- melt(alldata)
ggplot(alldata, aes(x = Team.Names, y= value, fill = variable), xlab="Team Names") +
geom_bar(stat="identity", width=.5, position = "dodge")
Thanks for the help!
In this case you need to sort your data frame prior to melting and capture the order. You can then use this to set the limit order on scale_x_discrete, or you can factor Team Name in your aes string.
Using factor:
ordr <- order(alldata$`Team Names`, alldata$PLRankingsReverse, decreasing = TRUE)
alldata <- melt(alldata)
ggplot(alldata, aes(x = factor(`Team Name`, ordr), y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width = .5, position = "dodge")
Using scale_x_discrete:
ordr <- alldata$`Team Name`[order(alldata$PLRankingsReverse, decreasing = TRUE)]
alldata <- melt(alldata)
ggplot(alldata, aes(x = `Team Name`, y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width =. 5, position = "dodge") +
scale_x_discrete(limits = ordr)

Grouped barplot in ggplot2 in R

I would like to make a grouped bar plot. An example of my data is as follows:
site code year month gear total value
678490 2012 3 GL 13882
678490 2012 4 GL 50942
678490 2012 5 GL 54973
678490 2012 6 GL 63938
678490 2012 7 GL 23825
678490 2012 8 GL 8195
678490 2012 9 GL 14859
678490 2012 9 RT 3225
678490 2012 10 GL 981
678490 2012 10 RT 19074
678490 2012 11 SD 106384
678490 2012 11 RT 2828
678490 2012 12 GL 107167
678490 2012 12 RT 4514
There are 17 site code options, four year options, twelve month options, and four gear options.
What I would to produce is a plot per site, per year, showing the 'total value' for each gear, for each month, as a bar.
So far I have managed to produce a plot, specific to site and year, but with the total values displayed in one bar per month, not separated into separate bars per month (can not include image in first post!)
But for months 9, 10, 11 and 12 there were two gears used so I want there to be two bars for these months.
I am using the following piece of code:
ggplot(subset(cdata, year %in% c("2012") & site code %in% c("678490")),
aes(x = factor(month), y = total value)) +
geom_bar(stat = "identity") +
labs(x = "Month", y = "Total value")
Any help on this would be greatly appreciated.
If you want separate bars for each gear, then you should add fill=gear to the aes in geom_bar:
ggplot(cdata[cdata$year==2012 & cdata$sitecode==678490,],
aes(x = factor(month), y = totalvalue, fill=gear)) +
geom_bar(stat = "identity", position="dodge") +
labs(x = "Month", y = "Total value")
this gives:
When you want to make a plot per site, per year, showing the 'total value' for each gear, for each month, as a bar, you can use facet_grid. For example:
ggplot(cdata, aes(x = factor(month), y = totalvalue, fill=gear)) +
geom_bar(stat = "identity", position="dodge") +
labs(x = "Month", y = "Total value") +
facet_grid(sitecode ~ year)
this gives:
Some additional comments:
It's probably better not to use spaces in your column names (in the code above I removed the spaces)
Add an example to your question which illustrative for the problem you are facing. In this case, it's better to give an example dataset that includes several sitecodes and several years.
I therefore made up some data:
df1 <- read.table(text="sitecode year month gear totalvalue
678490 2012 3 GL 13882
678490 2012 4 GL 50942
678490 2012 5 GL 54973
678490 2012 6 GL 63938
678490 2012 7 GL 23825
678490 2012 8 GL 8195
678490 2012 9 GL 14859
678490 2012 9 RT 3225
678490 2012 10 GL 981
678490 2012 10 RT 19074
678490 2012 11 SD 106384
678490 2012 11 RT 2828
678490 2012 12 GL 107167
678490 2012 12 RT 4514", header= TRUE)
df2 <- df1
df2$sitecode <- 7849
df2$year <- 2013
df3 <- df1
df3$sitecode <- 7849
df4 <- df1
df4$year <- 2013
cdata <- rbind(df1,df2,df3,df4)

Time Series in R with ggplot2

I'm a ggplot2 newbie and have a rather simple question regarding time-series plots.
I have a data set in which the data is structured as follows.
Area 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
MIDWEST 10 6 13 14 12 8 10 10 6 9
How do I generate a time series when the data is structured in this format.
With the reshape package, I could just alter the data to look like:
totmidc <- melt(totmidb, id="Area")
totmidc
Area variable value
1 MIDWEST 1998 10
2 MIDWEST 1999 6
3 MIDWEST 2000 13
4 MIDWEST 2001 14
5 MIDWEST 2002 12
6 MIDWEST 2003 8
7 MIDWEST 2004 10
8 MIDWEST 2005 10
9 MIDWEST 2006 6
10 MIDWEST 2007 9
Then run the following code to get the desired plot.
ggplot(totmidc, aes(Variable, Value)) + geom_line() + xlab("") + ylab("")
However, is it possible to generate a time series plot from the first
object in which the columns represent the years.
What is the error that ggplot2 gives you? The following seems to work on my machine:
Area <- as.numeric(unlist(strsplit("1998 1999 2000 2001 2002 2003 2004 2005 2006 2007", "\\s+")))
MIDWEST <-as.numeric(unlist(strsplit("10 6 13 14 12 8 10 10 6 9", "\\s+")))
qplot(Area, MIDWEST, geom = "line") + xlab("") + ylab("")
#Or in a dataframe
df <- data.frame(Area, MIDWEST)
qplot(Area, MIDWEST, data = df, geom = "line") + xlab("") + ylab("")
You may also want to check out the ggplot2 website for details on scale_date et al.
I am guessing that with "time series plot" you mean you want to get a bar chart instead of a line chart?
In that case, you have to modify your code only slightly to pass the correct parameters to geom_bar(). The geom_bar default stat is stat_bin, which will calculate a frequency count of your categories on the x-scale. With your data you want to override this behaviour and use stat_identity.
library(ggplot2)
# Recreate data
totmidc <- data.frame(
Area = rep("MIDWEST", 10),
variable = 1998:2007,
value = round(runif(10)*10+1)
)
# Line plot
ggplot(totmidc, aes(variable, value)) + geom_line() + xlab("") + ylab("")
# Bar plot
# Note that the parameter stat="identity" passed to geom_bar()
ggplot(totmidc, aes(x=variable, y=value)) + geom_bar(stat="identity") + xlab("") + ylab("")
This produces the following bar plot:

Resources