GGPLOT BAR CHART not sorting properly with reorder

GGPLOT BAR CHART not sorting properly with reorder - r

I am working on a dataset that requires a bar chart and I know the correct way to display this is in a decreasing fashion (i.e.; the largest values on the left and the smallest values on the right). So I used the following code to produce the bar chart:
ggplot(df, aes(x = reorder(province,-gross_profit), y = gross_profit)) +
geom_bar(stat = "identity", fun.y = sum) +
labs(title="Profit by Province")
It does produce a bar chart and in fact changes the order as compared to:
ggplot(df, aes(x = province, y = gross_profit)) +
geom_bar(stat = "identity", fun.y = sum) +
labs(title="Profit by Province")
The reorder code gives me:
The code without reorder gives me:
For clarity:
My Provinces column is a FACTOR type with 10 levels and gross_profit is a NUM type. I have a feeling that you pros will quickly see something I'm obviously missing.

Related

I want to add a total at the top of my bar chart using ggplot2 in R. When I try, I'm getting the value of each row instead of the total

I'm fairly new to R and have created a bar chart using
ggplot(df_mountainorframemodelsales, aes(x = FrameOrMountain, y = salestotal, fill= FrameOrMountain)) +
geom_bar(stat = "identity")+
ggtitle("Frame and Mountain Bikes Total Sales")
Which gives me a standard chart with two columns. I want to add a total at the top of each column, however when I try it is giving me a big cluster of numbers at the bottom of the chart. I think it is each individual total rather than the sum of them all together. How do I get the total for all mountains and all frames?
This is the code I've tried
ggplot(df_mountainorframemodelsales, aes(x = FrameOrMountain, y = salestotal, fill= FrameOrMountain)) +
geom_bar(stat = "identity") +
geom_text(aes(label=salestotal), vjust=0) +
ggtitle("Frame and Mountain Bikes Total Sales")
I've tried searching for answers but all the other problems are with stacked bar charts. Can anyone help please?

You can change the data= used by specific geoms, such as
ggplot(mtcars, aes(x = cyl, y = disp, fill = cyl)) +
geom_bar(stat = "identity") +
geom_text(aes(label = disp), vjust = 0,
data = ~ aggregate(disp ~ cyl, data = ., FUN = sum)) +
ggtitle("Total Displacement by Number of Cylinders")
This highlights that you may want to expand= the y-axis to allow room for the top labels. Alternatively, you can change vjust=1.2 or some number over 1 so that it is enclosed within the bar, though this will have problems when bars have extremely low totals (so I think expand= with vjust=0 is safer).
(I'm not saying this is an awesome plot: cyl being shown as a discrete vice continuous variable would make a lot of sense, and perhaps other aspects that would make this comparison better. My point of not doing that was to stay as close to your original code as possible.)

ggplot fill property changes scale

I have a simple dataframe and using ggplot to create a bar graph using the code:
ggplot(data=data_cases,aes(x = k,y = val)) +
stat_summary(fun.y=sum, geom = "bar") +
scale_x_discrete(name="Type",
labels=c('A&R','A&E','C&E'))
This code generates the desired result. However when i add a fill property to color the portions of the graph, it changes the y scale. In the image below, the picture on the left has the correct scale, the one on the right is what is produced if the fill property is set (ggplot(data=data_cases,aes(x = k,y = val, fill=state)))
Data:
"k","state","val"
"A&C","SA ",3
"C&E","SA ",2
"A&C","NSW",29
"A&E","NSW",10
"C&E","NSW",11
"C&E","NT ",1
"A&C","WA ",3
"A&E","WA ",1
"C&E","WA ",4
"A&C","VIC",24
"A&E","VIC",1
"C&E","VIC",15
"A&C","QLD",7
"A&E","QLD",2
"C&E","QLD",17

It is because this second chart is showing the number of cases per state, e.g. almost 30 for NSW with type A&R. Each bar is starting from 0.
If you want to be like the original then all the bars should be stacked on top of each other: use position='stack'
ggplot(data=data_cases,aes(x = k,y = val)) +
stat_summary(fun.y=sum, geom = "bar", position="stack") + # <---
scale_x_discrete(name="Type",
labels=c('A&R','A&E','C&E'))
ggplot has a bunch of positions like this. ?position_dodge, ?position_fill, ?position_stack, ?position_identity, ...

can also use geom_col
ggplot(df, aes(k, val, fill = state)) +
geom_col()

Proportional barplot in R

I am trying to create a proportional barplot in R, so far I have managed to do something like this:
library(ggplot2)
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..prop..,fill=color))
This obviously does not work, but neither does this:
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..prop..,fill=color,group=1))
or this:
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..count../sum(..count..),fill=color))
This works:
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..count../sum(..count..),fill=color),position="fill")
But I would like bars to be side by side within a category.
What I want to do is to get proportional barplot without transforming my data before

I think you need to aggregate first and then use position="dodge":
diamonds2 <- aggregate(carat ~ cut + color, diamonds, length)
ggplot(data = transform(diamonds2, p = ave(carat, cut, FUN = function(x) x/sum(x))),
aes(x = cut, y = p, fill=color))+
geom_bar(stat = "identity", position = "dodge")
The resulting plot:

EDIT after OP's comment
If you want conditional and side-by-side histograms use geom_bar(stat = "identity", position = "dodge") when you call your conditional histogram plot with ggplot2 (I display the first 100 rows of data for sake of clarity)
library(ggplot2)
ggplot(data = diamonds[1:100, ], aes(cut, carat, fill = color)) + geom_bar(stat = "identity", position = "dodge")

Grouped bar plot in ggplot with y values based on combination of 2 categorical variables?

I am trying to create a grouped bar plot in ggplot, in which there should be 4 bars per each x value. Here is a subset of my data (actual data is about 4x longer):
Verb_Type,Frame,proportion_type,speaker
mental,V CP,0.209513024,Child
mental,V NP,0.138731597,Child
perception,V CP,0.017167382,Child
perception,V NP,0.387528402,Child
mental,V CP,0.437998087,Parent
mental,V NP,0.144086707,Parent
perception,V CP,0.042695836,Parent
perception,V NP,0.398376853,Parent
What I want is to plot Frame as the x values and proportion_type as the y values, but with the bars based on both Verb_Type and speaker. So for each x value (Frame), there would be 4 bars grouped together - a bar each for the proportion_type value corresponding to mental~child, mental~parent, perception~child, perception~parent. I need for the fill color to be based on Verb_Type, and the fill "texture" (saturation or something) based on speaker. I do not want stacked bars, as it would not accurately represent the data.
I don't want to use facet grids because I find it visually difficult to compare all 4 bars when they're separated into 2 groups. I want to group all the bars together so that the visualization is easier. But I can't figure out how to make the appropriate groupings. Is this something I can do in ggplot, or do I need to manipulate the data before plotting? I tried using melt to reshape the data, but either I was doing it wrong, or that's not what I actually should be doing.

I think you are looking for the interaction() (i.e. get all unique pairings) between df$Verb_Type and df$speaker to get the column groupings you are after. You can pass this directly to ggplot or make a new variable ahead of time:
ggplot(df, aes(x = Frame, y = proportion_type,
group = interaction(Verb_Type, speaker), fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))
Or:
df$grouper <- interaction(df$Verb_Type, df$speaker)
ggplot(df, aes(x = Frame, y = proportion_type,
group = grouper, fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))

Overlay raw data onto geom_bar

I have a data-frame arranged as follows:
condition,treatment,value
A , one , 2
A , one , 1
A , two , 4
A , two , 2
...
D , two , 3
I have used ggplot2 to make a grouped bar plot that looks like this:
The bars are grouped by "condition" and the colours indicate "treatment." The bar heights are the mean of the values for each condition/treatment pair. I achieved this by creating a new data frame containing the mean and standard error (for the error bars) for all the points that will make up each group.
What I would like to do is superimpose the raw jittered data to produce a bar-chart version of this box plot: http://docs.ggplot2.org/0.9.3.1/geom_boxplot-6.png [I realise that a box plot would probably be better, but my hands are tied because the client is pathologically attached to bar charts]
I have tried adding a geom_point object to my plot and feeding it the raw data (rather than the aggregated means which were used to make the bars). This sort of works, but it plots the raw values at the wrong x axis locations. They appear at the points at which the red and grey bars join, rather than at the centres of the appropriate bar. So my plot looks like this:
I can not figure out how to shift the points by a fixed amount and then jitter them in order to get them centered over the correct bar. Anyone know? Is there, perhaps, a better way of achieving what I'm trying to do?
What follows is a minimal example that shows the problem I have:
#Make some fake data
ex=data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
#Calculate the mean and SD of each condition/treatment pair
agg=aggregate(value~cond*treat, data=ex, FUN="mean") #mean
agg$sd=aggregate(value~cond*treat, data=ex, FUN="sd")$value #add the SD
dodge <- position_dodge(width=0.9)
limits <- aes(ymax=value+sd, ymin=value-sd) #Set up the error bars
p <- ggplot(agg, aes(fill=treat, y=value, x=cond))
#Plot, attempting to overlay the raw data
print(
p + geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25) +
geom_point(data= ex[ex$treat=='one',], colour="green", size=3) +
geom_point(data= ex[ex$treat=='two',], colour="pink", size=3)
)

I found it is unnecessary to create separate dataframes. The plot can be created by providing ggplot with the raw data.
ex <- data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position = position_dodge(width = 1))

You need just one call to geom_point() where you use data frame ex and set x values to cond, y values to value and color=treat (inside aes()). Then add position=dodge to ensure that points are dodgeg. With scale_color_manual() and argument values= you can set colors you need.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(cond,value,color=treat),position=dodge)+
scale_color_manual(values=c("green","pink"))
UPDATE - jittering of points
You can't directly use positions dodge and jitter together. But there are some workarounds. If you save whole plot as object then with ggplot_build() you can see x positions for bars - in this case they are 0.775, 1.225, 1.775... Those positions correspond to combinations of factors cond and treat. As in data frame ex there are 4 values for each combination, then add new column that contains those x positions repeated 4 times.
ex$xcord<-rep(c(0.775,1.225,1.775,2.225,2.775,3.225,3.775,4.225),each=4)
Now in geom_point() use this new column as x values and set position to jitter.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(xcord,value,color=treat),position=position_jitter(width =.15))+
scale_color_manual(values=c("green","pink"))

As illustrated by holmrenser above, referencing a single dataframe and updating the stat instruction to "summary" in the geom_bar function is more efficient than creating additional dataframes and retaining the stat instruction as "identity" in the code.
To both jitter and dodge the data points with the bar charts per the OP's original question, this can also be accomplished by updating the position instruction in the code with position_jitterdodge. This positioning scheme allows widths for jitter and dodge terms to be customized independently, as follows:
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position =
position_jitterdodge(jitter.width = 0.5, jitter.height=0.4,
dodge.width=0.9))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

GGPLOT BAR CHART not sorting properly with reorder - r

Related

I want to add a total at the top of my bar chart using ggplot2 in R. When I try, I'm getting the value of each row instead of the total

ggplot fill property changes scale

Proportional barplot in R

Grouped bar plot in ggplot with y values based on combination of 2 categorical variables?

Overlay raw data onto geom_bar

Categories

Resources