I am making stacked bar plots with ggplot2 in R with specific bar ordering about the y-axis.
# create reproducible data
library(ggplot2)
d <- read.csv(text='Day,Location,Length,Amount
1,4,3,1.1
1,3,1,2
1,2,3,4
1,1,3,5
2,0,0,0
3,3,3,1.8
3,2,1,3.54
3,1,3,1.1',header=T)
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity")
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = rev(Location)), stat = "identity")
The first ggplot plot shows the data in order of Location, with Location=1 nearest the x-axis and data for each increasing value of Location stacked upon the next.
The second ggplot plot shows the data in a different order, but it doesn't stack the data with the highest Location value nearest the x-axis with the data for the next highest Location stacked in the second from the x-axis position for the first bar column, like I would expect it to based on an earlier post.
This next snippet does show the data in the desired way, but I think this is an artifact of the simple and small example data set. Stacking order hasn't been specified, so I think ggplot is stacking based on values for Amount.
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount), stat = "identity")
What I want is to force ggplot to stack the data in order of decreasing Location values (Location=4 nearest the x-axis, Location=3 next, ... , and Location=1 at the very top of the bar column) by calling the order = or some equivalent argument. Any thoughts or suggestions?
It seems like it should be easy because I am only dealing with numbers. It shouldn't be so hard to ask ggplot to stack the data in a way that corresponds to a column of decreasing (as you move away from the x-axis) numbers, should it?
Try:
ggplot(d, aes(x = Day, y = Length)) +
geom_bar(aes(fill = Amount, order = -Location), stat = "identity")
Notice how I swapped rev with -. Using rev does something very different: it stacks by the value for each row you happen to get if you reverse the order of values in the column Location, which could be just about anything.
Related
I am having a hard time plotting percentage instead of count when using facet_grid.
I have the following DF (this is an example, my DF is much longer):
'Gu<-c("1","0","0","0","1","0")
variable<-c("THR","Screw removal","THR","THR","THR","Screw removal")
value<-c("0","1","0","1","0","0")
df2<-data.frame(Gu,variable,value)'
and I am trying to plot the "1" values out of the specific variable (either THR or Screw removal) and split the graph by "Gu" (facet grid).
I manage to code it to plot count, but I can seem to be able to calculate the percentage (I need to calculate the percentage from each variable only and not from all the DF)
This is my code:
ggplot(data = df2, aes(x = variable,y =value ,
fill = variable)) +
geom_bar(stat = "identity")+
facet_grid(~ Gu,labeller=labeller(Gu
=c('0'="Nondisplaced fracture",'1'="Displaced
fracture")))+
scale_fill_discrete(name = "Revision", labels =
c("THR","SCREW"))
and this is what I plotted:
enter image description here
I searched this website and the web and couldn't find an answer...
any help will do!
thanks
This question seems fairly simple, but I wasn't able to find another post that answers it (apologies if I've missed it though).
I have a variable with three factors (a value for each month). The data looks like the below:
id variable value
AZ Feb-20 1085
AZ Mar-20 1
AZ Apr-20 61
CO Feb-20 6
CO Mar-20 192
FO Apr-20 2
I want to stack the data, such that I have a bar for February, and then the marginal increases for March, and April stacked on top.
Right now, the values for each month are stacked on top of each other.
ggplot(df3, aes(x = id, y = value, fill=factor(variable, levels=c("Apr-20","Mar-20", "Feb-20")))) + geom_bar(stat = "identity")
How do I stack the increases from February? Is there a way to modify a stacked bar plot or do I need to try another method?
EDIT
After thinking on this, I believe the best solution is overlapping bars. But higher bars cover smaller bars. Changing the transparency isn't very useful with three factors. Maybe there is a way to reorder so that the smaller bars are in the forefront?
ggplot(df2) +geom_bar(aes(x = id, y = `Feb-20`), position = "identity",stat="identity", fill = 'green') +
geom_bar(aes(x = id, y = `Mar-20`), position = "identity", stat="identity",fill = 'navy') +
geom_bar(aes(x = id, y = `Apr-20`), position = "identity", stat="identity", fill = 'red')
Second Edit
Apologies, this is my fault for being unclear. Prior, I wanted to avoid the cumulative summing that occurs with stacked barplots, and asked that each additional month be added as a marginal increase.
Now, though, as overlapping barplots, it is not necessary for the bars to display change. The problem with the overlapping bar charts is just that the data is obscured for bars with a shorter height.
This plot still plots marginal change. I'm looking for something like plot 2 in this post, but where all of the data is visible.
EDIT 3
Maybe this is a better way to explain:
Take the example of 'WA' in the first plot. For Feb, the data point was 338, for March, the data point was 318, and for April, the data point was 2270. A stacked bar plot adds these on top of each other, cumulatively.
However, the bar that I want for 'WA' should really show 338 for February, then a drop of 20 for March. And finally, an addition of 1952 for April.
This is why I had used the language of a marginal increase/decrease for a stacked barplot. I had also tried an overlapping barplot, but all of the data is not visible, and longer bars cover shorter bars.
We can arrange based on converting the column to yearmon class (from zoo), then convert to factor with levels specified as the unique elements (or do a match and convert)
library(dplyr)
library(zoo)
library(ggplot2)
df1 %>%
arrange(id, as.yearmon(variable, '%b-%y')) %>%
mutate(variable = factor(variable, levels = unique(variable))) %>%
ggplot(aes(x = id, y = value, fill = variable)) +
geom_bar(stat = "identity")
If we wanted to automate the second plot
library(tidyr)
df2 <- df1 %>%
pivot_wider(names_from = variable, values_from = value)
p <- ggplot(df2)
colrs <- c('green', 'navy', 'red')
nm1 <- names(df2)[-1]
for(i in seq_along(nm1)) p <- p +
geom_bar(aes(x = id, y = !! rlang::sym(nm1[i])),
position = 'identity', stat = 'identity', fill = colrs[i])
I have the following data:
I would like to generate a bar plot that shows the frequency of each value of Var1 per each run. I want the x axis represents each run and the y axis represents the frequency of each Var1 value. To do that, I wrote the following R script:
df <- read.csv("/home/nasser/Desktop/data.csv")
g <- ggplot(df) +
geom_bar(aes(Run, Freq, fill = Var1, colour = Var1), position = "stack", stat = "identity")
The result that I got is:
The issue is that the x axis does not show each run seperately (the axis should be 1, 2, .., etc) and the legend should show each value of Var1 seperately and in a different color. Also, the bars are not so clear since it is so difficult to see the frequency of each Var1 values. In other words, the generated plot is not the normal stacked bar like the one shown in this answer
How to solve that?
You need to convert both variables to factors. Otherwise, R sees them as numerical and not categorical data.
df <- read.csv("/home/nasser/Desktop/data.csv")
g <- ggplot(df) +
geom_bar(aes(factor(Run), Freq, fill = factor(Var1), colour = factor(Var1)),
position = "stack", stat = "identity")
I am trying to create a grouped bar plot in ggplot, in which there should be 4 bars per each x value. Here is a subset of my data (actual data is about 4x longer):
Verb_Type,Frame,proportion_type,speaker
mental,V CP,0.209513024,Child
mental,V NP,0.138731597,Child
perception,V CP,0.017167382,Child
perception,V NP,0.387528402,Child
mental,V CP,0.437998087,Parent
mental,V NP,0.144086707,Parent
perception,V CP,0.042695836,Parent
perception,V NP,0.398376853,Parent
What I want is to plot Frame as the x values and proportion_type as the y values, but with the bars based on both Verb_Type and speaker. So for each x value (Frame), there would be 4 bars grouped together - a bar each for the proportion_type value corresponding to mental~child, mental~parent, perception~child, perception~parent. I need for the fill color to be based on Verb_Type, and the fill "texture" (saturation or something) based on speaker. I do not want stacked bars, as it would not accurately represent the data.
I don't want to use facet grids because I find it visually difficult to compare all 4 bars when they're separated into 2 groups. I want to group all the bars together so that the visualization is easier. But I can't figure out how to make the appropriate groupings. Is this something I can do in ggplot, or do I need to manipulate the data before plotting? I tried using melt to reshape the data, but either I was doing it wrong, or that's not what I actually should be doing.
I think you are looking for the interaction() (i.e. get all unique pairings) between df$Verb_Type and df$speaker to get the column groupings you are after. You can pass this directly to ggplot or make a new variable ahead of time:
ggplot(df, aes(x = Frame, y = proportion_type,
group = interaction(Verb_Type, speaker), fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))
Or:
df$grouper <- interaction(df$Verb_Type, df$speaker)
ggplot(df, aes(x = Frame, y = proportion_type,
group = grouper, fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))
Is there a way to overlay partial graph on top of full graph using ggplot? I have one line graph with time span of say 100 days on X axis and need to add second line that only spans last 20 days, with different color; I don't want to plot second line as having zero values for first 80 days - need to only plot it for last 20 days- using different color. What is the best way to do that?
Sure, just use two geoms with different subsets of your data.frame (for simplicity I use the full df and only one subset):
library(ggplot2)
df <- data.frame(Index = 1:1000, Value = cumsum(rnorm(1000)))
ggplot() + geom_line(data = df, aes(x = Index, y = Value)) +
geom_line(data = df[500:700,], aes(x = Index, y = Value), col="red")