An issue when ploting geom bar with multiple varaibels - r

I have the following data:
I would like to generate a bar plot that shows the frequency of each value of Var1 per each run. I want the x axis represents each run and the y axis represents the frequency of each Var1 value. To do that, I wrote the following R script:
df <- read.csv("/home/nasser/Desktop/data.csv")
g <- ggplot(df) +
geom_bar(aes(Run, Freq, fill = Var1, colour = Var1), position = "stack", stat = "identity")
The result that I got is:
The issue is that the x axis does not show each run seperately (the axis should be 1, 2, .., etc) and the legend should show each value of Var1 seperately and in a different color. Also, the bars are not so clear since it is so difficult to see the frequency of each Var1 values. In other words, the generated plot is not the normal stacked bar like the one shown in this answer
How to solve that?

You need to convert both variables to factors. Otherwise, R sees them as numerical and not categorical data.
df <- read.csv("/home/nasser/Desktop/data.csv")
g <- ggplot(df) +
geom_bar(aes(factor(Run), Freq, fill = factor(Var1), colour = factor(Var1)),
position = "stack", stat = "identity")

Related

Overlaying two bars in ggplot2 using position_dodge() without changing alpha value

What I'm wanting to do is have non-additive, 'stacked' bars, which I understand is achieved using position_dodge, rather than "stack". However, this does not behave as I expected it to.
The closest answer to what I'm after is here, but the code on that post causes exactly the same problem I'm running into.
A very basic example:
library(ggplot2)
example <- data.frame(week_num = c(1, 1),
variable = c("x", "y"),
value = c(5, 10))
ex <- ggplot(example, aes(x = week_num, y = value, fill = variable))
ex <- ex +
geom_bar(stat = "identity", position = position_dodge(0))
print(ex)
What you get is essentially a single blue bar, representing the variable 'y' value of 10, with no sign of the 'x' value of 5:
Chart with lower value hidden
So far, the only way around this I've found is to make the width argument of position_dodge, say, 0.1, to get something like this, but that's not ideal.
Chart with lower value visible
Essentially, I want to 'front' the lower of the two values, so in this case what I'd want is a bar of height 10 (representing variable = , but with the lower half (up to 5) filled in a different colour.
One option to fix your issue would be to reorder your data. Observations are plotted in the order as they appear in your dataset. Hence, reorder you dataset so that lower values are at the end and will plotted last. Moreover, you could use position="identity" and geom_col (which is the same as geom_bar(stat="identity")):
library(ggplot2)
example <- dplyr::arrange(example, week_num, desc(value))
ggplot(example, aes(x = week_num, y = value, fill = variable)) +
geom_col(position = "identity")

ggplot: why is the y-scale larger than the actual values for each response?

Likely a dumb question, but I cannot seem to find a solution: I am trying to graph a categorical variable on the x-axis (3 groups) and a continuous variable (% of 0 - 100) on the y-axis. When I do so, I have to clarify that the geom_bar is stat = "identity" or use the geom_col.
However, the values still show up at 4000 on the y-axis, even after following the comments from Y-scale issue in ggplot and from Why is the value of y bar larger than the actual range of y in stacked bar plot?.
Here is how the graph keeps coming out:
I also double checked that the x variable is a factor and the y variable is numeric. Why would this still be coming out at 4000 instead of 100, like a percentage?
EDIT:
The y-values are simply responses from participants. I have a large dataset (N = 600) and the y-value are a percentage from 0-100 given by each participant. So, in each group (N = 200 per group), I have a value for the percentage. I wanted to visually compare the three groups based on the percentages they gave.
This is the code I used to plot the graph.
df$group <- as.factor(df$group)
df$confid<- as.numeric(df$confid)
library(ggplot2)
plot <-ggplot(df, aes(group, confid))+
geom_col()+
ylab("confid %") +
xlab("group")
Are you perhaps trying to plot the mean percentage in each group? Otherwise, it is not clear how a bar plot could easily represent what you are looking for. You could perhaps add error bars to give an idea of the spread of responses.
Suppose your data looks like this:
set.seed(4)
df <- data.frame(group = factor(rep(1:3, each = 200)),
confid = sample(40, 600, TRUE))
Using your plotting code, we get very similar results to yours:
library(ggplot2)
plot <-ggplot(df, aes(group, confid))+
geom_col()+
ylab("confid %") +
xlab("group")
plot
However, if we use stat_summary, we can instead plot the mean and standard error for each group:
ggplot(df, aes(group, confid)) +
stat_summary(geom = "bar", fun = mean, width = 0.6,
fill = "deepskyblue", color = "gray50") +
geom_errorbar(stat = "summary", width = 0.5) +
geom_point(stat = "summary") +
ylab("confid %") +
xlab("group")

Draw line between points with groups in ggplot

I have a time-series, with each point having a time, a value and a group he's part of. I am trying to plot it with time on x axis and value on y axes with the line appearing a different color depending on the group.
I tried using geom_path and geom_line, but they end up linking points to points within groups. I found out that when I use a continuous variable for the groups, I have a normal line; however when I use a factor or a categorical variable, I have the link problem.
Here is a reproducible example that is what I would like:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c(1,2,2,2,1,1,2,2,2,2))
ggplot(df, aes(time, value, color = group)) + geom_line()
And here is a reproducible example that is what I have:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c("apple","pear","pear","pear","apple","apple","pear","pear","pear","pear"))
ggplot(df, aes(time, value, color = group)) + geom_line()
So the first example works well, but 1/ it adds a few lines to change the legend to have the labels I want, 2/ out of curiosity I would like to know if I missed something.
Is there any option in ggplot I could use to have the behavior I expect, or is it an internal constraint?
As pointed by Richard Telford and Carles Sans Fuentes, adding group = 1 within the ggplot aesthetic makes the job. So the normal code should be:
ggplot(df, aes(time, value, color = group, group = 1)) + geom_line()

Grouped bar plot in ggplot with y values based on combination of 2 categorical variables?

I am trying to create a grouped bar plot in ggplot, in which there should be 4 bars per each x value. Here is a subset of my data (actual data is about 4x longer):
Verb_Type,Frame,proportion_type,speaker
mental,V CP,0.209513024,Child
mental,V NP,0.138731597,Child
perception,V CP,0.017167382,Child
perception,V NP,0.387528402,Child
mental,V CP,0.437998087,Parent
mental,V NP,0.144086707,Parent
perception,V CP,0.042695836,Parent
perception,V NP,0.398376853,Parent
What I want is to plot Frame as the x values and proportion_type as the y values, but with the bars based on both Verb_Type and speaker. So for each x value (Frame), there would be 4 bars grouped together - a bar each for the proportion_type value corresponding to mental~child, mental~parent, perception~child, perception~parent. I need for the fill color to be based on Verb_Type, and the fill "texture" (saturation or something) based on speaker. I do not want stacked bars, as it would not accurately represent the data.
I don't want to use facet grids because I find it visually difficult to compare all 4 bars when they're separated into 2 groups. I want to group all the bars together so that the visualization is easier. But I can't figure out how to make the appropriate groupings. Is this something I can do in ggplot, or do I need to manipulate the data before plotting? I tried using melt to reshape the data, but either I was doing it wrong, or that's not what I actually should be doing.
I think you are looking for the interaction() (i.e. get all unique pairings) between df$Verb_Type and df$speaker to get the column groupings you are after. You can pass this directly to ggplot or make a new variable ahead of time:
ggplot(df, aes(x = Frame, y = proportion_type,
group = interaction(Verb_Type, speaker), fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))
Or:
df$grouper <- interaction(df$Verb_Type, df$speaker)
ggplot(df, aes(x = Frame, y = proportion_type,
group = grouper, fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))

ggplot reorder stacked bar plot based on values in data frame

I am making stacked bar plots with ggplot2 in R with specific bar ordering about the y-axis.
# create reproducible data
library(ggplot2)
d <- read.csv(text='Day,Location,Length,Amount
1,4,3,1.1
1,3,1,2
1,2,3,4
1,1,3,5
2,0,0,0
3,3,3,1.8
3,2,1,3.54
3,1,3,1.1',header=T)
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity")
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = rev(Location)), stat = "identity")
The first ggplot plot shows the data in order of Location, with Location=1 nearest the x-axis and data for each increasing value of Location stacked upon the next.
The second ggplot plot shows the data in a different order, but it doesn't stack the data with the highest Location value nearest the x-axis with the data for the next highest Location stacked in the second from the x-axis position for the first bar column, like I would expect it to based on an earlier post.
This next snippet does show the data in the desired way, but I think this is an artifact of the simple and small example data set. Stacking order hasn't been specified, so I think ggplot is stacking based on values for Amount.
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount), stat = "identity")
What I want is to force ggplot to stack the data in order of decreasing Location values (Location=4 nearest the x-axis, Location=3 next, ... , and Location=1 at the very top of the bar column) by calling the order = or some equivalent argument. Any thoughts or suggestions?
It seems like it should be easy because I am only dealing with numbers. It shouldn't be so hard to ask ggplot to stack the data in a way that corresponds to a column of decreasing (as you move away from the x-axis) numbers, should it?
Try:
ggplot(d, aes(x = Day, y = Length)) +
geom_bar(aes(fill = Amount, order = -Location), stat = "identity")
Notice how I swapped rev with -. Using rev does something very different: it stacks by the value for each row you happen to get if you reverse the order of values in the column Location, which could be just about anything.

Resources