how to plot percentage instead of count, in facet_grid graph? - grid

I am having a hard time plotting percentage instead of count when using facet_grid.
I have the following DF (this is an example, my DF is much longer):
'Gu<-c("1","0","0","0","1","0")
variable<-c("THR","Screw removal","THR","THR","THR","Screw removal")
value<-c("0","1","0","1","0","0")
df2<-data.frame(Gu,variable,value)'
and I am trying to plot the "1" values out of the specific variable (either THR or Screw removal) and split the graph by "Gu" (facet grid).
I manage to code it to plot count, but I can seem to be able to calculate the percentage (I need to calculate the percentage from each variable only and not from all the DF)
This is my code:
ggplot(data = df2, aes(x = variable,y =value ,
fill = variable)) +
geom_bar(stat = "identity")+
facet_grid(~ Gu,labeller=labeller(Gu
=c('0'="Nondisplaced fracture",'1'="Displaced
fracture")))+
scale_fill_discrete(name = "Revision", labels =
c("THR","SCREW"))
and this is what I plotted:
enter image description here
I searched this website and the web and couldn't find an answer...
any help will do!
thanks

Related

How to change the margins to be different for each graph when plotting multiple graphs in ggplot2?

I am trying to plot one variable against all other variables in a data set and view each graph at the same time. The code I am using to do this is:
theme_set(
theme_bw() +
theme(legend.position = "top")
)
healthTrain.gathered <- healthTrain %>%
as_tibble() %>%
gather(key = "variable", value = "value",
-CHD, -Population2010)
ggplot(healthTrain.gathered, aes(x = value, y = CHD)) +
geom_point() +
facet_wrap(~variable)
This code works great, except not all of the variables have the same range of x values, but each graph uses the margins of the variable with the largest range of x values. Is there way to make each graph use the margins that are best fit for itself?
Example of what I am looking for:
plot(heatlh$CHD, health$BPHIGH)
plot(health$CHD, health$COPD)
plot(health$CHD, health$STROKE)
Except I want to be able to see all of the graphs at the same time.

ggplot: why is the y-scale larger than the actual values for each response?

Likely a dumb question, but I cannot seem to find a solution: I am trying to graph a categorical variable on the x-axis (3 groups) and a continuous variable (% of 0 - 100) on the y-axis. When I do so, I have to clarify that the geom_bar is stat = "identity" or use the geom_col.
However, the values still show up at 4000 on the y-axis, even after following the comments from Y-scale issue in ggplot and from Why is the value of y bar larger than the actual range of y in stacked bar plot?.
Here is how the graph keeps coming out:
I also double checked that the x variable is a factor and the y variable is numeric. Why would this still be coming out at 4000 instead of 100, like a percentage?
EDIT:
The y-values are simply responses from participants. I have a large dataset (N = 600) and the y-value are a percentage from 0-100 given by each participant. So, in each group (N = 200 per group), I have a value for the percentage. I wanted to visually compare the three groups based on the percentages they gave.
This is the code I used to plot the graph.
df$group <- as.factor(df$group)
df$confid<- as.numeric(df$confid)
library(ggplot2)
plot <-ggplot(df, aes(group, confid))+
geom_col()+
ylab("confid %") +
xlab("group")
Are you perhaps trying to plot the mean percentage in each group? Otherwise, it is not clear how a bar plot could easily represent what you are looking for. You could perhaps add error bars to give an idea of the spread of responses.
Suppose your data looks like this:
set.seed(4)
df <- data.frame(group = factor(rep(1:3, each = 200)),
confid = sample(40, 600, TRUE))
Using your plotting code, we get very similar results to yours:
library(ggplot2)
plot <-ggplot(df, aes(group, confid))+
geom_col()+
ylab("confid %") +
xlab("group")
plot
However, if we use stat_summary, we can instead plot the mean and standard error for each group:
ggplot(df, aes(group, confid)) +
stat_summary(geom = "bar", fun = mean, width = 0.6,
fill = "deepskyblue", color = "gray50") +
geom_errorbar(stat = "summary", width = 0.5) +
geom_point(stat = "summary") +
ylab("confid %") +
xlab("group")

Draw line between points with groups in ggplot

I have a time-series, with each point having a time, a value and a group he's part of. I am trying to plot it with time on x axis and value on y axes with the line appearing a different color depending on the group.
I tried using geom_path and geom_line, but they end up linking points to points within groups. I found out that when I use a continuous variable for the groups, I have a normal line; however when I use a factor or a categorical variable, I have the link problem.
Here is a reproducible example that is what I would like:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c(1,2,2,2,1,1,2,2,2,2))
ggplot(df, aes(time, value, color = group)) + geom_line()
And here is a reproducible example that is what I have:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c("apple","pear","pear","pear","apple","apple","pear","pear","pear","pear"))
ggplot(df, aes(time, value, color = group)) + geom_line()
So the first example works well, but 1/ it adds a few lines to change the legend to have the labels I want, 2/ out of curiosity I would like to know if I missed something.
Is there any option in ggplot I could use to have the behavior I expect, or is it an internal constraint?
As pointed by Richard Telford and Carles Sans Fuentes, adding group = 1 within the ggplot aesthetic makes the job. So the normal code should be:
ggplot(df, aes(time, value, color = group, group = 1)) + geom_line()

ggplot reorder stacked bar plot based on values in data frame

I am making stacked bar plots with ggplot2 in R with specific bar ordering about the y-axis.
# create reproducible data
library(ggplot2)
d <- read.csv(text='Day,Location,Length,Amount
1,4,3,1.1
1,3,1,2
1,2,3,4
1,1,3,5
2,0,0,0
3,3,3,1.8
3,2,1,3.54
3,1,3,1.1',header=T)
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity")
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = rev(Location)), stat = "identity")
The first ggplot plot shows the data in order of Location, with Location=1 nearest the x-axis and data for each increasing value of Location stacked upon the next.
The second ggplot plot shows the data in a different order, but it doesn't stack the data with the highest Location value nearest the x-axis with the data for the next highest Location stacked in the second from the x-axis position for the first bar column, like I would expect it to based on an earlier post.
This next snippet does show the data in the desired way, but I think this is an artifact of the simple and small example data set. Stacking order hasn't been specified, so I think ggplot is stacking based on values for Amount.
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount), stat = "identity")
What I want is to force ggplot to stack the data in order of decreasing Location values (Location=4 nearest the x-axis, Location=3 next, ... , and Location=1 at the very top of the bar column) by calling the order = or some equivalent argument. Any thoughts or suggestions?
It seems like it should be easy because I am only dealing with numbers. It shouldn't be so hard to ask ggplot to stack the data in a way that corresponds to a column of decreasing (as you move away from the x-axis) numbers, should it?
Try:
ggplot(d, aes(x = Day, y = Length)) +
geom_bar(aes(fill = Amount, order = -Location), stat = "identity")
Notice how I swapped rev with -. Using rev does something very different: it stacks by the value for each row you happen to get if you reverse the order of values in the column Location, which could be just about anything.

ggplot2 geom_bar fill aesthetic

I have following code to graph a contracts in different countries.
Country <- CCOM$Principal.Place.of.Performance.Country.Name
Val <- CCOM$Action_Absolute_Value
split <- CCOM$Contract.Category
ggplot(CCOM, aes(x = Country, y = Val, fill = levels(split))) +
geom_bar(stat = "identity")
I want a simple stacked bar chart with the bars colored by the contract category which is the variable "split" (ie. CCOM$Contract.Category).
However when I run the code it produces the graph below:
Why won't gplot separate the spending into three distinct blocks? Why do I get color sections scattered throughout the chart.? I have tried using factor(split) and levels(split) but does not seem to work. Maybe I am putting it in the wrong position.
Ah, I just realized what was going on. You seem scared to modify your data frame, don't be! Creating external vectors for ggplot is asking for trouble. Rather than create Country and Val as loose vectors, add them as columns to your data:
CCOM$Country <- CCOM$Principal.Place.of.Performance.Country.Name
CCOM$Val <- CCOM$Action_Absolute_Value
Then your plot is nice and straightforward, you don't have to worry about order or anything else.
ggplot(CCOM, aes(x = Country, y = Val, fill = Contract.Category)) +
geom_bar(stat = "identity")
as you suggest order provides a solution:
ggplot(CCOM[order(CCOM$split), ], aes(x = Country, y = Val, fill = Contract.Category)) +
geom_bar(stat = "identity")
I have a similar example where I use the equivalent of fill as Contact.Category and it still requires the reordering.

Resources