Reordering data in ggplot creates mismatch between data and bars

Reordering data in ggplot creates mismatch between data and bars - r

I have the following data based off responses to a survey:
1 ABSENCE OF BULLYING 0.665
2 SENSE OF SAFETY 0.614
3 FAIRNESS OF DISCIPLINE 0.677
4 FEELINGS OF EMOTIONAL SAFETY 0.585
5 PERCEPTION OF EQUITABLE TREATMENT 0.691
6 COMFORT OF PHYSICAL ENVIRONMENT 0.509
I want to create a bar graph graph with this data and to cut down the excess text in the label, so I did the following:
fig4 = read_excel(data)
order = c("bullying", "safety", "discipline", "emotional", "equity", "environment")
fig4$domain = as.factor(order)
levels(fig4$domain) = order
ggplot(fig4, aes(x = reorder(domain, desc(domain)), y = pct_responses)) +
geom_bar(position = "dodge", stat="identity", width=0.5, fill="#3bbae0") +
vertical_theme +
labs(y = "% of Affirmative Responses", x = "") +
scale_y_continuous(expand = c(0,0),
limits = c(0,1),
breaks = seq(0,1,0.2),
labels = function(x) paste0(x*100, "%")) +
coord_flip() +
geom_text(aes(label= paste0(round(pct_responses*100),"%")), position = position_dodge(width = 0.5), size=scale_factor * 6, hjust=-0.25)
However, this created a graph where the bars don't match with the right axis label. Rows 2, 3, and 6 have the wrong numbers.
I fixed the problem by using this code instead:
fig4$domain = c("bullying", "safety", "discipline", "emotional", "equity", "environment")
fig4$domain = factor(fig4$domain, levels = fig4$domain)
This solved the issue. However, I'm not sure why the first way I did it messed up my graph. Can someone please explain what happened?

Related

How to get geom_point size to reflect actual values rather than relative values

There is probably a simple solution to this, but I'm too much of a newcomer to know what it is - so I'd greatly appreciate any help.
I'm trying to create a graph that will show the average time to first response vs. the response rate. However, I want each point size to represent the actual size (no. active users in the table below) rather than the relative size.
Data frame table:
ggplot(benchmarksdf, aes(benchmarksdf$`Avg. Time To First Response`,benchmarksdf$`Response Rate`)) +
geom_point(shape=21, aes(fill=benchmarksdf$`Community Name`, size=benchmarksdf$`Active Users`)) +
geom_text(aes(label=benchmarksdf$`Community Name`), check_overlap = T, show.legend = F, size = 3, vjust = 2) +
labs(title = "Benchmarking Top Enterprise Communities",
subtitle = "Comparing top brand communities by response rate and avg. time to first response",
y = "Response Rate %",
x = "Avg. time to first response (days)") + scale_x_reverse () +
theme_classic()+
theme(legend.position = 'none',aspect.ratio = 0.8)
This leads into this result below:
Gggplot of community by size:
My eyes could be deceiving me, but at the moment the size of each point seems established by relativity to one another rather than the values of the data.
Is there a way to correct this and have it represent the absolute active users number?

Check that Active users is numerics and not factors.
I tried with the dataset midwest from package ggplot2 and the display seems correct:
library(ggplot2)
data(midwest)
gg <- ggplot(midwest[1:10,],aes(x = area, y = poptotal)) +
geom_point(aes(size=popdensity)) +
labs(title = "Area vs Pop", subtitle = "Midwest dataset", y = "Pop", x = "Area") +
geom_text(aes(label=county),size = 3,hjust = 0.5, vjust = -1.5)
gg

You will have to change scale of size to continuous instead of default one.
try below code:
ggplot(benchmarksdf, aes(x =benchmarksdf$Avg.time.to.first.response,y= benchmarksdf$Response.rate)) + geom_point(shape=21, aes(fill=benchmarksdf$Community.name, size = benchmarksdf$Active.users)) +scale_size_continuous(limits = c(0,2100))+geom_text(aes(label=benchmarksdf$Community.name), check_overlap = T, show.legend = F, size = 3, vjust = 2) +
labs(title = "Benchmarking Top Enterprise Communities",
subtitle = "Comparing top brand communities by response rate and avg. time to first response",
y = "Response Rate %",
x = "Avg. time to first response (days)") + scale_x_reverse () +
theme_classic()+
theme(legend.position = 'none',aspect.ratio = 0.8)

Implementing paired lines into boxplot.ggplot2

I have a set of paired data, and I'm using ggplot2.boxplot (of the easyGgplot2 package) with added (jittered) individual data points:
ggplot2.boxplot(data=INdata,xName='condition',yName='vicarious_pain',groupName='condition',showLegend=FALSE,
position="dodge",
addDot=TRUE,dotSize=3,dotPosition=c("jitter", "jitter"),jitter=0.2,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired")
INdata:
ID,condition,pain
1,Treatment,4.5
3,Treatment,12.5
4,Treatment,16
5,Treatment,61.75
6,Treatment,23.25
7,Treatment,5.75
8,Treatment,5.75
9,Treatment,5.75
10,Treatment,44.5
11,Treatment,7.25
12,Treatment,40.75
13,Treatment,17.25
14,Treatment,2.75
15,Treatment,15.5
16,Treatment,15
17,Treatment,25.75
18,Treatment,17
19,Treatment,26.5
20,Treatment,27
21,Treatment,37.75
22,Treatment,26.5
23,Treatment,15.5
25,Treatment,1.25
26,Treatment,5.75
27,Treatment,25
29,Treatment,7.5
1,No Treatment,34.5
3,No Treatment,46.5
4,No Treatment,34.5
5,No Treatment,34
6,No Treatment,65
7,No Treatment,35.5
8,No Treatment,48.5
9,No Treatment,35.5
10,No Treatment,54.5
11,No Treatment,7
12,No Treatment,39.5
13,No Treatment,23
14,No Treatment,11
15,No Treatment,34
16,No Treatment,15
17,No Treatment,43.5
18,No Treatment,39.5
19,No Treatment,73.5
20,No Treatment,28
21,No Treatment,12
22,No Treatment,30.5
23,No Treatment,33.5
25,No Treatment,20.5
26,No Treatment,14
27,No Treatment,49.5
29,No Treatment,7
The resulting plot looks like this:
However, since this is paired data, I want to represent this in the plot - specifically to add lines between paired datapoints. I've tried adding
... + geom_line(aes(group = ID))
..but I am not able to implement this into the ggplot2.boxplot code. Instead, I get this error:
Error in if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
argument is not interpretable as logical
In addition: Warning message:
In if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
the condition has length > 1 and only the first element will be used
Grateful for any input on this!

I do not know the package from which ggplot2.boxplot comes from but I will show you how perform the requested operation in ggplot.
The requested output is a bit problematic for ggplot since you want both points and lines connecting them to be jittered by the same amount. One way to perform that is to jitter the points prior making the plot. But the x axis is discrete, here is a workaround:
b <- runif(nrow(df), -0.1, 0.1)
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition), y = pain, group = condition))+
geom_point(aes(x = as.numeric(condition) + b, y = pain)) +
geom_line(aes(x = as.numeric(condition) + b, y = pain, group = ID)) +
scale_x_continuous(breaks = c(1,2), labels = c("No Treatment", "Treatment"))+
xlab("condition")
First I have made a vector to jitter by called b, and converted the x axis to numeric so I could add b to the x axis coordinates. Latter I relabeled the x axis.
I do agree with eipi10's comment that the plot works better without jitter:
ggplot(df, aes(condition, pain)) +
geom_boxplot(width=0.3, size=1.5, fatten=1.5, colour="grey70") +
geom_point(colour="red", size=2, alpha=0.5) +
geom_line(aes(group=ID), colour="red", linetype="11") +
theme_classic()
and the updated plot with jittered points eipi10 style:
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition),
y = pain,
group = condition),
width=0.3,
size=1.5,
fatten=1.5,
colour="grey70")+
geom_point(aes(x = as.numeric(condition) + b,
y = pain),
colour="red",
size=2,
alpha=0.5) +
geom_line(aes(x = as.numeric(condition) + b,
y = pain,
group = ID),
colour="red",
linetype="11") +
scale_x_continuous(breaks = c(1,2),
labels = c("No Treatment", "Treatment"),
expand = c(0.2,0.2))+
xlab("condition") +
theme_classic()

Although I like the oldschool way of plotting with ggplot as shown by #missuse's answer, I wanted to check whether using your ggplot2.boxplot-based code this was also possible.
I loaded your data:
'data.frame': 52 obs. of 3 variables:
$ ID : int 1 3 4 5 6 7 8 9 10 11 ...
$ condition: Factor w/ 2 levels "No Treatment",..: 2 2 2 2 2 2 2 2 2 2 ...
$ pain : num 4.5 12.5 16 61.8 23.2 ...
And called your code, adding geom_line at the end as you suggested your self:
ggplot2.boxplot(data = INdata,xName = 'condition', yName = 'pain', groupName = 'condition',showLegend = FALSE,
position = "dodge",
addDot = TRUE, dotSize = 3, dotPosition = c("jitter", "jitter"), jitter = 0,
ylim = c(0,100),
backgroundColor = "white",xtitle = "",ytitle = "Pain intenstity", mainTitle = "Pain intensity",
brewerPalette = "Paired") + geom_line(aes(group = ID))
Note that I set jitter to 0. The resulting graph looks like this:
If you don't set jitter to 0, the lines still run from the middle of each boxplot, ignoring the horizontal location of the dots.
Not sure why your call gives an error. I thought it might be a factor issue, but I see that my ID variable is not factor class.

I implemented missuse's jitter solution into the ggplot2.boxplot approach in order to align the dots and lines. Instead of using "addDot", I had to instead add dots using geom_point (and lines using geom_line) after, so I could apply the same jitter vector to both dots and lines.
b <- runif(nrow(df), -0.2, 0.2)
ggplot2.boxplot(data=df,xName='condition',yName='pain',groupName='condition',showLegend=FALSE,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired") +
geom_point(aes(x=as.numeric(condition) + b, y=pain),colour="black",size=3, alpha=0.7) +
geom_line(aes(x=as.numeric(condition) + b, y=pain, group=ID), colour="grey30", linetype="11", alpha=0.7)

ggplot2 plot two data sets into one picture

this must be a FAQ, but I can't find an exactly similar example in the other answers (feel free to close this if you can point a similar Q&A). I'm still a newbie with ggplot2 and can't seem to wrap my head around it quite so easily.
I have 2 data.frames (that come from separate mixed models) and I'm trying to plot them both into the same graph. The data.frames are:
newdat
id Type pred SE
1 1 15.11285 0.6966029
2 1 13.68750 0.9756909
3 1 13.87565 0.6140860
4 1 14.61304 0.6187750
5 1 16.33315 0.6140860
6 1 16.19740 0.6140860
1 2 14.88805 0.6966029
2 2 13.46270 0.9756909
3 2 13.65085 0.6140860
4 2 14.38824 0.6187750
5 2 16.10835 0.6140860
6 2 15.97260 0.6140860
and
newdat2
id pred SE
1 14.98300 0.6960460
2 13.25893 0.9872502
3 13.67650 0.6150701
4 14.39590 0.6178266
5 16.37662 0.6171588
6 16.08426 0.6152017
As you can see, the second data.frame doesn't have Type, whereas the first does, and therefore has 2 values for each id.
What I can do with ggplot, is plot either one, like this:
fig1
fig2
As you can see, in fig 1 ids are stacked by Type on the x-axis to form two groups of 6 ids. However, in fig 2 there is no Type, but instead just the 6 ids.
What I would like to accomplish is to plot fig2 to the left/right of fig1 with similar grouping. So the resulting plot would look like fig 1 but with 3 groups of 6 ids.
The problem is also, that I need to label and organize the resulting figure so that for newdat the x-axis would include a label for "model1" and for newdat2 a label for "model2", or some similar indicator that they are from different models. And to make things even worse, I need some labels for Type in newdat.
My (hopefully) reproducible (but obviously very bad) code for fig 1:
library(ggplot2)
pd <- position_dodge(width=0.6)
ggplot(newdat,aes(x=Type,y=newdat$pred,colour=id))+
geom_point(position=pd, size=5)
geom_linerange(aes(ymin=newdat$pred-1.96*SE,ymax=newdat$pred+1.96*SE), position=pd, size=1.5, linetype=1) +
theme_bw() +
scale_colour_grey(start = 0, end = .8, name="id") +
coord_cartesian(ylim=c(11, 18)) +
scale_y_continuous(breaks=seq(10, 20, 1)) +
scale_x_discrete(name="Type", limits=c("1","2"))
Code for fig 2 is identical, but without the limits in the last line and with id defined for x-axis in ggplot(aes())
As I understand it, defining stuff at ggplot() makes that stuff "standard" along the whole graph, and I've tried to remove the common stuff and separately define geom_point and geom_linerange for both newdat and newdat2, but no luck so far... Any help is much appreciated, as I'm completely stuck.

How about adding first adding some new variables to each dataset and then combining them:
newdat$model <- "model1"
newdat2$model <- "model2"
newdat2$Type <- 3
df <- rbind(newdat, newdat2)
# head(df)
Then we can plot with:
library(ggplot2)
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5, linetype = 1)
Alternatively, you pass an additional aesthetic to geom_linerange to further delineate the model type:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE, linetype = model),
position = position_dodge(width = 0.6),
size = 1.5)
Finally, you may want to considered facets:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5) +
facet_wrap(~ id)

R Side-by-side grouped boxplot

I have temporal data of gas emissions from two species of plant, both of which have been subjected to the same treatments. With some previous help to get this code together [edit]:
soilflux = read.csv("soil_fluxes.csv")
library(ggplot2)
soilflux$Treatment <- factor(soilflux$Treatment,levels=c("L-","C","L+"))
soilplot = ggplot(soilflux, aes(factor(Week), Flux, fill=Species, alpha=Treatment)) + stat_boxplot(geom ='errorbar') + geom_boxplot()
soilplot = soilplot + labs(x = "Week", y = "Flux (mg m-2 d-1)") + theme_bw(base_size = 12, base_family = "Helvetica")
soilplot
Producing this which works well but has its flaws.
Whilst it conveys all the information I need it to, despite Google trawls and looking through here I just couldn't get the 'Treatment' part of the legend to show that L- is light and L+ darkest. I've also been told that a monochrome colour scheme is easier to differentiate hence I'm trying to get something like this where the legend is clear.
(source: biomedcentral.com)

As a workaround you could create a combined factor from species and treatment and assign the fill colors manually:
library(ggplot2)
library(RColorBrewer)
d <- expand.grid(week = factor(1:4), species = factor(c("Heisteria", "Simarouba")),
trt = factor(c("C", "L-", "L+"), levels = c("L-", "C", "L+")))
d <- d[rep(1:24, each = 30), ]
d$flux <- runif(NROW(d))
# Create a combined factor for coding the color
d$spec.trt <- interaction(d$species, d$trt, lex.order = TRUE, sep = " - ")
ggplot(d, aes(x = week, y = flux, fill = spec.trt)) +
stat_boxplot(geom ='errorbar') + geom_boxplot() +
scale_fill_manual(values = c(brewer.pal(3, "Greens"), brewer.pal(3, "Reds")))

ggplot2 multiple sub groups of a bar chart

I am trying to produce a bar graph that has multiple groupings of factors. An example from excel of what I am attempting to create, subgrouped by Variety and Irrigation treatment:
I know I could produce multiple graphs using facet_wrap(), but I would like to produce multiple graphs for this same type of data for multiple years of similar data. An example of the data I used in this example:
Year Trt Variety geno yield SE
2010-2011 Irr Variety.2 1 6807 647
2010-2011 Irr Variety.2 2 5901 761
2010-2011 Irr Variety.1 1 6330 731
2010-2011 Irr Variety.1 2 5090 421
2010-2011 Dry Variety.2 1 3953 643
2010-2011 Dry Variety.2 2 3438 683
2010-2011 Dry Variety.1 1 3815 605
2010-2011 Dry Variety.1 2 3326 584
Is there a way to create multiple groupings in ggplot2? I have searched for quite some time and have yet to see an example of something like the example graph above.
Thanks for any help you may have!

This may be a start.
dodge <- position_dodge(width = 0.9)
ggplot(df, aes(x = interaction(Variety, Trt), y = yield, fill = factor(geno))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymax = yield + SE, ymin = yield - SE), position = dodge, width = 0.2)
Update: labelling of x axis
I have added:
coord_cartesian, to set limits of y axis, mainly the lower limit to avoid the default expansion of the axis.
annotate, to add the desired labels. I have hard-coded the x positions, which I find OK in this fairly simple example.
theme_classic, to remove the gray background and the grid.
theme, increase lower plot margin to have room for the two-row label, remove default labels.
Last set of code: Because the text is added below the x-axis, it 'disappears' outside the plot area, and we need to remove the 'clipping'. That's it!
library(grid)
g1 <- ggplot(data = df, aes(x = interaction(Variety, Trt), y = yield, fill = factor(geno))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymax = yield + SE, ymin = yield - SE), position = dodge, width = 0.2) +
coord_cartesian(ylim = c(0, 7500)) +
annotate("text", x = 1:4, y = - 400,
label = rep(c("Variety 1", "Variety 2"), 2)) +
annotate("text", c(1.5, 3.5), y = - 800, label = c("Irrigated", "Dry")) +
theme_classic() +
theme(plot.margin = unit(c(1, 1, 4, 1), "lines"),
axis.title.x = element_blank(),
axis.text.x = element_blank())
# remove clipping of x axis labels
g2 <- ggplot_gtable(ggplot_build(g1))
g2$layout$clip[g2$layout$name == "panel"] <- "off"
grid.draw(g2)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Reordering data in ggplot creates mismatch between data and bars - r

Related

How to get geom_point size to reflect actual values rather than relative values

Implementing paired lines into boxplot.ggplot2

ggplot2 plot two data sets into one picture

R Side-by-side grouped boxplot

ggplot2 multiple sub groups of a bar chart

Categories

Resources