Size based pie chart code doesn't work - r

I wanted to reproduce this attached graph to multiple pie-charts whose radii are defined by the total weed weight. This was the code I used:
weedweights<-data%>%
select(-ends_with("No"))%>%
gather(key=species, value=speciesmass, DIGSAWt:POLLAWt)%>%
mutate(realmass= (10*speciesmass) / samplearea.m.2.)%>%
group_by(Rot.Herb, species)%>%
summarize(avgrealmass=mean(realmass, na.rm=TRUE))%>%
filter(avgrealmass != "NaN")%>%
ungroup()
ggplot(weedweights, aes(x=Rot.Herb, y=avgrealmass, fill=species, width=Rot.Herb)) +
geom_bar(position = "fill", stat="identity") +
facet_grid(Rot.Herb ~ .) +
coord_polar("y") +
theme(axis.text.x = element_blank())
I wanted 18 pie-charts, sort of look like this but got this blank plot. Please see where the problem is.
Here is the stacked chart
And here you can download the data

There are quite a few problems with this. The main one is that your + is in the wrong place, but even once this is fixed there a some problems you must think about.
put the + at the end of the preceding line, not the start of the first. Otherwise the first line ggplot(...) looks like a complete statement (how is R to know there is a + on the next line?) i.e. ggplot(...) + on the first line, and function_call(...) + on subsequent lines so R knows there is a continuation.
opts is deprecated. Use theme instead.
theme_blank is deprecated. Use element_blank instead.
"At least one layer must contain all variables used for facetting" - you are faceting by avgrealmass1 ~ avgrealmass2, but none of your layers has these variables in it, and neither does your data frame. I am unsure what the purpose of this is.
you have width=kg.ha but weedweights doesn't have a kg.ha column.
Fixing (or omitting) these problems yields:
ggplot(weedweights, aes(x=Rot.Herb, y=avgrealmass, fill=species)) +#, width=kg.ha)) +
geom_bar(position = "fill", stat="identity") +
#facet_grid(avgrealmass1~avgrealmass2) +
coord_polar("y") +
theme(axis.text.x = element_blank())
This doesn't look quite what you are after, but at least it fixes a lot of your initial question, and now you can have a proper think about what it is you want to plot/facet so that you can fix the rest.
Edit update in response to OP completely changing original question...
This does one pie chart per Rot.Herb -> change your facet_grid(Rot.Herb ~ .) to facet_wrap( ~ Rot.Herb). Also remove the width=Rot.Herb from the original ggplot call because how can you map width (a number) to Rot.Herb (a string)? Note also the aes(x=1...) --> the x is just a dummy variable, doesn't matter what it is.
ggplot(weedweights, aes(x=1, y=avgrealmass, fill=species)) +
geom_bar(position = "fill", stat="identity") +
facet_wrap(~ Rot.Herb) +
coord_polar("y") +
theme(axis.text.x = element_blank())
Now you say you want to vary the width of each pie chart by its total weed weight (though you do not explain how to obtain this, as weedweights only has one column avgrealmass).
I'll put on my mind-reading hat and assume that "total weed weight" is the sum of "avgrealmass" for each Rot.Herb (i.e. sum across all species for each Rot.Herb). If it is not, you are capable of calculating this column yourself already (you have already shown you can do this when you calculated your weedweight from your original data - well done).
So, you just add this width column to your weedweights:
ww2 <- weedweights %>%
group_by(Rot.Herb) %>%
mutate(totalweedweight=sum(avgrealmass)) %>%
ungroup()
Then ggplot as before with the following changes:
width=totalweedweight: add the width column to the ggplot call.
x=totalweedweight/2: all this does is ensure each pie chart is "left-aligned", as it were, i.e. the pie is a circle and not a ring (leave it as x=1 and you will see what I mean).
ggplot(ww2, aes(x=totalweedweight/2, y=avgrealmass, fill=species, width=totalweedweight)) +
geom_bar(position = "fill", stat="identity") +
facet_wrap(~ Rot.Herb) +
coord_polar("y") +
theme(axis.text.x = element_blank())

Related

Show mean values in boxplots in R

time_pic <- ggplot(data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot()
print(time_pic)
time_pic+labs(title="", x="", y = "Time (Sec)")
I ran the above codes to get the following image. But I don't know how to add average value for each boxplot on this image.
updated.
I tried this.
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
ggplot(data=data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot() +
stat_summary(fun=mean, colour="darkred", geom="point", shape=18, size=3,show_guide = FALSE) +
geom_text(data = means, aes(label = TimeTotal, y = TimeTotal + 0.08))
This is what it looks like now. Two dots are on the same line. And two values are overlapping with each other.
As others said, you can share your dataset for more specific help, but in this case I think the point can be made using a dummy dataset. I'm creating one that looks pretty similar to your own in terms of naming, so theoretically you can just plug in this code and it could work.
The biggest thing you need here is to control how ggplot2 is separating the separate boxplots for the data_box$Sitting_Position that share the same data_box$Kind. The process of separating and spreading the boxes around that x= axis value is called "dodging". When you supply a fill= or color= (or other) aesthetic in aes() for that geom, ggplot2 knows enough that it will assume you also want to group the data according to that value. So, your initial ggplot() call has in aes() that fill=Sitting_Position, which means that geom_boxplot() "works" - it creates the separate boxes that are colored differently and which are "dodged" properly.
When you create the points and the text, ggplot2 has no idea that you want to "dodge" this data, and even if you did want to dodge, on what basis to use for the dodge, since the fill= aesthetic doesn't make sense for a text or point geom. How to fix this? The answer is to:
Supply a group= aesthetic, which can override the grouping of a fill= or color= aesthetic, but which also can serve as a basis for the dodging for geoms that do not have a similar aesthetic.
Specify more clearly how you want to dodge. This will be important for accurate positioning of all things you want to dodge. Otherwise, you will have things dodged, but maybe not the same distance.
Here's how I combined all that:
# the datasets
set.seed(1234)
data_box <- data.frame(
Kind=c(rep('Model-free AR',100),rep('Real-world',100)),
TimeTotal=c(rnorm(50,5.5,1),rnorm(50,5.43,1.1),rnorm(50,4.9,1),rnorm(50,4.7,0.2)),
Sitting_Position=rep(c(rep('face to face',50),rep('side by side',50)),2)
)
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
# the plot
ggplot(data_box, aes(x=Kind, y=TimeTotal)) + theme_bw() +
# specifying dodge here and width to avoid overlapping boxes
geom_boxplot(
aes(fill=Sitting_Position),
position=position_dodge(0.6), width=0.5
) +
# note group aesthetic and same dodge call for next two objects
stat_summary(
aes(group=Sitting_Position),
position=position_dodge(0.6),
fun=mean,
geom='point', color='darkred', shape=18, size=3,
show.legend = FALSE
) +
geom_text(
data=means,
aes(label=round(TimeTotal,2), y=TimeTotal + 0.18, group=Sitting_Position),
position=position_dodge(0.6)
)
Giving you this:

Overlay points (and error bars) over bar plot with position_dodge

I have been trying to look for an answer to my particular problem but I have not been successful, so I have just made a MWE to post here.
I tried the answers here with no success.
The task I want to do seems easy enough, but I cannot figure it out, and the results I get are making me have some fundamental questions...
I just want to overlay points and error bars on a bar plot, using ggplot2.
I have a long format data frame that looks like the following:
> mydf <- data.frame(cell=paste0("cell", rep(1:3, each=12)),
scientist=paste0("scientist", rep(rep(rep(1:2, each=3), 2), 3)),
timepoint=paste0("time", rep(rep(1:2, each=6), 3)),
rep=paste0("rep", rep(1:3, 12)),
value=runif(36)*100)
I have attempted to get the plot I want the following way:
myPal <- brewer.pal(3, "Set2")[1:2]
myPal2 <- brewer.pal(3, "Set1")
outfile <- "test.pdf"
pdf(file=outfile, height=10, width=10)
print(#or ggsave()
ggplot(mydf, aes(cell, value, fill=scientist )) +
geom_bar(stat="identity", position=position_dodge(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_manual(values=myPal) +
scale_color_manual(values=myPal2)
)
dev.off()
But I obtain this:
The problem is, there should be 3 "rep" values per "scientist" bar, but the values are ordered by "rep" instead (they should be 1,2,3,1,2,3, instead of 1,1,2,2,3,3).
Besides, I would like to add error bars with geom_errorbar but I didn't manage to get a working example...
Furthermore, overlying actual value points to the bars, it is making me wonder what is actually being plotted here... if the values are taken properly for each bar, and why the max value (or so it seems) is plotted by default.
The way I think this should be properly plotted is with the median (or mean), adding the error bars like the whiskers in a boxplot (min and max value).
Any idea how to...
... have the "rep" value points appear in proper order?
... change the value shown by the bars from max to median?
... add error bars with max and min values?
I restructured your plotting code a little to make things easier.
The secret is to use proper grouping (which is otherwise inferred from fill and color. Also since you're dodging on multiple levels, dodge2 has to be used.
When you are unsure about "what is plotted where" in bar/column charts, it's always helpful to add the option color="black" which reveals that still things are stacked on top each other, because of your use of dodge instead of dodge2.
p = ggplot(mydf, aes(x=cell, y=value, group=paste(scientist,rep))) +
geom_col(aes(fill=scientist), position=position_dodge2(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge2(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
ggsave(filename = outfile, plot=p, height = 10, width = 10)
gives:
Regarding error bars
Since there are only three replicates I would show original data points and maybe a violin plot. For completeness sake I added also a geom_errorbar.
ggplot(mydf, aes(x=cell, y=value,group=paste(cell,scientist))) +
geom_violin(aes(fill=scientist),position=position_dodge(),color="black") +
geom_point(aes(cell, color=rep), position=position_dodge(0.9), size=5) +
geom_errorbar(stat="summary",position=position_dodge())+
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
gives
Update after comment
As I mentioned in my comment below, the stacking of the percentages leads to an undesirable outcome.
ggplot(mydf, aes(x=paste(cell, scientist), y=value)) +
geom_bar(aes(fill=rep),stat="identity", position=position_stack(),color="black") +
geom_point(aes(color=rep), position=position_dodge(.9), size=3) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")

Create a ggplot geom_boxplot when third variable=x

I think this is a fairly simple question, I just can't figure it out for the life of me.
I am making a boxplot using the following code;
ggplot(data, aes(gear, length)) + geom_boxplot() + xlab('Gear Type') +
ylab('Size (cm)') + ggtitle("Catch Characterization") +
theme(plot.title = element_text(hjust = 0.5))
This produces an aggregated boxplot for my entire dataset, I would like to be able to produce the same boxplot for two subsets of my dataset. Specifically I have another column "action" with either the character "D" or "K". For example MySQL mind wants to just add in a WHERE clause (however I know that is not how it works) such as;
ggplot(data, aes(gear, length, WHEREaction=D)) + geom_boxplot() + xlab('Gear Type') +
ylab('Size (cm)') + ggtitle("Catch Characterization") +
theme(plot.title = element_text(hjust = 0.5))
Edit, I am able to use facet_wrap to parse out and graph based on "action" however I am curious how I could just make one graph where of (length, gear) where "action = D". I know I can fairly easily just restructure my data just wondering if there is a simpler/quicker way to do so with the aggregated data set?
Edit again, think I figured it out by piecing a few things together, I ended up using this code and it seems to be giving me the appropriate graphs;
ggplot(aggdata[aggdata$action == "D",], aes(gear, length)) +
geom_boxplot() + xlab('Gear') + ylab('Size (cm)') +
ggtitle("Discard Characterization") +
theme(plot.title = element_text(hjust = 0.5))

How to stack error bars in a stacked bar plot using geom_errorbar?

I want to stack the error bars in a stacked histogram using geom_errorbar / ggplot.
In my ggplot statement, I have tried to used both position="stack" and position="identity". None of them worked.
Here is my ggplot statement:
ggplot(DF, aes(x=factor(year), y=proportion, fill=response)) +
facet_grid(. ~ sex) +
theme(legend.position="none")
geom_bar(position="stack", stat="identity") +
geom_errorbar(aes(ymin=ci_l, ymax=ci_u),
width=.2, # Width of the error bars
position="identity") +
Here is the result I'm getting, and you may notice that the error bars on the right hand-side do not follow the bar values.
Here is the Data Frame I've used in this example:
DF <- data.frame(sex=c("men","women","men","women","men","women"),
proportion=c(0.33,0.32,0.24,0.29,0.12,0.16),
ci_l=c(0.325,0.322,0.230,0.284,0.114,0.155),
ci_u=c(0.339,0.316,0.252,0.311,0.130,0.176),
year=c(2008,2008,2013,2013,2013,2013),
response=c("Yes","Yes","Yes, entire the journey","Yes, entire the journey","Yes, part of the journey","Yes, part of the journey")
)
What is happening here is that ggplot is not stacking the error bars (they would have to be summed) so you will have to do that by hand (and it seems that Hadley thinks that this is not a good idea and wil not add this functionality).
So doing by hand:
DF$ci_l[DF$response == "Yes, part of the journey"] <- with(DF,ci_l[response == "Yes, part of the journey"] +
ci_l[response == "Yes, entire the journey"])
DF$ci_u[DF$response == "Yes, part of the journey"] <- with(DF,ci_u[response == "Yes, part of the journey"] +
ci_u[response == "Yes, entire the journey"])
Now:
ggplot(DF, aes(x=factor(year), y=proportion)) +
facet_grid(. ~ sex) +
geom_bar(stat="identity",aes(fill=response)) +
geom_errorbar(aes(ymin= ci_l,
ymax= ci_u),
width=.2, # Width of the error bars
position="identity")
The issue here is that geom_errorbar is just making nice error bars with the y values that you give it; it doesn't know anything about the geom_bar layer which has a vertical offset for some of the data. So you need to adjust for the fact that for one of your responses, the values plotted have a positive vertical offset determined by the value for another response. For the example provided, this can be accomplished by:
DF$vadj <- c(rep(0,2), rep(c(0,1,0), each=2) * DF$proportion)[1:6]
ggplot(DF, aes(x=factor(year), y=proportion, fill=response)) +
facet_grid(. ~ sex) + geom_bar(stat='identity') +
geom_errorbar( aes(ymin=ci_l+vadj, ymax=ci_u+vadj), width=.2)
The technique for adjustment here is admittedly not especially elegant, and if you need to generalize, be aware that it is very much dependent on the particular structure of the dataframe (i.e. it would have to be changed if the rows were ordered differently). But it should get your error bars where you want them.

How to add manual colors for a ggplot2 (geom_smooth/geom_line)

I want to built a plot with ggplot2. Therefore i use geom_line to visualize lines and geom_smooth to show the Min-Max-Range of a specific index.
Two data frames were used, the first row consists of the date (e.g.: 2013-02-04) and the next are measured values (e.g. 2.532283).
First i generate an empty ggplot with all styles:
yrange_EVI2 = is the Range of the Index (Minimum - Maximum)
xrange = is the date range for the x-Axis (earliest - latest date)
EVI2_veg <- ggplot() + geom_blank() +
ylim(yrange_EVI2) + xlim(xrange) +
ggtitle("EVI2 for reference-data in Azraq (Jordan)") + ylab("EVI2") + xlab("month") +
theme_bw(base_size = 12, base_family = "Times New Roman")
Second step is to plot the Ranges (Min-Max-Range) and lines with the mean for specific values:
EVI2_veg <- EVI2_veg +
geom_smooth(aes(x=Date, y=Vegetable_mean, ymin=Vegetable_min, ymax=Vegetable_max), data=Grouped_Croptypes_EVI2, stat="identity") +
geom_line(aes(x=Date, y=Tomato), data=Sample_EVI2_A_SPOT)
In the last step i tried to change the color with scale_fill_manual and scale_color_manual:
EVI2_veg <- EVI2_veg +
scale_fill_manual("Min-Max-Range and Mean \nof specific Croptypes",labels=c("Vegetable","Tomato"),values=c("#008B00","#FFFFFF")) +
scale_color_manual("Min-Max-Range and Mean \nof specific Croptypes",labels=c("Vegetable","Tomato"),values=c("#008B00","#CD4F39"))
I read a lot of answers and the manuals for the specific packages but i don't understand when i use the different colors="" and fill="":
geom_line(ads(color="",fill=""))
geom_line(ads(),color="", fill="")
scale_color_manual(values=c("")) or scale_fill_manual=(values=c(""))
If i don't define the 1. no legend appears. But if i define it like in the Code the color don't match to the plot. Its my first time with ggplot2 and i read a lot of this useful package but i don't understand how i can define the colors. And how the colors from the plot and legend matching. It would be nice if somebody could help me.
First, it's always nice to include sample data with any plotting code otherwise we can't run it to see what you see. Please read how to make a great R reproducible example before making other posts. It will make it much easier for people to help you. Anyway, here's some sample data
Sample_EVI2_A_SPOT<-data.frame(
Date=seq(as.Date("2014-01-01"), as.Date("2014-02-01"), by="1 day"),
Tomato = cumsum(rnorm(32))
)
Grouped_Croptypes_EVI2<-data.frame(
Date=seq(as.Date("2014-01-01"), as.Date("2014-02-01"), by="1 day"),
Vegetable_mean=cumsum(rnorm(32))
)
Grouped_Croptypes_EVI2<-transform(Grouped_Croptypes_EVI2,
Vegetable_max=Vegetable_mean+runif(32)*5,
Vegetable_min=Vegetable_mean-runif(32)*5
)
And this should make the plot you want
EVI2_veg <- ggplot() + geom_blank() +
ggtitle("EVI2 for reference-data in Azraq (Jordan)") +
ylab("EVI2") + xlab("month") +
theme_bw(base_size = 12, base_family = "Times New Roman") +
geom_smooth(aes(x=Date, y=Vegetable_mean, ymin=Vegetable_min,
ymax=Vegetable_max, color="Vegetable", fill="Vegetable"),
data=Grouped_Croptypes_EVI2, stat="identity") +
geom_line(aes(x=Date, y=Tomato, color="Tomato"), data=Sample_EVI2_A_SPOT) +
scale_fill_manual(name="Min-Max-Range and Mean \nof specific Croptypes",
values=c(Vegetable="#008B00", Tomato="#FFFFFF")) +
scale_color_manual(name="Min-Max-Range and Mean \nof specific Croptypes",
values=c(Vegetable="#008B00",Tomato="#CD4F39"))
EVI2_veg
Note the addition of color= and fill= in the aes() calls. You really should put stuff you want in legends inside aes(). Here i specify "fake" colors that i then define them in the scale_*_manual commands.

Resources