Create a ggplot geom_boxplot when third variable=x - r

I think this is a fairly simple question, I just can't figure it out for the life of me.
I am making a boxplot using the following code;
ggplot(data, aes(gear, length)) + geom_boxplot() + xlab('Gear Type') +
ylab('Size (cm)') + ggtitle("Catch Characterization") +
theme(plot.title = element_text(hjust = 0.5))
This produces an aggregated boxplot for my entire dataset, I would like to be able to produce the same boxplot for two subsets of my dataset. Specifically I have another column "action" with either the character "D" or "K". For example MySQL mind wants to just add in a WHERE clause (however I know that is not how it works) such as;
ggplot(data, aes(gear, length, WHEREaction=D)) + geom_boxplot() + xlab('Gear Type') +
ylab('Size (cm)') + ggtitle("Catch Characterization") +
theme(plot.title = element_text(hjust = 0.5))
Edit, I am able to use facet_wrap to parse out and graph based on "action" however I am curious how I could just make one graph where of (length, gear) where "action = D". I know I can fairly easily just restructure my data just wondering if there is a simpler/quicker way to do so with the aggregated data set?
Edit again, think I figured it out by piecing a few things together, I ended up using this code and it seems to be giving me the appropriate graphs;
ggplot(aggdata[aggdata$action == "D",], aes(gear, length)) +
geom_boxplot() + xlab('Gear') + ylab('Size (cm)') +
ggtitle("Discard Characterization") +
theme(plot.title = element_text(hjust = 0.5))

Related

How to plot plots using different datasets using ggplot2

I am trying to plot a line and a dot using ggplot2. I looked at but it assumes the same dataset is used. What I tried to do is
library(ggplot2)
df = data.frame(Credible=c(0.2, 0.3),
len=c(0, 0))
zero=data.frame(x0=0,y0=0)
ggplot(data=df, aes(x=Credible, y=len, group=1)) +
geom_line(color="red")+
geom_point()+
labs(x = "Credible", y = "")
ggplot(data=zero, aes(x=x0, y=y0, group=1)) +
geom_point(color="green")+
labs(x = "Credible", y = "")
but it generates just the second plot (the dot).
Thank you
Given the careful and reproducible way you created your question I am not just referring to the old answer as it may be harder to transfer the subsetting etc.
You initialize a new ggplot object whenever you run ggplot(...).
If you want to add a layer on top of an existing plot you have to operate on the same object, something like this:
ggplot(data=df, aes(x=Credible, y=len, group=1)) +
geom_line(color="red")+
geom_point()+
labs(x = "Credible", y = "") +
geom_point(data=zero, color="green", aes(x=x0, y=y0, group=1))
Note how in the second geom_point the data source and aesthetics are explicitly specified instead to prevent them being inherited from the initial object.

R studio graph not plotted (neither with x11())

p <- data %>%
ggplot(aes(x=rating)) +
geom_histogram( binwidth=10, fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Distribution of teams fifa ratings") +
theme_ipsum() + theme(plot.title = element_text(size=15))
#I am trying to plot this histogram but I don't know why the plot is not shown
#This code unlikely contains mistakes since I have copied It from https://www.r-graph-gallery.com/220-basic-ggplot2-histogram.html#binSize
Your binwidth argument is the culprit. You can confirm by removing that and seeing if it works.
Set a binwidth that is compatible with the values for rating. This should solve the issue.

ggplot2 facet_wrap doesn't find a variable but shape does

I'm running in a bit of a problem plotting some data with ggplot2: I want to use a facet_wrap over a variable AdultInputProp, but R doesn't find the variable and instead returns an Error in as.quoted(facets) : object 'AdultInputProp' not found. Now I understand that this simply means that R can't find this variable in the dataset used to plot, but if I ask ggplot2 to instead use the same variable for to create a shape scale, it works just fine. Any idea what the problem might be?
Sorry, I'm not too sure how to make a minimal working example with a generated df from scratch, so here's the df I'm using, and the code bellow. I've also tried using facet_grid instead of facet_wrap but ran into the same problem.
The code here with facets returns the above-mentioned error:
df.plot.GBPperAIP <- ggplot(df.sum.GBPperAIP,
aes(x=TestIteration, y=Error,
colour=GoalBabblingProp,
group=interaction(GoalBabblingProp,
AdultInputProp))) +
facet_wrap(AdultInputProp) +
xlab("Step") + ylab("Mean error") + theme_bw(base_size=18) +
scale_colour_discrete(name = "Goal babbling proportion") +
geom_line(position = position_dodge(1000)) +
geom_errorbar(aes(ymin=Error-ci,
ymax=Error+ci),
color="black", width=1000,
position = position_dodge(1000)) +
geom_point(position = position_dodge(1000),
size=1.5, fill="white")
This other code, exactly the same except for the facet_wrap line deleted and with shape added works fine:
df.plot.GBPperAIP <- ggplot(df.sum.GBPperAIP,
aes(x=TestIteration, y=Error,
colour=GoalBabblingProp,
shape=AdultInputProp,
group=interaction(GoalBabblingProp,
AdultInputProp))) +
xlab("Step") + ylab("Mean error") + theme_bw(base_size=18) +
scale_colour_discrete(name = "Goal babbling proportion") +
geom_line(position = position_dodge(1000)) +
geom_errorbar(aes(ymin=Error-ci,
ymax=Error+ci),
color="black", width=1000,
position = position_dodge(1000)) +
geom_point(position = position_dodge(1000),
size=1.5, fill="white")
facet_wrap expects a formula, not just a naked variable name. So you should change it to
...
facet_wrap(~ AdultInputProp) +
...

Size based pie chart code doesn't work

I wanted to reproduce this attached graph to multiple pie-charts whose radii are defined by the total weed weight. This was the code I used:
weedweights<-data%>%
select(-ends_with("No"))%>%
gather(key=species, value=speciesmass, DIGSAWt:POLLAWt)%>%
mutate(realmass= (10*speciesmass) / samplearea.m.2.)%>%
group_by(Rot.Herb, species)%>%
summarize(avgrealmass=mean(realmass, na.rm=TRUE))%>%
filter(avgrealmass != "NaN")%>%
ungroup()
ggplot(weedweights, aes(x=Rot.Herb, y=avgrealmass, fill=species, width=Rot.Herb)) +
geom_bar(position = "fill", stat="identity") +
facet_grid(Rot.Herb ~ .) +
coord_polar("y") +
theme(axis.text.x = element_blank())
I wanted 18 pie-charts, sort of look like this but got this blank plot. Please see where the problem is.
Here is the stacked chart
And here you can download the data
There are quite a few problems with this. The main one is that your + is in the wrong place, but even once this is fixed there a some problems you must think about.
put the + at the end of the preceding line, not the start of the first. Otherwise the first line ggplot(...) looks like a complete statement (how is R to know there is a + on the next line?) i.e. ggplot(...) + on the first line, and function_call(...) + on subsequent lines so R knows there is a continuation.
opts is deprecated. Use theme instead.
theme_blank is deprecated. Use element_blank instead.
"At least one layer must contain all variables used for facetting" - you are faceting by avgrealmass1 ~ avgrealmass2, but none of your layers has these variables in it, and neither does your data frame. I am unsure what the purpose of this is.
you have width=kg.ha but weedweights doesn't have a kg.ha column.
Fixing (or omitting) these problems yields:
ggplot(weedweights, aes(x=Rot.Herb, y=avgrealmass, fill=species)) +#, width=kg.ha)) +
geom_bar(position = "fill", stat="identity") +
#facet_grid(avgrealmass1~avgrealmass2) +
coord_polar("y") +
theme(axis.text.x = element_blank())
This doesn't look quite what you are after, but at least it fixes a lot of your initial question, and now you can have a proper think about what it is you want to plot/facet so that you can fix the rest.
Edit update in response to OP completely changing original question...
This does one pie chart per Rot.Herb -> change your facet_grid(Rot.Herb ~ .) to facet_wrap( ~ Rot.Herb). Also remove the width=Rot.Herb from the original ggplot call because how can you map width (a number) to Rot.Herb (a string)? Note also the aes(x=1...) --> the x is just a dummy variable, doesn't matter what it is.
ggplot(weedweights, aes(x=1, y=avgrealmass, fill=species)) +
geom_bar(position = "fill", stat="identity") +
facet_wrap(~ Rot.Herb) +
coord_polar("y") +
theme(axis.text.x = element_blank())
Now you say you want to vary the width of each pie chart by its total weed weight (though you do not explain how to obtain this, as weedweights only has one column avgrealmass).
I'll put on my mind-reading hat and assume that "total weed weight" is the sum of "avgrealmass" for each Rot.Herb (i.e. sum across all species for each Rot.Herb). If it is not, you are capable of calculating this column yourself already (you have already shown you can do this when you calculated your weedweight from your original data - well done).
So, you just add this width column to your weedweights:
ww2 <- weedweights %>%
group_by(Rot.Herb) %>%
mutate(totalweedweight=sum(avgrealmass)) %>%
ungroup()
Then ggplot as before with the following changes:
width=totalweedweight: add the width column to the ggplot call.
x=totalweedweight/2: all this does is ensure each pie chart is "left-aligned", as it were, i.e. the pie is a circle and not a ring (leave it as x=1 and you will see what I mean).
ggplot(ww2, aes(x=totalweedweight/2, y=avgrealmass, fill=species, width=totalweedweight)) +
geom_bar(position = "fill", stat="identity") +
facet_wrap(~ Rot.Herb) +
coord_polar("y") +
theme(axis.text.x = element_blank())

ggplot2: Order in legend alphabetically instead of following the order of appearance in the dataset

I am producing a ggplot2 line plot from AESRD 2013 - SCO Bitumen - 7y.csv in this folder. The file is automatically created on a website according to my wishes. These are several time series showing some production values, each named according to the column "Compilation". So I grouped according to "Compilation".
See this excerpt of my code in the file plotter.r available in the same folder (see above).
# "dt" is the dataframe derived from the csv file.
# "thinned" is some vector of x-values that tells where to draw the special symbols.
p = ggplot(dt, aes(Date, Value, colour= Compilation, group = Compilation, size = plotParameter), guide=FALSE)
p = p + geom_point(data=dt[thinned,],aes(as.Date(Date), Value, colour= Compilation, shape = Compilation), size = 5)
p = p + scale_shape_manual(values = seq(0,20))
p = p + geom_line(guide = FALSE)
p = p + scale_colour_manual(values=cbPalette) #cbPalette is already defined
p = p + scale_size(range=c(0.5, 2), guide=FALSE)
p = p + scale_y_continuous(labels = comma)
p = p + ylab("Barrels per day") + xlab("")
p = p + theme(legend.text = element_text(size = 8, hjust = 5, vjust= -5))
plot(p)
Here comes the nasty thing: The legend reorders my compilations alphabetically!
I have purposely designed my csv-file so that each compilation shows up in a certain logical order (the most important series first, then following in order of some performance parameter). So the right order of the legend would simply be according to unique(dt$Compilation).
My steps until now have been to introduce the column Order into the csv-file and experiment with that (unsuccessfully), and to change my code in all kinds of ways. With no success.
Of course, I have googled and checked most available threads on Stackoverflow. I have encountered factorization and reordering, but there is no "logical" order for my compilations except for the order they appear in the dataset. *Sigh*
Can anyone point me on where to insert what?
(Bonus point: How do I get rid of those horizontal lines in the symbol legend?)
apply breaks in both scales(scale_color_manual and scale_shape_manual) . If we did just one, they wouldn't match, and ggplot would split them into two legends, rather than merging them
One such example is:
> library(ggplot2)
> ggplot(mtcars, aes(wt, mpg, shape=factor(cyl))) + geom_point() + theme_bw()
> library(ggplot2)
> bp <- ggplot(data=PlantGrowth, aes(x=group, y=weight, fill=group)) + geom_boxplot()
> bp
bp + scale_fill_discrete(breaks=c("trt1","ctrl","trt2"))
To reverse
bp + scale_fill_discrete(breaks = rev(levels(PlantGrowth$group)))
also try
bp + scale_fill_discrete(breaks = unique(levels(PlantGrowth$group)))
To address the ordering: You probably want to fix the factor levels of dt$Compilation by calling sth like dt <- transform(dt, Compilation=factor(Compilation, levels=unique(Compilation)))before plotting.
And speaking of horizontal lines: Do you want p = p + guides(size=FALSE)?

Resources