Trouble using facet_wrap - r

I'm a newbie, so this might be super basic.
When I run this code
Dplot <- qplot(x = diamonds$carat, y = diamonds$price, color = diamonds$color) +
ggtitle("An A+ plot") +
xlab("carat") +
ylab("price") +
geom_point()
Dplot <- Dplot + facet_wrap(vars(diamonds$clarity))
Dplot
I get an error message that says:
Error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just =
"centre", :
only 'grobs' allowed in "gList"
I've tried googling, but haven't been able to figure out what the issue is.

I would advise against using qplot except in the most basic cases. It teaches bad habits (like using $) that should be avoided in ggplot.
We can make the switch by passing the data frame diamonds to ggplot(), and putting the mappings inside aes() with just column names, not diamonds$. Then the facet_wrap works fine as long as we also omit the diamonds$:
Dplot = ggplot(diamonds, aes(x = carat, y = price, color = color)) +
ggtitle("An A+ plot") +
xlab("carat") +
ylab("price") +
geom_point()
Dplot + facet_wrap(vars(clarity))
Dplot + facet_wrap(~ clarity) # another option
Notice the code is actually shorter because we don't need to type diamonds$ all the time!
The vars(clarity) option works fine, more traditionally you would see formula interface used ~ clarity. The vars() option is new-ish, and will play a little nicer if you are writing a function where the name of a column to facet by is stored in a variable.

Related

How to plot plots using different datasets using ggplot2

I am trying to plot a line and a dot using ggplot2. I looked at but it assumes the same dataset is used. What I tried to do is
library(ggplot2)
df = data.frame(Credible=c(0.2, 0.3),
len=c(0, 0))
zero=data.frame(x0=0,y0=0)
ggplot(data=df, aes(x=Credible, y=len, group=1)) +
geom_line(color="red")+
geom_point()+
labs(x = "Credible", y = "")
ggplot(data=zero, aes(x=x0, y=y0, group=1)) +
geom_point(color="green")+
labs(x = "Credible", y = "")
but it generates just the second plot (the dot).
Thank you
Given the careful and reproducible way you created your question I am not just referring to the old answer as it may be harder to transfer the subsetting etc.
You initialize a new ggplot object whenever you run ggplot(...).
If you want to add a layer on top of an existing plot you have to operate on the same object, something like this:
ggplot(data=df, aes(x=Credible, y=len, group=1)) +
geom_line(color="red")+
geom_point()+
labs(x = "Credible", y = "") +
geom_point(data=zero, color="green", aes(x=x0, y=y0, group=1))
Note how in the second geom_point the data source and aesthetics are explicitly specified instead to prevent them being inherited from the initial object.

How to add sample size used in plotting geom_jitter

I want to add how many samples were added to a graph, next to my stat_cor (ggpubr) text.
I'm using the following code to generate the graph:
dataset = mtcars
ggplot(dataset, aes(dataset$wt, dataset$disp)) +
geom_jitter() +
geom_smooth(level=0.95, method = "loess") +
stat_cor(method="spearman") +
theme_classic()
But, if I want to plot multiple graphs in one figure, which uses a real data set where different variables have different missing values, it would be nice to have my sample size used to plot the geom_jitter.
It's a little hacky (and limited in its options), but you can use the label.sep argument to insert the sample size between the correlation coefficient and the p-value (note that somewhat older version of ggpubr have a bug with label.sep... if this doesn't work for you, try updating your package)
ggplot(mtcars, aes(wt, disp)) +
geom_jitter() +
geom_smooth(level = 0.95, method = "loess") +
stat_cor(method = "spearman", label.sep = sprintf(", n = %s, ", nrow(mtcars))) +
theme_classic()
If your concern is missing values, you might need to use a different function than nrow, but I'll leave that to you. This also will not work with facets (you'll get the same number in each facet).
For a fully flexible solution, I think you could use a geom_text, or maybe a stat_summary with geom = "text" would be possible?
Or go hardcore like this answer, if nothing else works
Just for completeness on missing values:
ggplot(mtcars, aes(wt, disp)) +
geom_jitter() +
geom_smooth(level = 0.95, method = "loess") +
stat_cor(method = "spearman", label.sep =
sprintf(", n = %s, ",
sum(complete.cases(mtcars[c("wt","disp")]))
)) +
theme_classic()
To plot the value of N on complete cases of wt and disp as the example shows

How to plot multiple boxplots with numeric x values properly in ggplot2?

I am trying to get a boxplot with 3 different tools in each dataset size like the one below:
ggplot(data1, aes(x = dataset, y = time, color = tool)) + geom_boxplot() +
labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_y_log10() + theme_bw()
But I need to transform x-axis to log scale. For that, I need to numericize each dataset to be able to transform them to log scale. Even without transforming them, they look like the one below:
ggplot(data2, aes(x = dataset, y = time, color = tool)) + geom_boxplot() +
labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_y_log10() + theme_bw()
I checked boxplot parameters and grouping parameters of aes, but could not resolve my problem. At first, I thought this problem is caused by scaling to log, but removing those elements did not resolve the problem.
What am I missing exactly? Thanks...
Files are in this link. "data2" is the numericized version of "data1".
Your question was a tough cookie, but I learned something new from it!
Just using group = dataset is not sufficient because you also have the tool variable to look out for. After digging around a bit, I found this post which made use of the interaction() function.
This is the trick that was missing. You want to use group because you are not using a factor for the x values, but you need to include tool in the separation of your data (hence using interaction() which will compute the possible crosses between the 2 variables).
# This is for pretty-printing the axis labels
my_labs <- function(x){
paste0(x/1000, "k")
}
levs <- unique(data2$dataset)
ggplot(data2, aes(x = dataset, y = time, color = tool,
group = interaction(dataset, tool))) +
geom_boxplot() + labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_x_log10(breaks = levs, labels = my_labs) + # define a log scale with your axis ticks
scale_y_log10() + theme_bw()
This plots

increase font size of subplot titles in R? [duplicate]

I'm wondering how I can manipulate the size of strip text in facetted plots. My question
is similar to a question on plot titles, but I'm specifically concerned with
manipulating not the plot title but the text that appears in facet titles (strip_h).
As an example, consider the mpg dataset.
library(ggplot2)
qplot(hwy, cty, data = mpg) + facet_grid( . ~ manufacturer)
The resulting output produces some facet titles that don't fit in the strip.
I'm thinking there must be a way to use grid to deal with the strip text. But I'm
still a novice and wasn't sure from the grid appendix in Hadley's book how,
precisely, to do it.
You can modify strip.text.x (or strip.text.y) using theme_text(), for instance
qplot(hwy, cty, data = mpg) +
facet_grid(. ~ manufacturer) +
opts(strip.text.x = theme_text(size = 8, colour = "red", angle = 90))
Update: for ggplot2 version > 0.9.1
qplot(hwy, cty, data = mpg) +
facet_grid(. ~ manufacturer) +
theme(strip.text.x = element_text(size = 8, colour = "red", angle = 90))
Nowadays the usage of opts and theme_text seems to be deprecated. R suggests to use theme and element_text. A solution to the answer can be found here: http://wiki.stdout.org/rcookbook/Graphs/Facets%20%28ggplot2%29/#modifying-facet-label-text
qplot(hwy, cty, data = mpg) +
facet_grid(. ~ manufacturer) +
theme(strip.text.x = element_text(size = 8, colour = "red", angle = 90))
I guess in the example of mpg changing the rotation angle and font size is fine, but in many cases you might find yourself with variables that have quite lengthy labels, and it can become a pain in the neck (literally) to try read rotated lengthy labels.
So in addition (or complement) to changing angles and sizes, I usually reformat the labels of the factors that define the facet_grid whenever they can be split in a way that makes sense.
Typically if I have a dataset$variable with strings that looks like
c("median_something", "aggregated_average_x","error","something_else")
I simply do:
reformat <– function(x,lab="\n"){ sapply(x, function(c){ paste(unlist(strsplit(as.character(c) , split="_")),collapse=lab) }) }
[perhaps there are better definitions of reformat but at least this one works fine.]
dataset$variable <- factor(dataset$variable, labels=reformat(dataset$variable, lab='\n')
And upon facetting, all labels will be very readable:
ggplot(data=dataset, aes(x,y)) + geom_point() + facet_grid(. ~ variable)

Showing percentage in bars in ggplot

I have a dataset with binary variables like the one below.
M4 = matrix(sample(1:2,20*5, replace=TRUE),20,5)
M4 <- as.data.frame(M4)
M4$id <- 1:20
I have produced a stacked bar plot using the code below
library(reshape)
library(ggplot2)
library(scales)
M5 <- melt(M4, id="id")
M5$value <- as.factor(M5$value)
ggplot(M5, aes(x = variable)) + geom_bar(aes(fill = value), position = 'fill') +
scale_y_continuous(labels = percent_format())
Now I want the percentage for each field in each bar to be displayed in the graph, so that each bar reach 100%. I have tried 1, 2, 3 and several similar questions, but I can't find any example that fits my situation. How can I manage this task?
Try this method:
test <- ggplot(M5, aes(x = variable, fill = value, position = 'fill')) +
geom_bar() +
scale_y_continuous(labels = percent_format()) +
stat_bin(aes(label=paste("n = ",..count..)), vjust=1, geom="text")
test
EDITED: to give percentages and using the scales package:
require(scales)
test <- ggplot(M5, aes(x = variable, fill = value, position = 'fill')) +
geom_bar() +
scale_y_continuous(labels = percent_format()) +
stat_bin(aes(label = paste("n = ", scales::percent((..count..)/sum(..count..)))), vjust=1, geom="text")
test
You could use the sjp.stackfrq function from the sjPlot-package (see examples here).
M4 = matrix(sample(1:2,20*5, replace=TRUE),20,5)
M4 <- as.data.frame(M4)
sjp.stackfrq(M4)
# alternative colors: sjp.stackfrq(M4, barColor = c("aquamarine4", "brown3"))
Plot appearance can be custzomized with various parameters...
I really like the usage of the implicit information that is created by ggplot itself, as described in this post:
using the ggplot_build() function
From my point of view this provides a lot of opportunities to finally control the appearance of a ggplot chart.
Hope this helps somehow
Tom

Resources