ggpairs() correlation values without gridlines - r

I have this code that generates the plot below.
library(ggplot2)
library(GGally)
data(iris)
ggpairs(data = iris[, 1:4], axisLabels = "none", switch = "both")
I'd like to do three things with this plot: 1) remove the gridlines in the correlation windows; 2) increase font size of the x-y axes labels; and 3) make these label-backgrounds white (instead of gray). The first question was addressed about 4 years ago here and here, and it seems one would need to either rebuild GGally package, or use a custom code from GitHub. Both options are pretty heavy for a newbie like me, and I am wondering if someone has figured out an easier method by now. I have not found my 2nd and 3rd questions addressed anywhere.
Thanks.

The first request can be handled by:
+theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank())
The size of the axis labels (which are really in "strips" can be handled with this additional arguemnt to theme:
... , strip.text = element_text(size = 5))

Related

ggplot theme argument for axis arrows suddenly not working

I have a custom ggplot theme that I have been using for a while now. This week I started getting an error when making plots using this theme stating that plot.new has not been called yet
I've traced the error to an axis.line() argument which i've used before to turn my axis lines into arrows. I was able to reproduce the error below, but i'm confused why this is happening now. Does anyone know why this could be happening or possibly provide some alternatives to changing axis lines to arrows?
Eg:
library(ggplot2)
dat <- mtcars
ggplot(data = dat) +
geom_point(aes(x = wt, y = mpg)) +
theme(axis.line = element_line(color = "black", arrow = arrow(length = unit(0.3, "lines"), type = "closed")))
EDIT:
loading the graphics library (library(graphics)) fixed the issue temporarily. However knitting the markdown file produces the same error as above. It seems my example is not reproducing the error, however it's persisting on my computer even after re-installing RStudio AND updating all packages.

Why aren't any points showing up in the qqcomp function when using plotstyle="ggplot"?

I want to compare the fit of different distributions to my data in a single plot. The qqcomp function from the fitdistrplus package pretty much does exactly what I want to do. The only problem I have however, is that it's mostly written using base R plot and all my other plots are written in ggplot2. I basically just want to customize the qqcomp plots to look like they have been made in ggplot2.
From the documentation (https://www.rdocumentation.org/packages/fitdistrplus/versions/1.0-14/topics/graphcomp) I get that this is totally possible by setting plotstyle="ggplot". If I do this however, no points are showing up on the plot, even though it worked perfectly without the plotstyle argument. Here is a little example to visualize my problem:
library(fitdistrplus)
library(ggplot2)
set.seed(42)
vec <- rgamma(100, shape=2)
fit.norm <- fitdist(vec, "norm")
fit.gamma <- fitdist(vec, "gamma")
fit.weibull <- fitdist(vec, "weibull")
model.list <- list(fit.norm, fit.gamma, fit.weibull)
qqcomp(model.list)
This gives the following output:
While this:
qqcomp(model.list, plotstyle="ggplot")
gives the following output:
Why are the points not showing up? Am I doing something wrong here or is this a bug?
EDIT:
So I haven't figured out why this doesn't work, but there is a pretty easy workaround. The function call qqcomp(model.list, plotstyle="ggplot") still returns an ggplot object, which includes the data used to make the plot. Using that data one can easily write an own plot function that does exactly what one wants. It's not very elegant, but until someone finds out why it's not working as expected I will just use this method.
I was able to reproduce your error and indeed, it's really intriguing. Maybe, you should contact developpers of this package to mention this bug.
Otherwise, if you want to reproduce this qqplot using ggplot and stat_qq, passing the corresponding distribution function and the parameters associated (stored in $estimate):
library(ggplot2)
df = data.frame(vec)
ggplot(df, aes(sample = vec))+
stat_qq(distribution = qgamma, dparams = as.list(fit.gamma$estimate), color = "green")+
stat_qq(distribution = qnorm, dparams = as.list(fit.norm$estimate), color = "red")+
stat_qq(distribution = qweibull, dparams = as.list(fit.weibull$estimate), color = "blue")+
geom_abline(slope = 1, color = "black")+
labs(title = "Q-Q Plots", x = "Theoritical quantiles", y = "Empirical quantiles")
Hope it will help you.

R: Draw arrows in ggplot2 based on loop

I have a dataset as follows:
i <- data.scores
i
NMDS1 NMDS2
Plot_1_O -0.1716069847 -1.177471624
Plot_2_O -0.2452065424 -0.978276228
Plot_3_O 0.3885298355 -0.578810975
... ... ...
Plot_64_O 0.7976712787 -0.187241724
Plot_1_N -0.4044221768 -0.239157686
Plot_2_N 0.2539782304 0.197509348
Plot_3_N 0.3163483600 -0.130876763
... ... ...
Plot_64_N 0.6346501475 0.265873211
As you (may or may not) see, it's vegetational plot-data on 64 different plots, taken at several points in time (hence the "O"/"N" for "Old" and "New"). I've ran an NMDS via vegan's metaMDS() and got a plot showing my results. I've also calculated a fit via env_fit() and relevant environmental data. The finished plot is fine, but I wanted to add arrows between the pairs of old and new survey data. I used a loop to do so:
for (j in 1:64){
k <- j+64
arrows(data.scores$NMDS1[j], data.scores$NMDS2[j],
data.scores$NMDS1[k], data.scores$NMDS2[k], length = 0.1, lwd=2)
}
The resulting plot looks like this: NMDS-Plot
The black symbols show the old plots, the red symbols the new ones respectively.
For better aesthetics I re-drew the plot in ggplot2, and everything worked perfectly fine, until I had to draw those arrows again. I cannot use a loop within the ggplot2-command, and I do not know how draw all of those arrows at once. I tried something like this:
geom_segment(data=i, aes(x=i$NMDS1[1:64], xend=i$NMDS1[65:128],
y=i$NMDS2[1:64], yend=i$NMDS2[65:128]),
arrow = arrow(length = unit(0.5, "cm")), colour="red",
inherit.aes=FALSE, lwd=2)
but it does not draw any arrows at all, not even a single one. Removing the column-specification does not help either, and I doubt I would get all the respective arrows at once. Can anybody help?
I found a solution: Based on my last line of code where I tried to draw the arrows geom_segment(data=i, aes(x=i$NMDS1[1:64], xend=i$NMDS1[65:128], y=i$NMDS2[1:64], yend=i$NMDS2[65:128]), arrow = arrow(length = unit(0.5, "cm")), colour="red", inherit.aes=FALSE, lwd=2) I saved the plot as an object called "Plot_Final" and tried this:
for (j in 1:64){ #j=1
k <- j+64
Plot_Final <- Plot_Final + geom_segment(data=i, x=i$NMDS1[j], xend=i$NMDS1[k], y=i$NMDS2[j], yend=i$NMDS2[k], arrow = arrow(length = unit(0.3, "cm")), colour="black", inherit.aes=FALSE, lwd=0.1)
}
By removing the aes()-argument I finally got my arrows pointing out the plot-pairs within my ggplot2-plot. Thank you anyway!

Automatically adjust plot title width using ggplot

I am fairly new to R/ggplot2 and still learning on the go. Hopefully I am not missing something obvious!
I am trying to create several different plots using ggplot2 that I am layouting using the function plot_grid from the cowplot package to make the plots visible side by side and add plot numeration and captions. The problem is that if the generated plots are displayed in a small window or I have many plots beside one another then the titles of the two plots sometimes overlap. To solve this problem I tried to automatically insert line breaks in my too long titles using code I found in another thread since I wanted the text size of the titles to stay constant.
Using the following code I can easily automatically insert the necessary line breaks to make my title a specific width, but the problem is that I always need to enter a numeric value for the width. Depending on the number of plots I am inserting this value would of course change. I could of course go through my code and manually set the width for each set of plots until it is the correct value, but I was hoping to automate this process so that the title width is adjusted automatically to match the width of the x-axis. Is there anyway to implement this in R?
#automatically line break and add titles
myplot_theme1 = function (plot, x.title = NULL, y.title = NULL, plot.title = NULL) {
plot +
labs(title = paste(strwrap(plot.title, width = 50), collapse = "\n"),
x = x.title,
y = y.title)
}
# generate an example plot
data_plot <- data.frame(x = rnorm(1000), y = rnorm (1000))
plot1 <- ggplot(data_plot, aes(x = x, y = y)) + geom_point()
title <- "This is a title that is very long and does not display nicely"
myplot_theme1(plot1, plot.title = title)
My test plot
I have tried searching but I haven't found any solutions that seem to address what I am looking for. The only solution I did find that looked promising was based on the package gridDebug. This packages doesn't seem to be supported by my operating system anymore though (macOS Sierra Version 10.12.6) since when I try to install it I get the following error message:
Warning in install.packages: dependencies ‘graph’, ‘Rgraphviz’ are not available
And on the CRAN package documentation it states that the package is not even available for macOS El Capitan which was my previous operating system. If someone knows what is causing this issue so that I could try the solution from the above thread that would of course be great as well.
One idea (but perhaps not an ideal solution) is to adjust the size of text based on the number of characters in the title. You can adjust ggplot properties using theme and in this case you want to adjust plot.title (the theme property, not your variable). plot.title has elements size and horizontal justification hjust, the latter is in range [0,1].
# generate an example plot
data_plot <- data.frame(x = rnorm(1000), y = rnorm (1000))
plot1 <- ggplot(data_plot, aes(x = x, y = y)) + geom_point()
title1 <- "This is a title that is very long and does not display nicely"
title2 <- "I'm an even longer sentence just test me out and see if I display the way you want or you'll be sorry"
myplot_theme1 = function (plot, x.title = NULL, y.title = NULL, plot.title = NULL) {
plot +
labs(title = plot.title,
x = x.title,
y = y.title) +
theme(plot.title = element_text(size=800/nchar(plot.title), hjust=0.5)) # 800 is arbitrarily chosen
}
myplot_theme1(plot1, plot.title = title1)
myplot_theme1(plot1, plot.title = title2)

Bad idea? ggplotting an S3 class object

Many R objects have S3 methods to plot associated with them. For instance, every R regression tutorial contains something like this:
dat <- data.frame(x=runif(10))
dat$y <- dat$x+runif(10)
my.lm <- lm( y~x, dat )
plot(my.lm)
Which displays regression diagnostics.
Similarly, I have an S3 object for a package which consists of a list which basically holds a few time series. I have a plot.myobject method for it which reaches into the list, yanks out the time series, and plots them on the same graph. I would like to rewrite this as a ggplot2 function so that it will be prettier and perhaps more extensible as well.
Because this package is intended to get people without much R experience up and running quickly, I'd like this to be a one-liner with one argument, as in plot(myobject), ggplot(myobject), or whatever the appropriate version might be. Then once they get hooked, they can learn more about ggplot2 and customize the graph to their heart's content.
My initial temptation was to simply replace the internals of the plot.myobject method to use ggplot2. This, however, seems like it might lose me major style points.
Is this a bad idea, and if so why and what alternative should I use?
There is an existing idiom in ggplot2 to do exactly what you propose. It is called fortify. It takes an object and produces a version of the object in a form that ggplot can work with, i.e. a data.frame. Section 9.3 in Hadley's ggplot2 book describes how to do this, using the S3 object class lm as an example. To see this in action, type fortify.lm into your console to get the following code:
function (model, data = model$model, ...)
{
infl <- influence(model, do.coef = FALSE)
data$.hat <- infl$hat
data$.sigma <- infl$sigma
data$.cooksd <- cooks.distance(model, infl)
data$.fitted <- predict(model)
data$.resid <- resid(model)
data$.stdresid <- rstandard(model, infl)
data
}
<environment: namespace:ggplot2>
Here is my own example of writing a fortify method for tree, originally published on the ggplot2 mailing list
fortify.tree <- function(model, data, ...){
require(tree)
# Uses tree:::treeco to extract data frame of plot locations
xy <- tree:::treeco(model)
n <- model$frame$n
# Lines copied from tree:::treepl
x <- xy$x
y <- xy$y
node = as.numeric(row.names(model$frame))
parent <- match((node%/%2), node)
sibling <- match(ifelse(node%%2, node - 1L, node + 1L), node)
linev <- data.frame(x=x, y=y, xend=x, yend=y[parent], n=n)
lineh <- data.frame(x=x[parent], y=y[parent], xend=x,
yend=y[parent], n=n)
rbind(linev[-1,], lineh[-1,])
}
theme_null <- opts(
panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank(),
axis.text.x = theme_blank(),
axis.text.y = theme_blank(),
axis.ticks = theme_blank(),
axis.title.x = theme_blank(),
axis.title.y = theme_blank(),
legend.position = "none"
)
And the plot code. Notice that the data passed to ggplot is not a data.frame but a tree object.
library(ggplot2)
library(tree)
data(cpus, package="MASS")
cpus.ltr <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, cpus)
p <- ggplot(data=cpus.ltr) +
geom_segment(aes(x=x,y=y,xend=xend,yend=yend,size=n),
colour="blue", alpha=0.5) +
scale_size("n", to=c(0, 3)) +
theme_null
print(p)
As per Hadley's suggestion in comments, I have submitted a generic S3 autoplot() to the ggplot2 Github repository. So if it's accepted and checks out, there should be an autoplot available for this use in the future.
Update
autoplot is now available in ggplot2.
Using plot.myobject is easy to remember and execute. However, if you're talking about myobjects that already have plot.myobject functions, you have to possibly worry about the different versions in the different namespaces. But if it's just for your own myobjects, you don't lose any style points with me. The nlme package, for one, does this extensively, though with lattice graphs instead of ggplot.
Using ggplot.myobject is an alternative; you shouldn't have to worry about other versions, unless other people start doing the same thing. However, as you note, it does break the ggplot usage paradigm.
Another alternative is to use a new name, say, gsk3plot; you never have to worry about other versions, it's not too hard to remember, and you can make alternatives to plot to your heart's content without having to worry about conflicts. This is probably what I'd choose as it makes it clear to the audience that these plots are customizable and this is a function that makes the plot the way that you prefer, and that if they are so inclined, they could dig in and do the same thing.
ggplot and ggplot2 methods generally expect the data to come to them in melt()-ed form. So your methods may need to do a melt (from package plyr) and then "map" the resulting column names to arguments in the ggplot methods.

Resources