geom_mosaic and aes magic - r

I use the following library for mosaic plots. In the documentation it provides the following example:
ggplot(data = fly) +
geom_mosaic(aes(x = product(DoYouRecline, RudeToRecline), fill=DoYouRecline)
So the DoYouRecline and RudeToRecline are x and y variables in a plot, but for some reason they are combined with product function which returns a list of names. How does geom_mosaic internally use product(DoYouRecline, RudeToRecline) in order to figure out that the two parameters need to be used for x and y axis?
The main problem that I have is that I need to have one of those attribute names as a variable, for instance something like this:
attr_name = 'foo'
ggplot(data = fly) +
geom_mosaic(aes(x = product(DoYouRecline, attr_name), fill=DoYouRecline)
Thanks!

Related

Modify labels in facet_grid on existing ggplot2 object

Suppose we have the following dataset:
d = data.frame(
y = rnorm(100),
x = rnorm(100),
f1 = sample(c("A", "B"), size=100, replace=T)
)
And I want to plot the data using facets:
require(ggplot2)
plot = ggplot(d, aes(x,y)) +
facet_grid(~f1, labeller = labeller(.cols=label_both))
Now let's suppose I want to capitalize all columns. It's trivial to do so with the x/y variables:
plot + labs(x="X", y="Y")
But how do I go about capitalizing the facet labels?
The obvious solutions are:
Just change the name of the variable (e.g., d$F1 = d$f1) then rerun the code.
Create a custom labeller that capitalizes the variable names
However, I cannot do either of these in my current application. I cannot change the original ggplot object; I can only layer (e.g., as I do with the x/y axis labels) or I can modify the ggplot object directly.
So, is there a way to change the facet labels by either modifying the ggplot object directly or layering it?
Fortunately, I was able to solve my own problem by creating my MWE. And, rather than keep that knowledge to myself, I figured I'd share it with others (or future me if I forget how to do this).
ggplot objects can be easily dissected using str
In this case, the ggplot object (plot) can be dissected:
str(plot)
Which lists many objects, including one called facet, which can be further dissected:
str(plot$facet)
After some trial and error, I found an object called plot$facet$params$cols. Now, using the following code:
names(plot$facet$params$cols) = "F1"
I get the desired result.

ggplot function like points

Is there any way to add points to a ggplot graph like with the points() function in base graphics? I don't often use ggplot and always prefer base graphics, but this time I must to deal with it. With + geom_point(x = c(1,2,3), y = c(1,2,3)) there is an error:
Error: Aesthetics must be either length 1 or the same as the data (33049): x, y
I'm not quite sure what you're looking for, but you can use the data= argument to geom_point() to override the default behaviour (which is to inherit data from the original ggplot call); as #dc37 points out, x and y need to be specified within a data frame, but you can do this on the fly. You might also need to specify the mapping, if the original x and y variables aren't called x and y ...
+ geom_point(data= data.frame(x = c(1,2,3), y = c(1,2,3)),
mapping = aes(x=x, y=y))
Alternatively (and maybe better):
+ annotate( geom="point", x = 1:3, y = 1:3)
From ?annotate:
This function adds geoms to a plot, but unlike [a typical] geom
function, the properties of the geoms are not mapped from
variables of a data frame, but are instead passed in as vectors.
This is useful for adding small annotations (such as text labels)
or if you have your data in vectors, and for some reason don't
want to put them in a data frame.

How do I loop a ggplot2 functon to export and save about 40 plots?

I am trying to loop a ggplot2 plot with a linear regression line over it. It works when I type the y column name manually, but the loop method I am trying does not work. It is definitely not a dataset issue.
I've tried many solutions from various websites on how to loop a ggplot and the one I've attempted is the simplest I could find that almost does the job.
The code that works is the following:
plots <- ggplot(Everything.any, mapping = aes(x = stock_VWRETD, y = stock_10065)) +
geom_point() +
labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
geom_smooth(method='lm',formula=y~x)
But I do not want to do this another 40 times (and then 5 times more for other reasons). The code that I've found on-line and have tried to modify it for my means is the following:
plotRegression <- function(z,na.rm=TRUE,...){
nm <- colnames(z)
for (i in seq_along(nm)){
plots <- ggplot(z, mapping = aes(x = stock_VWRETD, y = nm[i])) +
geom_point() +
labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
geom_smooth(method='lm',formula=y~x)
ggsave(plots,filename=paste("regression1",nm[i],".png",sep=" "))
}
}
plotRegression(Everything.any)
I expect it to be the nice graph that I'd expect to get, a Stock returns vs Market returns graph, but instead on the y-axis, I get one value which is the name of the respective column, and the Market value plotted as normally, but as if on a straight number-line across the one y-axis value. Please let me know what I am doing wrong.
Desired Plot:
Actual Plot:
Sample Data is available on Google Drive here:
https://drive.google.com/open?id=1Xa1RQQaDm0pGSf3Y-h5ZR0uTWE-NqHtt
The problem is that when you assign variables to aesthetics in aes, you mix bare names and strings. In this example, both X and Y are supposed to be variables in z:
aes(x = stock_VWRETD, y = nm[i])
You refer to stock_VWRETD using a bare name (as required with aes), however for y=, you provide the name as a character vector produced by colnames. See what happens when we replicate this with the iris dataset:
ggplot(iris, aes(Petal.Length, 'Sepal.Length')) + geom_point()
Since aes expects variable names to be given as bare names, it doesn't interpret 'Sepal.Length' as a variable in iris but as a separate vector (consisting of a single character value) which holds the y-values for each point.
What can you do? Here are 2 options that both give the proper plot
1) Use aes_string and change both variable names to character:
ggplot(iris, aes_string('Petal.Length', 'Sepal.Length')) + geom_point()
2) Use square bracket subsetting to manually extract the appropriate variable:
ggplot(iris, aes(Petal.Length, .data[['Sepal.Length']])) + geom_point()
you need to use aes_string instead of aes, and double-quotes around your x variable, and then you can directly use your i variable. You can also simplify your for loop call. Here is an example using iris.
library(ggplot2)
plotRegression <- function(z,na.rm=TRUE,...){
nm <- colnames(z)
for (i in nm){
plots <- ggplot(z, mapping = aes_string(x = "Sepal.Length", y = i)) +
geom_point()+
geom_smooth(method='lm',formula=y~x)
ggsave(plots,filename=paste("regression1_",i,".png",sep=""))
}
}
myiris<-iris
plotRegression(myiris)

Labeling points using qplot in R

I'm having trouble labeling points in R. I've created a qplot that uses four numeric variables I'm plotting as the x and y axes, the color of the points and the size of the points. When I try to add the labels by just including label = player (where player is the column name with the labels I want) R says: "Error: object 'Player' not found." Maybe because this is the only text column? This is probably really simple, but my first plot, so...
qplot(cars$dist, cars$speed) + geom_text(label = cars$dist)
You can append normal ggplot syntax to qplot() exactly the same way you would when calling ggplot().
You need to specify the source of the data you are feeding: you can do so by passing the name of the dataframe to the data argument of a geom() and then referencing a specific column ('Player'), in quotes, in the aes() call within the same geom():
geom_point(data = data, aes(x = 'col1', y = 'col2'))
or you can attach() the data, and then just specify the column (without quotes or the data= parameter):
geom_point(aes(x = col1, y = col2))
Thank you to Marius for pointing out the notion that referencing data through the data parameter may be preferential over $ (data$col) in certain situations like facetting.

Automatically detect correct number of colorRampPalette values needed

Related to this question.
If I create a gradient using colorRampPalette, is there a way to have ggplot2 automatically detect the number of colours it will need from this gradient?
In the example below, I have to specify 3 colours will be needed for the 3 cyl values. This requires me knowing ahead of time that I'll need this many. I'd like to not have to specify it and have ggplot detect the number it will need automatically.
myColRamp <- colorRampPalette(c('#a0e2f2', '#27bce1'))
ggplot(mtcars, aes(x = wt, y = mpg, col = as.factor(cyl))) +
geom_point(size = 3) +
scale_colour_manual(values = myColRamp(3)) # How to avoid having to specify 3?
I'm also open to options that don't use colorRampPalette but achieve the same functionality.
I see two options here. One which requires a little customisation. One which has more code but requires no customisation.
Option 1 - Determine number of unique factors from your specific variable
Simply use the length and unique functions to work out how many factors are in cyl.
values = myColRamp(length(unique(mtcars$cyl))
Option 2 - Build the plot, and see how many colours it used
If you don't want to specify the name of the variable, and want something more general, we can build the plot, and see how many colours ggplot used, then build it again.
To do this, we also have to save our plot as an object, let's call that plot object p.
p <- ggplot(mtcars, aes(x = wt, y = mpg, col = as.factor(cyl))) +
geom_point(size = 3)
#Notice I haven't set the colour option this time
p_built <- ggplot_build(p) #This builds the plot and saves the data based on
#the plot, so x data is called 'x', y is called 'y',
#and importantly in this case, colour is called the
#generic 'colour'.
#Now we can fish out that data and check how many colour levels were used
num_colours <- length(unique(p_built$data[[1]]$colour))
#Now we know how many colours were used, we can add the colour scale to our plot
p <- p + scale_colour_manual(values = myColRamp(num_colours))
Now either just call p or print(p) depending on your use to view it.

Resources