What does the ".." refer to in ggplot's "fill=..density.."? - r

I am working my way through The R Graphics Cookbook and ran into this set of code:
library(gcookbook)
library(ggplot2)
p <- ggplot(faithful, aes(x = eruptions, y = waiting)) +
geom_point() +
stat_density2d(aes(alpha=..density.., fill=..density..), geom="tile", contour=FALSE)
It runs fine, but I don't understand what the .. before and after density is referring to. I can't seem to find it mentioned in the book either.

Variable names beginning with .. are possible in R, and are treated in the same way as any other variable. Trying creating one of your own.
..x.. <- 1:5
ggplot2 often creates appends extra columns to your data frame in order to draw the plot. (In ggplot2 terminology, this is "fortifying the data".) ggplot2 uses the naming convention ..something.. for these fortified columns.
This is partly because using ..something.. is unlikely to clash with existing variables in your dataset. Take that as a hint that you shouldn't name the columns in your dataset using that pattern.
The stat_density* functions use ..density.. to represent the density of the x variable. Other fortified variable names include ..count...

Related

Creating a multi-panel plot of a data set grouped by two grouping variables in R

I'm trying to solve the following exercise:
Make a scatter plot of the relationship between the variables 'K1' and 'K2' with "faceting" based on the parameters 'diam' and 'na' (subdivide the canvas by these two variables). Finally, assign different colors to the points depending on the 'thickness' of the ring (don't forget to factor it before). The graph should be similar to this one ("grosor" stands by "thickness"):
Now, the last code I tried with is the following one (the dataset is called "qerat"):
ggplot(qerat, aes(K1,K2, fill=factor(grosor))) + geom_point() + facet_wrap(vars(diam,na))
¿Could somebody give me a hand pointing out where the mistake is? ¡Many thanks in advance!
Maybe you are looking for a facet_grid() approach. Here the code using a data similar to yours:
library(ggplot2)
#Data
data("diamonds")
#Plot
ggplot(diamonds,aes(x=carat,y=price,color=factor(cut)))+
geom_point()+
facet_grid(color~clarity)
Output:
In the case of your code, as no data is present, I would suggest next changes:
#Code
ggplot(qerat, aes(K1,K2, color=factor(grosor)))+
geom_point() +
facet_grid(diam~na)

Modify labels in facet_grid on existing ggplot2 object

Suppose we have the following dataset:
d = data.frame(
y = rnorm(100),
x = rnorm(100),
f1 = sample(c("A", "B"), size=100, replace=T)
)
And I want to plot the data using facets:
require(ggplot2)
plot = ggplot(d, aes(x,y)) +
facet_grid(~f1, labeller = labeller(.cols=label_both))
Now let's suppose I want to capitalize all columns. It's trivial to do so with the x/y variables:
plot + labs(x="X", y="Y")
But how do I go about capitalizing the facet labels?
The obvious solutions are:
Just change the name of the variable (e.g., d$F1 = d$f1) then rerun the code.
Create a custom labeller that capitalizes the variable names
However, I cannot do either of these in my current application. I cannot change the original ggplot object; I can only layer (e.g., as I do with the x/y axis labels) or I can modify the ggplot object directly.
So, is there a way to change the facet labels by either modifying the ggplot object directly or layering it?
Fortunately, I was able to solve my own problem by creating my MWE. And, rather than keep that knowledge to myself, I figured I'd share it with others (or future me if I forget how to do this).
ggplot objects can be easily dissected using str
In this case, the ggplot object (plot) can be dissected:
str(plot)
Which lists many objects, including one called facet, which can be further dissected:
str(plot$facet)
After some trial and error, I found an object called plot$facet$params$cols. Now, using the following code:
names(plot$facet$params$cols) = "F1"
I get the desired result.

Apply ggplot2 across columns

I am working with a dataframe with many columns and would like to produce certain plots of the data using ggplot2, namely, boxplots, histograms, density plots. I would like to do this by writing a single function that applies across all attributes (columns), producing one boxplot (or histogram etc) and then storing that as a given element of a list into which all the boxplots will be chained, so I could later index it by number (or by column name) in order to return the plot for a given attribute.
The issue I have is that, if I try to apply across columns with something like apply(df,2,boxPlot), I have to define boxPlot as a function that takes just a vector x. And when I do so, the attribute/column name and index are no longer retained. So e.g. in the code for producing a boxplot, like
bp <- ggplot(df, aes(x=Group, y=Attr, fill=Group)) +
geom_boxplot() +
labs(title="Plot of length per dose", x="Group", y =paste(Attr)) +
theme_classic()
the function has no idea how to extract the info necessary for Attr from just vector x (as this is just the column data and doesn't carry the column name or index).
(Note the x-axis is a factor variable called 'Group', which has 6 levels A,B,C,D,E,F, within X.)
Can anyone help with a good way of automating this procedure? (Ideally it should work for all types of ggplots; the problem here seems to simply be how to refer to the attribute name, within the ggplot function, in a way that can be applied / automatically replicated across the columns.) A for-loop would be acceptable, I guess, but if there's a more efficient/better way to do it in R then I'd prefer that!
Edit: something like what would be achieved by the top answer to this question: apply box plots to multiple variables. Except that in that answer, with his code you would still need a for-loop to change the indices on y=y[2] in the ggplot code and get all the boxplots. He's also expanded-grid to include different ````x``` possibilities (I have only one, the Group factor), but it would be easy to simplify down if the looping problem could be handled.
I'd also prefer just base R if possible--dplyr if absolutely necessary.
Here's an example of iterating over all columns of a data frame to produce a list of plots, while retaining the column name in the ggplot axis label
library(tidyverse)
plots <-
imap(select(mtcars, -cyl), ~ {
ggplot(mtcars, aes(x = cyl, y = .x)) +
geom_point() +
ylab(.y)
})
plots$mpg
You can also do this without purrr and dplyr
to_plot <- setdiff(names(mtcars), 'cyl')
plots <-
Map(function(.x, .y) {
ggplot(mtcars, aes(x = cyl, y = .x)) +
geom_point() +
ylab(.y)
}, mtcars[to_plot], to_plot)
plots$mpg

Labeling points using qplot in R

I'm having trouble labeling points in R. I've created a qplot that uses four numeric variables I'm plotting as the x and y axes, the color of the points and the size of the points. When I try to add the labels by just including label = player (where player is the column name with the labels I want) R says: "Error: object 'Player' not found." Maybe because this is the only text column? This is probably really simple, but my first plot, so...
qplot(cars$dist, cars$speed) + geom_text(label = cars$dist)
You can append normal ggplot syntax to qplot() exactly the same way you would when calling ggplot().
You need to specify the source of the data you are feeding: you can do so by passing the name of the dataframe to the data argument of a geom() and then referencing a specific column ('Player'), in quotes, in the aes() call within the same geom():
geom_point(data = data, aes(x = 'col1', y = 'col2'))
or you can attach() the data, and then just specify the column (without quotes or the data= parameter):
geom_point(aes(x = col1, y = col2))
Thank you to Marius for pointing out the notion that referencing data through the data parameter may be preferential over $ (data$col) in certain situations like facetting.

geom_line only connects points on horizontal lines instead all points

I've written something in R using ggplot2 and don't know why it behaves as it does.
If I plot my data using geom_point and geom_line it is supposed to draw lines trough those points. but instead of connecting all the points it only connects those that are on a horizontal line. I don't know how to handle this.
This is a simple version of the code:
date<-c("2014-07-01","2014-07-02","2014-07-03",
"2014-07-04","2014-07-05","2014-07-06",
"2014-07-07")
mbR<- c(160,163,169,169,169,169,169)
mbL<- c(166,166,166,166,NA, NA, NA)
mb<-data.frame(mbR,mbL)
mb<-data.frame(t(as.Date(date)),mb)
colnames(mb)<-c("Datum","R","L")
mb$Datum<-date
plot1<-ggplot(mb,aes(x=mb$Datum,y=mb$R))+
geom_point(data=mb,aes(x=mb$Datum,y=mb$R,color="R",size=2),
group=mb$R,position="dodge")+
geom_line(data=mb,aes(y=mb$R,color="R",group=mb$R))+
geom_point(aes(y=mb$L,color="L",size=2),position="dodge")
plot1
I used group, otherwise I wouldn't have been able to draw any lines, still it doesn't do what I intended.
I hope you guys can help me out a little. :) It may be a minor fault.
First, melt your data to long format and then plot it. The column called variable in the melted data is the category (R or L). The column called value stores the data values for each instance of R and L. We group and color the data by variable in the call to ggplot, which gives us separate lines/points for R and L.
Also, you only need to provide the data frame and column mappings in the initial call to ggplot. They will carry through to geom_point and geom_line. Furthermore, when you provide the column names, you don't need to (and shouldn't) include the name of the data frame, because you've already specified the data frame in the data argument to ggplot.
library(reshape2)
mb.l = melt(mb, id.var="Datum")
ggplot(data=mb.l, aes(x=Datum, y=value, group=variable, color=variable)) +
geom_point(size=2) +
geom_line()

Resources