Contour plot or heatmap from three continuous variables - r

I have a model which has told me there is an interaction between two variables: a and b, which is significantly influencing my response variable: c. All three are continuous numeric variables. For detail c is the rate in change my response variable, b is the rate of change in my predictor and a is mean annual rainfall. The unit of analysis is pixels in a raster. So my model is telling me mean annual rainfall modifies how my predictor affects my response.
To visualise this interaction I would like to use a contour plot/heat map/level plot with a and b on the x and y axes and c providing the colour to show me how my response variable changes within the space described by a and b. I can do this with a scatter plot but its not very pretty or easy to interpret:
qplot(b, a, colour = c) +
scale_colour_gradient(low="green", high="red") +
When I try to plot a contour plot/heat map/level plot though all I get is errors, blank plots or ugly plots.
geom_contour gives me an error:
ggplot(data = Mod, aes(x = Rain, y = Bomas, z = Fire)) +
geom_contour()
Warning message:
Not possible to generate contour data
geom_raster initially gives me Error: cannot allocate vector of size 81567.2 Gb but when I round my data it produces:
ggplot(data = df, aes(x = a, y = b, z = c)) +
geom_raster(aes(fill = c))
Adding interpolate = TRUE to the geom_raster code just makes the lines a little blurry.
geom_tile produces a blank graph but with a scale bar for c:
ggplot(data = df, aes(x = a, y = b, z = c)) +
geom_tile(aes(color = c))
I've also tried using stat_density2d and setting the fill and/or the colour to c, but just got an error, and I've tried using levelplot in the lattice package as well but that produces this:
levelplot(c ~ a * b, data = df,
aspect = "asp", contour = TRUE,
xlab = "a",
ylab = "b")
I suspect the problems I'm encountering are because the functions are not set up to deal with continuous x and y variables, all the examples seem to use factors. I would have thought I could compensate for that by changing bin widths but that doesn't seem to work either. Is there a function that allows you to make a heat map with 3 continuous variables? Or do I need to treat my a and b variables as factors and manually make a dataframe with bins appropriate for my data?
If you want to experiment for yourself then you get similar problems to what I'm having with:
df<- as.data.frame(rnorm(1:1068))
df[,2] <- rnorm(1:1068)
df[,3] <- rnorm(1:1068)
names(df) <- c("a", "b", "c")

You can get automatic bins, and for example calculate the means by using stat_summary_2d:
ggplot(df, aes(a, b, z = c)) +
stat_summary_2d() +
geom_point(shape = 1, col = 'white') +
viridis::scale_fill_viridis()
Another good option is to slice your data by the third variable, and plot small multiples. This doesn't really show very well for random data though:
library(ggplot2)
ggplot(df, aes(a, b)) +
geom_point() +
facet_wrap(~cut_number(c, 4))

Related

control x axis of a violin plot in ggplot2

I'm generating violin plots in ggplot2 for a time series, year_1 to year_32. The years in my df are stored as numerical values. From the examples I've seen, it seems that I must convert these numerical year values to factors to plot one violin per year; and in fact, if I run the code without as.factors, I get one big fat violin. I would like to understand why geom_violin can't have numeric values on the x axis; or if I'm wrong about that, how to use them?
So:
my_data$year <- as.factor(my_data$year)
p <- ggplot(data = my_data, aes(x = year, y = continuous_var)+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label")
p +my_theme()
works fine, but if I skip
my_data$year <- as.factor(my_data$year)
it doesn't work, I get one big fat violin for all years. Why?
TIA
You miss a ) at the end of this line p <- ggplot(data = my_data, aes(x = year, y = continuous_var)
I have construced a reproducible example with the ToothGrowth dataset:
This should work now:
library(ggplot2)
my_data <- ToothGrowth
my_data$dose <- as.factor(my_data$dose)
p <- ggplot(data = my_data, aes(x = dose, y = len))+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label") +
theme_bw()
p
PS: this discussion would better fit Cross Validated, as it's more of an statistics than coding question.
I'm not 100% sure, but here's my explanation: the violin plot shows the density for a set of data, you can divide your data into groups so that you can plot one violin for each part of your data. But if the metric you're using to divide groups (x axis) is a continuous, you're going to have infinite groupings (one group for the values at 0, one for 0.1, one for 0.01, etc.), so in the end you actually can't divide your data, and ggplot probably ignores the x variable and makes one violin for all your data.

ggplot2 stat density only for y values

I'm wondering whether I can manipulate stat_density2d to show the density for the x values without considering the y values.
To illustrate:
df <- data.frame(x = c(1:40, rep(1:20, 3), 15:40))
ggplot(df, aes(x=x, y = x)) +
stat_density2d(aes(fill='red',alpha=..level..),geom='polygon', show.legend = F) +
geom_point(alpha = 0.3)
Obviously I does't really make sense to plot the sames values against each other, however I'm interested in the density of the plots at a certain value. Therefore I would like to keep y constant (e.g y = 1) but still show the same density like so:
(In my publication I actually have multiple groups, making this a nice way to plot the group separation even though it is 1D)

DBSCAN clustering plotting through ggplot2

I am trying to plot the dbscan clustering result through ggplot2. If I understand it correctly the current dbscan plots noise in black colour with base plot function. Some code first,
library(dbscan)
n <- 100
x <- cbind(
x = runif(5, 0, 10) + rnorm(n, sd = 0.2),
y = runif(5, 0, 10) + rnorm(n, sd = 0.2)
)
plot(x)
kNNdistplot(x, k = 5)
abline(h=.25, col = "red", lty=2)
res <- dbscan::dbscan(x, eps = .25, minPts = 4)
plot(res, x, main = "DBSCAN")
x <- data.frame(x)
ggplot(x, aes(x = x, y=y)) + geom_point(color = res$cluster+1, pch = clusym[res$cluster+1])
+ theme_grey() + ggtitle("(c)") + labs(x ="x", y = "y")
I want two things to do differently here, first trying to plot the clustering output through ggplot(). The difficulty is if I use res$cluster to plot points the plot() will ignore points with 0 labels (which are noise points), and ggplots() will though error as length of res$cluster will be smaller than actual data to plot and if I try to use res$cluster+1 it will give 1 to noise points, which I don't want. And secondly if possible try to do something which clusym[] in package fpc does. It plots clusters with labels 1, 2, 3, ... and ignores 0 labels. Thats fine if my labels for noise points are still 0 and then giving any specific symbol say "*" to noise point with a specific colour lets say grey. I have seen a stack overflow post which tries to do similar thing for convex hull plotting but couldn't still figure out how to do this if I don't want to draw the hull and want a clustering number for each cluster.
A possibility which I thought was first plot the points without noise and then additional adding noise points with the desired colour and symbols to the original plot .
But since the res$cluster length is not equal to x it is thronging error.
ggplot(x, aes(x = x, y=y)) + geom_point(color = res$cluster+1, pch = clusym[res$cluster+1])
+ theme_grey() + ggtitle("(c)") + labs(x ="x", y = "y") + adding noise points
Error: Aesthetics must be either length 1 or the same as the data (100): shape, colour
You should first subset the third column from the output of DBSCAN, tack that onto your original data as a new column (i.e. as cluster), and assign that as a factor.
When you make the ggplot, you can assign color or shape to cluster. As for ignoring the noise points, I would do it as follows.
data <- dataframe with the cluster column (still in numeric form).
data2 <- dplyr::filter(data, cluster > 0)
data2$cluster <- as.factor(data2$cluster)
ggplot(data2, aes(x = x, y = y) +
geom_point(aes(color = `cluster`))

how to combine in ggplot line / points with special values?

I'm quite new to ggplot but I like the systematic way how you build your plots. Still, I'm struggeling to achieve desired results. I can replicate plots where you have categorical data. However, for my use I often need to fit a model to certain observations and then highlight them in a combined plot. With the usual plot function I would do:
library(splines)
set.seed(10)
x <- seq(-1,1,0.01)
y <- x^2
s <- interpSpline(x,y)
y <- y+rnorm(length(y),mean=0,sd=0.1)
plot(x,predict(s,x)$y,type="l",col="black",xlab="x",ylab="y")
points(x,y,col="red",pch=4)
points(0,0,col="blue",pch=1)
legend("top",legend=c("True Values","Model values","Special Value"),text.col=c("red","black","blue"),lty=c(NA,1,NA),pch=c(4,NA,1),col=c("red","black","blue"),cex = 0.7)
My biggest problem is how to build the data frame for ggplot which automatically then draws the legend? In this example, how would I translate this into ggplot to get a similar plot? Or is ggplot not made for this kind of plots?
Note this is just a toy example. Usually the model values are derived from a more complex model, just in case you wante to use a stat in ggplot.
The key part here is that you can map colors in aes by giving a string, which will produce a legend. In this case, there is no need to include the special value in the data.frame.
df <- data.frame(x = x, y = y, fit = predict(s, x)$y)
ggplot(df, aes(x, y)) +
geom_line(aes(y = fit, col = 'Model values')) +
geom_point(aes(col = 'True values')) +
geom_point(aes(col = 'Special value'), x = 0, y = 0) +
scale_color_manual(values = c('True values' = "red",
'Special value' = "blue",
'Model values' = "black"))

connecting lines between means of factors in ggplot2

I was trying to create a simple line graph of means and interactions. I have a DV (reading times) on the y-axis, one factor (Length) on the x-axis, and another as a grouping variable (position).
The syntax I used is below. The data plotted as single points on a line for each of the two Length conditions, but did not connect with lines between the two Length conditions. What am I missing in terms of syntax?
I am using R i386 2.15.2, and updated ggplot2 last week.
Here is a reproducible example
SubjectID <- c(101,101,101,101,101,101,101,101,102,102,102,102,102,102,102,102,
201,201,201,201,201,201,201,201,202,202,202,202,202,202,202,202)
Group <- c("PWA","PWA","PWA","PWA","PWA","PWA","PWA","PWA","PWA","PWA","PWA",
"PWA","PWA","PWA","PWA","PWA","Control","Control","Control",
"Control","Control","Control","Control","Control","Control",
"Control","Control","Control","Control","Control","Control",
"Control")
Length <- c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)
Pos <- c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2)
ReadT <- c(6.7,7.6,6.4,7.9,5.4,6.4,6.3,7.4,6.9,7.2,6.7,7.4,5.7,6.1,6.5,7.8,
6.1,5.7,4.9,6.1,4.7,6.5,6.1,6.2,6.9,5.9,4.8,6.5,4.6,6.3,6.7,6.6)
data <- data.frame (SubjectID, Group,Length,Pos,ReadT)
data$Length <- factor(data$Length, order = TRUE,
levels = c(1,2),
labels = c("Length 1", "Length 2"))
data$Pos <- factor(data$Pos, order = TRUE,
levels = c(1,2),
labels = c("Position 1", "Position 2"))
qplot(Length, data=data, ReadT, geom=c("point", "line"),
stat="summary", fun.y=mean, group=Pos, colour=Pos,
facets = ~Group)
I don't think you have reproduced any inconsistency, but your issues in part are clouded by trying condense everything into single qplot call.
Your x variable Length is a factor, therefore ggplot is sensibly considering Length 1 and Length 2 to be independent, and won't connect the lines.
Secondly, you won't be able to use stat_summary to summarize by your x values, without forcing these to be a factor (and hence independant).
I find it easiest to presummarize the data and not rely on ggplot.
eg
library(plyr)
data.means <- ddply(data, .(Group, Pos, Length), summarize, ReadT = mean(ReadT))
Then construct the plot using ggplot not qplot, to give you the flexibility (and transparency) required.
The trick to get the lines connected is to consider x numeric within the call to geom_line see here for example
ggplot(data.means, aes(x= Length, y= ReadT, colour = Pos)) +
geom_point() +
geom_line(aes(x=as.numeric(Length))) +
facet_grid(~Group)
If you insisted on using the raw data, and stat_xxxx functions, you could also replicate this using stat_smooth to estimate the means (which would keep x classified as numeric)
ggplot(data, aes(x = Length, y= ReadT, colour = Pos)) +
stat_summary(fun.y = 'mean', geom = 'point')+
stat_smooth(method = 'lm', aes(x=as.numeric(Length)), se = FALSE) +
facet_grid(~Group)

Resources