I've seen some examples when constructing a heatmap of having the fill variable set to ..level...
Such as in this example:
library(MASS)
ggplot(geyser, aes(x = duration, y = waiting)) +
geom_point() +
geom_density2d() +
stat_density2d(aes(fill = ..level..), geom = "polygon")
I suspect that the ..level.. means that the fill is set to the relative amount of layers present? Also could someone link me a good example of how to interpret these 2D-density plots, what does each contour represent etc.? I have searched online but couldn't find any suitable guide.
the stat_ functions compute new values and create new data frames. this one creates a data frame with a level variable. you can see it if you use ggplot_build vs plotting the graph:
library(ggplot2)
library(MASS)
gg <- ggplot(geyser, aes(x = duration, y = waiting)) +
geom_point() +
geom_density2d() +
stat_density2d(aes(fill = ..level..), geom = "polygon")
gb <- ggplot_build(gg)
head(gb$data[[3]])
## fill level x y piece group PANEL
## 1 #132B43 0.002 3.876502 43.00000 1 1-001 1
## 2 #132B43 0.002 3.864478 43.09492 1 1-001 1
## 3 #132B43 0.002 3.817845 43.50833 1 1-001 1
## 4 #132B43 0.002 3.802885 43.65657 1 1-001 1
## 5 #132B43 0.002 3.771212 43.97583 1 1-001 1
## 6 #132B43 0.002 3.741335 44.31313 1 1-001 1
The ..level.. tells ggplot to reference that column in the newly build data frame.
Under the hood, ggplot is doing something similar to (this is not a replication of it 100% as it uses different plot limits, etc):
n <- 100
h <- c(bandwidth.nrd(geyser$duration), bandwidth.nrd(geyser$waiting))
dens <- kde2d(geyser$duration, geyser$waiting, n=n, h=h)
df <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z))
head(df)
## x y z
## 1 0.8333333 43 9.068691e-13
## 2 0.8799663 43 1.287684e-12
## 3 0.9265993 43 1.802768e-12
## 4 0.9732323 43 2.488479e-12
## 5 1.0198653 43 3.386816e-12
## 6 1.0664983 43 4.544811e-12
And also calling contourLines to get the polygons.
This is a decent introduction to the topic. Also look at ?kde2d in R help.
Expanding on the answer provided by #hrbrmstr -- first, the call to geom_density2d() is redundant. That is, you can achieve the same results with:
library(ggplot2)
library(MASS)
gg <- ggplot(geyser, aes(x = duration, y = waiting)) +
geom_point() +
stat_density2d(aes(fill = ..level..), geom = "polygon")
Let's consider some other ways to visualize this density estimate that may help clarify what is going on:
base_plot <- ggplot(geyser, aes(x = duration, y = waiting)) +
geom_point()
base_plot +
stat_density2d(aes(color = ..level..))
base_plot +
stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE)
base_plot +
stat_density2d(aes(alpha = ..density..), geom = "tile", contour = FALSE)
Notice, however, we can no longer see the points generated from geom_point().
Finally, note that you can control the bandwidth of the density estimate. To do this, we pass x and y bandwidth arguments to h (see ?kde2d):
base_plot +
stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE,
h = c(2, 5))
Again, the points from geom_point() are hidden as they are behind the call to stat_density2d().
Related
I have data with around 25,000 rows myData with column attr having values from 0 -> 45,600. I am not sure how to make a simplified or reproducible data...
Anyway, I am plotting the density of attr like below, and I also find the attr value where density is maximum:
library(ggplot)
max <- which.max(density(myData$attr)$y)
density(myData$attr)$x[max]
ggplot(myData, aes(x=attr))+
geom_density(color="darkblue", fill="lightblue")+
geom_vline(xintercept = density(myData$attr)$x[max])+
xlab("attr")
Here is the plot I have got with the x-intercept at maximum point:
Since the data is skewed, I then attempted to draw x-axis in log scale by adding scale_x_log10() to the ggplot, here is the new graph:
My questions now are:
1. Why does it have 2 maximum points now? Why is my x-intercept no longer at the maximum point(s)?
2. How do I find the intercepts for the 2 new maximum points?
Finally, I attempt to convert the y-axis to count instead:
ggplot(myData, aes(x=attr)) +
stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3)+
xlab("attr")+
scale_x_log10()
I got the following plot:
3. How do I find the count of the 2 peaks?
Why the density shapes are different
To put my comments into a fuller context, ggplot is taking the log before doing the density estimation, which is causing the difference in shape because the binning covers different parts of the domain. For example,
(bins <- seq(1, 10, length.out = 10))
#> [1] 1 2 3 4 5 6 7 8 9 10
(bins_log <- 10^seq(log10(1), log10(10), length.out = 10))
#> [1] 1.000000 1.291550 1.668101 2.154435 2.782559 3.593814 4.641589
#> [8] 5.994843 7.742637 10.000000
library(ggplot2)
ggplot(data.frame(x = c(bins, bins_log),
trans = rep(c('identity', 'log10'), each = 10)),
aes(x, y = trans, col = trans)) +
geom_point()
This binning can affect the resulting density shape. For example, compare an untransformed density:
d <- density(mtcars$disp)
plot(d)
to one which is logged beforehand:
d_log <- density(log10(mtcars$disp))
plot(d_log)
Note that the height of the modes flips! I believe what you are asking for is the first one, but with the log transformation applied after the density, i.e.
d_x_log <- d
d_x_log$x <- log10(d_x_log$x)
plot(d_x_log)
Here the modes are similar, just compressed.
Moving to ggplot
When moving to ggplot, to do the density estimation before the log transformation it's easiest to do it outside of ggplot beforehand:
library(ggplot2)
d <- density(mtcars$disp)
ggplot(data.frame(x = d$x, y = d$y), aes(x, y)) +
geom_density(stat = "identity", fill = 'burlywood', alpha = 0.3) +
scale_x_log10()
Finding modes
Finding modes when there's a single one is relatively easy; it's just d$x[which.max(d$x)]. But when you have multiple modes, that's not good enough, since it will only show you the highest one. A solution is to effectively take the derivative and look for where the slope changes from positive to negative. We can do this numerically with diff, and since we only care about whether the result is positive or negative, call sign on that to turn everything into -1 and 1.* If we call diff on that, everything will be 0 except the maximums and minimums, which will be -2 and 2, respectively. We can then look for which values are less than 0, which we can use to subset. (Because diff does not insert NAs on the end, you'll have to add one to the indices.) Altogether, designed to work on a density object,
d <- density(mtcars$disp)
modes <- function(d){
i <- which(diff(sign(diff(d$y))) < 0) + 1
data.frame(x = d$x[i], y = d$y[i])
}
modes(d)
#> x y
#> 1 128.3295 0.003100294
#> 2 305.3759 0.002204658
d$x[which.max(d$y)] # double-check
#> [1] 128.3295
We can add them to our plot, and they'll get transformed nicely:
ggplot(data.frame(x = d$x, y = d$y), aes(x, y)) +
geom_density(stat = "identity", fill = 'mistyrose', alpha = 0.3) +
geom_vline(xintercept = modes(d)$x) +
scale_x_log10()
Plotting counts instead of density
To turn the y-axis into counts instead of density, multiply y by the number of observations, which is stored in the density object as n:
ggplot(data.frame(x = d$x, y = d$y * d$n), aes(x, y)) +
geom_density(stat = "identity", fill = 'thistle', alpha = 0.3) +
geom_vline(xintercept = modes(d)$x) +
scale_x_log10()
In this case it looks a little silly because there are only 32 observations spread over a wide domain, but with a larger n and smaller domain, it is more interpretable:
d <- density(diamonds$carat, n = 2048)
ggplot(data.frame(x = d$x, y = d$y * d$n), aes(x, y)) +
geom_density(stat = "identity", fill = 'papayawhip', alpha = 0.3) +
geom_point(data = modes(d), aes(y = y * d$n)) +
scale_x_log10()
* Or 0 if the value is exactly 0, but that's unlikely here and will work fine regardless.
I'm working with ggplot2, stacked barplot to 100% with relative values, using the position = "fill" option in geom_bar().
Here my code:
test <- data.frame (x = c('a','a','a','b','b','b','b')
,k = c('k','j','j','j','j','k','k')
,y = c(1,3,4,2,5,9,7))
plot <- ggplot(test, aes(x =x, y = y, fill = k))
plot <- plot + geom_bar(position = "fill",stat = "identity")
plot <- plot + scale_fill_manual(values = c("#99ccff", "#ff6666"))
plot <- plot + geom_hline(yintercept = 0.50)+ggtitle("test")
plot
Here the result:
However, I need to add the labels on the various bars, also on the "sub bars". To do this, I worked with the geom_text():
plot + geom_text(aes(label=y, size=4))
But the result is not good. I tried without luck the hjust and vjust parameters, and also using something like:
plot + geom_text(aes(label=y/sum(y), size=4))
But I did not reach the result needed (I'm not adding all the tests to not overload the question with useless images, if needed, please ask!).
Any idea about to have some nice centered labels?
label specifies what to show, and y specifies where to show. Since you are using proportions for y-axis with position = "fill", you need to calculate the label positions (geom_text(aes(y = ...))) in terms of proportions for each x using cumulative sums. Additionally, to display only the total proportion of a given color, you will need to extract the Nth row for each x, k combination. Here, I am building a separate test_labels dataset for use in geom_text to display the custom labels:
test <- data.frame (x = c('a','a','a','b','b','b','b'),
k = c('k','j','j','j','j','k','k'),
y = c(1,3,4,2,5,9,7))
test_labels = test %>%
arrange(x, desc(k)) %>%
group_by(x) %>%
mutate(ylabel_pos = cumsum(y)/sum(y),
ylabel = y/sum(y)) %>%
group_by(k, add = TRUE) %>%
mutate(ylabel = sum(ylabel)) %>%
slice(n())
ggplot(test, aes(x =x, y = y, fill = k)) +
geom_bar(position = "fill", stat = "identity") +
scale_fill_manual(values = c("#99ccff", "#ff6666")) +
geom_hline(yintercept = 0.50) +
geom_text(data = test_labels,
aes(y = ylabel_pos, label=paste(round(ylabel*100,1),"%")),
vjust=1.6, color="white", size=3.5) +
ggtitle("test")
Result:
> test_labels
# A tibble: 4 x 5
# Groups: x, k [4]
x k y ylabel_pos ylabel
<fctr> <fctr> <dbl> <dbl> <dbl>
1 a j 4 1.0000000 0.8750000
2 a k 1 0.1250000 0.1250000
3 b j 5 1.0000000 0.3043478
4 b k 7 0.6956522 0.6956522
this must be a FAQ, but I can't find an exactly similar example in the other answers (feel free to close this if you can point a similar Q&A). I'm still a newbie with ggplot2 and can't seem to wrap my head around it quite so easily.
I have 2 data.frames (that come from separate mixed models) and I'm trying to plot them both into the same graph. The data.frames are:
newdat
id Type pred SE
1 1 15.11285 0.6966029
2 1 13.68750 0.9756909
3 1 13.87565 0.6140860
4 1 14.61304 0.6187750
5 1 16.33315 0.6140860
6 1 16.19740 0.6140860
1 2 14.88805 0.6966029
2 2 13.46270 0.9756909
3 2 13.65085 0.6140860
4 2 14.38824 0.6187750
5 2 16.10835 0.6140860
6 2 15.97260 0.6140860
and
newdat2
id pred SE
1 14.98300 0.6960460
2 13.25893 0.9872502
3 13.67650 0.6150701
4 14.39590 0.6178266
5 16.37662 0.6171588
6 16.08426 0.6152017
As you can see, the second data.frame doesn't have Type, whereas the first does, and therefore has 2 values for each id.
What I can do with ggplot, is plot either one, like this:
fig1
fig2
As you can see, in fig 1 ids are stacked by Type on the x-axis to form two groups of 6 ids. However, in fig 2 there is no Type, but instead just the 6 ids.
What I would like to accomplish is to plot fig2 to the left/right of fig1 with similar grouping. So the resulting plot would look like fig 1 but with 3 groups of 6 ids.
The problem is also, that I need to label and organize the resulting figure so that for newdat the x-axis would include a label for "model1" and for newdat2 a label for "model2", or some similar indicator that they are from different models. And to make things even worse, I need some labels for Type in newdat.
My (hopefully) reproducible (but obviously very bad) code for fig 1:
library(ggplot2)
pd <- position_dodge(width=0.6)
ggplot(newdat,aes(x=Type,y=newdat$pred,colour=id))+
geom_point(position=pd, size=5)
geom_linerange(aes(ymin=newdat$pred-1.96*SE,ymax=newdat$pred+1.96*SE), position=pd, size=1.5, linetype=1) +
theme_bw() +
scale_colour_grey(start = 0, end = .8, name="id") +
coord_cartesian(ylim=c(11, 18)) +
scale_y_continuous(breaks=seq(10, 20, 1)) +
scale_x_discrete(name="Type", limits=c("1","2"))
Code for fig 2 is identical, but without the limits in the last line and with id defined for x-axis in ggplot(aes())
As I understand it, defining stuff at ggplot() makes that stuff "standard" along the whole graph, and I've tried to remove the common stuff and separately define geom_point and geom_linerange for both newdat and newdat2, but no luck so far... Any help is much appreciated, as I'm completely stuck.
How about adding first adding some new variables to each dataset and then combining them:
newdat$model <- "model1"
newdat2$model <- "model2"
newdat2$Type <- 3
df <- rbind(newdat, newdat2)
# head(df)
Then we can plot with:
library(ggplot2)
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5, linetype = 1)
Alternatively, you pass an additional aesthetic to geom_linerange to further delineate the model type:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE, linetype = model),
position = position_dodge(width = 0.6),
size = 1.5)
Finally, you may want to considered facets:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5) +
facet_wrap(~ id)
I'd like to plot two graphs ontop of each other like in this post.
Experimental data: I have continuous variable displaying the angle of wind on a given day in a list called expt$iso_xs[,8], I then have the wind speed corresponding to that angle in expt$iso_xs[,2].
df<-data.frame(expt$iso.xs)
head(expt$iso.xs)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
736105.4 16.62729 2.183740 7.234774 0.9791632 4.01 4.20 238.62
736105.4 18.96705 2.489668 7.036234 0.9640366 3.82 4.00 243.14
736105.5 20.52089 2.687636 10.355394 1.3698454 4.99 5.14 247.02
736105.5 19.94449 2.611556 10.306912 1.3655301 4.85 5.12 249.57
736105.5 19.43309 2.551787 11.098302 1.4646251 4.83 5.12 243.89
736105.5 20.48259 2.689075 11.928011 1.5710530 4.89 5.09 254.23
Which looks like this:
Simulation data: I have a data.frame z that contains predictions for a subset of the above angles (0-90º).
head(z,15)
Tracer angle treatment bigangle
71.101 0 S 150
71.101 0 S 150
71.105 15 S 165
71.105 15 S 165
71.098 30 S 180
71.098 45 S 195
71.114 60 S 210
71.114 80 S 230
71.110 90 S 240
Plotting it using bigangle as factor and Tracer as :
ggplot() +
geom_boxplot(data=z, aes(y = (3600/Tracer/93.241), x = factor(bigangle)),outlier.shape = NA,outlier.colour = NA)+
coord_cartesian(ylim=c(0, 1))+
labs(x = "Angle", y = "Normalised ACh" )+
scale_x_discrete(labels=seq(0,360,10))+
theme_classic()
looks like this:
I'd like to superimpose the boxplot ontop of the portion of red points (between 150º and 240º) but the following doesn't work:
ggplot() +
geom_boxplot(data=z, aes(y = (3600/Tracer/93.241), x = factor(bigangle)),outlier.shape = NA,outlier.colour = NA)+
geom_point(data=df, aes(y = X2/45, x = X8),color="red")+
coord_cartesian(ylim=c(0, 1))+
labs(x = "Angle", y = "Normalised ACh" )+
scale_x_discrete(labels=seq(0,360,10))+
theme_classic()
Any thoughts would be much appreciated,
Cheers
I think your only problem is trying to specify a discrete x scale for continuous data. That and you need a group for your boxplot geom.
As an illustrative example:
mt = mtcars
mt$wt_bin = cut(mt$wt, breaks = c(1, 3, 4.5, 6))
ggplot(mt, aes(x = wt, y = mpg)) +
geom_point() +
geom_boxplot(aes(group = wt_bin, x = wt), alpha = 0.4)
As the geom_boxplot help says:
You can also use boxplots with continuous x, as long as you supply
a grouping variable. cut_width is particularly useful
The example in the help shows this code:
ggplot(diamonds, aes(carat, price)) +
geom_boxplot(aes(group = cut_width(carat, 0.25)))
You can, of course, add a geom_point layer (though in the diamonds data there are too many points for that to be a nice plot).
For your scale, don't use a discrete scale unless you have factors on the axis. You probably want scale_x_continuous(breaks = seq(0, 360, 10)).
Different data sets can be used in the usual way, with the data argument. Continuing the previous example but using different data for the geom_point layer:
similar_to_mt = data.frame(wt = runif(100, 1, 6), mpg = rnorm(100, 20, 4))
ggplot(mt, aes(x = wt, y = mpg)) +
geom_point(data = similar_to_mt) +
geom_boxplot(data = mt, aes(group = wt_bin, x = wt), alpha = 0.4)
I'm using ggplot2 to create a simple dot plot of -1 to +1 correlation values using the following R code:
ggplot(dataframe, aes(x = exit)) +
geom_point(aes(y= row.names(dataframe))) +
geom_text(aes(y=exit, label=samplesize))
The y-axis has text labels, and I believe those text labels may be the reason that my geom_text() data point labels are squished down into the bottom of the plot as pictured here:
How can I change my plotting so that the data point labels appear on the dots themselves?
I understand that you would like to have the samplesize appear above each data point in the plot. Here is a sample plot with a sample data frame that does this:
EDIT: Per note by Gregor, changed the geom_text() call to utilize aes() when referencing the data. Thanks for the heads up!
top10_rank<-
String Number
4 h 0
1 a 1
11 w 1
3 z 3
7 z 3
2 b 4
8 q 5
6 k 6
9 r 9
5 x 10
10 l 11
x<-ggplot(data=top10_rank, aes(x = Number,
y = String)) + geom_point(size=3) + scale_y_discrete(limits=top10_rank$String)
x + geom_text(data=top10_rank, size=5, color = 'blue',
aes(x = Number,label = Number), hjust=0, vjust=0)
Not sure if this is what you wanted though.
Your problem is simply that you switched the y variables:
# your code
ggplot(dataframe, aes(x = exit)) +
geom_point(aes(y = row.names(dataframe))) + # here y is the row names
geom_text(aes(y =exit, label = samplesize)) # here y is the exit column
Since you want the same y-values for both you can define this in the initial ggplot() call and not worry about repeating it later
# working version
ggplot(dataframe, aes(x = exit, y = row.names(dataframe))) +
geom_point() +
geom_text(aes(label = samplesize))
Using row names is a little fragile, it's a little safer and more robust to actually create a data column with what you want for y values:
# nicer code
dataframe$y = row.names(dataframe)
ggplot(dataframe, aes(x = exit, y = y)) +
geom_point() +
geom_text(aes(label = samplesize))
Having done this, you probably don't want the labels right on top of the points, maybe a little offset would be better:
# best of all?
ggplot(dataframe, aes(x = exit, y = y)) +
geom_point() +
geom_text(aes(x = exit + .05, label = samplesize), vjust = 0)
In the last case, you'll have to play with the adjustment to the x aesthetic, what looks right will depend on the dimensions of your final plot