Using ggplot2: Create faceted scatterplot with scaled and moved density - r

I would like to plot some data as a scatter plot using facet_wrap, while superimposing some information such as a linear regression and the density.
I managed to do all that, but the density values are out of proportion with respect to my points, which is a normal thing since these points are far away. Nevertheless, I'd like to scale and move my density curve so that it is clearly visible; I don't care about it's real values but more about its shape.
Here is an exaggerated minimum working example of what I have:
set.seed(48151623)
mydf <- data.frame(x1=rnorm(mean=5,n=100),x2=rnorm(n=100,mean=10),x3=rnorm(n=100,mean=20,sd=3))
mydf$var <- mydf$x1 + mydf$x2 * mydf$x3
mydf.wide <- melt(mydf,id.vars='var',measure.vars=c(1:3))
ggplot(data=mydf.wide,aes(x=value,y=var)) +
geom_point(colour='red') +
geom_smooth(method='lm') +
stat_density(aes(x=value,y=..scaled..),position='identity',geom='line') +
facet_wrap(~variable,scale='free_x')
Which results in:
What I would like resembles to this ugly hack:
stat_density(aes(x=value,y=..scaled..*100+200),position='identity',geom='line')
Ideally, I would use y=..scaled..* diff(range(value)) + min(value) but when I do this I get an error saying that 'value' was not found. I suspect the problem is related to the faceting, but I would prefer to keep my facets.
How can I scale and move the density curve in this case?

I suggest to make two plots and combine them with grid.arrange:
p1 <- ggplot(data=mydf.wide,aes(x=value,y=var)) +
geom_point(colour='red') +
geom_smooth(method='lm') +
facet_wrap(~variable,scale='free_x') +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
plot.margin = unit(c(1, 1, 0, 0.5), "lines"))
p2 <- ggplot(data=mydf.wide,aes(x=value,y=var)) +
stat_density(aes(x=value,y=..scaled..),position='identity',geom='line') +
facet_wrap(~variable,scale='free_x') +
theme(strip.background=element_blank(),
strip.text=element_blank(),
plot.margin = unit(c(-1, 1, 0.5, 0.35), "lines"))
library(gridExtra)
grid.arrange(p1, p2, heights = c(2,1))

I'm not sure if this completely answers your question, but it was too long to put in a comment, so... In response to your second chunk of code in your question, since you've already defined x=value, you can use x instead of value in your definition of y.
stat_density(aes(x=value,y=..scaled..*diff(range(x)) +
min(x)),position='identity',geom='line')
This seems to fix your error and produces the following plot:
The only problem is, of course, if you have data with low y-values, then you're still going to overlap your density curves with your scatterplot. But, if this isn't the case, I personally think this is a fairly informative figure, as long as you can communicate effectively that the y axis values aren't important in interpreting the density curves--only the shapes of the curves are important.

I appreciate the answers of everyone, which led me to better understand ggplot underlying mechanisms. I also realize how awkward my requirement is; ggplot is not going to solve my problem.
I managed to do what I wanted not by using ggplot stat_density but to directly calculate my densities in another data frame:
set.seed(48151623)
mydf <- data.frame(x1=rnorm(mean=5,n=100),x2=rnorm(n=100,mean=10),x3=rnorm(n=100,mean=20,sd=3))
mydf$var <- mydf$x1 + mydf$x2 * mydf$x3
mydf.wide <- melt(mydf,id.vars='var',measure.vars=c(1:3))
mydf.densities <- do.call('rbind',lapply(unique(mydf.wide$variable), function(var) {
tmp <- mydf.wide[which(mydf.wide$variable==var),c('var','value')]
dfit <- density(tmp$value,cut=0)
scaledy <-dfit$y/max(dfit$y) * diff(range(tmp$var)) + min(tmp$var)
data.frame(x=dfit$x,y=scaledy,variable=rep(var,length(dfit$x)))
}))
ggplot(data=mydf.wide,aes(x=value,y=var)) +
geom_point(colour='red') +
geom_smooth(method='lm') +
geom_line(aes(x=x,y=y),data=mydf.densities) +
facet_wrap(~variable,scale='free_x')
(I know that the construction of mydf.densities is a bit obfuscated, but I will work on that later).
I'm giving out the bounty to the most voted solution at the end of the day, for your troubles.

Related

ggplot2: Flip axes and maintain aspect ratio of data

In ggplot2, the coord_fixed() coordinate system ensures that the aspect ratio of the data is maintained at a given value. So, the shape of the panel changes to maintain the shape of the data. Meanwhile coord_flip() swaps the axes of the plot. However, a plot in ggplot2 must have exactly one coordinate system, so these functions cannot be combined.
My question is:
Does there exist a way to combine the behaviours of coord_fixed() and coord_flip(), resulting in a coordinate system with the x and y axes exchanged and a fixed aspect ratio of the data?
This is a popular question, however the common answer is incorrect:
How do I to fix aspect ratio and apply coord_flip in ggplot2?
Flipping and maintaining aspect ratio of a chart in ggplot2
The commonly suggested answer is to use coord_flip() together with theme(aspect.ratio = 1) instead of coord_fixed(). However, as per the ggplot2 documentation, this setting refers to the "aspect ratio of the panel." Thus, the data will change shape to maintain the shape of the panel.
I suspect that this is a feature that does not currently exist in ggplot2. But more importantly I think that a correct solution or at least response to this question should be documented.
Quick minimal example of the issue:
library(ggplot2)
x <- 1:100; data <- data.frame(x = x, y = x * 2)
p <- ggplot(data, aes(x, y)) + geom_point()
p # by default panel and data both fit to device window
p + coord_fixed() # panel changes shape to maintain shape of data
p + theme(aspect.ratio = 1) # data changes shape to maintain shape of panel
p + coord_fixed() + coord_flip() # coord_flip() overwrites coord_fixed()
# popular suggested answer does not maintain aspect ratio of data:
p + coord_flip() + theme(aspect.ratio = 1)
I agree that the theme solution isn't really a proper one. Here is a solution that does work programatically by calculating the aspect from the actual axes ranges stored in the plot object, but it takes a few lines of code:
ranges <- ggplot_build(p)$layout$panel_ranges[[1]][c('x.range', 'y.range')]
sizes <- sapply(ranges, diff)
aspect <- sizes[1] / sizes[2]
p + coord_flip() + theme(aspect.ratio = aspect)
The solution I would probably use in practice, is to use the horizontal geoms in the ggstance package (although this may not always be feasible).
Note: This will only give the exact correct answer for two continuous scales with an equal multiplicative extend argument (i.e. the default).
edit: In many cases I would recommend using coord_equal combined with the ggstance package instead of this solution.
I ended up just flipping the x and y arguments in the aes specification. So for example instead of:
ggplot(mtcars,aes(x=wt,y=drat))+geom_point()+coord_fixed()
I did:
ggplot(mtcars,aes(x=drat,y=wt))+geom_point()+coord_fixed()

R geom_line not plotting as expected

I am using the following code to plot a stacked area graph and I get the expected plot.
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) + #ggplot initial parameters
geom_ribbon(position='fill', aes(ymin=0, ymax=1))
but then when I add lines which are reading the same data source I get misaligned results towards the right side of the graph
P + geom_line(position='fill', aes(group=model, ymax=1))
does anyone know why this may be? Both plots are reading the same data source so I can't figure out what the problem is.
Actually, if all you wanted to do was draw an outline around the areas, then you could do the same using the colour aesthetic.
ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(position='fill', aes(ymin=0, ymax=1), colour = "black")
I have an answer, I hope it works for you, it looks good but very different from your original graph:
library(ggplot2)
DATA2 <- read.csv("C:/Users/corcoranbarriosd/Downloads/porsche model volumes.csv", header = TRUE, stringsAsFactors = FALSE)
In my experience you want to have X as a numeric variable and you have it as a string, if that is not the case I can Change that, but this will transform your bucket into a numeric vector:
bucket.list <- strsplit(unlist(DATA2$bucket), "[^0-9]+")
x=numeric()
for (i in 1:length(bucket.list)) {
x[i] <- bucket.list[[i]][2]
}
DATA2$bucket <- as.numeric(x)
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(aes(ymin=0, ymax=volume))+ geom_line(aes(group=model, ymax=volume))
It gives me the area and the line tracking each other, hope that's what you needed
If you switch to using geom_path in place of geom_line, it all seems to work as expected. I don't think the ordering of geom_line is behaving the same as geom_ribbon (and suspect that geom_line -- like geom_area -- assumes a zero base y value)
ggplot(DATA2, aes(x=bucket, y=volume, ymin=0, ymax=1,
group=model, fill=model, label=volume)) +
geom_ribbon(position='fill') +
geom_path(position='fill')
Should give you

Plot log density of a distribution in ggplot2 [duplicate]

I'm using ggplot as described here
Smoothed density estimates
and entered in the R console
m <- ggplot(movies, aes(x = rating))
m + geom_density()
This works but is there some way to remove the connection between the x-axis and the density plot (the vertical lines which connect the density plot to the x-axis)
The most consistent way to do so is (thanks to #baptiste):
m + stat_density(geom="line")
My original proposal was to use geom_line with an appropriate stat:
m + geom_line(stat="density")
but it is no longer recommended since I'm receiving reports it's not universally working for every case in newer versions of ggplot.
The suggested answers dont provide exactly the same results as geom_density. Why not draw a white line over the baseline?
+ geom_hline(yintercept=0, colour="white", size=1)
This worked for me.
Another way would be to calculate the density separately and then draw it. Something like this:
a <- density(movies$rating)
b <- data.frame(a$x, a$y)
ggplot(b, aes(x=a.x, y=a.y)) + geom_line()
It's not exactly the same, but pretty close.

ggplot2 and geom_density: How to remove baseline?

I'm using ggplot as described here
Smoothed density estimates
and entered in the R console
m <- ggplot(movies, aes(x = rating))
m + geom_density()
This works but is there some way to remove the connection between the x-axis and the density plot (the vertical lines which connect the density plot to the x-axis)
The most consistent way to do so is (thanks to #baptiste):
m + stat_density(geom="line")
My original proposal was to use geom_line with an appropriate stat:
m + geom_line(stat="density")
but it is no longer recommended since I'm receiving reports it's not universally working for every case in newer versions of ggplot.
The suggested answers dont provide exactly the same results as geom_density. Why not draw a white line over the baseline?
+ geom_hline(yintercept=0, colour="white", size=1)
This worked for me.
Another way would be to calculate the density separately and then draw it. Something like this:
a <- density(movies$rating)
b <- data.frame(a$x, a$y)
ggplot(b, aes(x=a.x, y=a.y)) + geom_line()
It's not exactly the same, but pretty close.

Can I change where the x-axis intersects the y-axis in ggplot2?

I'm plotting some index data as a bar chart. I'd like to emphasise the "above index" and "below index"-ness of the numbers by forcing the x-axis to cross at 100 (such that a value of 80 would appear as a -20 bar.)
This is part of a much longer process, so it's hard to share data usefully. Here, though, is some bodge-y code that illustrates the problem (and the beginnings of my solution):
df <- data.frame(c("a","b","c"),c(118,80,65))
names(df) <- c("label","index")
my.plot <- ggplot(df,aes(label,index))
my.plot + geom_bar()
df$adjusted <- as.numeric(lapply(df$index,function(x) x-100))
my.plot2 <- ggplot(df,aes(label,adjusted))
my.plot2 + geom_bar()
I can, of course, change my index calculation to read: (value.new/value.old)*100-100 then title the chart appropriately (something like "xxx relative to index") but this seems clumsy.
So, too, does the approach I've been testing (to run the simple calculation above, then re-label the y-axis.) Is that really the best solution?
No doubt someone's going to tell me that this sort of axis manipulation is frowned upon. If this is the case, please could they point me in the direction of an explanation? At least then I'll have learned something.
This doesn't directly answer you question, but instead of missing about with the x-axis, why not make a single grid line a bit thicker? For example,
dd = data.frame(x = 1:10, y = runif(10))
g = ggplot(dd, aes(x, y)) + geom_point()
g + geom_hline(yintercept=0.2, colour="white", lwd=3)
Or as Paul suggested, with a black line and some text:
g + geom_hline(yintercept=0.2, colour="black", lwd=3) +
annotate("text", x = 2, y = 0.22, label = "Reference")
The coordinate system of you plot has the x-axis and the y-axis crossing at (0,0). This is just the way you define your coordinate system. You can of course draw a horizontal line at (x = 100), but to call this is x-axis is false.
What you already proposed is to redefine your coordinate system by transforming the data. Whether or not this transformation is appropriate is easier to answer with a reproducible example from your side.

Resources