Overlay density of one variable on trend line

Overlay density of one variable on trend line - r

I have some data:
dat <- data.frame(x=rnorm(100,100,100),y=rnorm(100,100,100))
I can plot it with a local trend line:
ggplot(dat, aes(x,y)) + stat_smooth()
But I want to overlay a density curve, on the same plot, showing the distribution of x. So just add the previous graph to this one (the y-axis is different, but I only care about relative differences in the density curve anyway):
ggplot(dat, aes(x)) + geom_density()
I know there's stat_binhex() and stat_sum() etc showing where the data falls. There are only a few y values, so what gets plotted by stat_binhex() etc is hard to read.

You can plot a combination of histograms and density curves at both sides of the scatterplot. In the example below I also included a confidence ellipse:
require(ggplot2)
require(gridExtra)
require(devtools)
source_url("https://raw.github.com/low-decarie/FAAV/master/r/stat-ellipse.R") # in order to create a 95% confidence ellipse
htop <- ggplot(data=dat, aes(x=x)) +
geom_histogram(aes(y=..density..), fill = "white", color = "black", binwidth = 2) +
stat_density(colour = "blue", geom="line", size = 1.5, position="identity", show_guide=FALSE) +
scale_x_continuous("x-var", limits = c(-200,400), breaks = c(-200,0,200,400)) +
scale_y_continuous("Density", breaks=c(0.0,0.01,0.02), labels=c(0.0,0.01,0.02)) +
theme_bw() + theme(axis.title.x = element_blank())
blank <- ggplot() + geom_point(aes(1,1), colour="white") +
theme(axis.ticks=element_blank(), panel.background=element_blank(), panel.grid=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot(data=dat, aes(x=x, y=y)) +
geom_point(size = 0.6) + stat_ellipse(level = 0.95, size = 1, color="green") +
scale_x_continuous("x-var", limits = c(-200,400), breaks = c(-200,0,200,400)) +
scale_y_continuous("y-var", limits = c(-200,400), breaks = c(-200,0,200,400)) +
theme_bw()
hright <- ggplot(data=dat, aes(x=y)) +
geom_histogram(aes(y=..density..), fill = "white", color = "black", binwidth = 1) +
stat_density(colour = "red", geom="line", size = 1, position="identity", show_guide=FALSE) +
scale_x_continuous("y-var", limits = c(-200,400), breaks = c(-200,0,200,400)) +
scale_y_continuous("Density", breaks=c(0.0,0.01,0.02), labels=c(0.0,0.01,0.02)) +
coord_flip() + theme_bw() + theme(axis.title.y = element_blank())
grid.arrange(htop, blank, scatter, hright, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
the result:

Related

In R, how can I draw a negative (1:1 ratio) line in graph?

Here is a data
x<- c(10,25,35,45,55)
y<- c(30,50,25,17,17)
dataA<- data.frame (x,y)
ggplot(data=dataA, aes(x=x, y=y)) +
geom_point (col="Black", size=4) +
scale_x_continuous(breaks= seq(0,70,10), limits = c(0,70)) +
scale_y_continuous(breaks= seq(0,70,10), limits = c(0,70)) +
labs(x="x", y="y") +
theme(axis.line = element_line(size = 0.5, colour = "black")) +
windows(width=5.5, height=5)
I want to draw a 1:1 ratio line (not a regression). So if I add geom_abline()
ggplot(data=dataA, aes(x=x, y=y)) +
geom_point (col="Black", size=4) +
scale_x_continuous(breaks= seq(0,70,10), limits = c(0,70)) +
scale_y_continuous(breaks= seq(0,70,10), limits = c(0,70)) +
geom_abline (slope=-1, linetype = "dashed", color="Red") +
labs(x="x", y="y") +
theme(axis.line = element_line(size = 0.5, colour = "black")) +
windows(width=5.5, height=5)
The line is not a full 1:1 ratio. If I add geom_abline (slope=1, linetype = "dashed", color="Red"),
it works well.
How do I draw a negative 1:1 ratio line?
Thanks,

The -1 slope line you're drawing will intersect the origin [0,0]. I'm guessing you want to move the line so more of it shows. You can do x = 'x' , y = 'y' + 70 in the plot coordinates. Replace 70 with max(y) for a dynamic plot?

How to add a second legend using different `geom_line`?

I am plotting the relationship between two variables (X and Y) for different individuals (IDs). This relationship is shown both with the real values (geom_point) and with lines which represent the prediction of the relationship between the variables for different individuals Linear Mixed Effect Models (LME). On top of that, the linear relationship between the two variables and for the different individuals is done using three levels of a second quantitative predictor (Z).
Thus, what I do is to use geom_point() for showing the relationship between raw values of X and Y. Then, I use three geom_line() for three LME with different levels of Z. Thus, each geom_line() draws the six lines for the six IDs for a fixed Z. So, since I have 3 Z levels and I have 3 geom_line(), I have 18 lines.
I tried this (note: code is simplified):
Plot_legend <- ggplot(df, aes(x=X, y=Y, colour=ID)) +
geom_point(size=1.5,alpha=0.2) +
geom_line(aes(y=predict(model,df.Z_low), group=ID, linetype = c("1")), size=1.5, alpha=0.6, color = line_colors[3]) +
geom_line(aes(y=predict(model,df.Z_medium), group=ID, linetype = c("2")), size=1.5, alpha=0.6, color = line_colors[2]) +
geom_line(aes(y=predict(model,df.Z_high), group=ID, linetype = c("3")), size=1.5, alpha=0.6, color = line_colors[1]) +
geom_abline(aes(slope=1,intercept=0),linetype="dashed",color="grey52",size=1.5) +
theme_bw() +
theme(legend.text=element_text(size=18),
legend.title = element_text(size=19, face = "bold",hjust = 0.5),
legend.key=element_blank(),
legend.background = element_rect(colour = 'black', fill = 'white', size = 1, linetype='solid')) +
guides(color=guide_legend(override.aes=list(fill=NA)))
However, as you can see, the legend for the three geom_line() is not what I desire. I would like to appear as title Z instead of c("10th"). Also, the colours of the legend for the three geom_line() do not correspond with the true colours for the different geom_line(), and some lines are dashed.
Does anyone know how to solve this?
Plot using Duck's advice

Try this approach. As no data was shared I can test it but it can address in right path:
library(ggplot2)
#Code
Plot_legend <- ggplot(df, aes(x=X, y=Y, colour=ID)) +
geom_point(size=1.5,alpha=0.2) +
geom_line(aes(y=predict(model,df.Z_low), group=ID, linetype = c("1")),
size=1.5, alpha=0.6, color = line_colors[3]) +
geom_line(aes(y=predict(model,df.Z_medium), group=ID, linetype = c("2")),
size=1.5, alpha=0.6, color = line_colors[2]) +
geom_line(aes(y=predict(model,df.Z_high), group=ID, linetype = c("3")),
size=1.5, alpha=0.6, color = line_colors[1]) +
geom_abline(aes(slope=1,intercept=0),linetype="dashed",color="grey52",size=1.5) +
theme_bw() +
scale_linetype_manual(values=c('solid','solid','solid'))+
scale_color_manual(values=c(line_colors[3],line_colors[2],line_colors[1]))+
labs(linetype='Z')
theme(legend.text=element_text(size=18),
legend.title = element_text(size=19, face = "bold",hjust = 0.5),
legend.key=element_blank(),
legend.background = element_rect(colour = 'black', fill = 'white', size = 1, linetype='solid')) +
guides(color=guide_legend(override.aes=list(fill=NA)))

I used next code finally:
Plot_legend <- ggplot(df, aes(x=X, y=Y, colour=ID)) +
geom_point(size=1.5,alpha=0.2) +
geom_abline(aes(slope=1,intercept=0),linetype="dashed",color="grey52",size=1.5) +
theme_bw() +
theme(legend.text=element_text(size=18),
legend.title = element_text(size=19, face = "bold",hjust = 0.5),
legend.key=element_blank(),
legend.background = element_rect(colour = 'black', fill = 'white', size = 1, linetype='solid')) +
guides(color=guide_legend(override.aes=list(fill=NA)))
Plot_legend
Plot_legend_2 <- Plot_legend +
geom_line(aes(y=predict(model,df.Z_low), group=ID, linetype = "m1"), size=1.5, alpha=0.6, color = line_colors[3]) +
geom_line(aes(y=predict(model,df.Z_medium), group=ID, linetype ="m2"), size=1.5, alpha=0.6, color = line_colors[2]) +
geom_line(aes(y=predict(model,df.Z_high), group=ID, linetype ="m3"), size=1.5, alpha=0.6, color = line_colors[1]) +
scale_linetype_manual(values = c(m1 = "solid", m2 = "solid", m3 = "solid"),labels = c(m1 = "1", m2 = "2", m3 = "3")) +
labs(color = "ID", linetype = expression(Z)) +
guides(linetype = guide_legend(override.aes = list(color = line_colors)))
Plot_legend_2

Broken confidence interval areas when using ylim in ggplot2

I've been using ggplot2 for long, but never experienced this issue. I am representing confidence intervals of some regressions. However, I decided to manually control the ylim(). I realized that those areas which exceed the y limits are broken. See this picture:
The red regression on the right contains a very wide CLs. As you can see there is a gap in there as its highest point is outside ylim range.
This is the code I used:
ggplot(dataset, aes(x=variable, y=value, fill=Species, colour=Species, linetype = Species)) +
geom_smooth(method="lm", formula= y~poly(x,3), level=0.95, alpha=0.2) +
xlab("A") +
ylab("B") +
ylim(0, 30) +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5, size = 10),
panel.background = element_blank(),
legend.position='bottom',
panel.grid.major = element_line(colour="azure2"),
axis.line = element_line(colour = "black",
size = 0.15, linetype = "solid")) +
scale_x_continuous(breaks=seq(1, 10, 1), limits=c(1, 10)) +
scale_color_manual(values=c("coral4", "coral1", "darkolivegreen3", "darkgoldenrod4", "darkgoldenrod2", "deepskyblue3", "darkorchid3")) +
scale_fill_manual(values=c("coral4", "coral1", "darkolivegreen3", "darkgoldenrod4", "darkgoldenrod2", "deepskyblue3", "darkorchid3")) +
scale_linetype_manual(values=c(1,1,1,3,3,2,2))
I would like to keep these y limits. I used coord_cartesian with no success. Can anybody help me?

coord_cartesian should work, but you have to remove the ylim()
Some data
set.seed(1)
df <- data_frame(x = -5:5, y = rnorm(11, x^2, 5))
Replicating your problem
ggplot(df, aes(x, y)) +
geom_smooth() +
ylim(-1, NA)
With coord_cartesian
ggplot(df, aes(x, y)) +
geom_smooth() +
coord_cartesian(ylim = c(-1, 40))

Displaying a subset of features in a facet of a multi-layer plot

I'm trying to generate a multi-layered plot where the points in one layer gets displayed only in a fraction of the facets created using data from another layer. In the code below, the points in red are either x1 or x2 (just like the row labels of the facet).
library(ggplot2)
set.seed(1000)
#generate first df
df1 = data.frame(x=rep(rep(seq(2,8,2),4),4),
y=rep(rep(seq(2,8,2),each=4),4),
v1=rep(c("x1","x2"),each=32),
v2=rep(rep(c("t1","t2"),each=16),2),
v3=rbinom(64,1,0.5))
# generate second df
df2 = data.frame(x=runif(20)*10,
y=runif(20)*10,
v4=sample(c("x1","x2"),20,T))
# create theme
t1=theme(panel.grid.major = element_blank(), text = element_text(size=18),
panel.grid.minor = element_blank(), strip.background= element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank())
# plot
ggplot() +
geom_point(data=df1, aes(x=x, y=y, colour = factor(v3)), shape=15, size=5) +
scale_colour_manual(values = c(NA,"black")) + facet_grid(v1~v2) +
geom_point(data=df2, aes(x=x,y=y, shape=v4), colour="red", size=4) +
coord_equal(ratio=1) + xlim(0, 10) + ylim(0, 10) + t1
EDIT: The black squares are generated by manually setting the colour of df1$v3 = 1 to black and df1$v3 = 0 to NA. /EDIT
But what I actually want is to display only those points from df2 with df2$v4 = x1 in the first row of facets, and df2$v4 = x2 in the second row of facets (corresponding to the values of df1$v1 and the row labels of the facet).
I've done this by generating two separate graphs...
ggplot() +
geom_point(data=df1[df1$v1=="x1",], shape=15, size=5,
aes(x=x, y=y, colour = factor(v3)), ) +
scale_colour_manual(values = c(NA,"black")) + facet_grid(~v2) +
geom_point(data=df2[df2$v4=="x1",], aes(x=x,y=y), colour="red", size=4) +
coord_equal(ratio=1) + xlim(0, 10) + ylim(0, 10) + t1
ggplot() +
geom_point(data=df1[df1$v1=="x2",], shape=15, size=5,
aes(x=x, y=y, colour = factor(v3)), ) +
scale_colour_manual(values = c(NA,"black")) + facet_grid(~v2) +
geom_point(data=df2[df2$v4=="x2",], aes(x=x,y=y), colour="red", size=4) +
coord_equal(ratio=1) + xlim(0, 10) + ylim(0, 10) + t1
... but I'm curious if a single plot can be generated because with my actual data set I have several x's and it is time consuming to piece the graphs together.

does it help if we just rename df2$v4 or make a new column called df2$v1, for faceting purposes:
df2 <- dplyr::rename(df2, v1 = v4)
df2$v1 <- df2$v4
# either works
then ggplot will distribute the data points as you would like, with this:
ggplot() +
geom_point(data=df1, aes(x=x, y=y, colour = factor(v3)), shape=15, size=5) +
scale_colour_manual(values = c(NA,"black")) +
facet_grid(v1~v2) +
geom_point(data=df2, aes(x=x,y=y), colour="red", size=4) +
coord_equal(ratio=1) + xlim(0, 10) + ylim(0, 10) +
t1
not 100% sure I grasp your problem...

R ggplot : Can't change y-axis scale range with facetted plot

I have a simple problem but I can't figure out why it won't work -> I can't adjust the y scale range on my faceted bar plot:
# Data #
df<-as.data.frame(c("x","y","z","x","y","z","x","y","z","x","y","z"))
colnames(df)<-"x"
df$y<-c(10,15,20,5,25,45,10,10,20,40,20,5)
df$facet<-c(1,1,1,1,1,1,2,2,2,2,2,2)
df$group<-c("A","A","A","B","B","B","A","A","A","B","B","B")
# Plot #
ggplot(df, aes(x=x, y=y, fill=group)) +
facet_grid( ~ facet) +
scale_fill_manual(values=c("blue", "red")) +
geom_bar(position="dodge", stat="identity") +
theme(strip.text = element_text(face="bold", size=rel(1)),
strip.background = element_rect(fill="white", colour="white", size=1)) +
theme(panel.margin = unit(1, "lines")) +
scale_x_discrete(expand = c(0, 0)) +
theme(panel.grid.major.x = element_blank()) + theme(axis.ticks.x = element_blank()) +
theme(legend.background=element_blank()) +
scale_y_continuous("%", breaks=seq(0, 50, 10), minor_breaks=seq(0,50,5), expand = c(0, 0))
I would like the y-axis to go upto 50 but using scale_y_continuous strangely does not work, producing this result:

You need to add a limits argument in your scale_y_continuous :
scale_y_continuous("%", limits=c(0,50), breaks=seq(0, 50, 10), minor_breaks=seq(0,50,5), expand = c(0, 0))
Otherwise you just define the breaks position, not the axis values range.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Overlay density of one variable on trend line - r

Related

In R, how can I draw a negative (1:1 ratio) line in graph?

How to add a second legend using different `geom_line`?

Broken confidence interval areas when using ylim in ggplot2

Displaying a subset of features in a facet of a multi-layer plot

R ggplot : Can't change y-axis scale range with facetted plot

Categories

Resources