I will like replicate the combined plot of density plot and scatter plot as in Figure 3 of of this paper:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6222032/
I will like to do this in base R or any R package. I know how to plot both density plot and scatter plot but not how to combine the two as in the paper. Can anyone suggest anything?
library(ggplot2)
ggplot(iris, aes(Sepal.Width)) +
geom_histogram(binwidth = 0.3, fill = "gray70", color = "white") +
geom_jitter(aes(y = -5), height = 3, alpha = 0.2) +
theme_classic()
Related
I want to plot a segmented bar plot in ggplot2. Here is part of my dataframe, I want to plot the proportion of output(0 and 1) for each x1(0 and 1). But when I use the following code, what I plot is just black bars without any segmentation. What's the problem in here?
fig = ggplot(data=df, mapping=aes(x=x1, fill=output)) + geom_bar(stat="count", width=0.5, position='fill')
The output plot is here
You need factor variables for your task:
library(ggplot2)
df <- data.frame(x1=sample(0:1,100,replace = T),output=sample(0:1,100,replace = T))
ggplot(data = df, aes(x = as.factor(x1), fill = as.factor(output))) +
geom_histogram(stat = "count")+
labs(x="x11")
which give me:
How can i add shaded on both end like the picture below?
i want to add one end from 0 to -.995 and 1.995 to Inf
I tried solution here https://stackoverflow.com/a/4371473/3133957 but it doesn't seem to work.
here my code
tmpdata <- data.frame(vals = t.stats)
qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency") +
geom_vline(xintercept = t.statistic(precip, population.precipitation),
linetype = "dashed") +
geom_ribbon(data=subset(tmpdata,vals>-1.995 & vals<1.995),aes(ymax=max(vals),ymin=0,fill="red",alpha=0.5))
You didn't provide a dataset for your question, so I simulated one to use for this answer. First, make your density plot:
tmpdata <- data.frame(vals = rnorm(10000, mean = 0, sd = 1))
plot <- qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency")
Then, extract the x and y coordinates used by ggplot to plot your density curve:
area.data <- ggplot_build(plot)$data[[1]]
You can then add two geom_area layers to shade in the left and right tails of your curve via:
plot +
geom_area(data=area.data[which(area.data$x < -1.995),], aes(x=x, y=y), fill="skyblue") +
geom_area(data=area.data[which(area.data$x > 1.995),], aes(x=x, y=y), fill="skyblue")
This will give you the following plot:
Note that you can add your geom_vline layer after this (I left it out because it required data you did not supply in your question).
I have a scatterplot and I need to draw a contour that contains all (or almost all) points.
I have managed to do that with stat_density_2d() with the bins option set to 2 and the geom_polygon(). However, since I have to set the bin to 2, I still have 2 contours, one in the 'center' of the polygon and the outter one. I only need the outter one.
What I have (a polygon with 2 bins: an inner one and the outter one):
What I need:
The inner bin looks small in this example but it looks unprofessional in bigger and more complex graphs.
Example:
set.seed(20)
x = rnorm(20, 3)
y = rnorm(20, 4)
points = tibble('x'=rnorm(10, 3), 'y'=rnorm(10, 4))
ggplot2::ggplot(data=points, mapping=aes(x=x, y=y, fill='grey', colour='black')) +
geom_point() +
stat_density_2d(aes(colour='black'), bins=2, geom='polygon') +
scale_fill_identity() + scale_colour_identity() +
geom_vline(xintercept=0, colour = 'black', linetype = 'solid') +
geom_hline(yintercept=0, colour = 'black', linetype = 'solid') +
xlim(-8, 8) + ylim(-8, 8)
Similar questions:
(the package in which this soution is based it is not longe availble)
ggplot: How to draw contour line for 2d scatter plot to outline the data points
How to plot a contour line showing where 95% of values fall within, in R and in ggplot2
I'm trying to make a plot that overlays a bunch of simulated density plots that are one color with low alpha and one empirical density plot with high alpha in a new color. This produces a plot that looks about how I want it.
library(ggplot2)
model <- c(1:100)
values <- rnbinom(10000, 1, .4)
df = data.frame(model, values)
empirical_data <- rnbinom(1000, 1, .3)
ggplot() +
geom_density(aes(x=empirical_data), color='orange') +
geom_line(stat='density',
data = df,
aes(x=values,
group = model),
color='blue',
alpha = .05) +
xlab("Value")
However, it doesn't have a legend and I can't figure out how to add a legend to differentiate plots from df and plots from empirical_data.
The other road I started to go down was to put them all in one dataframe but I couldn't figure out how to change the color and alpha for just one of the density plots.
Moving the color = ... into the aes allows you to call the scale_color_manual and move them into the aes and make the values you pass to color a binding. You can then change it to whatever you want as the actual colors are determined in the scale_color_manual.
ggplot() +
geom_density(aes(x=empirical_data, color='a')) +
geom_line(stat='density',
data = df,
aes(x=values,
group = model,
color='b'),
alpha = .05) +
scale_color_manual(name = 'data source',
values =c('b'='blue','a'='orange'),
labels = c('df','empirical_data')) +
xlab("Value")
How can i add shaded on both end like the picture below?
i want to add one end from 0 to -.995 and 1.995 to Inf
I tried solution here https://stackoverflow.com/a/4371473/3133957 but it doesn't seem to work.
here my code
tmpdata <- data.frame(vals = t.stats)
qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency") +
geom_vline(xintercept = t.statistic(precip, population.precipitation),
linetype = "dashed") +
geom_ribbon(data=subset(tmpdata,vals>-1.995 & vals<1.995),aes(ymax=max(vals),ymin=0,fill="red",alpha=0.5))
You didn't provide a dataset for your question, so I simulated one to use for this answer. First, make your density plot:
tmpdata <- data.frame(vals = rnorm(10000, mean = 0, sd = 1))
plot <- qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency")
Then, extract the x and y coordinates used by ggplot to plot your density curve:
area.data <- ggplot_build(plot)$data[[1]]
You can then add two geom_area layers to shade in the left and right tails of your curve via:
plot +
geom_area(data=area.data[which(area.data$x < -1.995),], aes(x=x, y=y), fill="skyblue") +
geom_area(data=area.data[which(area.data$x > 1.995),], aes(x=x, y=y), fill="skyblue")
This will give you the following plot:
Note that you can add your geom_vline layer after this (I left it out because it required data you did not supply in your question).