ggplot axis: different y-axis on left and right of plot - r

I have the following data;
https://www.dropbox.com/s/at2f2zni7s1hnzm/results.csv?dl=0
When I plot all three plots using the following;
library(ggplot2)
library(pROC)
roc <- roc(results$testactual, results$pred)
ggroc(roc) +
geom_density(data = results %>%
filter(testactual == 0), aes(pred), color='green') +
geom_density(data = results %>%
filter(testactual == 1), aes(pred), color='black')
I am able to obtain 3 plots on the same graph but the axis are not as I would have hoped.
I am trying to make it such that the y-axis for the density plots are displayed onto the right side and the roc plots y-axis are on the left.
Finally sort the x-axis so that the 1 is on the right side and the zero is on the left side (however I think I can manage this as I have run into this problem before)
Direct R link to data:
results <- read.csv(url("https://www.dropbox.com/s/at2f2zni7s1hnzm/results.csv?dl=1"))
EDIT: Just plotting the density plots:
Plot of the ROC plot

Use the sec.axis parameter. Also you can use ..scaled..to scale your densities to max of 1.
roc <- roc(results$testactual, results$pred)
ggroc(roc) +
geom_density(data = results %>%
filter(testactual == 0), aes(x=pred, y=..scaled..), color='green') +
geom_density(data = results %>%
filter(testactual == 1), aes(x=pred, y=..scaled..), color='black') +
scale_y_continuous(name = "Density", sec.axis = sec_axis(~., name = "Sensitivity"))

Related

Something wrong with my segmented bar plot in ggplot2

I want to plot a segmented bar plot in ggplot2. Here is part of my dataframe, I want to plot the proportion of output(0 and 1) for each x1(0 and 1). But when I use the following code, what I plot is just black bars without any segmentation. What's the problem in here?
fig = ggplot(data=df, mapping=aes(x=x1, fill=output)) + geom_bar(stat="count", width=0.5, position='fill')
The output plot is here
You need factor variables for your task:
library(ggplot2)
df <- data.frame(x1=sample(0:1,100,replace = T),output=sample(0:1,100,replace = T))
ggplot(data = df, aes(x = as.factor(x1), fill = as.factor(output))) +
geom_histogram(stat = "count")+
labs(x="x11")
which give me:

Density plots using ggplot2 [duplicate]

How can i add shaded on both end like the picture below?
i want to add one end from 0 to -.995 and 1.995 to Inf
I tried solution here https://stackoverflow.com/a/4371473/3133957 but it doesn't seem to work.
here my code
tmpdata <- data.frame(vals = t.stats)
qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency") +
geom_vline(xintercept = t.statistic(precip, population.precipitation),
linetype = "dashed") +
geom_ribbon(data=subset(tmpdata,vals>-1.995 & vals<1.995),aes(ymax=max(vals),ymin=0,fill="red",alpha=0.5))
You didn't provide a dataset for your question, so I simulated one to use for this answer. First, make your density plot:
tmpdata <- data.frame(vals = rnorm(10000, mean = 0, sd = 1))
plot <- qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency")
Then, extract the x and y coordinates used by ggplot to plot your density curve:
area.data <- ggplot_build(plot)$data[[1]]
You can then add two geom_area layers to shade in the left and right tails of your curve via:
plot +
geom_area(data=area.data[which(area.data$x < -1.995),], aes(x=x, y=y), fill="skyblue") +
geom_area(data=area.data[which(area.data$x > 1.995),], aes(x=x, y=y), fill="skyblue")
This will give you the following plot:
Note that you can add your geom_vline layer after this (I left it out because it required data you did not supply in your question).

How can i add two shade on both end of the density distribution plot

How can i add shaded on both end like the picture below?
i want to add one end from 0 to -.995 and 1.995 to Inf
I tried solution here https://stackoverflow.com/a/4371473/3133957 but it doesn't seem to work.
here my code
tmpdata <- data.frame(vals = t.stats)
qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency") +
geom_vline(xintercept = t.statistic(precip, population.precipitation),
linetype = "dashed") +
geom_ribbon(data=subset(tmpdata,vals>-1.995 & vals<1.995),aes(ymax=max(vals),ymin=0,fill="red",alpha=0.5))
You didn't provide a dataset for your question, so I simulated one to use for this answer. First, make your density plot:
tmpdata <- data.frame(vals = rnorm(10000, mean = 0, sd = 1))
plot <- qplot(x = vals, data=tmpdata, geom="density",
adjust = 1.5,
xlab="sampling distribution of t-statistic",
ylab="frequency")
Then, extract the x and y coordinates used by ggplot to plot your density curve:
area.data <- ggplot_build(plot)$data[[1]]
You can then add two geom_area layers to shade in the left and right tails of your curve via:
plot +
geom_area(data=area.data[which(area.data$x < -1.995),], aes(x=x, y=y), fill="skyblue") +
geom_area(data=area.data[which(area.data$x > 1.995),], aes(x=x, y=y), fill="skyblue")
This will give you the following plot:
Note that you can add your geom_vline layer after this (I left it out because it required data you did not supply in your question).

How to plot histograms of raw data on the margins of a plot of interpolated data

I would like to show in the same plot interpolated data and a histogram of the raw data of each predictor. I have seen in other threads like this one, people explain how to do marginal histograms of the same data shown in a scatter plot, in this case, the histogram is however based on other data (the raw data).
Suppose we see how price is related to carat and table in the diamonds dataset:
library(ggplot2)
p = ggplot(diamonds, aes(x = carat, y = table, color = price)) + geom_point()
We can add a marginal frequency plot e.g. with ggMarginal
library(ggExtra)
ggMarginal(p)
How do we add something similar to a tile plot of predicted diamond prices?
library(mgcv)
model = gam(price ~ s(table, carat), data = diamonds)
newdat = expand.grid(seq(55,75, 5), c(1:4))
names(newdat) = c("table", "carat")
newdat$predicted_price = predict(model, newdat)
ggplot(newdat,aes(x = carat, y = table, fill = predicted_price)) +
geom_tile()
Ideally, the histograms go even beyond the margins of the tileplot, as these data points also influence the predictions. I would, however, be already very happy to know how to plot a histogram for the range that is shown in the tileplot. (Maybe the values that are outside the range could just be added to the extreme values in different color.)
PS. I managed to more or less align histograms to the margins of the sides of a tile plot, using the method of the accepted answer in the linked thread, but only if I removed all kind of labels. It would be particularly good to keep the color legend, if possible.
EDIT:
eipi10 provided an excellent solution. I tried to modify it slightly to add the sample size in numbers and to graphically show values outside the plotted range since they also affect the interpolated values.
I intended to include them in a different color in the histograms at the side. I hereby attempted to count them towards the lower and upper end of the plotted range. I also attempted to plot the sample size in numbers somewhere on the plot. However, I failed with both.
This was my attempt to graphically illustrate the sample size beyond the plotted area:
plot_data = diamonds
plot_data <- transform(plot_data, carat_range = ifelse(carat < 1 | carat > 4, "outside", "within"))
plot_data <- within(plot_data, carat[carat < 1] <- 1)
plot_data <- within(plot_data, carat[carat > 4] <- 4)
plot_data$carat_range = as.factor(plot_data$carat_range)
p2 = ggplot(plot_data, aes(carat, fill = carat_range)) +
geom_histogram() +
thm +
coord_cartesian(xlim=xrng)
I tried to add the sample size in numbers with geom_text. I tried fitting it in the far right panel but it was difficult (/impossible for me) to adjust. I tried to put it on the main graph (which would anyway probably not be the best solution), but it didn’t work either (it removed the histogram and legend, on the right side and it did not plot all geom_texts). I also tried to add a third row of plots and writing it there. My attempt:
n_table_above = nrow(subset(diamonds, table > 75))
n_table_below = nrow(subset(diamonds, table < 55))
n_table_within = nrow(subset(diamonds, table >= 55 & table <= 75))
text_p = ggplot()+
geom_text(aes(x = 0.9, y = 2, label = paste0("N(>75) = ", n_table_above)))+
geom_text(aes(x = 1, y = 2, label = paste0("N = ", n_table_within)))+
geom_text(aes(x = 1.1, y = 2, label = paste0("N(<55) = ", n_table_below)))+
thm
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, text_p, ggplot(), widths=c(6,1), heights =c(6,1))
I would be very happy to receive help on either or both tasks (adding sample size as text & adding values outside plotted range in a different color).
Based on your comment, maybe the best approach is to roll your own layout. Below is an example. We create the marginal plots as separate ggplot objects and lay them out with the main plot. We also extract the legend and put it outside the marginal plots.
Set-up
library(ggplot2)
library(cowplot)
# Function to extract legend
#https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend) }
thm = list(theme_void(),
guides(fill=FALSE),
theme(plot.margin=unit(rep(0,4), "lines")))
xrng = c(0.6,4.4)
yrng = c(53,77)
Plots
p1 = ggplot(newdat, aes(x = carat, y = table, fill = predicted_price)) +
geom_tile() +
theme_classic() +
coord_cartesian(xlim=xrng, ylim=yrng)
leg = g_legend(p1)
p1 = p1 + thm[-1]
p2 = ggplot(diamonds, aes(carat)) +
geom_line(stat="density") +
thm +
coord_cartesian(xlim=xrng)
p3 = ggplot(diamonds, aes(table)) +
geom_line(stat="density") +
thm +
coord_flip(xlim=yrng)
plot_grid(
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(4,1), rel_heights=c(1,4), align="hv", scale=1.1),
leg, rel_widths=c(5,1))
UPDATE: Regarding your comment about the space between the plots: This is an Achilles heel of plot_grid and I don't know if there's a way to fix it. Another option is ggarrange from the experimental egg package, which doesn't add so much space between plots. Also, you need to save the output of ggarrange first and then lay out the saved object with the legend. If you run ggarrange inside grid.arrange you get two overlapping copies of the plot:
# devtools::install_github('baptiste/egg')
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, widths=c(6,1))

ggplot2: How to combine histogram, rug plot, and logistic regression prediction in a single graph

I am trying to plot combined graphs for logistic regressions as the function logi.hist.plot but I would like to do it using ggplot2 (aesthetic reasons).
The problem is that only one of the histograms should have the scale_y_reverse().
Is there any way to specify this in a single plot (see code below) or to overlap the two histograms by using coordinates that can be passed to the previous plot?
ggplot(dat) +
geom_point(aes(x=ind, y=dep)) +
stat_smooth(aes(x=ind, y=dep), method=glm, method.args=list(family="binomial"), se=FALSE) +
geom_histogram(data=dat[dat$dep==0,], aes(x=ind)) +
geom_histogram(data=dat[dat$dep==1,], aes(x=ind)) ## + scale_y_reverse()
This final plot is what I have been trying to achieve:
We use geom_segment to create the "bars" for the histogram and also to create the rug plots. Adjust the size parameter to change the "bar" widths in the histogram. In the example below, the bar heights are equal to the percentage of values within a given x range. If you want to change the absolute heights of the bars, just multiply n/sum(n) by a scaling factor when you create the h data frame of histogram counts.
To generate histogram counts for the plot, we pre-summarize the data to create the histogram values. Note the ifelse statement in the mutate function, which adjusts the values of pct in order to get the upward and downward bars in the plot, depending on whether y is 0 or 1, respectively. You can do this in the plot code itself, but then you need two separate calls to geom_segment.
library(dplyr)
# Fake data
set.seed(1926)
dat = data.frame(y = sample(0:1, 1000, replace=TRUE))
dat$x1 = rnorm(1000, 5, 2) * (dat$y+1)
# Summarise data to create histogram counts
h = dat %>% group_by(y) %>%
mutate(breaks = cut(x1, breaks=seq(-2,20,0.5), labels=seq(-1.75,20,0.5),
include.lowest=TRUE),
breaks = as.numeric(as.character(breaks))) %>%
group_by(y, breaks) %>%
summarise(n = n()) %>%
mutate(pct = ifelse(y==0, n/sum(n), 1 - n/sum(n)))
ggplot() +
geom_segment(data=h, size=4, show.legend=FALSE,
aes(x=breaks, xend=breaks, y=y, yend=pct, colour=factor(y))) +
geom_segment(dat=dat[dat$y==0,], aes(x=x1, xend=x1, y=0, yend=-0.02), size=0.2, colour="grey30") +
geom_segment(dat=dat[dat$y==1,], aes(x=x1, xend=x1, y=1, yend=1.02), size=0.2, colour="grey30") +
geom_line(data=data.frame(x=seq(-2,20,0.1),
y=predict(glm(y ~ x1, family="binomial", data=dat),
newdata=data.frame(x1=seq(-2,20,0.1)),
type="response")),
aes(x,y), colour="grey50", lwd=1) +
scale_y_continuous(limits=c(-0.02,1.02)) +
scale_x_continuous(limits=c(-1,20)) +
theme_bw(base_size=12)

Resources