I've poked around, but been unable to find an answer. I want to do a weighted geom_bar plot overlaid with a vertical line that shows the overall weighted average per facet. I'm unable to make this happen. The vertical line seems to a single value applied to all facets.
require('ggplot2')
require('plyr')
# data vectors
panel <- c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
instrument <-c("V1","V2","V1","V1","V1","V2","V1","V1","V2","V1","V1","V2","V1","V1","V2","V1")
cost <- c(1,4,1.5,1,4,4,1,2,1.5,1,2,1.5,2,1.5,1,2)
sensitivity <- c(3,5,2,5,5,1,1,2,3,4,3,2,1,3,1,2)
# put an initial data frame together
mydata <- data.frame(panel, instrument, cost, sensitivity)
# add a "contribution to" vector to the data frame: contribution of each instrument
# to the panel's weighted average sensitivity.
myfunc <- function(cost, sensitivity) {
return(cost*sensitivity/sum(cost))
}
mydata <- ddply(mydata, .(panel), transform, contrib=myfunc(cost, sensitivity))
# two views of each panels weighted average; should be the same numbers either way
ddply(mydata, c("panel"), summarize, wavg=weighted.mean(sensitivity, cost))
ddply(mydata, c("panel"), summarize, wavg2=sum(contrib))
# plot where each panel is getting its overall cost-weighted sensitivity from. Also
# put each panel's weighted average on the plot as a simple vertical line.
#
# PROBLEM! I don't know how to get geom_vline to honor the facet breakdown. It
# seems to be computing it overall the data and showing the resulting
# value identically in each facet plot.
ggplot(mydata, aes(x=sensitivity, weight=contrib)) +
geom_bar(binwidth=1) +
geom_vline(xintercept=sum(contrib)) +
facet_wrap(~ panel) +
ylab("contrib")
If you pass in the presumarized data, it seems to work:
ggplot(mydata, aes(x=sensitivity, weight=contrib)) +
geom_bar(binwidth=1) +
geom_vline(data = ddply(mydata, "panel", summarize, wavg = sum(contrib)), aes(xintercept=wavg)) +
facet_wrap(~ panel) +
ylab("contrib") +
theme_bw()
Example using dplyr and facet_wrap incase anyone wants it.
library(dplyr)
library(ggplot2)
df1 <- mutate(iris, Big.Petal = Petal.Length > 4)
df2 <- df1 %>%
group_by(Species, Big.Petal) %>%
summarise(Mean.SL = mean(Sepal.Length))
ggplot() +
geom_histogram(data = df1, aes(x = Sepal.Length, y = ..density..)) +
geom_vline(data = df2, mapping = aes(xintercept = Mean.SL)) +
facet_wrap(Species ~ Big.Petal)
vlines <- ddply(mydata, .(panel), summarize, sumc = sum(contrib))
ggplot(merge(mydata, vlines), aes(sensitivity, weight = contrib)) +
geom_bar(binwidth = 1) + geom_vline(aes(xintercept = sumc)) +
facet_wrap(~panel) + ylab("contrib")
Related
First, the libraries
library(tidyr)
library(leaps)
library(ggplots2)
library(ggdark)
The value of the model
set.seed(1)
X = rnorm(100)
e = rnorm(100)
Y = 8 + 7*X + 2.5*X^2 - 9*X^3 + e
Fitting
data.all = data.frame(Y,X)
regfit.full = regsubsets(Y~poly(X,10,raw=T), data=data.all, nvmax=10)
(reg.summary = summary(regfit.full))
Then I get the minimum value for each variables
(reg.min.cp = which.min(reg.summary$cp))
(reg.min.bic = which.min(reg.summary$bic))
(reg.min.adjr2 = which.min(reg.summary$adjr2))
Creating the data frame for plot
df = data.frame(reg.summary$cp, reg.summary$bic, reg.summary$adjr2)
df$rownum = 1:nrow(df)
Reshaping the data frame
molten = df %>% gather(variable, value, reg.summary.cp:reg.summary.adjr2 )
Plotting with facets
(lp = molten %>% ggplot(data=.) +
aes(x=rownum, y=value) +
geom_line(col="black") +
geom_point(data=molten, aes(xint=reg.min.adjr2, z="reg.summary.adjr2", col="red")) + # this is where I got the wrong plot
facet_wrap(~variable, scales="free_y")
)
And it shows wrong. What I expect is that the geom_point(data=molten, aes(xint=reg.min.adjr2, z="reg.summary.adjr2", col="red")) will just add the reg.min.adjr2 to the facet reg.summary.adjr2 and only one point.
How to make it in that way?
I got some idea here from these two SO:
How to add different lines for facets
Add a segment only to one facet using ggplot2
What I did is to create first a new data frame for the min values for cp , bic, and adjr2. And then add the points to the main plot.
I make sure that the value for x will be the rownum and the y are the min values. I also added a parameter variable to min_plot to make sure that it will be added to the right facet.
min_plot = data.frame(
rownum=c(reg.min.cp, reg.min.bic, reg.min.adjr2),
y = c(reg.summary$cp[reg.min.cp], reg.summary$bic[reg.min.bic], reg.summary$adjr2[reg.min.adjr2]),
variable=c("reg.summary.cp", "reg.summary.bic", "reg.summary.adjr2"))
(lp = molten %>% ggplot(data=.)
+ aes(x=rownum, y=value)
+ geom_line(col="black")
+ facet_wrap(~variable, scales="free_y")
+ geom_point(data = min_plot, aes(x=rownum, y=y), col="red")
)
I’m totally new to ggplot, relatively fresh with R and want to make a smashing ”before-and-after” scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:
show each individual observation separately (now same values are overlapping)
connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
subset the data along class and id using shape and colors
How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?
Main alternative: geom_point()
Here is some sample data and example code using genom_point
x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of ”feelings of peace"
class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual
df <- data.frame(x,y,class,id)
ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()
Alternative: scale_size()
I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.
ggplot(df, aes(x=x, y=y)) +
stat_sum()
Alternative: geom_dotplot()
I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.
df1 <- df[1:10,] # data before
df2 <- df[11:20,] # data after
p1 <- ggplot(df1, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
p2 <- ggplot(df2, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
grid.arrange(p1,p2, nrow=1) # GridExtra package
Or maybe it is better to summarize data by x, id, class as mean/median of y, filter out ids producing NAs (e.g. ids 3 and 6), and connect the points by lines? So in case if you don't really need to show variability for some ids (which could be true if the plot only illustrates tendencies) you can do it this way:
library(ggplot)
library(dplyr)
#library(ggthemes)
df <- df %>%
group_by(x, id, class) %>%
summarize(y = median(y, na.rm = T)) %>%
ungroup() %>%
mutate(
id = factor(id),
x = factor(x, labels = c("before", "after")),
class = factor(class, labels = c("one day", "multiple days")),
) %>%
group_by(id) %>%
mutate(nas = any(is.na(y))) %>%
ungroup() %>%
filter(!nas) %>%
select(-nas)
ggplot(df, aes(x = x, y = y, col = id, group = id)) +
geom_point(aes(shape = class)) +
geom_line(show.legend = F) +
#theme_few() +
#theme(legend.position = "none") +
ylab("Feelings of peace, %") +
xlab("")
Here's one possible solution for you.
First - to get the color and shapes determined by variables, you need to put these into the aes function. I turned several into factors, so the labs function fixes the labels so they don't appear as "factor(x)" but just "x".
To address multiple points, one solution is to use geom_smooth with method = "lm". This plots the regression line, instead of connecting all the dots.
The option se = FALSE prevents confidence intervals from being plotted - I don't think they add a lot to your plot, but play with it.
Connecting the dots is done by geom_line - feel free to try that as well.
Within geom_point, the option position = position_jitter(width = .1) adds random noise to the x-axis so points do not overlap.
ggplot(df, aes(x=factor(x), y=y, color=factor(id), shape=factor(class), group = id)) +
geom_point(position = position_jitter(width = .1)) +
geom_smooth(method = 'lm', se = FALSE) +
labs(
x = "x",
color = "ID",
shape = 'Class'
)
In ggplot2 I believe the facets (subplots) are all the same size. I think this is very important for making comparisons between the subplots. However, when I convert my plots to plotly objects using ggplotly() the subplots are not always the same size. In particular, the rows seem to be different heights. I am not sure if this is the intended functionality of ggplotly(), a bug, or just user error on my part. I am using ggplot2 version 2.2.1.9000 and plotly 4.7.1.
How can I convert my ggplots to plotly objects while ensuring the subplots are all the same size?
Note: I originally posted this question in the context of using Shiny and plotly. However, the issue seems to be independent of Shiny so I am deleting my original question and reposting.
In the example below, when I use facet_wrap(sex ~ day) the second row is slightly shorter than top and bottom rows. When I use facet_grid(sex ~ day) the rows are equal in height. When I use facet_grid(sex ~ day) the two middle rows are slightly shorter.
library(plotly)
library(reshape2)
# An example plot
myPlot <- ggplot(tips, aes(x = total_bill, y = tip / total_bill)) +
geom_point(shape = 1)
# Facet wrap with sex ~ day
pWrap <- myPlot + facet_wrap(sex ~ day)
qWrap <- ggplotly(pWrap)
q <- qWrap
d <- rbind(q$x$layout$yaxis4$domain, q$x$layout$yaxis3$domain, q$x$layout$yaxis2$domain, q$x$layout$yaxis$domain)
dfWrap <- data.frame(lb = d[,1], ub = d[,2])
dfWrap["range"] <- dfWrap["ub"] - dfWrap["lb"]
dfWrap["facetType"] <- "facet_wrap(sex ~ day)"
# Facet grid with sex ~ day
pGrid1 <- myPlot + facet_grid(sex ~ day)
qGrid1 <- ggplotly(pGrid1)
q <- qGrid1
d <- rbind(q$x$layout$yaxis4$domain, q$x$layout$yaxis3$domain, q$x$layout$yaxis2$domain, q$x$layout$yaxis$domain)
dfGrid1 <- data.frame(lb = d[,1], ub = d[,2])
dfGrid1["range"] <- dfGrid1["ub"] - dfGrid1["lb"]
dfGrid1["facetType"] <- "facet_grid(sex ~ day)"
# Facet grid with day ~ sex
pGrid2 <- myPlot + facet_grid(day ~ sex)
qGrid2 <- ggplotly(pGrid2)
q <- qGrid2
d <- rbind(q$x$layout$yaxis4$domain, q$x$layout$yaxis3$domain, q$x$layout$yaxis2$domain, q$x$layout$yaxis$domain)
dfGrid2 <- data.frame(lb = d[,1], ub = d[,2])
dfGrid2["range"] <- dfGrid2["ub"] - dfGrid2["lb"]
dfGrid2["facetType"] <- "facet_grid(day ~ sex)"
EDIT
After some more digging I found that the plotly uses the get_domains() function to define the layout. Inside this function there is a line for defining the heights of the plots
ys[[i]] <- c(ystart = 1 - (heights[j]) - if (j == 1) 0 else margins[3],
yend = 1 - (heights[j + 1]) + if (j == nrows) 0 else margins[4])
such that the top and bottom rows are higher by the width of the margins. It appears the margins include not only the spacing between the panels (panel.spacing) but also room for axis labels, ticks, panel titles, etc. Therefore, we can find a test case that works reasonably well with ggplot() but totally fails with ggplotly()
p <- ggplot(data = tips %>%
group_by(time, sex, day) %>%
summarize(total_bill = mean(total_bill)),
aes(x = time, y = total_bill)) +
geom_bar(stat = "identity") +
facet_wrap(sex ~ day, scales = "free_x") +
theme(axis.text.x = element_text(angle = 90))
ggplotly(p)
In the following, by selecting free_y, the maximum values of each scale adjust as expected, however, how can I get the minimum values to also adjust? at the moment, they both start at 0, when I really want the upper facet to start at about 99 and go to 100, and the lower facet to start at around 900 and go to 1000.
library(ggplot2)
n = 100
df = rbind(data.frame(x = 1:n,y = runif(n,min=99,max=100),variable="First"),
data.frame(x = 1:n,y = runif(n,min=900,max=1000),variable="Second"))
ggplot(data=df,aes(x,y,fill=variable)) +
geom_bar(stat='identity') +
facet_grid(variable~.,scales='free')
You could use geom_linerange rather than geom_bar. A general way to do this is to first find the min of y for each value of variable and then merge the minimums with the original data. Code would look like:
library(ggplot2)
min_y <- aggregate(y ~ variable, data=df, min)
sp <- ggplot(data=merge(df, min_y, by="variable", suffixes = c("","min")),
aes(x, colour=variable)) +
geom_linerange(aes(ymin=ymin, ymax=y), size=1.3) +
facet_grid(variable ~ .,scales='free')
plot(sp)
Plot looks like:
I'm looking for a way to plot a bar chart containing two different series, hide the bars for one of the series and instead have a line (smooth if possible) go through the top of where bars for the hidden series would have been (similar to how one might overlay a freq polynomial on a histogram). I've tried the example below but appear to be running into two problems.
First, I need to summarize (total) the data by group, and second, I'd like to convert one of the series (df2) to a line.
df <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,1,2,2,3,3))
df2 <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,4,3,5,1,2))
ggplot(df, aes(x=grp, y=val)) +
geom_bar(stat="identity", alpha=0.75) +
geom_bar(data=df2, aes(x=grp, y=val), stat="identity", position="dodge")
You can get group totals in many ways. One of them is
with(df, tapply(val, grp, sum))
For simplicity, you can combine bar and line data into a single dataset.
df_all <- data.frame(grp = factor(levels(df$grp)))
df_all$bar_heights <- with(df, tapply(val, grp, sum))
df_all$line_y <- with(df2, tapply(val, grp, sum))
Bar charts use a categorical x-axis. To overlay a line you will need to convert the axis to be numeric.
ggplot(df_all) +
geom_bar(aes(x = grp, weight = bar_heights)) +
geom_line(aes(x = as.numeric(grp), y = line_y))
Perhaps your sample data aren't representative of the real data you are working with, but there are no lines to be drawn for df2. There is only one value for each x and y value. Here's a modifed version of your df2 with enough data points to construct lines:
df <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,2,3,1,2,3))
df2 <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,4,3,5,0,2))
p <- ggplot(df, aes(x=grp, y=val))
p <- p + geom_bar(stat="identity", alpha=0.75)
p + geom_line(data=df2, aes(x=grp, y=val), colour="blue")
Alternatively, if your example data above is correct, you can plot this information as a point with geom_point(data = df2, aes(x = grp, y = val), colour = "red", size = 6). You can obviously change the color and size to your liking.
EDIT: In response to comment
I'm not entirely sure what the visual for a freq polynomial over a histogram is supposed to look like. Are the x-values supposed to be connected to one another? Secondly, you keep referring to wanting lines but your code shows geom_bar() which I assume isn't what you want? If you want lines, use geom_lines(). If the two assumptions above are correct, then here's an approach to do that:
#First let's summarise df2 by group
df3 <- ddply(df2, .(grp), summarise, total = sum(val))
> df3
grp total
1 A 5
2 B 8
3 C 3
#Second, let's plot df3 as a line while treating the grp variable as numeric
p <- ggplot(df, aes(x=grp, y=val))
p <- p + geom_bar(alpha=0.75, stat = "identity")
p + geom_line(data=df3, aes(x=as.numeric(grp), y=total), colour = "red")