I have 4 variables (A, B, C, D) with similar pattern on 3 Locations. I would like to plot a box plot (variables as dots on Y-axis, locations as X). But the variables have values of different orders of magnitude. Is there a way of scaling the Y-axis and have all variables plotted on the boxplots? Maybe differenced by colouring.
Location = c("Washington","Washington","Washington","Washington","Washington","Washington", "Maine","Maine","Maine","Maine","Maine", "Florida","Florida","Florida","Florida","Florida","Florida")
A = c(0.000693156, 0.000677354, 0.000727863, 0.000650822, 0.000908343, 0.001126689, 0.001316292, 0.000975274, 0.00109082, 0.001057585, 0.000927826, 0.000552769, 0.000532546, 0.000559781, 0.000771569, 0.000563436, 0.000551136)
B = c(0.001915388, 0.001936627, 0.001476521, 0.001573681, 0.002584282, 0.00738909, 0.008089839, 0.006616564, 0.00495211, 0.004515925, 0.003791596, 0.000653847, 0.000350701, 0.000559781, 0.001920087, 0.000738206, 0.001077627)
C = c(0.000138966, 0.000104745, 0.000145573, 0.000103305, 5.08255E-05, 0.000361988, 0.000264876, 0.000454172, 0.000277471, 0.000117919, 8.9214E-05, 0.000173727, 0.000108241, 8.54628E-05, 2.35593E-05, 3.1302E-05, 1.12019E-05)
D = c(0.000108829, 0.000135005, 0.000120617, 9.29746E-05, 0.000105561, 9.27596E-05, 0.000121317, 0.000131471, 0.000152503, 0.000128974, 0.000196271, 0.000142141, 0.000147208, 0.00013674, 0.000147246, 0.000185204, 0.000103058)
df = data.frame(Location, A, B, C, D)
And this is what I have tried for two variables as individual graphs
library(ggplot2)
a <- ggplot(df, aes(x=Location, y=A)) +
geom_boxplot()
a + geom_dotplot(binaxis='y', stackdir='center', dotsize=1, fill="red")
b <- ggplot(df, aes(x=Location, y=B)) +
geom_boxplot()
b + geom_dotplot(binaxis='y', stackdir='center', dotsize=1, fill="blue")
Can I merge all 4 variables in 1 graph with a scaled Y-axis?
Can I add a legend only showing "A" and "D"?
If you reshape your data to "long" format, faceting is one option. Note that you must set scales = 'free' in facet_wrap().
library(tidyverse)
df.long <- df %>%
pivot_longer(A:D, names_to = 'variable', values_to = 'value')
g <- ggplot(data = df.long, aes(x = Location, y = value)) +
geom_boxplot() +
facet_wrap(facets = ~variable, scales = 'free')
print(g)
If you wanted to get everything on one plot, you'd have to rescale the data per group. Here I've normalized each data point to between 0 and 1, relative to its original scale.
df.long <- df %>%
pivot_longer(A:D, names_to = 'variable', values_to = 'value') %>%
group_by(variable) %>%
mutate(value_norm = value - min(value),
value_norm = value_norm / max(value_norm)
)
g.norm <- ggplot(data = df.long, aes(x = Location, y = value_norm, fill = variable)) +
geom_boxplot()
print(g.norm)
Try this. Using scale_y_log10. Not the most beautiful plot, but ...
library(ggplot2)
library(tidyr)
library(dplyr)
df %>%
pivot_longer(-Location) %>%
ggplot(aes(x=Location, y=value, color = name)) +
geom_boxplot() +
geom_dotplot(aes(fill = name), color = "black", binaxis='y', dotsize=.5) +
scale_y_log10()
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2020-04-14 by the reprex package (v0.3.0)
Related
Is there a way to set a constant width for geom_bar() in the event of missing data in the time series example below? I've tried setting width in aes() with no luck. Compare May '11 to June '11 width of bars in the plot below the code example.
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
d<-aggregate(iris$Sepal.Length, by=list(iris$Month, iris$Species), sum)
d$quota<-seq(from=2000, to=60000, by=2000)
colnames(d) <- c("Month", "Species", "Sepal.Width", "Quota")
d$Sepal.Width<-d$Sepal.Width * 1000
g1 <- ggplot(data=d, aes(x=Month, y=Quota, color="Quota")) + geom_line(size=1)
g1 + geom_bar(data=d[c(-1:-5),], aes(x=Month, y=Sepal.Width, width=10, group=Species, fill=Species), stat="identity", position="dodge") + scale_fill_manual(values=colours)
Some new options for position_dodge() and the new position_dodge2(), introduced in ggplot2 3.0.0 can help.
You can use preserve = "single" in position_dodge() to base the widths off a single element, so the widths of all bars will be the same.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge(preserve = "single") ) +
scale_fill_manual(values = colours)
Using position_dodge2() changes the way things are centered, centering each set of bars at each x axis location. It has some padding built in, so use padding = 0 to remove.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge2(preserve = "single", padding = 0) ) +
scale_fill_manual(values = colours)
The easiest way is to supplement your data set so that every combination is present, even if it has NA as its value. Taking a simpler example (as yours has a lot of unneeded features):
dat <- data.frame(a=rep(LETTERS[1:3],3),
b=rep(letters[1:3],each=3),
v=1:9)[-2,]
ggplot(dat, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
This shows the behavior you are trying to avoid: in group "B", there is no group "a", so the bars are wider. Supplement dat with a dataframe with all the combinations of a and b:
dat.all <- rbind(dat, cbind(expand.grid(a=levels(dat$a), b=levels(dat$b)), v=NA))
ggplot(dat.all, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
I had the same problem but was looking for a solution that works with the pipe (%>%). Using tidyr::spread and tidyr::gather from the tidyverse does the trick. I use the same data as #Brian Diggs, but with uppercase variable names to not end up with double variable names when transforming to wide:
library(tidyverse)
dat <- data.frame(A = rep(LETTERS[1:3], 3),
B = rep(letters[1:3], each = 3),
V = 1:9)[-2, ]
dat %>%
spread(key = B, value = V, fill = NA) %>% # turn data to wide, using fill = NA to generate missing values
gather(key = B, value = V, -A) %>% # go back to long, with the missings
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())
Edit:
There actually is a even simpler solution to that problem in combination with the pipe. Use tidyr::complete gives the same result in one line:
dat %>%
complete(A, B) %>%
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())
I am using this code:
library(tidyverse)
library(reshape)
mtcars <- melt(mtcars, id="vs")
mtcars$vs <- as.character(mtcars$vs)
ggplot(mtcars, aes(x=vs, y=variable, fill=value)) +
geom_tile()
How can I paste the mean values as text on each tile? I tried + geom_text(mtcars, aes(vs, variable, label=mean)), but that does not work.
Also, how can I reverse the order of 0 and 1 on the x axis?
You can do the processing work outside of ggplot to produce:
library(ggplot2)
library(dplyr)
library(tidyr)
df <-
mtcars %>%
pivot_longer(-vs) %>%
group_by(vs, name) %>%
mutate(vs = factor(vs, levels = c(1, 0), ordered = TRUE),
mean = round(mean(value), 2))
ggplot(df, aes(x=vs, y=name, fill=value)) +
geom_tile() +
geom_text(aes(vs, name, label=mean), colour = "white", check_overlap = TRUE)
Created on 2021-04-06 by the reprex package (v1.0.0)
You can tweak stat_summary_2d() to display text based on computed variables with after_stat(). The order of the x-axis can be determined by setting the limits argument of the x-scale.
suppressPackageStartupMessages({
library(tidyverse)
library(reshape)
library(scales)
})
mtcars <- melt(mtcars, id="vs")
mtcars$vs <- as.character(mtcars$vs)
ggplot(mtcars, aes(x=vs, y =variable, fill = value)) +
geom_tile() +
stat_summary_2d(
aes(z = value,
label = after_stat(number(value, accuracy = 0.01))),
fun = mean,
geom = "text"
) +
scale_x_discrete(limits = c("1", "0"))
Created on 2021-04-06 by the reprex package (v1.0.0)
Also note that the geom_tile() just plots the last row of the dataset belonging to the x- and y-axis categories, so unless that is intended, it is something to be aware of.
The following code plot the predicted probability of several models against time. Having, all the plots on one graph was not readable so I divided the result in a grid.
I was wondering if it was possible to have only one ggplot with all the models then somehow specify which goes where with grid.arrange
Current :
p2.dat1 <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4 )
mdf1 <- melt(p2.dat1 , id.vars="EXPOSURE")
plm.plot.all1 <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
p2.dat2 <- select(ppf, EXPOSURE, predp.glm.gen, predp.glm5,predp.glm.step )
mdf2 <- melt(p2.dat2 , id.vars="EXPOSURE")
plm.plot.all2 <- ggplot(data = mdf2,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all1, plm.plot.all2, nrow=2)
Expected:
p2.dat <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step)
mdf <- melt(p2.dat , id.vars="EXPOSURE")
plm.plot.all <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all[some_selection_somehow], plm.plot.all[same], nrow=2)
Thanks,
You can do this with grid.arrange by writing some helper functions. It can be done more succinctly, but I prefer small focused functions that can be used with pipes.
library(tidyverse)
library(gridExtra)
# Helper Functions ----
plot_function <- function(x) {
ggplot(x, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
labs(title = unique(x$variable)) +
theme(legend.position = "none")
}
grid_plot <- function(x, selection) {
order <- c(names(x)[grepl(selection,names(x))], names(x)[!grepl(selection,names(x))])
grid.arrange(grobs = x[order], nrow = 2)
}
# Actually make the plot ----
ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
split(.$variable) %>%
map(plot_function) %>%
grid_plot("predp.glm3")
or you could do this with ggplot, a facet_wrap and factoring the variable column to the proper order. This has the benefits of shared axes across the plots, which facilitates easy comparison. You can alter the helper functions in the first approach to set the axes explicitly to achieve the same effect, but its just easier keeping it in ggplot.
library(tidyverse)
selection <- "predp.glm3"
plot_data <- ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
mutate(variable = fct_relevel(variable, c(selection, levels(variable)[-grepl(selection, levels(variable))])))
ggplot(plot_data, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
facet_wrap( ~variable, nrow = 2) +
theme(legend.position = "none")
Given a dataframe with discrete values,
d=data.frame(id=1:6, a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
I want to make a plot like
However I want to make different color for each layer, say red and green for "a", yellow/blue for "b".
The idea is to reshape your data (define coordinates to draw the rectangles) in order to use geom_rect from ggplot:
library(ggplot2)
library(reshape2)
i = setNames(expand.grid(1:nrow(d),1:ncol(d[-1])),c('x1','y1'))
ggplot(cbind(i,melt(d, id.vars='id')),
aes(xmin=x1, xmax=x1+1, ymin=y1, ymax=y1+1, color=variable, fill=value)) +
geom_rect()
Try geom_tile(). But you need to reshape your data to get exactly the same figure as you presented.
df <- data.frame(id=factor(c(1:6)), a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
library(reshape2)
df <- melt(df, vars.id = c(df$id))
library(ggplot2)
ggplot(aes(x = id, y = variable, fill = value), data = df) + geom_tile()
require("dplyr")
require("tidyr")
require("ggplot2")
d=data.frame(id=1:6, a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
ggplot(d %>% gather(type, value, a, b, c) %>% mutate(value = paste0(type, value)),
aes(x = id, y = type)) +
geom_tile(aes(fill = value), color = "white") +
scale_fill_manual(values = c("forestgreen", "indianred", "lightgoldenrod1",
"royalblue", "plum1", "plum2", "plum3"))
First we use reshape2 to transform the data from wide to long. Then to get discrete values we use as.factor(value) and finally we use scale_fill_manual to assign the 5 different colours we need. In geom_tile we specify the colour of the tile borders.
library(reshape2)
library(ggplot2)
df <- data.frame(id=1:6, a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
df <- melt(df, id.vars=c("id"))
ggplot(df, aes(id, variable, fill = as.factor(value))) + geom_tile(colour = "white") +
scale_fill_manual(values = c("lightblue", "steelblue2", "steelblue3", "steelblue4", "darkblue"), name = "Values")+
scale_x_discrete(limits = 1:6)
I'm using ggplot2 and am trying to generate a plot which shows the following data.
df=data.frame(score=c(4,2,3,5,7,6,5,6,4,2,3,5,4,8),
age=c(18,18,23,50,19,39,19,23,22,22,40,35,22,16))
str(df)
df
Instead of doing a frequency plot of the variables (see below code), I want to generate a plot of the average values for each x value. So I want to plot the average score at each age level. At age 18 on the x axis, we might have a 3 on the y axis for score. At age 23, we might have an average score of 4.5, and so forth (Edit: average values corrected). This would ideally be represented with a barplot.
ggplot(df, aes(x=factor(age), y=factor(score))) + geom_bar()
Error: stat_count() must not be used with a y aesthetic.
Just not sure how to do this in R with ggplot2 and can't seem to find anything on such plots. Statisticially, I don't know if the plot I desire to plot is even the right thing to do, but that's a different store.
Thanks!
You can use summary functions in ggplot. Here are two ways of achieving the same result:
# Option 1
ggplot(df, aes(x = factor(age), y = score)) +
geom_bar(stat = "summary", fun = "mean")
# Option 2
ggplot(df, aes(x = factor(age), y = score)) +
stat_summary(fun = "mean", geom = "bar")
Older versions of ggplot use fun.y instead of fun:
ggplot(df, aes(x = factor(age), y = score)) +
stat_summary(fun.y = "mean", geom = "bar")
If I understood you right, you could try something like this:
library(plyr)
library(ggplot2)
ggplot(ddply(df, .(age), mean), aes(x=factor(age), y=factor(score))) + geom_bar()
You can also use aggregate() in base R instead of loading another package.
temp = aggregate(list(score = df$score), list(age = factor(df$age)), mean)
ggplot(temp, aes(x = age, y = score)) + geom_bar()
Another option is doing a group_by of the x-values and summarise the "mean_score" per "age" using dplyr to do it in one pipe. Also you can use geom_col instead of geom_bar. Here is a reproducible example:
df=data.frame(score=c(4,2,3,5,7,6,5,6,4,2,3,5,4,8),
age=c(18,18,23,50,19,39,19,23,22,22,40,35,22,16))
library(dplyr)
library(ggplot2)
df %>%
group_by(age) %>%
summarise(mean_score = mean(score)) %>%
ggplot(aes(x = factor(age), y = mean_score)) +
geom_col() +
labs(x = "Age", y = "Mean score")
Created on 2022-08-26 with reprex v2.0.2