I am trying to add p-values to each boxplot pair in the graph shown below. I would like the p-values to be placed under each soil horizon label ('O', 'A' and 'B').
My data looks like this:
> head(kiwi_l)
# A tibble: 6 x 6
type horizon root_name length diameter n_child
<chr> <chr> <chr> <dbl> <dbl> <int>
1 Elevated CO2 A R1_A_L_S4G 0.0752 0.0342 0
2 Elevated CO2 A R1_A_L_S4F 0.0987 0.0319 0
3 Elevated CO2 A R1_A_L_S4E 0.105 0.0209 0
4 Elevated CO2 A R1_A_L_S4D 0.0476 0.0127 0
5 Elevated CO2 A R1_A_L_S4C 0.110 0.0282 0
6 Elevated CO2 A R1_A_L_S4B 0.244 0.0168 0
While the code I used to generate the graph is:
l_horizon<-ggplot(kiwi, aes(x=type, y=length, fill=type, palette='jco'))
+
geom_boxplot() +
facet_grid(. ~ factor(horizon, level=level_order)) +
theme_pubr() +
scale_y_continuous(name='Primary root length (cm)') +
scale_x_discrete(name='Treatment') +
ggtitle('Soil horizon') + theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position="none") +
theme(plot.title = element_text(size = 10, face = "bold"),
text = element_text(size = 10),
axis.title = element_text(face="bold"),
axis.text.x=element_text(size = 10),
axis.text.y=element_text(size=10),
axis.title.x = element_blank(),
axis.title.y=element_text(size=10))
l_horizon<-l_horizon+scale_fill_locuszoom()
l_horizon
Please help!
Since there is no data to play around with, I'll make up some:
set.seed(0)
df <- data.frame(f1 = rep(c("O","A","B"), each = 30),
f2 = rep(c("M","N"), 45),
y = rnorm(90))
Next we do a test on that data and format it's output:
tests <- split(df, df$f1) %>% sapply(function(x){
pval <- t.test(x[x$f2 == "M", "y"], x[x$f2 == "N", "y"])$p.value
paste0("p-value = ", format(pval, digits = 2, nsmall = 2))
})
Now if you want it to be part of the facet strip, you can adjust the levels of df$f1 to include the p-value:
levels(df$f1) <- paste0(levels(df$f1), "\n", tests)
ggplot(df, aes(x = f2, y = y)) +
geom_boxplot() +
facet_grid(~ f1)
If you wanted the p-values inside the panel instead of in the strip, you can use the annotate() function to place them in the panel. y = Inf ensures they are placed at the top.
ggplot(df, aes(x = f2, y = y)) +
geom_boxplot() +
facet_grid(~ f1) +
annotate("text", x = 1.5, y = Inf, label = tests, vjust = 1)
If you know where the on the y-axes to put the text, maybe annotate like this?
p_values <- c(1.1,2.2,3.3)
ggplot(data = d2,mapping = aes(x=range,y=p_area)) +
geom_boxplot() +
annotate("text", x=c(1,2,3), y=0.5, label= p_values)
Related
Current figure:
Desired effect:
I have a stacked bar chart which I wanted to add sample size on top of the chart, I tried using geomtext with the following code:
Data %>% count(Month, Age) %>%
group_by(Month) %>%
mutate(percent = n/sum(n)*100) %>%
ggplot(aes(Month, percent, fill = as.factor(Age))) +
geom_col(position = "fill") + ylab("") +
geom_text(aes(label = n_month, y = 1.05)) +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("#009E73", "#E69F00", "#0072B2")) +
theme(axis.text = element_text(size = 17),
legend.text = element_text(size = 18),
axis.title.x = element_text(margin = margin(t = 10), size = 16))
This returns an error, which I understand that it's because there are actually 34 data in this figure, but I only wanted it to display 12 numbers. For now I can only succeed if there's only 12 data (Hence the "Desired effect" figure). How should I change my code?
Error: Aesthetics must be either length 1 or the same as the data (34): label"
n_month
[1] 18 8 20 18 24 34 32 15 22 26 12 13
sorry for the delay. I tried to reproduce your data and the issue is the underlying data. For your approach it would be easier to have different datasets for your geoms.
For this example I am using the nycflights13 data, which is probably similar to your data.
Here is my setup:
library(dplyr)
library(ggplot2)
library(nycflights13)
graph_data <- flights %>%
filter(carrier %in% c("UA", "B6", "EV")) %>%
count(carrier, month) %>%
add_count(month, wt = n, name = "n_month") %>%
mutate(percent = n / n_month * 100)
Data looks like:
# A tibble: 36 × 3
carrier month n n_month percent
<chr> <int> <int> <int> <dbl>
1 B6 1 4427 13235 33.4
2 B6 2 4103 12276 33.4
3 B6 3 4772 14469 33.0
Now we supply the geom_col() and geom_text() with different datasets, based on your graph_data.
ggplot() +
geom_col(
data = graph_data,
aes(x = month, y = percent, fill = as.factor(carrier)),
position = "fill") + ylab("") +
geom_text(
data = distinct(graph_data, month, n_month),
aes(x = month, y = 1.05, label = n_month)) +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("#009E73", "#E69F00", "#0072B2")) +
theme(axis.text = element_text(size = 17),
legend.text = element_text(size = 18),
axis.title.x = element_text(margin = margin(t = 10), size = 16))
I tried to leave your code as much as possible, just added the data = ... argument in the geom_s.
Output is:
The dataframe df1 summarizes the mean daily depth (meanDepth) of a fish throught time, also the mean daily water temperature to different depths (T5m, T15m, T25m and T35m) and the overall mean daily temperature (meanT) for the whole water column (without considering different depths). As an example:
df1<- data.frame(Date=c("2016-08-05","2016-08-06","2016-08-07","2016-08-08","2016-08-09","2016-08-10"),
meanDepth=c(15,22,18,25,27,21),
T5m=c(17,18,21,23,21,18),
T15m=c(16,17,18,19,18,17),
T25m=c(16,17,17,18,18,17),
T35m=c(15,16,17,17,17,16),
meanT=c(16,17.2,17.8,18.3,17.8,17.4))
df1$Date<-as.Date(df1$Date)
df1
Date meanDepth T5m T15m T25m T35m meanT
1 2016-08-05 15 17 16 16 15 16.0
2 2016-08-06 22 18 17 17 16 17.2
3 2016-08-07 18 21 18 17 17 17.8
4 2016-08-08 25 23 19 18 17 18.3
5 2016-08-09 27 21 18 18 17 17.8
6 2016-08-10 21 18 17 17 16 17.4
I want to plot in one graph both the depth profile of the fish and the mean daily temperature for the different depths.
What I've got so far is to plot in one Y-axis the meanDepth and in the other y-axis the meanT. But I don't know how to add more lines related to the right-y-axis (=Temperature) that represent the mean daily temperature for different depths. Here you have the code I've been able to built so far.
p <- ggplot(df1, aes(x = Date))
p <- p + geom_line(aes(y = meanDepth, colour = "Overall daily mean depth"))
p <- p + geom_line(aes(y = meanT/(max(range(df1$meanT,na.rm=TRUE)/max(range(df1$meanDepth,na.rm=TRUE)))), colour = "Mean Water T"))
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*(max(range(df1$meanT,na.rm=TRUE)/max(range(df1$meanDepth,na.rm=TRUE)))), name = "Mean daily Temp"))
p <- p + scale_colour_manual(values = c("blue", "red"))
p <- p + labs(title="Mean daily depth and water temperature through time",
y = "Mean daily depth",
x = "Date",
colour = "Parameter")
p <- p + theme(legend.position = c(0.8, 0.9), plot.title = element_text(hjust=0.5, face="bold",margin = margin(0,0,12,0) ),axis.title.y =element_text(margin = margin(t = 0, r = 12, b = 0, l = 0)),axis.title.x =element_text(margin = margin(t = 12, r = 0, b = 0, l = 0)),axis.text.x=element_text(angle=60, hjust=1))
p <- p + scale_x_date(date_breaks = "1 days", labels = date_format("%Y-%m-%d"))
p
That's the plot I've got:
Does anyone how to add the lines referred to the temperatures at 5, 15, 25 and 35 meters?
Secondary axes are almost never a good idea, but you could do something like this:
library(ggplot2)
library(tidyr)
library(dplyr)
df1 %>%
mutate(meanDepthNormalized = meanDepth * max(meanT) / max(meanDepth)) %>% #1
select(-meanDepth) %>%
# could have changed meanDepth before directly, but wanted to be more verbose
gather(type, value, -Date) %>% #2
ggplot(aes(x = Date, y = value, color = type, linetype = type == "meanT")) + #3
geom_line(size = 1.5) +
scale_y_continuous(sec.axis = sec_axis(~ . * max(df1$meanDepth) / max(df1$meanT))) +
scale_color_manual("", values = c("#E41A1C", "#984EA3", "#BDD7E7",
"#6BAED6", "#3182BD", "#08519C")) +
theme_minimal() +
guides(linetype = FALSE) +
theme(legend.position = "top")
Explanation
You first transform your Depth to be on the same scale as the temperature measurements
Then you transform your data from wide to long format via gather
Then you can map color to the type variable which holds basically the former column names.
If I understood your question correctly, it's sufficient to add a geom_line for each of the temperatures/columns. Here's an example with T5m and T35m.
df1<- data.frame(Date=c("2016-08-05","2016-08-06","2016-08-07",
"2016-08-08","2016-08-09","2016-08-10"),
meanDepth=c(15,22,18,25,27,21),
T5m=c(17,18,21,23,21,18),
T15m=c(16,17,18,19,18,17),
T25m=c(16,17,17,18,18,17),
T35m=c(15,16,17,17,17,16),
meanT=c(16,17.2,17.8,18.3,17.8,17.4))
df1$Date<-as.Date(df1$Date)
norm <- max(df1$meanT,na.rm=TRUE)/max(df1$meanDepth,na.rm=TRUE)
p <- ggplot(df1, aes(x = Date)) +
geom_line(aes(y = meanDepth, colour = "Overall daily mean depth")) +
geom_line(aes(y = meanT/norm, colour = "Mean Water T")) +
geom_line(aes(y = T5m/norm, colour = "T5m")) +
geom_line(aes(y = T35m/norm, colour = "T35m")) +
scale_x_date(date_breaks = "1 days", labels = date_format("%Y-%m-%d")) +
scale_y_continuous(sec.axis = sec_axis(~.*norm, name = "Mean daily Temp")) +
scale_colour_manual(values = c("blue", "red", "orange", "black")) +
labs(title="Mean daily depth and water temperature through time",
y = "Mean daily depth",
x = "Date",
colour = "Parameter") +
theme(legend.position = c(0.8, 0.9),
plot.title = element_text(hjust=0.5, face="bold", margin = margin(0,0,12,0)),
axis.title.y = element_text(margin = margin(t = 0, r = 12, b = 0, l = 0)),
axis.title.x = element_text(margin = margin(t = 12, r = 0, b = 0, l = 0)),
axis.text.x=element_text(angle=60, hjust=1))
p
maybe with facet_grid(y ~., scales = "free")
here a example
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
facet_grid(cyl ~., scales = "free")
I perform a regression with reg <- lm(...) and get some coefficents I can access with reg$coefficients.
It's of type Named num and contains all the coefficients with their values.
Named num [1:11] 505.085 -0.251 -0.286 -0.22 -0.801 ...
- attr(*, "names")= chr [1:11] "(Intercept)" "year" "monthDez" "monthFeb" ...
I want to show these on my graph created with ggplot. My current approach was to use the subtitle for this:
labs(subtitle=paste(toString(names(reg$coefficients)), "\n",
paste(reg$coefficients, collapse = " ")))
But it's not aligned correctly (name directly over the value etc.)
Has someone an idea?
My current plot looks like this:
base <- ggplot(deliveries, aes(Date)) +
geom_line(aes(y = SalesVolume, colour = "SalesVolume"))+
ggtitle("Sales Volume By Time") +
xlab("Time") +
ylab("Sales Volume") +
labs(subtitle=paste(toString(names(reg$coefficients)), "\n", paste(reg$coefficients, collapse = " ")))
print(base + scale_x_date(labels = date_format("%b %y"), breaks = date_breaks("2 months")))
In this graph a forecast is displayed, so I want to see the regression coefficients there as well.
Would it work to make two separate plots and arrange them onto a grid?
library(ggplot2)
library(broom)
library(dplyr)
library(tidyr)
data_plot <-
ggplot(data = mtcars,
mapping = aes(x = qsec,
y = mpg,
colour = factor(gear))) +
geom_point()
fit <- lm(mpg ~ qsec + wt + factor(gear),
data = mtcars)
# Make a data frame with the contents of the model.
reg_data <-
tidy(fit) %>%
mutate(y = nrow(.):1 - 1) %>%
gather(estimate, value,
estimate:p.value) %>%
mutate(estimate = factor(estimate,
c("term", "estimate", "std.error",
"statistic", "p.value")))
# Make a plot displaying the table.
reg_plot <-
ggplot(data = reg_data,
mapping = aes(x = estimate,
y = y)) +
geom_text(mapping = aes(label = round(value, 2))) +
scale_y_continuous(breaks = unique(reg_data[["y"]]),
labels = unique(reg_data[["term"]])) +
scale_x_discrete(position = "top") +
xlab("") +
ylab("") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_blank())
# Arrange the two plots
gridExtra::grid.arrange(data_plot + theme(plot.margin = grid::unit(c(1,1,0,.5), "lines")),
reg_plot + theme(plot.margin = grid::unit(c(0,0,1,0), "lines")),
clip = FALSE,
nrow = 2,
ncol = 1,
heights = grid::unit(c(.70, .5),
c("null", "null")))
In my limited experience with ggplot2, annotate() could be used to add some annotations to a plot created with ggplot(), but I am not sure if the code below works for what you want
reg <- lm(data = mtcars, mpg ~ wt)
pred <- predict(reg)
newdata <- data.frame(mtcars, pred)
par <- summary(reg)$coefficients[,1] # extract model parameters
par.f <- format(par, digits = 2) # set the decimal digits of parameters
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_line(data = newdata, aes(x = wt, y = pred)) +
annotate("text", x = c(2, 2.5), y = 18, label = names(reg$coefficients)) +
annotate("text", x = c(2, 2.5), y = 16.5, label = par.f) # make them aligned by set x and y in annotate()
enter image description here
I am trying to implement the diagram 1 from Excel to Shiny. So far I got this code with the resulting diagram 2.
ggplot(filteredData(), aes(x=interaction(month, year), y=sum))
+ geom_bar(stat="identity") + facet_grid(. ~ X) + theme(legend.position="none")
I want to group month and year like in the Excel example, so hat you have only the month counter ("1", "2", ...) in the first row of the legend and the year ("2016", "2017", ...) in the second. The number of months can vary.
The data set looks like:
X year month sum
10 2016 1 450
10 2016 2 670
... ... ... ...
10 2017 1 200
11 2016 1 460
I slightly changed the data set, this is the closest I got to your specs:
df <- read.table(text = "X year month sum
10 2016 1 450
10 2016 2 670
10 2017 1 200
11 2016 1 460
11 2017 2 500", header = T)
# Notice the variable type for month and year
df$month <- as.factor(df$month)
df$year <- as.factor(df$year)
df$X <- as.factor(df$X)
ggplot(df, aes(x = month, y = sum)) + geom_bar(stat = "identity") +
facet_grid(.~X + year,
switch = "x", # Moves the labels from the top to the bottom
labeller = label_both # Adds the labels to the year and X variables
) +
xlab("") # Removes the month label
Result:
Or if you want to drop unused levels:
ggplot(df, aes(x = month, y = sum)) + geom_bar(stat = "identity") +
facet_grid(.~X + year,
switch = "x", # Moves the labels from the top to the bottom
labeller = label_both, # Adds the labels to the year and X variables
scales = "free_x") +
xlab("") # Removes the month legend
You can get a little more complex and use cowplot to merge the plots together. You could automate this using lapply to loop through your unique values, though that is probably overkill for just two groups.
library(ggplot2)
library(cowplot)
library(dplyr)
# Return to default theme, as cowplot sets its own
theme_set(theme_gray())
# Save y limits to get same scale
myYlims <- c(0, ceiling(max(df$sum)/100)*100)
# Generate each plot
x10 <-
ggplot(df %>%
filter(X == 10)
, aes(x = month, y = sum)) + geom_bar(stat = "identity") +
facet_grid(~ year,
switch = "x") +
panel_border() +
coord_cartesian(ylim = myYlims) +
xlab("X = 10")
x11 <-
ggplot(df %>%
filter(X == 11)
, aes(x = month, y = sum)) + geom_bar(stat = "identity") +
facet_grid(~ year,
switch = "x") +
panel_border() +
coord_cartesian(ylim = myYlims) +
xlab("X = 11")
# Put the plots together
plot_grid(x10
, x11 +
theme(axis.title.y = element_blank()
, axis.text.y = element_blank()
, axis.ticks.y = element_blank())
, rel_widths = c(1.1,1)
)
Here is an approach to automate this, including more complex data to justify the automation. Note that you will need to play with the aspect ratio of your output and with the rel_widths option to make it look decent:
df <-
data.frame(
X = rep(1:6, each = 9)
, year = rep(rep(2016:2018, each = 3),3)
, month = rep(1:3, 6)
, sum = rnorm(9*6, 700, 100)
)
# Notice the variable type for month and year
df$month <- as.factor(df$month)
df$year <- as.factor(df$year)
df$X <- as.factor(df$X)
# Save y limits to get same scale
myYlims <- c(0, ceiling(max(df$sum)/100)*100)
# Generate each plot
eachPlot <- lapply(levels(df$X), function(thisX){
ggplot(df %>%
filter(X == thisX)
, aes(x = month, y = sum)) +
geom_bar(stat = "identity") +
facet_grid(~ year,
switch = "x") +
panel_border() +
coord_cartesian(ylim = myYlims) +
xlab(paste("X =", thisX))
})
# Remove axes from all but the first
eachPlot[-1] <- lapply(eachPlot[-1], function(x){
x +
theme(axis.title.y = element_blank()
, axis.text.y = element_blank()
, axis.ticks.y = element_blank()
)
})
# Put the plots together
plot_grid(plotlist = eachPlot
, rel_widths = c(1.4, rep(1, length(eachPlot)-1))
, nrow = 1
)
Apologies for the title, I know it sucks.
I am trying to create a waterfall chart function. So, I am trying to create a basic plot, which people can configure however they wish. I ran into a problem, though, adding a gradient to the plot. For example:
I have this df:
> wfDF
category value sign id end start labels
1 Basic Materials 0.0024 pos 1 0.0024 0.0000 0.0024
2 Communications 0.0492 pos 2 0.0516 0.0024 0.0516
3 Consumer, Cyclical 0.0268 pos 3 0.0784 0.0516 0.0784
4 Consumer, Non-cyclical 0.0245 pos 4 0.1029 0.0784 0.1029
5 Diversified -0.0037 neg 5 0.0992 0.1029 0.1029
6 Energy -0.0040 neg 6 0.0952 0.0992 0.0992
7 Financial 0.0445 pos 7 0.1397 0.0952 0.1397
8 Industrial 0.0006 pos 8 0.1403 0.1397 0.1403
9 Technology -0.0059 neg 9 0.1344 0.1403 0.1403
10 Total 0.1345 pos 10 0.0000 0.1344 0.1344
With this code:
ggplot(wfDF, aes(category, fill = sign, color = sign)) + guides(fill = FALSE, color=FALSE) +
ggtitle("Risk by Industry") +
annotation_custom(g, xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
theme(plot.title = element_text(vjust=1.5, face="bold", size = 20),
axis.title.x = element_blank(), axis.title.y = element_blank()) +
geom_rect(aes(x = category, xmin = id - 0.475, xmax = id + 0.475, ymin = end, ymax = start)) +
scale_fill_manual(values=c("red", "forestgreen")) +
scale_color_manual(values=c("black", "black")) +
scale_y_continuous(labels = percent) +
scale_x_discrete("", breaks = levels(wfDF$category), labels = gsub(" ", "\n", levels(wfDF$category))) +
geom_text(data = wfDF, aes(id, labels, label = paste0(value*100, "%")), vjust = -.5, size = 5, fontface = 4)
Which produces this graph:
Which looks great. I am trying to write a function which will do all this with any set of categories and values, and allows for any colors or customization to be added or used. I have this function:
waterfall <- function(categories, values, has.total = FALSE, offset = .475, labelType = c("decimal", "percent")) {
library(scales)
library(grid)
library(ggplot2)
library(dplyr)
theData <- data.frame("category" = as.character(categories), "value" = as.numeric(values))
if (labelType == "percent") theData$value = theData$value/100
if (!has.total) theData <- theData %>% rbind(.,list("Total", sum(.$val)))
theData$sign <- ifelse(theData$val >= 0, "pos","neg")
theData <- data.frame(category = factor(theData$category, levels = unique(theData$category)),
value = round(theData$value,4),
sign = factor(theData$sign, levels = unique(theData$sign)))
theData$id <- seq_along(theData$value)
theData$end <- cumsum(theData$value)
theData$end <- c(head(theData$end, -1), 0)
theData$start <- c(0, head(theData$end, -1))
theData$labels <- paste0(theData$value*100, "%")
theData$labellocs <- pmax(theData$end,theData$start)
theGG <- ggplot(theData, aes(category, fill = sign, color = sign)) +
geom_rect(aes(x = category, xmin = id - offset, xmax = id + offset, ymin = end, ymax = start)) +
scale_x_discrete("", breaks = levels(theData$category), labels = gsub(" ", "\n", levels(theData$category))) +
geom_text(data = theData, aes(id, labellocs, label = labels), vjust = -.5, size = 5, fontface = 4)
return(theGG)
}
waterfall(categories = riskDecomp$ID, values = riskDecomp$val, labelType = "percent")
Which produces a pretty ugly basic thing:
However, if I try to run something like the following:
test <- waterfall(categories = riskDecomp$ID, values = riskDecomp$val, labelType = "percent")
g <- rasterGrob(blues9, width=unit(1,"npc"), height = unit(1,"npc"), interpolate = TRUE)
test + guides(fill = FALSE, color=FALSE) +
ggtitle("Risk Decomposition") +
annotation_custom(g, xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
theme(plot.title = element_text(vjust=1.5, face="bold", size = 20),
axis.title.x = element_blank(), axis.title.y = element_blank()) +
scale_fill_manual(values=c("red", "forestgreen")) +
scale_color_manual(values=c("black", "black")) +
scale_y_continuous(labels = percent)
I get this nonsense:
The rasterGrob thing seems to overlay the entire rest of the plot. The only workaround I can find is to add the gradient to the inside of the function. Which kind of removes the... customization of the function. Is there a way to fix this? To fix the order of the grobs? If that makes sense?
you can change the order of the layers manually,
library(grid)
library(ggplot2)
g <- rasterGrob(matrix(blues9, ,1), interpolate=TRUE,
width=unit(1,"npc"), height=unit(1,"npc"))
p <- qplot(rnorm(10), rnorm(10)) +
annotation_custom(g)
nl <- length(p$layers)
p$layers <- c(p$layers[[nl]], p$layers[-nl])
p