I'm making a ggplot with a secondary axis using the sec_axis() function but am having trouble retaining the correct scale.
Below is a reproducible example
# load package
library(ggplot2)
# produce dummy data
data = data.frame(week = 1:5,
count = c(45, 67, 21, 34, 50),
rate = c(3, 6, 2, 5, 3))
# calculate scale (and save as an object called 'scale')
scale = max(data$count)/10
# produce ggplot
p = ggplot(data, aes(x = week)) +
geom_bar(aes(y = count), stat = "identity") +
geom_point(aes(y = rate*scale)) +
scale_y_continuous(sec.axis = sec_axis(~./scale, name = "% positive",
breaks = seq(0, 10, 2)))
# look at ggplot - all looks good
p
# change the value of the scale object
scale = 2
# look at ggplot - you can see the scale has now change
p
In reality I am producing a series of ggplot's within a loop and within each iteration of the loop the 'scale' object changes
Question
How do I ensure the scale of my secondary y-axis remains fixed? (even if the value of the 'scale' object changes)
EDIT
I wanted to keep the example as simple as possible (see example above) but on request I'll add an example which includes a loop
# load package
library(ggplot2)
# produce dummy data
data = data.frame(group = c(rep("A", 5), rep("B", 5)),
week = rep(1:5, 2),
count = c(45, 67, 21, 34, 50,
120, 200, 167, 148, 111),
rate = c(3, 6, 2, 5, 3,
15, 17, 20, 11, 12))
# define the groups i want to loop over
group = c("A", "B")
# initalize an empty list (to store figures)
fig_list = list()
for(i in seq_along(group)){
# subset the data
data.sub = data[data$group == group[i], ]
# calculate scale (and save as an object called 'scale')
scale = max(data.sub$count)/20
# produce the plot
p = ggplot(data.sub, aes(x = week)) +
geom_bar(aes(y = count), stat = "identity") +
geom_point(aes(y = rate*scale), size = 4, colour = "dark red") +
scale_y_continuous(sec.axis = sec_axis(~./scale, name = "% positive",
breaks = seq(0, 20, 5))) +
ggtitle(paste("Plot", group[i]))
# print the plot
print(p)
# store the plot in a list
fig_list[[group[i]]] = p
}
I get the following figures when printing within the loop (everything looks good)
However... if I call the figure for group A from the list I created you can see the secondary y-axis scale is now incorrect (it has used the scale created for group B)
fig_list[["A"]]
Thanks for your edit, this makes things clearer. Your problem stems from the way R evaluates objects. The plot in your fig_list is not an image, but an outline on how the plot should be generated. It is only generated when you call print (by typing fig_list["A"]and hitting enter). Since the value for scale changes throughout the loop, if you evaluate the plot later, it will be incorrect, since it will use the last iteration of scale.
An easy solution is to wrap your code for plotting in a function and use lapply:
make_plot <- function(df) {
scale = max(df$count)/20
ggplot(df, aes(x = week)) +
geom_bar(aes(y = count), stat = "identity") +
geom_point(aes(y = rate*scale), size = 4, colour = "dark red") +
scale_y_continuous(sec.axis = sec_axis(~./scale, name = "% positive",
breaks = seq(0, 20, 5))) +
ggtitle(paste("Plot", unique(df$group)))
}
grouped_data <- split(data, data$group)
fig_list <- lapply(grouped_data, make_plot)
Now when you call the first plot, it is evaluated correctly.
fig_list["A"]
#> $A
This still works when you happen to have an object scale with a bogus value in your environment, since R looks up scale within the function call, and not in the global environment.
Created on 2018-09-02 by the reprex
package (v0.2.0).
Related
I'm plotting a graph in R using ggplot2. It's a lined graph with points for every observations, the points represent p-values. Three of them are not significant, and I want these points to show up differently (any other shape/color, doesn't matter). Now I'm not sure how to do this.
I've tried scale_shape_manual(values = c(valueA, valueB, valueC)) and
scale_color_manual, but I don't get any results. No error messages either, just nothing happens.
Can anyone help?
ggplot(data = dataframe) +
geom_line(aes(x=Time, y=Treatment), color="#00AFBB")+
geom_point(aes(x=Time, y=Treatment)) +
scale_y_reverse()+
scale_x_continuous( breaks = c(1, 2, 3, 4, 5, 6,7,8,9,10,11,12,13,14,15,16,17,18,19,20))
Thanks!
--
Edit: here a reproducible sample (I hope it works?):
A <- c(1,2,3,4,5)
B <- c(1,2,3,4,5)
df <- data.frame(cbind(A, B))
Here's an example, hopefully helpful. I use scale_color_identity and scale_shape_identity because my data (in this case created through the if_else statements) specifies the literal color/shape I want to use.
Time <- c(1,2,3,4,5)
Treatment <- c(1,2,3,4,5)
df <- data.frame(Time = 1:5, Treatment = 1:5)
ggplot(data = df) +
geom_line(aes(x=Time, y=Treatment), color = "#00AFBB") +
geom_point(aes(x=Time, y=Treatment,
shape = if_else(Treatment < 5, 18, 1),
color = if_else(Treatment < 5, "#00AFBB", "black")), size = 4) +
scale_y_reverse()+
scale_x_continuous( breaks = 1:20) +
scale_color_identity() +
scale_shape_identity()
I use a R package, SetMethods, to get the fsQCA results of panel data. In the package, it uses cluster.plot() function to generate a plot.
However, I have a hard time letting the x-axis of the graph show the number of units as tick marks. For example, I want it shows 10, 20, 30,..,140 on the x-axis to know how many units' consistency score lower than a certain point.
Is there any method to add tick marks on a plot that is not generated by plot() function? Thanks in advance.
Here I use the dataset in the package as an example.
install.packages("SetMethods")
library(SetMethods)
data("PAYF")
PS <- minimize(data = PAYF,
outcome = "HL",
conditions = c("HE","GG","AH","HI","HW"),
incl.cut = 0.9,
n.cut = 2,
include = "?",
details = TRUE,
show.cases = TRUE)
PS
# Perform cluster diagnostics:
CB <- cluster(data = PAYF,
results = PS,
outcome = "HL",
unit_id = "COUNTRY",
cluster_id = "REGION",
necessity=FALSE,
wicons = FALSE)
CB
# Plot pooled, between, and within consistencies:
cluster.plot(cluster.res = CB,
labs = TRUE,
size = 8,
angle = 6,
wicons = TRUE)
Finally, I get a graph as follows.
However, I want it shows 10, 20, 30,..,140 on the x-axis to know how many units' consistency score lower than a certain point.
Is there any method to add tick marks on a plot that is not generated by plot() function? Thanks in advance.
If you look inside the cluster.plot function definition (in RStudio press F2 while pointer is on it) you will see that it uses ggplot2 under the hood. Only it doesn't return ggplot2 objects but just prints them one over another. Because of this it's not really possible to modify the output afterwards in any covenient manner.
But you can always copy the function code and rewrite it for your own need. The part that prints the final plot in your case is
CTw <- list()
ticklabw = unique(as.character(cluster.res$unit_ids))
xtickw <- seq(1, length(ticklabw), by = 1)
if (class(cluster.res) == "clusterminimize") {
for (i in 1:length(cluster.res$output)) {
CTw[[i]] <- cluster.res$output[[i]]$WICONS
dtw <- data.frame(x = xtickw, y = CTw[[i]])
dtw <- dtw[order(dtw$y), ]
dtw$xr <- reorder(dtw$x, 1 - dtw$y)
pw <- ggplot(dtw, aes(y = dtw[, 2], x = dtw[,
3])) + geom_point() + ylim(0, 1) + theme_classic(base_size = 16) +
geom_hline(yintercept = cluster.res$output[[i]]$POCOS) +
labs(title = names(cluster.res$output[i]),
x = "Units", y = "Consistency") + theme(axis.text.x = element_blank())
suppressWarnings(print(pw))
}
}
You can modify the ggplot2 construction part to something like this (packages ggplot2 and dplyr need to be loaded):
pw <-
dtw %>%
mutate(x_ind = as.numeric(xr)) %>%
ggplot(aes(x_ind, y)) +
geom_point() +
ylim(0, 1) +
theme_classic(base_size = 16) +
geom_hline(yintercept = cluster.res$output[[i]]$POCOS) +
scale_x_continuous(breaks = seq(from = 0, to = 140, by = 10)) +
labs(title = names(cluster.res$output[i]),
x = "Units", y = "Consistency")
I am trying to show a boxplot and a violin plot in one.
I can fill in the colors of the boxplot and violin plot based on the treatment. But, I don't want them in exactly the same color, I'd prefer the violin plot or the boxplot filling to be lighter.
Also, I am able to get the outer lines of the boxplot in different colors if I add col=TM to the aes of the geom_boxplot. But, then I can not choose these colors or don't know how to (they are now automatically pink and blue).
BACKGROUND:
I am working with a data set that looks something like this:
TM yax X Zscore
Org zscore zhfa -1.72
Org zscore zfwa -0.12
I am plotting the z-scores based on the X (zhfa e.d.) per treatment (TM).
#Colours
ocean = c('#BBDED6' , '#61C0BF' , '#FAE3D9' , '#FFB6B9' )
## Plot ##
z <- ggplot(data = data, aes(x = X, y = Zscore,fill=TM)) +
geom_split_violin(col="white", fill="white") +
geom_boxplot(alpha = 1, width=0.3, aes(fill=TM), position = position_dodge(width = 0.3))
z + theme(axis.text = element_text(size = 12),legend.position="top") +
stat_compare_means(method="t.test", label.y=2.8, label.x=0.3, size=3) +
scale_fill_manual(values=ocean, labels=c("Mineral fertilizer", "Organic fertilizer"))
Now, half of the violin plot is filled white, but not both (which would already be better). If I would plot geom_split_violin() it would get exactly the same colors as the boxplot.
Furthermore, should the violinplot of zhfa be on the left side but it get's switched and is displayed at the right side, while it matched the data of the organic (left) boxplot.
The graph now:
I don't know if it can be solved by adding something related to the scale_fill_manual or if this is an impossible request
Sample Data:
data <- data.frame(TM = c(rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5)),
Zscore = runif(30,-2,2),
X = c(rep("zwfa", 10), rep("zhfa", 10), rep("zbfa", 10)))
You can add an additional column to your data that is the same structure as TM but different values, then scale the fill:
Sample Data:
data <- data.frame(TM = c(rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5)),
Zscore = runif(30,-2,2),
X = c(rep("zwfa", 10), rep("zhfa", 10), rep("zbfa", 10)))
Begin solution:
data <- data %>% mutate(TMm = c(rep("orgM", 5), rep("minM", 5),rep("orgM", 5), rep("minM", 5),rep("orgM", 5), rep("minM", 5)))
#Colours
ocean = c('#BBDED6' , '#FAE3D9', '#61C0BF' , '#FFFFFF')
## Plot ##
z <- ggplot(data = data, aes(x = X, y = Zscore,fill=TM)) +
geom_split_violin(mapping = aes(fill=TMm)) +
geom_boxplot(alpha = 1, width=0.3, aes(fill=TM), position = position_dodge(width = 0.3))
z + theme(axis.text = element_text(size = 12),legend.position="top") +
stat_compare_means(method="t.test", label.y=2.8, label.x=0.3, size=3) +
scale_fill_manual(breaks = c("org", "min"), values=ocean, labels=c("Mineral fertilizer", "Organic fertilizer"))
In your data you may have to change breaks = c("org", "min") to whatever you call the factor levels in the TM variable
Or if you want the whole violin plot white:
ocean = c('#BBDED6' , '#FFFFFF', '#61C0BF' , '#FFFFFF')
New Plot:
I am trying to pass an integer ("i") to a function in which "i" is used as a row index for a data frame. However, doing this...
user_definedFUN <- function (i){
...
result <- df[i, "col_name"]
...
}
x <- user_definedFUN(1)
...yields the following error:
Error in `[.data.frame`(df, i, "col_name") :
object 'i' not found
I'm certain this is a simple issue of how I am referencing "i" within the brackets (even if not simple enough for me to find a solution); however, I have provided additional details below if necessary.
The data.frame:
gen_name <- c("Boomers","Gen X","Millenials","Gen Z")
gen_years <- c("1946 to 1964","1965 to 1980","1981 to 1996", "1997 to 2011")
gen_xmin <- c(11, 9, 5, 2)
gen_xmax <- c(15, 11, 8, 5)
GEN_G.labels <- data.frame(gen_name, gen_years, gen_xmin, gen_xmax)
The data.frame contains information for four generations that will be used to plot rectangles as layers on a ggplot bar chart of populations by age.
The rectangles will be created by the following function that will be called from a loop and is provided the row index for specific generation (1 = "Boomers", 2 = "Gen X", etc.)
genlabelsFUN <- function(i){
# return a geom_rect()
rv <- geom_rect(aes(
xmin = GEN_G.labels[i, "gen_xmin"],
xmax = GEN_G.labels[i, "gen_xmax"],
ymin = 1000,
ymax = 1100)
, fill = "red")
return(rv)
}
ggplot(...snip...) +
...snip... +
genlabelsFUN(1)
The function works if a static index value is used. For example, 'GEN_G.labels[1, "gen_xmin"]' instead of 'GEN_G.labels[i, "gen_xmin"]' places a red rectangle between 11 and 15 on the x-axis at 1,000 on the y-axis with a height of 100. Although, the function is pointless without the dynamic aspect of "i".
The following image shows the output when using a static index value (Note: I'm using a different y-axis scale in my example above for simplicity). The final code will loop through each row of GEN_G.labels and run genlabelsFUN() to create a similar rectangle for each generation.
Thanks
EDIT:
Full ggplot
scaleFUN <- function(x) formatC( x / 1000, format = "f", big.mark = ",", digits = 0) #format as thousands with comma
ggplot(data = GEN_G.data_frame, aes(x = range, y = persons)) +
geom_bar(stat = "identity") +
theme_classic() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_continuous(
name = "Persons (thousands)",
labels = scaleFUN) +
genlabelsFUN(1)
EDIT 2:
Reproducible example (functioning based on MrFlick comment below)
GEN_G.dataframe <- data.frame(
range = c(1:21),
persons = abs(rnorm(21))*50)
GEN_G.labelsx <- data.frame(
gen_name = c("Group A","Group B","Group C","Group D"),
gen_xmin = c(11, 9, 5, 2),
gen_xmax = c(15, 11, 9, 5))
GEN_G.labelsx$gen_name <- factor(
GEN_G.labelsx$gen_name,
levels = GEN_G.labelsx$gen_name)
ggplot() +
geom_bar(data=GEN_G.dataframe,aes(x=range, y=persons),stat="identity") +
theme_classic() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_rect(aes(
xmin = gen_xmin,
xmax = gen_xmax,
ymin = 175,
ymax = 180,
fill = gen_name),
data = GEN_G.labelsx)
Output from Edit 2 example.
You can't use variables like i inside an aes(). Symbols inside the aes() are not evaluated until the plot is actually drawn. There's no way for R to properly capture the environment were i is defined that way so the value would have changed by the time the plot is drawn.
However, I don't really think a loop/function is even necessary. You should just be able to do
geom_rect(aes(xmin=gen_xmin, xmax=gen_xmax), ymin=1000, ymax=1000, data=GEN_G.labels)
to use a different data.frame for that layer. Then all the boxes are drawn at once without a loop.
I'm trying to create a line graph depicting different trajectories over time for two groups/conditions. I have two groups for which the data 'eat' was collected at five time points (1,2,3,4,5).
I'd like the lines to connect the mean point for each group at each of five time points, so I'd have two points at Time 1, two points at Time 2, and so on.
Here's a reproducible example:
#Example data
library(tidyverse)
library(ggplot2)
eat <- sample(1:7, size = 30, replace = TRUE)
df <- data.frame(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$time <- as.factor(df$time)
df$Condition <- as.factor(df$Condition)
#Create the plot.
library(ggplot2)
ggplot(df, aes(x = time, y = eat, fill = Condition)) + geom_line() +
geom_point(size = 4, shape = 21) +
stat_summary(fun.y = mean, colour = "red", geom = "line")
The problem is, I need my lines to go horizontally (ie to show two different colored lines moving across the x-axis). But this code just connects the dots vertically:
If I don't convert Time to a factor, but only convert Condition to a factor, I get a mess of lines. The same thing happens in my actual data, as well.
I'd like it to look like this aesthetically, with the transparent error envelopes wrapping each line. However, I don't want it to be curvy, I want the lines to be straight, connecting the means at each point.
Here's the lines running in straight segments through the means of each time, with the range set to be the standard deviation of the points at the time. One stat.summary makes the mean line with the colour aesthetic, the other makes the area using the inherited fill aesthetic. ggplot2::mean_se is a convenient function that takes a vector and returns a data frame with the mean and +/- some number of standard errors. This is the right format for thefun.data argument to stat_summary, which passes these values to the geom specified. Here, geom_ribbon accepts ymin and ymax values to plot a ribbon across the graph.
library(tidyverse)
set.seed(12345)
eat <- sample(1:7, size = 30, replace = T)
df <- data.frame(
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$Condition <- as.factor(df$Condition)
ggplot(df, aes(x = time, y = eat, fill = Condition)) +
geom_point(size = 4, shape = 21, colour = "black") +
stat_summary(geom = "ribbon", fun.data = mean_se, alpha = 0.2) +
stat_summary(
mapping = aes(colour = Condition),
geom = "line",
fun.y = mean,
show.legend = FALSE
)
Created on 2018-07-09 by the reprex package (v0.2.0).
Here's my best guess at what you want:
# keep time as numeric
df$time = as.numeric(as.character(df$time))
ggplot(df, aes(x = time, y = eat, group = Condition)) +
geom_smooth(
aes(fill = Condition, linetype = Condition),
method = "lm",
level = 0.65,
color = "black",
size = 0.3
) +
geom_point(aes(color = Condition))
Setting the level = 0.65 is about +/- 1 standard deviation on the linear model fit.
I think this code will get you most of the way there
library(tidyverse)
eat <- sample(1:7, size = 30, replace = TRUE)
tibble(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = factor(rep(c(0, 1), each = 15)),
time = factor(rep(c(1, 2, 3, 4, 5), 6)),
eat = eat) %>%
ggplot(aes(x = time, y = eat, fill = Condition, group = Condition)) +
geom_point(size = 4, shape = 21) +
geom_smooth()
geom_smooth is what you were looking for, I think. This creates a linear model out of the points, and as long as your x value is a factor, it should use the mean and connect the points that way.