Why does ggplot2 repeat (unfacet) my data when adding text annotations?

Why does ggplot2 repeat (unfacet) my data when adding text annotations? - r

Alright, this has got me stumped. I have this function:
tf <- function(formula = NULL, data = NULL) {
res <- as.character(formula[[2]])
fac2 <- as.character(formula[[3]][3])
fac1 <- as.character(formula[[3]][2])
# Aesthetic & Data 1
p <- ggplot(aes_string(x = fac1, y = res, color = fac1), data = data) +
facet_grid(paste(".~", fac2)) + geom_point() # OK if we only go this far
facCounts <- count(data, vars = c(fac2, fac1))
facCounts$label <- paste("n = ", facCounts$freq , sep = "")
facCounts$y <- min(data$res) - 0.1*diff(range(data$res))
facCounts <- facCounts[,-3]
names(facCounts) <- c("f2", "f1", "lab", "y") # data frame looks correct
# Aesthetic & Data 2
p <- p + geom_text(aes(x = f1, y = y, label = lab),
color = "black", size = 4.0, data = facCounts) + facet_grid(".~f2")
p
}
Which when run with this data and call:
set.seed(1234)
mydf <- data.frame(
resp = rnorm(40),
cat1 = sample(LETTERS[1:3], 40, replace = TRUE),
cat2 = sample(letters[1:2], 40, replace = TRUE))
p <- tf(formula = resp~cat1*cat2, data = mydf); print(p)
Produces this picture:
If you look carefully, you'll see that the data in the two facets are actually the same. The counts are correct for that data that should be displayed (and is stored in facCounts). If the call to geom_text is commented out, then the plot is correct. A variety of changes to the geom_text call leave me with either what you see above, or the correct data is present but the count texts overlap. I can't find the way out of this labyrinth! An attempt with + annotate("text", ...) doesn't work either. What change is needed to keep the data faceted and the counts correct? Thanks. This is ggplot 0.9.3 btw.

Now that I've convinced myself that this will work:
tf <- function(formula = NULL, data = NULL) {
res <- as.character(formula[[2]])
fac2 <- as.character(formula[[3]][3])
fac1 <- as.character(formula[[3]][2])
# Aesthetic & Data 1
p <- ggplot(aes_string(x = fac1, y = res, color = fac1), data = data) +
facet_grid(paste(".~", fac2)) + geom_point() # OK if we only go this far
facCounts <- count(data, vars = c(fac2, fac1))
facCounts$label <- paste("n = ", facCounts$freq , sep = "")
facCounts$y <- min(data$res) - 0.1*diff(range(data$res))
facCounts <- facCounts[,-3]
names(facCounts) <- c("cat2", "f1", "lab", "y") # data frame looks correct
# Aesthetic & Data 2
p <- p + geom_text(aes(x = f1, y = y, label = lab),
color = "black", size = 4.0, data = facCounts)
p
}
You were calling facet_grid a second time using a differently named faceting variable. Removing the second call and renaming f2 to `cat2 seems to work.

Related

Facet Labels not being retained when printing ggplot objects stored in a list

Here is the data that I will be using to give context to my question:
library(dplyr)
library(tidyr)
library(ggplot2)
set.seed(1)
f1 <- sample(c(letters[1:3],NA),100, prob = c(rep((0.9/3),times = 3),0.1),replace = T)
f2 <- sample(c(letters[1:3],NA),100, prob = c(rep((0.8/3),times = 3),0.2),replace = T)
f3 <- sample(c(letters[1:3],NA),100, prob = c(rep((0.95/3),times = 3),0.01),replace = T)
sample_dat <- tibble(
x1 = factor(f1, level=letters[1:3]),
x2 = factor(f2, level=letters[1:3]),
x3 = factor(f3, level=letters[1:3]),
grpA = factor(sample(c("grp1","grp2"),100, prob=c(0.3, 0.7) ,replace=T),
levels = c("grp1", "grp2"))
)
sample_dat
here is a function that I created to prepare the data for plotting:
plot_data_prepr <- function(dat, groupvar, mainvar){
groupvar <- sym(groupvar)
mainvar <- sym(mainvar)
plot_data <- dat %>%
group_by(!!groupvar) %>%
count(!!mainvar, .drop = F) %>% drop_na() %>%
mutate(pct = n/sum(n),
pct2 = ifelse(n == 0, 0.005, n/sum(n)),
grp_tot = sum(n),
pct_lab = paste0(format(pct*100, digits = 1),'%'),
pct_pos = pct2 + .02)
return(plot_data)
}
here is the application of the function to produce the data sets I will use for plotting
plot_data_prepr(dat = sample_dat, groupvar = "grpA", mainvar = "x1")
plot_data_prepr(dat = sample_dat, groupvar = "grpA", mainvar = "x2")
plot_data_prepr(dat = sample_dat, groupvar = "grpA", mainvar = "x3")
here I use a for loop to plot the data and dynamically change the labels of the facets -- if one runs this in
rstudio as an RMarkdown file, one can see that the plots are produced and the labels for the facets are
each distinct as they should be given the different degrees of missingness and sampling densities for the
'grpA' variable.
plot_list <- vector('list', length = 0)
for (fct in names(sample_dat)[1:3]){
mvar <- fct
smvar <- sym(mvar)
gvar <- "grpA"
sgvar <- sym(gvar)
dd <- plot_data_prepr(dat = sample_dat, groupvar = gvar, mainvar = mvar)
pre_lookup <- dd %>%
select(!!sgvar, grp_tot) %>%
group_by(!!sgvar) %>%
summarise(lookup = mean(grp_tot))
lookup <- pre_lookup$lookup
my_label <- function(x) {
var <- names(x)[1]
list(paste0(x[[var]], " (N = ", lookup, ")"))
}
plot <- ggplot(dd,
mapping = aes(x=!!smvar, y = pct2, fill = !!smvar)) +
geom_bar(stat = 'identity') +
ylim(0,1.3) +
geom_text(aes(x=!!smvar, label=pct_lab, y = pct_pos + .02)) +
facet_grid(as.formula(paste0(".~", gvar)), labeller = my_label) +
ggtitle(paste(gvar,"by",mvar))
plot_list[[fct]] <- plot
print(plot)
}
Here's my problem -- when I print the plots which are stored in the list,
they all seem to retain the facet label from the last plot, instead of retaining
the distinct facet-labels they displayed when they were originally generated.
for (name in names(sample_dat)[1:3]){
print(plot_list[[name]])
}
Basically, I would like to be able to print the plots from the list
when I need them and have them display their distinct facet labels
as they had been displayed when the plots were originally produced.
Perhaps someone in the community could help me?

I would suggest you try to avoid the loop for the plots building. It uses to create that kind of issues as you have with labels or sometimes with data. Here, I have packaged your loop in a function and stored the results in a list. Also, you can use lapply() with the names of your data in order to directly create the list with the plots. Here the code:
#Function for plot
myplotfun <- function(fct)
{
mvar <- fct
smvar <- sym(mvar)
gvar <- "grpA"
sgvar <- sym(gvar)
dd <- plot_data_prepr(dat = sample_dat, groupvar = gvar, mainvar = mvar)
pre_lookup <- dd %>%
select(!!sgvar, grp_tot) %>%
group_by(!!sgvar) %>%
summarise(lookup = mean(grp_tot))
lookup <- pre_lookup$lookup
my_label <- function(x) {
var <- names(x)[1]
list(paste0(x[[var]], " (N = ", lookup, ")"))
}
plot <- ggplot(dd,
mapping = aes(x=!!smvar, y = pct2, fill = !!smvar)) +
geom_bar(stat = 'identity') +
ylim(0,1.3) +
geom_text(aes(x=!!smvar, label=pct_lab, y = pct_pos + .02)) +
facet_grid(as.formula(paste0(".~", gvar)), labeller = my_label) +
ggtitle(paste(gvar,"by",mvar))
return(plot)
}
Now, we create a list:
#Create a list
plot_list <- lapply(names(sample_dat)[1:3],myplotfun)
Finally, the plots as you used in the last loop:
#Loop
for (i in 1:length(plot_list)){
plot(plot_list[[i]])
}
Outputs:

The problem is your my_label function has a free variable lookup that's only resolved when you actually plot the function. After your for-loop runs, then you it only contains the last value in the loop. To capture the current loop value, you can place it inside an enclosure. So you could change the my_label function to
my_labeler <- function(lookup) {
function(x) {
var <- names(x)[1]
list(paste0(x[[var]], " (N = ", lookup, ")"))
}
}
and then call facet_grid with
facet_grid(as.formula(paste0(".~", gvar)), labeller = my_labeler(lookup))
But I agree with #Duck that avoiding the for-loop in this case would be easier.

Can't print all ggplot charts I need

There is a good discussion about using ggplot in loop and other creative ways at Looping over variables in ggplot. However, the discussion does not quite solve my problem.
I have a vertical dataset that I need to create plots from in a loop. There is no error in the code but my code only prints the last plot. Can't figure out why. Here is a reproducible example:
df <- cbind.data.frame(var = sample(c('a','b'), size = 100, replace = TRUE),
grp = sample(c('x','y'), size = 100, replace = TRUE), value = rnorm(100))
for (i in 2) {
plot.df <- df[which(df$var == c('a','b')[i]),]
print(ggplot(plot.df, aes(x = 1:nrow(plot.df), y = value, color = grp)) +
geom_line() + ggtitle(c('a','b')[i]))
}

As an alternative, you might also consider using lapply, as it makes the code a lot more readable.
If I am not mistaken you want to produce plots for each of the levels of the variable var.
You can firstly define your function, and then apply it to all levels
my_plot <- function(x){
# debug: x <- "a"
plot.df <- df[df$var %in% x,]
ggplot(plot.df, aes(x = 1:nrow(plot.df), y = value, color = grp)) +
geom_line() + ggtitle(x)
}
lapply(unique(df$var), my_plot)

The comment by #EJJ is correct, your loop isn't you need something like
for (i in seq_along(1:nlevels(factor(df$var))))
library(ggplot2)
library(dplyr)
df <- cbind.data.frame(var = sample(c('a','b'), size = 100, replace = TRUE),
grp = sample(c('x','y'), size = 100, replace = TRUE), value = rnorm(100))
for (i in seq_along(1:nlevels(factor(df$var)))) {
plot.df <- df[which(df$var == c('a','b')[i]),]
print(ggplot(plot.df, aes(x = 1:nrow(plot.df), y = value, color = grp)) +
geom_line() + ggtitle(c('a','b')[i]))
}

How to create multiple (6) plots with ggplot and save them to a pdf file?

I have a matrix (pred_matrix, dim = 1e6, 250), the rows are "pixelstacks" of 250 NDVI values of a Landsat scene, from which i did a "fuzzy cmeans" classification witch 6 centers (classes), stored in the list results. I want now to plot a random subset of each class of the 1e6 rows. This is my quick and dirty code so far:
random_index <- floor(runif(10000, 1, 1e6+1))
random_cluster <- results[[6]]$cluster[random_index]
random_pred_matrix <- pred_matrix[random_index, ]
dates_subse_after_pred <- rdn_num[rm_na_pred_df]
random_res <- cbind(random_pred_matrix, random_cluster)
random_res <- t(random_res)
random_res <- cbind(c(dates_subse_after_pred, 1), random_res)
df_1 <- data.frame(random_res[1:250,c(TRUE, random_cluster==1)])
df_2 <- data.frame(random_res[1:250,c(TRUE, random_cluster==2)])
df_3 <- data.frame(random_res[1:250,c(TRUE, random_cluster==3)])
df_4 <- data.frame(random_res[1:250,c(TRUE, random_cluster==4)])
df_5 <- data.frame(random_res[1:250,c(TRUE, random_cluster==5)])
df_6 <- data.frame(random_res[1:250,c(TRUE, random_cluster==6)])
df_1.long <- melt(df_1, id.vars = 1)
df_1.long$X1 <- as.Date(df_1.long$X1)
df_2.long <- melt(df_2, id.vars = 1)
df_2.long$X1 <- as.Date(df_2.long$X1)
df_3.long <- melt(df_3, id.vars = 1)
df_3.long$X1 <- as.Date(df_3.long$X1)
df_4.long <- melt(df_4, id.vars = 1)
df_4.long$X1 <- as.Date(df_4.long$X1)
df_5.long <- melt(df_5, id.vars = 1)
df_5.long$X1 <- as.Date(df_5.long$X1)
df_6.long <- melt(df_6, id.vars = 1)
df_6.long$X1 <- as.Date(df_6.long$X1)
ggplot(df_1.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "lightblue")
ggplot(df_2.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "blue")
ggplot(df_3.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "lightgreen")
ggplot(df_4.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "green")
ggplot(df_5.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "pink")
ggplot(df_6.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "red")
After this i have just hit 6 times the export button in rstudio and inserted it all in a word document...
Is there a way to do this in a loop? Or even produce a final pdf containing the 6 plots?

Separate file
I think what you are after is having the following six times in your code.
ggsave("filename.png", # or pdf if you like
plot = last_plot(), # or give ggplot object name as in myPlot,
width = 5, height = 5,
units = "in", # other options c("in", "cm", "mm"),
dpi = 300)
For example,
library(ggplot2)
p1 <- ggplot(df_1.long) +
geom_line( aes(x = X1, y= value, group = variable),
color = "lightblue")
ggsave("df1.png", plot = p1, dpi = 300)
All in one
If you want all the six files in one pdf, then first do
pdf("file_name.pdf")
# do your ggplots here
p1
p2
p6
dev.off()

If you are using Rstudio I would recommend writing your code in a Rmarkdown file and then exporting to pdf directly.

Need help on ggplot in R, I am getting an error: 'argument "x" is missing, with no default'

[enter image description here][1]I am trying to create a lowry plot in R but am having difficulty debugging the errors returned. I am using the following code to create the plot:
library(ggplot2)
library(reshape)
m_xylene_data <- data.frame(
Parameter = c(
"BW", "CRE", "DS", "KM", "MPY", "Pba", "Pfaa",
"Plia", "Prpda", "Pspda", "QCC", "QfaC", "QliC",
"QPC", "QspdC", "Rurine", "Vfac", "VliC", "Vmax"),
"Main Effect" = c(
1.03E-01, 9.91E-02, 9.18E-07, 3.42E-02, 9.27E-3, 2.82E-2, 2.58E-05,
1.37E-05, 5.73E-4, 2.76E-3, 6.77E-3, 8.67E-05, 1.30E-02,
1.19E-01, 4.75E-04, 5.25E-01, 2.07E-04, 1.73E-03, 1.08E-03),
Interaction = c(
1.49E-02, 1.43E-02, 1.25E-04, 6.84E-03, 3.25E-03, 7.67E-03, 8.34E-05,
1.17E-04, 2.04E-04, 7.64E-04, 2.84E-03, 8.72E-05, 2.37E-03,
2.61E-02, 6.68E-04, 4.57E-02, 1.32E-04, 6.96E-04, 6.55E-04
)
)
fortify_lowry_data <- function(data,
param_var = "Parameter",
main_var = "Main.Effect",
inter_var = "Interaction")
{
#Convert wide to long format
mdata <- melt(data, id.vars = param_var)
#Order columns by main effect and reorder parameter levels
o <- order(data[, main_var], decreasing = TRUE)
data <- data[o, ]
data[, param_var] <- factor(
data[, param_var], levels = data[, param_var]
)
#Force main effect, interaction to be numeric
data[, main_var] <- as.numeric(data[, main_var])
data[, inter_var] <- as.numeric(data[, inter_var])
#total effect is main effect + interaction
data$.total.effect <- rowSums(data[, c(main_var, inter_var)])
#Get cumulative totals for the ribbon
data$.cumulative.main.effect <- cumsum(data[, main_var])
data$.cumulative.total.effect <- cumsum(data$.total.effect)
#A quirk of ggplot2 means we need x coords of bars
data$.numeric.param <- as.numeric(data[, param_var])
#The other upper bound
#.maximum = 1 - main effects not included
data$.maximum <- c(1 - rev(cumsum(rev(data[, main_var])))[-1], 1)
data$.valid.ymax <- with(data,
pmin(.maximum, .cumulative.total.effect)
)
mdata[, param_var] <- factor(
mdata[, param_var], levels = data[, param_var]
)
list(data = data, mdata = mdata)
}
lowry_plot <- function(data,
param_var = "Parameter",
main_var = "Main.Effect",
inter_var = "Interaction",
x_lab = "Parameters",
y_lab = "Total Effects (= Main Effects + Interactions)",
ribbon_alpha = 0.5,
x_text_angle = 25)
{
#Fortify data and dump contents into plot function environment
data_list <- fortify_lowry_data(data, param_var, main_var, inter_var)
list2env(data_list, envir = sys.frame(sys.nframe()))
p <- ggplot(data) +
geom_bar(aes_string(x = param_var, y = "value", fill = "variable"),
data = mdata) +
geom_ribbon(
aes(x = .numeric.param, ymin = .cumulative.main.effect, ymax =
.valid.ymax),
data = data,
alpha = ribbon_alpha) +
xlab(x_lab) +
ylab(y_lab) +
scale_y_continuous(labels = "percent") +
theme(axis.text.x = text(angle = x_text_angle, hjust = 1)) +
scale_fill_grey(end = 0.5) +
theme(legend.position = "top",
legend.title =blank(),
legend.direction = "horizontal"
)
p
}
m_xylene_lowry <- lowry_plot(m_xylene_data)
When I run the code, it is giving me the following error:
Error: argument "x" is missing, with no default
It is not specific enough for me to know what the issue is. What is causing the error to be displayed and how can I make error statements more verbose?
Lowry PLOT

It seems that you have more than one faulty element in your code than just the error it throws. In my experience it always helps to first check whether the code works as expected before putting it into a function. The plotting-part below should work:
p <- ggplot(data) + # no need to give data here, if you overwrite it anyway blow, but does not affect outcome...
# geom_bar does the counting but does not take y-value. Use geom_col:
geom_col(aes_string(x = param_var, y = "value", fill = "variable"),
data = mdata,
position = position_stack(reverse = TRUE)) +
geom_ribbon(
aes(x = .numeric.param, ymin = .cumulative.main.effect, ymax =
.valid.ymax),
data = data,
alpha = ribbon_alpha) +
xlab(x_lab) +
ylab(y_lab) +
# use scales::percent_format():
scale_y_continuous(labels = scales::percent_format()) +
# text is not an element you can use here, use element_text():
theme(axis.text.x = element_text(angle = x_text_angle, hjust = 1)) +
scale_fill_grey(end = 0.5) +
# use element_blank(), not just blank()
theme(legend.position = "top",
legend.title = element_blank(),
legend.direction = "horizontal"
)
This at least plots something, but I'm not sure whether it is what you expect it to do. It would help if you could show the desired output.
Edit:
Added position = position_stack(reverse = TRUE) to order according to sample plot.

ggplot2: how to reduce the number of items in a legend

I have the following function:
gg.barplots <- function(inp, order, xlab.strg, ylab.strg) {
require(RColorBrewer)
require(ggplot2)
require(reshape2)
arg <- c(expression(hat(p)[M]), expression(hat(p)[C]))
p <- order
col <- c(colorRampPalette(brewer.pal(9,'Blues')[2:9])(p+2),
colorRampPalette(brewer.pal(9,'Oranges')[2:9])(p+2))
lab <- c(0:p, paste(">",p,sep=""))
freq.mat <- data.frame(labels = lab, inp)
names(freq.mat) <- c("x", "Magnitude-only", "Complex-valued")
freq.mat$x <- factor(freq.mat$x, levels = c(levels(freq.mat$x)[-1],levels(freq.mat$x)[1]))
## force the orders to be as we want them to appear, using the factor function with levels specified.
freq.df <- melt(data = freq.mat, id.vars = 1, measure.vars = 2:3)
fill.vars <- paste(rep(names(freq.mat)[-1], times = p), rep(freq.mat$x, each = 2), sep = ":")
fill.vars <- factor(fill.vars, levels = fill.vars)
freq.df <- data.frame(fill.vars, freq.df[rep(c(0,p+2), times = p + 2) + rep(1:(p + 2), each = 2), ])
ggplot(data=freq.df, aes(x = x, y = value, fill = fill.vars)) +
geom_bar(stat="identity", position=position_dodge(), colour = "black") +
scale_fill_manual(values = col[rep(c(0,p+2), times = p + 2) + rep(1:(p + 2), each = 2)]) +
theme_bw() +
xlab(arg) +
ylab(ylab.strg) +
xlab(xlab.strg) +
ylab(ylab.strg)
}
which gives me the following (two dodged barplots) as in the following example:
dput(out.AR2$AR.rate)
structure(c(0.25178, 0.06735, 0.64564, 0.03523, 0.04396, 0.0027,
0.90415, 0.04919), .Dim = c(4L, 2L), .Dimnames = list(c("0",
"1", "2", ">2"), NULL))
and calling the function:
gg.barplots(inp = out.AR2$AR.rate, order = 2, xlab.strg = "AR order", ylab.strg = "Proportions")
which results in the following figure:
Now I feel that (even ignoring the inherent ugliness of the current legend in this plot), the whole legend is not necessary. I think it is enought to have only the colors (say the mid-valye of the Oranges scale and the mid-value of the Blues scale) should be enough to represent the important parts of the plot. The remainder (AR orders in the legend) are already there in the figure.
My question: is how do I make a legend which has only these two colors (and the words Complex-value and Magnitude-only) associated with them? I have tried several things and I am a bit lost, sorry.

Your function is a little messy - you could probably split it into two functions, one to clean and one to plot.
Anyways, the easiest way to get what you want is to use the breaks argument to scale_fill_manual. This allows you to choose only those levels you want in the legend:
gg.barplots <- function(inp, order, xlab.strg, ylab.strg) {
require(RColorBrewer)
require(ggplot2)
require(reshape2)
arg <- c(expression(hat(p)[M]), expression(hat(p)[C]))
p <- order
col <- c(colorRampPalette(brewer.pal(9,'Blues')[2:9])(p+2),
colorRampPalette(brewer.pal(9,'Oranges')[2:9])(p+2))
lab <- c(0:p, paste(">",p,sep=""))
freq.mat <- data.frame(labels = lab, inp)
names(freq.mat) <- c("x", "Magnitude-only", "Complex-valued")
freq.mat$x <- factor(freq.mat$x, levels = c(levels(freq.mat$x)[-1],levels(freq.mat$x)[1]))
## force the orders to be as we want them to appear, using the factor function with levels specified.
freq.df <- melt(data = freq.mat, id.vars = 1, measure.vars = 2:3)
fill.vars <- paste(rep(names(freq.mat)[-1], times = p), rep(freq.mat$x, each = 2), sep = ":")
fill.vars <- factor(fill.vars, levels = fill.vars)
freq.df <- data.frame(fill.vars, freq.df[rep(c(0,p+2), times = p + 2) + rep(1:(p + 2), each = 2), ])
ggplot(data=freq.df, aes(x = x, y = value, fill = fill.vars)) +
geom_bar(stat="identity", position=position_dodge(), colour = "black") +
scale_fill_manual(values = col[rep(c(0,p+2), times = p + 2) + rep(1:(p + 2), each = 2)], breaks = c("Magnitude-only:2", "Complex-valued:2")) +
theme_bw() +
xlab(arg) +
ylab(ylab.strg) +
xlab(xlab.strg) +
ylab(ylab.strg)
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Why does ggplot2 repeat (unfacet) my data when adding text annotations? - r

Related

Facet Labels not being retained when printing ggplot objects stored in a list

Can't print all ggplot charts I need

How to create multiple (6) plots with ggplot and save them to a pdf file?

Need help on ggplot in R, I am getting an error: 'argument "x" is missing, with no default'

ggplot2: how to reduce the number of items in a legend

Categories

Resources