I have a dataframe to plot where y_axis variable is a character one. I want to take only the last part of character with '_' as separation.
Here an example with iris dataset. As you can see, all y_axis labels are the same. How can I do it? thanks
iris$trial = paste('hello', 'good_bye', iris$Sepal.Length, sep = '_')
myfun = function(x) {
tail(unlist(strsplit(x, '_')), n = 1)
}
ggplot(iris, aes(x = Species, y = trial, color = Species)) +
geom_point() +
scale_y_discrete(labels = function(x) myfun(x)) +
theme_bw()
It seems to me that you function is only applied to the first row of the column. That value is replicated. Using lapply returns all the unique values. However, I don't know if it makes sense in this example without making it numeric (and sorting it) so you might want to add that as well.
ggplot(iris, aes(x = Species, y = trial, color = Species)) +
geom_point() +
scale_y_discrete(labels = lapply(iris$trial, myfun)) +
theme_bw()
You can make use of regex instead to extract the required value.
library(ggplot2)
#This removes everything until the last underscore
myfun = function(x) sub('.*_', '', x)
ggplot(iris, aes(x = Species, y = trial, color = Species)) +
geom_point() +
scale_y_discrete(labels = myfun) +
theme_bw()
If you want to extract numbers from y-axis value, you can also use scale_y_discrete(labels = readr::parse_number).
Related
I am trying to create a function that plots multiple columns in a dataframe (36 in total) using the following function:
big5p1 <- function(i) {
ggplot(big5_pos, aes(x= i, y = title)) +
geom_bar(stat="identity", width=0.5) +
xlab(colnames(big5_pos)[i]) + #Issues with NAs
ylab("Position") +
geom_vline(xintercept = mean(i), color="red")
}
lapply(big5_pos[2:3], big5p1)
When I check colnames(big5_pos[2:36]) I do get a correct list of character names for each column. However, when using apply only some of the xlabs are printed correctly, and the rest just have NA as the label. Not sure what I am overlooking but any help or advice would be much appreciated!
Change your function to accept column name.
library(ggplot2)
big5p1 <- function(i) {
ggplot(big5_pos, aes(x = .data[[i]], y = title)) +
geom_bar(stat="identity", width=0.5) +
xlab(i) +
ylab("Position") +
geom_vline(xintercept = mean(big5_pos[[i]], na.rm = TRUE), color="red")
}
result <- lapply(names(big5_pos)[2:3], big5p1)
We could convert the column names to symbols and evaluate with (!!)
library(ggplot2)
big5p1 <- function(nm) {
ggplot(big5_pos, aes(x = !! rlang::sym(nm), y = title)) +
geom_bar(stat = "identity", width = 0.5) +
xlab(nm) +
ylab("Position") +
geom_vline(xintercept = mean(big5_pos[[nm]], na.rm = TRUE), color="red")
}
and then loop over the column names
result <- lapply(names(big5_pos)[2:3], big5p1)
I know how to modify titles in ggplot without altering the original data. Suppose I have the following data frame and I want to change the labels. Then, I would do so in the following way
df <- data.frame(x = 1:4, y = 1:4, label = c(c("params[1]", "params[2]", "params[3]",
"params[4]")))
params_names <- list(
'params[1]'= "beta[11]",
'params[2]'= "beta[22]",
'params[3]'= "beta[33]",
'params[4]'= "beta[44]"
)
param_labeller <- function(variable, value){
params_names[value]
}
ggplot(df, aes(x=x,y=y)) +
geom_point() +
facet_grid(~label, labeller = param_labeller)
If I wanted to display the subscripts, I would just do this
ggplot(df, aes(x=x,y=y)) +
geom_point() +
facet_grid(~label, labeller = label_parsed)
How do I apply both operations at the same time?
I don't know exactly if this conflicts with you not wanting to "alter" the original data, but you add the labelling information to the factor itself:
df$label2 <- factor(df$label,
labels = c("beta[4]", "beta[24]", "beta[42]", "beta[43]"))
ggplot(df, aes(x = x, y = y)) +
geom_point() +
facet_grid( ~ label2, labeller = label_parsed)
This produces the following plot:
Plot with formatted facet labels
I am trying to write a function to plot graphs in a grid. I am using ggplot and facet grid. I am unable to pass the argument for facet grid. I wonder if anybody can point me in the right direction.
The data example:
Year = as.factor(rep(c("01", "02"), each = 4, times = 1))
Group = as.factor(rep(c("G1", "G2"), each = 2, times = 2))
Gender = as.factor(rep(c("Male", "Female"), times = 4))
Percentage = as.integer(c("80","20","50","50","45","55","15","85"))
df1 = data.frame (Year, Group, Gender, Percentage)
The code for the grid plot without function is:
p = ggplot(data=df1, aes(x=Year, y=Percentage, fill = Gender)) + geom_bar(stat = "identity")
p = p + facet_grid(~ Group, scales = 'free')
p
This produces a plot like the ones I want to do. However, when I put it into a function:
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
And then run:
MyGridPlot(df1, df1Year, df1$Percentage, df1$Gender, df1$Group)
It comes up with the error:
Error: At least one layer must contain all faceting variables: `fgrid`.
* Plot is missing `fgrid`
* Layer 1 is missing `fgrid
I have tried using aes_string, which works for the x, y and fill but not for the grid.
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes_string(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
and then run:
MyGridPlot(df1, Year, Percentage, Gender, Group)
This produces the same error. If I delete the facet grid, both function code runs well, though no grid :-(
Thanks a lot for helping this beginner.
Gustavo
Your problem is that in your function, ggplot is looking for variable names (x_axis, y_axis, etc), but you're giving it objects (df1$year...).
There are a couple ways you could deal with this. Maybe the simplest would be to rewrite the function so that it expects objects. For example:
MyGridPlot <- function(x_axis, y_axis, bar_fill, fgrid){ # Note no df parameter here
df1 <- data.frame(x_axis = x_axis, y_axis = y_axis, bar_fill = bar_fill, fgrid = fgrid) # Create a data frame from inputs
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
MyGridPlot(Year, Percentage, Gender, Group)
Alternatively, you could set up the function with a data frame and variable names. There isn't really much reason to do this if you're working with individual objects the way you are here, but if you're working with a data frame, it might make your life easier:
MyGridPlot <- function(df, x_var, y_var, fill_var, grid_var){
# Need to "tell" R to treat parameters as variable names.
df <- df %>% mutate(x_var = UQ(enquo(x_var)), y_var = UQ(enquo(y_var)), fill_var = UQ(enquo(fill_var)), grid_var = UQ(enquo(grid_var)))
p = ggplot(data = df, aes(x = x_var, y = y_var, fill = fill_var)) + geom_bar(stat = "identity")
p = p + facet_grid(~grid_var, scales = 'free')
return(p)
}
MyGridPlot(df1, Year, Percentage, Gender, Group)
How can I use stat_summary to label a plot with n = x where is x a variable? Here's an example of the desired output:
I can make that above plot with this rather inefficient code:
nlabels <- sapply(1:length(unique(mtcars$cyl)), function(i) as.vector(t(as.data.frame(table(mtcars$cyl))[,2][[i]])))
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
geom_text(aes(x = 1, y = median(mtcars$mpg[mtcars$cyl==sort(unique(mtcars$cyl))[1]]), label = paste0("n = ",nlabels[[1]]) )) +
geom_text(aes(x = 2, y = median(mtcars$mpg[mtcars$cyl==sort(unique(mtcars$cyl))[2]]), label = paste0("n = ",nlabels[[2]]) )) +
geom_text(aes(x = 3, y = median(mtcars$mpg[mtcars$cyl==sort(unique(mtcars$cyl))[3]]), label = paste0("n = ",nlabels[[3]]) ))
This is a follow up to this question: How to add a number of observations per group and use group mean in ggplot2 boxplot? where I can use stat_summary to calculate and display the number of observations, but I haven't been able to find a way to include n = in the stat_summary output. Seems like stat_summary might be the most efficient way to do this kind of labelling, but other methods are welcome.
You can make your own function to use inside the stat_summary(). Here n_fun calculate place of y value as median() and then add label= that consist of n= and number of observations. It is important to use data.frame() instead of c() because paste0() will produce character but y value is numeric, but c() would make both character. Then in stat_summary() use this function and geom="text". This will ensure that for each x value position and label is made only from this level's data.
n_fun <- function(x){
return(data.frame(y = median(x), label = paste0("n = ",length(x))))
}
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
stat_summary(fun.data = n_fun, geom = "text")
Most things in R are vectorized, so you can leverage that.
nlabels <- table(mtcars$cyl)
# To create the median labels, you can use by
meds <- c(by(mtcars$mpg, mtcars$cyl, median))
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
geom_text(data = data.frame(), aes(x = names(meds) , y = meds,
label = paste("n =", nlabels)))
Regarding the nlables:
Instead of your sapply statement you can simply use:
nlabels <- table(mtcars$cyl)
Notice that your current code is taking the above, converting it, transposing it, then iterating over each row only to grab the values one by one, then put them back together into a single vector.
If you really want them as an un-dimensioned integer vector, use c()
nlabels <- c(table(mtcars$cyl))
but of course, even this is not needed to accomplish the above.
When passing missing values to ggplot, it's very kind, and warns us that they are present. This is acceptable in an interactive session, but when writing reports, you do not the output get cluttered with warnings, especially if there's many of them. Below example has one label missing, which produces a warning.
library(ggplot2)
library(reshape2)
mydf <- data.frame(
species = sample(c("A", "B"), 100, replace = TRUE),
lvl = factor(sample(1:3, 100, replace = TRUE))
)
labs <- melt(with(mydf, table(species, lvl)))
names(labs) <- c("species", "lvl", "value")
labs[3, "value"] <- NA
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value, label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
If we wrap suppressWarnings around the last expression, we get a summary of how many warnings there were. For the sake of argument, let's say that this isn't acceptable (but is indeed very honest and correct). How to (completely) suppress warnings when printing a ggplot2 object?
You need to suppressWarnings() around the print() call, not the creation of the ggplot() object:
R> suppressWarnings(print(
+ ggplot(mydf, aes(x = species)) +
+ stat_bin() +
+ geom_text(data = labs, aes(x = species, y = value,
+ label = value, vjust = -0.5)) +
+ facet_wrap(~ lvl)))
R>
It might be easier to assign the final plot to an object and then print().
plt <- ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
R> suppressWarnings(print(plt))
R>
The reason for the behaviour is that the warnings are only generated when the plot is actually drawn, not when the object representing the plot is created. R will auto print during interactive usage, so whilst
R> suppressWarnings(plt)
Warning message:
Removed 1 rows containing missing values (geom_text).
doesn't work because, in effect, you are calling print(suppressWarnings(plt)), whereas
R> suppressWarnings(print(plt))
R>
does work because suppressWarnings() can capture the warnings arising from the print() call.
A more targeted plot-by-plot approach would be to add na.rm=TRUE to your plot calls.
E.g.:
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5), na.rm=TRUE) +
facet_wrap(~ lvl)
In your question, you mention report writing, so it might be better to set the global warning level:
options(warn=-1)
the default is:
options(warn=0)