Variable Created With Mutate Not Found With ggplot - r

New to R.
I created a new variable with dplyr::mutate() and I see the values in the df output when I run the code, but when I try to plot it with ggplot, I receive object not found error. What am I doing wrong? Thx.
Works as expected:
mutate(avg_inv = (inv_total / sr_count))
Error here:
# Plot avg invoice
p <- ggplot(df1, aes(x = Date_Group, y = avg_inv) ) +
geom_bar(stat = "identity", position="dodge")
p
Error message:
Error in eval(expr, envir, enclos) : object 'avg_inv' not found

I think you might not be saving the result of mutate, so even though the results print to your console, it's not available for ggplot2.
Try:
df1 <- df %>% mutate(avg_inv = (inv_total / sr_count))
p <- ggplot(df1, aes(x = Date_Group, y = avg_inv) ) +
geom_bar(stat = "identity", position="dodge")
p

How about this; Here I'm computing the additional variable within the function call to ggplot. This saves me the hassle of a temporary variable to hold the temporary result and is error free too.
data("airquality")
library(ggplot2)
library(dplyr)
p<- ggplot(airquality %>%
mutate(somevar=(Month/Day)), aes(x = somevar) ) +
geom_histogram(position = "stack", stat = "bin", binwidth = 5)
print(p)

Related

"Error in eval(expr, envir, enclos) : object 'y' not found" and "Removed 1 rows containing missing values (geom_text)"

It says that the error is in the "library(ggplot2)" line and I don't know how to fix it.
Here's the code I was using:
library(ggplot2)
library('remotes')
remotes::install_github("GuangchuangYu/nCov2019", dependencies = TRUE)
library('nCov2019')
get_nCov2019(lang = 'en')
library(dplyr)
library(magrittr)
d <- y['global']
f <- d %>% dplyr::filter(time == time(y)) %>% top_n(180, cum_confirm) %>% arrange(desc(cum_confirm))
library(ggrepel)
library(dplyr)
require(ggplot2)
require(ggrepel)
ggplot(filter(d, d$time > '2020-02-05' & country %in% f$country), mapping = aes(time, cum_confirm , color = country, label = country)) +
geom_line() +
geom_text(data = f, aes(label = country, colour = country, x = time, y = cum_confirm))+
theme_minimal(base_size = 14)+
theme(legend.position = "none") +
ggtitle('Covid-19 Cases by Country', 'The progression of confirmed cases by countries')+
ylab('Confirmed Cases')
The graph seems about right when I run the chunk, but I also get the following message and I don't know what does that mean: "Removed 1 rows containing missing values (geom_text)."
Don't worry, the message should only be a "warning", it is because in the data vector that ggplot is receiving it has found one (or more than one) element of type NA.
This is common in ggplot, it is not like the mean function, in which you have to specify a second argument like na.rm

writing R function with ggplot

I have to plot multiple datasets in the same format, and after copy-pasting the code several times, I decided to write a function.
I understand simple function in R, and managed to write the following:
testplot <- function(data, mapping){
output <- ggplot(data) +
geom_bar(mapping,
stat="identity",
position='stack')
}
p <- testplot(df, aes(x=xvar, y=yvar, fill=type))
this works fine, however, my plot is more complicated and requires the "data" argument to go separately into each component:
output <- ggplot() +
geom_bar(df1, mapping,
stat="identity",
position='stack')+
geom_errorbar(df1, ...)+
geom+bar(df2, mapping,
...+
geom_errorbar(df2, ...)
but when I write the function and try to run it as
output <- ggplot() +
geom_bar(data, mapping,
stat="identity",
position='stack')
}
p <- testplot(df, aes(x=xvar, y=yvar, fill=type))
it gives me an error:
Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class uneval Did you accidentally pass `aes()` to the `data` argument?
Is there a way around it?
EDIT: when I try to include 2 dataframes like this:
testplot <- function(data, data2, mapping){
output <- ggplot() +
geom_bar(data=data, mapping=mapping,
stat="identity",
position='stack',
width = barwidth)+
geom_bar(data2=data2, mapping=mapping,
stat="identity",
position='stack',
width = barwidth)
}
p <- testplot(data=df, data2=df2, mapping=aes(x=norms_number, y=coeff.BLDRT, fill=type))
it says "Ignoring unknown parameters: data2"
Most of the first arguments to the ggplot2 layer functions are reserved for the mapping argument, which is from aes.
So in your function definition you have a dataframe "data" being implicitly assigned to the mapping variable.
To get around this, explicitly assign data = data in your function definitions.
for example
output <- ggplot() +
geom_bar(data = data, mapping = mapping,
stat="identity",
position='stack')
}
EDIT:
There are many ways to do this and it really depends on how complex you want your function to be. If you are gonna stick to a global aesthetic mapping, then you can leave the mapping in the main ggplot call and assign data = NULL, then specify which data frame will be associated with which layer.
Consider the following reproducible example
library(ggplot2)
data1 <- data.frame(v1=rnorm(10, 50, 20), v2=rnorm(10,30,5))
data2 <- data.frame(v1=rnorm(10, 100, 20), v2=rnorm(10,50,10))
plot_custom_ggplot <- function(df1, df2, mapping) {
ggplot(data = NULL, mapping = mapping) +
geom_point(data = df1, color = "blue") +
geom_line(data = df2, color = "red")
}
plot_custom_ggplot(data1,data2, aes(x = v1,y = v2))
In this example, the mapping variable for each of the geom_* layer functions are left blank and instead the mapping is inherited from the main ggplot call.
This is usually how each layer function knows what data to use, because generally it is inherited in the main ggplot function. Whenever you specify a data argument or a mapping argument, you are generally overriding the inherited values. Any missing required aes mappings are attempted to be found in the main call.
library(ggplot2)
data1 <- data.frame(v1=rnorm(10, 50, 20), v2=rnorm(10,30,5))
data2 <- data.frame(v1=rnorm(10, 100, 20), v2=rnorm(10,50,10), z = c("A","B"))
plot_custom_ggplot <- function(df1, df2, mapping) {
ggplot(data = NULL, mapping = mapping) +
geom_point(data = df1, color = "blue") +
geom_line(data = df2, mapping = aes(color = z)) #inherits x and y mapping from main ggplot call.
}
plot_custom_ggplot(data1,data2, aes(x = v1,y = v2))
But adding additional aes mappings is risky if you are also specifying data. This is because you data variable may not always contain the correct columns.
plot_custom_ggplot(df1 = data2, df2 = data1, aes(x = v1, y = v2))
#Error in FUN(X[[i]], ...) : object 'z' not found
#
#the column z is not present in data1 object -
#R then looked globally for a z object and didnt find anything.
I believe it is best practices to use tidy data when working with ggplot because things become so much easier. There is usually no reason to use multiple data frames. Especially if you plan to use one set of mapping for all data frames. A good exception is if you are writing a plotting function for a custom R object, in which you know how it is defined.
Otherwise, consider and compare how these two functions work in this example:
data1 <- data.frame(v1=rnorm(20, 50, 20), v2=rnorm(20,30,5), letters= letters[1:20], id = "df1")
data2 <- data.frame(v1=rnorm(20, 100, 20), v2=rnorm(20,50,10), letters = letters[17:26], id = "df2")
set.seed(76)
plot_custom_ggplot2 <- function(df, mapping) {
ggplot(data = df, mapping = mapping) +
geom_bar(stat = "identity",
position="stack")
}
plot_custom_ggplot <- function(df1, df2, mapping) {
ggplot(data = NULL, mapping = mapping) +
geom_bar(data = df1, stat = "identity",
position="stack") +
geom_bar(data = df2, stat = "identity",
position="stack")
}
plot_custom_ggplot(data1,data2, aes(x = letters,y = v2, fill = id))
plot_custom_ggplot2(rbind(data1,data2), aes(x = letters, y = v2, fill = id))
In the first plot, the red bars for q, r, s, and t are hidden behind the blue bars. This is because they are added on top of each other as layers. In the second plot, these values actually stack because these values were added together in a single layer rather than two separate ones.
I hope this gives you enough information to write your ggplot function.
library(tidyverse)
testplot <- function(df1, df2, mapping){
a <- ggplot() +
geom_point(data = df1, mapping = mapping) +
geom_point(data = df2, mapping = mapping)
return(a)
}
mtcars2 <- mtcars / 100 # creating a separate dataframe to provide the function
testplot(mtcars, mtcars2, mapping = aes(x = drat, y = vs))
From your example you have "data2=data2" - geom_bar doesn't have an argument 'data2', only data. I got the above to work, so an adaptation for your purposes should work too!
The reason I split my dataframe was because I wanted a grouped and stacked plot, and used this question:
How to plot a Stacked and grouped bar chart in ggplot?
The mapping has to be different so that they don't end up on top on each other (so it's x=var1, and then x=var1+barwidth)
Anyway, I can make a plot with multiple geom_bar, but it's the subsequent geom_errorbar that doesn't work in a single function. I just added the error bars separately in the end, and maybe I'll look into the other options some other time.
I realise these are already functions so probably not meant to be used this way, and maybe that's why I can't do multiple geom_errorbar in one function. I just wanted my code to be more readable because I had to plot the same thing 12 times, with very minor differences and it was very long. Perhaps there is a more elegant way to do it though.

Using frame parameter to making a plot from ggplot to plotly

Here is my data:
data <- data.table(year = rep(1980:1985,each = 5),
Relationship = rep(c(" Acquaintance","Unknown","Wife","Stranger","Girlfriend","Friend"), 5),
N = sample(1:100, 30)
)
I can use plotly::plot_ly function to plot a Dynamic map of the years like this:
plot_ly(data
,x=~Relationship
,y=~N
,frame=~year
,type = 'bar'
)
but when I using ggplot with parameter frame ,I get a error
Error in -data$group : invalid argument to unary operator
here is my ggplot code :
p <- ggplot(data = data,aes(x =Relationship,y = N ))+
geom_bar(stat = "identity",aes(frame = year))
ggplotly(p)
Can you modify my ggplot code to produce the same graph ?
This example runs successfully using frame parameter:
data(gapminder, package = "gapminder")
gg <- ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent)) +
geom_point(aes(size = pop, frame = year)) +
scale_x_log10()
ggplotly(gg)
In case others are still looking, this does appear to be a bug related to geom_bar. Per Stéphane Laurent's GitHub report (https://github.com/ropensci/plotly/issues/1544) a workaround is to use geom_col(position = "dodge2") or geom_col(position = "identity") instead of geom_bar(stat='identity')

ERR for Score plot (PCA)

I am doing PCA in R and I got the result. But when I try to plot the first two principal components I get an error:
Warning: Ignoring unknown aesthetics: fill
Error in eval(expr, envir, enclos) : object 'GROUP' not found
Here is my code:
data = read.csv("pca_scores.csv", header = T)
data = data[, c(1:3)]
ggplot(data, aes(PC1, PC2)) +
geom_point(aes(shape = Group)) +
geom_text(aes(label = data$X)) +
stat_ellipse(aes(fill = Group))
I knew the problem is the “Group”. I did not mention the group in the previous code. But I really don't know how to change it
https://i.stack.imgur.com/rHgrj.png
Agree with #MrFlick, you should always provide sample data; a screenshot of your data.frame is not useful.
That aside, you can try this:
require(tidyverse);
data %>%
mutate(Group = gsub("\\(.+\\)$", "", X)) %>%
ggplot(aes(PC1, PC2)) +
geom_point(aes(shape = Group)) +
geom_text(aes(label = X)) +
stat_ellipse(aes(fill = Group))
A few comments:
You don't need to use data$ inside aes(); just refer to the relevant column directly.
I've added a Group column, which strips the "(PubChem: ...)" part from X.
Keep in mind that stat_ellipse will only draw an ellipse if there are >3 points.

How to suppress warnings when plotting with ggplot

When passing missing values to ggplot, it's very kind, and warns us that they are present. This is acceptable in an interactive session, but when writing reports, you do not the output get cluttered with warnings, especially if there's many of them. Below example has one label missing, which produces a warning.
library(ggplot2)
library(reshape2)
mydf <- data.frame(
species = sample(c("A", "B"), 100, replace = TRUE),
lvl = factor(sample(1:3, 100, replace = TRUE))
)
labs <- melt(with(mydf, table(species, lvl)))
names(labs) <- c("species", "lvl", "value")
labs[3, "value"] <- NA
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value, label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
If we wrap suppressWarnings around the last expression, we get a summary of how many warnings there were. For the sake of argument, let's say that this isn't acceptable (but is indeed very honest and correct). How to (completely) suppress warnings when printing a ggplot2 object?
You need to suppressWarnings() around the print() call, not the creation of the ggplot() object:
R> suppressWarnings(print(
+ ggplot(mydf, aes(x = species)) +
+ stat_bin() +
+ geom_text(data = labs, aes(x = species, y = value,
+ label = value, vjust = -0.5)) +
+ facet_wrap(~ lvl)))
R>
It might be easier to assign the final plot to an object and then print().
plt <- ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
R> suppressWarnings(print(plt))
R>
The reason for the behaviour is that the warnings are only generated when the plot is actually drawn, not when the object representing the plot is created. R will auto print during interactive usage, so whilst
R> suppressWarnings(plt)
Warning message:
Removed 1 rows containing missing values (geom_text).
doesn't work because, in effect, you are calling print(suppressWarnings(plt)), whereas
R> suppressWarnings(print(plt))
R>
does work because suppressWarnings() can capture the warnings arising from the print() call.
A more targeted plot-by-plot approach would be to add na.rm=TRUE to your plot calls.
E.g.:
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5), na.rm=TRUE) +
facet_wrap(~ lvl)
In your question, you mention report writing, so it might be better to set the global warning level:
options(warn=-1)
the default is:
options(warn=0)

Resources