ERR for Score plot (PCA) - r

I am doing PCA in R and I got the result. But when I try to plot the first two principal components I get an error:
Warning: Ignoring unknown aesthetics: fill
Error in eval(expr, envir, enclos) : object 'GROUP' not found
Here is my code:
data = read.csv("pca_scores.csv", header = T)
data = data[, c(1:3)]
ggplot(data, aes(PC1, PC2)) +
geom_point(aes(shape = Group)) +
geom_text(aes(label = data$X)) +
stat_ellipse(aes(fill = Group))
I knew the problem is the “Group”. I did not mention the group in the previous code. But I really don't know how to change it
https://i.stack.imgur.com/rHgrj.png

Agree with #MrFlick, you should always provide sample data; a screenshot of your data.frame is not useful.
That aside, you can try this:
require(tidyverse);
data %>%
mutate(Group = gsub("\\(.+\\)$", "", X)) %>%
ggplot(aes(PC1, PC2)) +
geom_point(aes(shape = Group)) +
geom_text(aes(label = X)) +
stat_ellipse(aes(fill = Group))
A few comments:
You don't need to use data$ inside aes(); just refer to the relevant column directly.
I've added a Group column, which strips the "(PubChem: ...)" part from X.
Keep in mind that stat_ellipse will only draw an ellipse if there are >3 points.

Related

Percentage labels for a stacked ggplot barplot with groups and facets

I am trying to add percentage labels to a stacked AND faceted barplot (position='fill'). I want the percentages displayed to add up for each bar.
I'm using a data set like this:
## recreate dataset
Village<-c(rep('Vil1',10),rep('Vil2',10))
livestock<-c('p','p','p','c','c','s','s','s','g','g',
'p','p','c','c','s','s','s','s','g','g')
dose<-c(3,2,1,2,1,3,2,1,2,1,
2,1,2,1,4,3,2,1,2,1)
Freq<-c(4,5,5,2,3,4,1,1,6,8,
1,3,2,2,1,1,3,2,1,1)
df<-data.frame(Village,livestock,dose,Freq)
I sucessfully plotted it and added labels that add up to 100% for each X variable (livestock):
## create dose categories (factors)
df$dose<-as.character(df$dose)
df$dose[as.numeric(df$dose)>3]<-'>3'
df$dose<-factor(df$dose,levels=c('1','2','3','>3'))
## percentage barplot
ggplot(data = df, aes(x=livestock, y=Freq, fill=dose)) +
geom_bar(position='fill', stat='identity') +
labs(title="Given doses of different drugs in last 6months (livestock)",
subtitle='n=89',x="Livestock",y="Percentage",
fill = "Nr. of\ndoses") +
theme(axis.text.x = element_text(angle = 45, hjust=1))+
scale_y_continuous(labels=percent)+
facet_wrap(~Village)+
geom_text(aes(label = percent(..y../tapply(..y..,..x..,sum)[..x..])),
stat = "identity",position = position_fill(vjust=0.5))
Does anyone know how I can change the label code within ggplot so the percentages add up to 100% for each bar? Maybe something to do with ..group..?
I tried something similar to this: Label percentage in faceted filled barplot in ggplot2 put I can't make it work for my data.
The easiest way would be to transform your data beforehand so that the fractions can be used directly.
library(tidyverse)
library(scales)
# Assume df is as in example code
df <- df %>% group_by(Village, livestock) %>%
mutate(frac = Freq / sum(Freq))
ggplot(df, aes(livestock, frac, fill = dose)) +
geom_col() +
geom_text(
aes(label = percent(frac)),
position = position_fill(0.5)
) +
facet_wrap(~ Village)
If you insist on not pre-transforming the data, you can write yourself a little helper function.
bygroup <- function(x, group, fun = sum, ...) {
splitted <- split(x, group)
funned <- lapply(splitted, fun, ...)
funned <- mapply(function(x, y) {
rep(x, length(y))
}, x = funned, y = splitted)
unsplit(funned, group)
}
Which you can then use by setting the group to x and the (undocumented) PANEL column.
library(ggplot2)
library(scales)
# Assume df is as in example code
ggplot(df, aes(livestock, Freq, fill = dose)) +
geom_col(position = "fill") +
geom_text(
aes(
label = percent(after_stat(y / bygroup(y, interaction(x, PANEL))))
),
position = position_fill(0.5)
) +
facet_wrap(~ Village)
Just to add to the solution of #teunbrand:
I calculated the fractions as #teunbrand suggested and it worked perfectly. However, I started to get very weird and persistent warning messages:
Warning messages:
1: Unknown or uninitialised column: `times`.
2: Unknown or uninitialised column: `times`.
3: Unknown or uninitialised column: `times`.
4: Unknown or uninitialised column: `times`.
5: Unknown or uninitialised column: `Var1`.
I read up on this problem here which seems to be a known bug: Persistent "Unknown or uninitialised column" warnings
I could get rid of the warnings by ungrouping and reconverting the tibble into a dataframe.
df <- as.data.frame(df %>% group_by(Village, livestock) %>%
mutate(frac = Freq / sum(Freq)) %>% ungroup())

"Error in eval(expr, envir, enclos) : object 'y' not found" and "Removed 1 rows containing missing values (geom_text)"

It says that the error is in the "library(ggplot2)" line and I don't know how to fix it.
Here's the code I was using:
library(ggplot2)
library('remotes')
remotes::install_github("GuangchuangYu/nCov2019", dependencies = TRUE)
library('nCov2019')
get_nCov2019(lang = 'en')
library(dplyr)
library(magrittr)
d <- y['global']
f <- d %>% dplyr::filter(time == time(y)) %>% top_n(180, cum_confirm) %>% arrange(desc(cum_confirm))
library(ggrepel)
library(dplyr)
require(ggplot2)
require(ggrepel)
ggplot(filter(d, d$time > '2020-02-05' & country %in% f$country), mapping = aes(time, cum_confirm , color = country, label = country)) +
geom_line() +
geom_text(data = f, aes(label = country, colour = country, x = time, y = cum_confirm))+
theme_minimal(base_size = 14)+
theme(legend.position = "none") +
ggtitle('Covid-19 Cases by Country', 'The progression of confirmed cases by countries')+
ylab('Confirmed Cases')
The graph seems about right when I run the chunk, but I also get the following message and I don't know what does that mean: "Removed 1 rows containing missing values (geom_text)."
Don't worry, the message should only be a "warning", it is because in the data vector that ggplot is receiving it has found one (or more than one) element of type NA.
This is common in ggplot, it is not like the mean function, in which you have to specify a second argument like na.rm

How do I plot flights data from nycflights13 so that x=airlines, y = dep_delay?

When I try to plot x = airlines, y = dep_delay, I get an error message.
My hypothesis is that delays are caused by the inefficiency of the airlines above and beyond any other factors. I simply want to plot these two variables, I get an error message.
I try this code but it doesn't work.
ggplot(data = flights, mapping = aes(x = airlines, y= dep_delay)) +
geom_point() +
geom_smooth(se = FALSE)
ggplot(data = flights, mapping = aes(x = airlines, y = dep_delay)) +
geom_point() +
geom_smooth(se = FALSE)
Don't know how to automatically pick scale for object of type tbl_df/tbl/data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (336776): x
You are using two "+" instead of a single "+" sign

Revising ggplot after a function: non-numeric argument to binary operator error

I am attempting to produce a ggplot from within a function. I can do so using the sample data and code below.
If I produce the plot (p) outside of the function, I can revise it with no problem to add a title, subtitle, axis labels, etc. (e.g., p + labs(title = "Most frequent words, by gender")).
However, if I produce the plot from within the function and then attempt to modify it, I get the following error: non-numeric argument to binary operator.
In both cases, the object "p" shows up under Values.
I would of course like to use a function because I have a number of different group_by variables to test, and I want to eliminate typing mistakes (e.g., forgetting to change "gender" to "income" on a later analysis).
Can someone explain why the error arises only after modifying a ggplot created in a function? And of course I would be grateful for advice about how to eliminate the source of the error.
# sample data of favorite activities
df <- tibble(
word = c("walk","hike","garden","garden","walk","hike", "garden","hike","hike","hike","walk"),
gender = c("Male","Female","Female","Female","Male","Male","Male", "Male","Male","Female","Female")
)
df
# function to figure out the proportions of the activities
sum_text_prop <- function(df, groupbyvar) {
groupbyvar <- enquo(groupbyvar)
df %>%
count(!!groupbyvar, word, sort = TRUE) %>%
group_by(groupbyvar = !!groupbyvar) %>%
mutate(proportion = n / sum(n)) %>%
top_n(proportion, n = 5) %>%
ungroup()
}
# function to plot the most common words
plot_text_prop <- function(df) {
p <- ggplot(data = df, aes(x = word, y = proportion, fill = groupbyvar)) +
geom_bar(stat = "identity", alpha = 0.8, show.legend = FALSE) +
facet_wrap(~ groupbyvar, ncol = 2, scales = "free") +
coord_flip()
print(p)
}
# deploy the functions
df %>%
sum_text_prop(groupbyvar = gender) %>%
plot_text_prop()
# add a title to the plot
p + labs(title = "Most frequent words, by gender")
# error: Error in p + labs(title = "Most frequent words, by gender") :
non-numeric argument to binary operator
Update
Thanks to the helpful responses, my revised code is as follows:
plot_text_prop <- function(df) {
ggplot(data = df, aes(reorder_within(word, proportion, groupbyvar),
proportion, fill = groupbyvar)) +
geom_bar(stat = "identity", alpha = 0.8, show.legend = FALSE) +
scale_x_reordered() +
facet_wrap(~ groupbyvar, ncol = 2, scales = "free") +
coord_flip()
}
p <- tidy_infl %>%
sum_text_prop(groupbyvar = gender) %>%
plot_text_prop()
p + labs(title = "Most frequent words, by gender")

Variable Created With Mutate Not Found With ggplot

New to R.
I created a new variable with dplyr::mutate() and I see the values in the df output when I run the code, but when I try to plot it with ggplot, I receive object not found error. What am I doing wrong? Thx.
Works as expected:
mutate(avg_inv = (inv_total / sr_count))
Error here:
# Plot avg invoice
p <- ggplot(df1, aes(x = Date_Group, y = avg_inv) ) +
geom_bar(stat = "identity", position="dodge")
p
Error message:
Error in eval(expr, envir, enclos) : object 'avg_inv' not found
I think you might not be saving the result of mutate, so even though the results print to your console, it's not available for ggplot2.
Try:
df1 <- df %>% mutate(avg_inv = (inv_total / sr_count))
p <- ggplot(df1, aes(x = Date_Group, y = avg_inv) ) +
geom_bar(stat = "identity", position="dodge")
p
How about this; Here I'm computing the additional variable within the function call to ggplot. This saves me the hassle of a temporary variable to hold the temporary result and is error free too.
data("airquality")
library(ggplot2)
library(dplyr)
p<- ggplot(airquality %>%
mutate(somevar=(Month/Day)), aes(x = somevar) ) +
geom_histogram(position = "stack", stat = "bin", binwidth = 5)
print(p)

Resources