Issues creating a line chart in R. Group aesthetic error [duplicate] - r

This question already has an answer here:
ggplot each group consists of only one observation
(1 answer)
Closed 8 months ago.
Below is the sample data. Trying to have two lines with different colors. Seems pretty simple but running into the below error. Have two questions. First, how do I get around this error. Second, how would I edit the legend to where it says "Hires" instead of "HI".
"geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?"
library(ggplot2)
measure <- c("HI","HI","HI","HI","HI","JO","JO","JO","JO","JO")
date <- c("2002-01","2002-02","2002-03","2002-04","2002-05","2002-01","2002-02","2002-03","2002-04","2002-05")
value <- c(100,105,95,145,110,25,35,82,75,90)
df <- data.frame(measure,date,value)
graph <- df %>% ggplot (aes(x=date, y= value, color = measure)) + geom_line () + theme (legend.position = "bottom",legend.title = element_blank())
print(graph)

It's asking for a group, so you can give it a group:
ggplot(aes(x=date, y=value, group=measure, color=measure))
It's a bit surprising that it's not already grouped, and I'm not exactly sure why, but the above change appears to produce the result you want:
If you're interested in why it's asking for a group, I'd recommend simplifying and reformatting your example, and then asking as a separate question.

Related

How can I organise my x-axis variables by another factor? [duplicate]

This question already has an answer here:
ggplot facet by column
(1 answer)
Closed 11 months ago.
I'm very new to R/ggplot so thanks in advance for your patience. Here's what I have at the moment:
ggplot(data = figure_data_3A,
mapping = aes(x = `Gene`,
y = `Percent growth`)
)+
geom_col()+
theme_classic()+
ylab("% Growth of double relative to single mutants")+
xlab(NULL)+
theme(axis.text.x = element_text(angle = 90))
Here's my code, and I want to organise the elements on the x-axis by another qualitative factor in my dataframe called Function/Process, so that I can label them together in groups, ultimately to look like .
Without a reproducible example (we cannot see your data), we have to guess.
But generally, when plotting a character (text) variable in ggplot2, it gets converted to a factor, and the order of that factor is simply the sorted elements.
Instead, you can preprocess your data, convert Gene to a factor and specify the order.
I think something akin to this would do:
# order your data jf. Function/Process
figure_data3A <- figure_data3A[order(figure_data3A$`Function/Process`),]
# make a factor,
figure_data_3A$Gene <- factor(figure_data_3A$Gene, levels=figure_data_3A$Gene)
In the last line, I am assuming that each gene only appears once in your data frame.

How to respect quantitative nature of discrete/group variables in R ggplot2? [duplicate]

This question already has an answer here:
How to plot a boxplot with correctly spaced continuous x-axis values in ggplot2
(1 answer)
Closed 3 years ago.
I'd like to do a plot with R ggplot2 functions to highlight relations between a categorical X and a continuous Y variable. But my categorical variable is quantitative (e.g integers) and I would like my plots to respect the position suggested by the quantitative value of X.
Imagine the following dataset:
library(tidyverse)
df <- data.frame(Category=sample(c(1, 2, 5), 1000, replace = T)) %>%
mutate(Value=Category+rnorm(1000))
The easiest boxplot would be :
ggplot(df, aes(x=as.factor(Category), y=Value)) +
geom_boxplot() +
labs(x="Category")
But what I would like is :
add_row(df, Category=3:4, Value=NA) %>%
ggplot(aes(x=as.factor(Category), y=Value)) +
geom_boxplot() +
labs(x="Category")
Do you know any proper way to achieve that beyond the ugly trick above that is not really scalable? Because we can imagine many boxplots. Or even the case in which my categories are decimal values (with of course a limited number of categories). All in all, my wish is to be able to distribute my boxplots along the x-axis according to the quantitative value of the categories. The same question could apply to barplot instead of boxplots of course...
Thanks a lot!
As mentioned by #camille, you should write:
ggplot(df, aes(x=Category, y=Value, group = Category)) +
geom_boxplot() +
labs(x="Category")

ggplot | show bin length on bar | stat_bin() must not be used with a y aesthetic [duplicate]

This question already has answers here:
How to put labels over geom_bar in R with ggplot2
(4 answers)
Closed 5 years ago.
I need to show bin length on top of each bar but stat_bin is not showing what I expected, I am giving a simple data set which has the same problem which I face with my big data set, please help me to show bin length on top of each bin and if possible then explain its working please
Code without bin length on top of it
library(ggplot2)
data <- data.frame(Brand = c('A', 'B', 'c', 'A'), ct = c(5, 4,3, 4), col = c('X', 'Y', 'X', 'X'))
ggplot(data, aes( Brand,ct, fill=col)) + geom_bar(stat="identity")
Now I want to show 9 on top of A, 4 on top of B and 3 on top of C bar but its showing me error
ggplot(data, aes( Brand,ct, fill=col)) + geom_bar(stat="identity") + stat_bin()
Error: stat_bin() must not be used with a y aesthetic.
I am using minimum parameters in this example code to describe my issue. I don't want to summarize the data before plotting, I am expecting if ggplot have some in-build feature as its already showing aggregate data in graph and its just matter of showing values.
This is not duplicate as I want to explore in-build ggplot mechanism to show the sum, I need to show multiple layer within a bin (fill=col), if I summarize in advance then I may loose this record.
It is highly recommended you summarize your data first before you plot. However, if for whatever reason you didn't want to do that you could do:
ggplot(data, aes( Brand,ct, fill=col)) + geom_bar(stat="identity") +
geom_text(aes(y =ave(ct, Brand, FUN = sum), label=ave(ct, Brand, FUN = sum)))
I would not recommend this way however, since it actually plots the "9" twice (which is why it's darker than the other numbers). Using this method there is no way to get around it because your aes needs to be the same length of the data. This means that you cant simply plot c(9,4,3) you need to plot c(9,9,4,3).

Revisit: geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?

Using this data frame, I attempted to create a simple line graph using the following code:
crypto_data<-
crypto_data %>% gather(Cryptocurrencies, USD_Exchange, -Date)
ggplot(data = crypto_data) +
geom_line(aes(x = Date, y = USD_Exchange, colour = Cryptocurrencies))
This produced the, apparently, well known error.
Revisit: geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
I am aware that this question has been addressed most notably here:
ggplot2 line chart gives "geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
However, I am graphing more than a single variable.(Even though the data on the Litecoin USD exchange rate only starts at a later date it takes the value of zero prior to its introduction) I have also been mindful of stipulating the groups within aes() in ggplot (see the code). The command produces a graph with the correct axis lables and key, but a blank plotregion.
Does anyone see any obvious errors or perhaps have a solution? Thank you in advance.
Thanks for the edit suggestion, have found a solution!
crypto_data <-
crypto_data %>% gather(Cryptocurrencies, USD_Exchange, -Date)
ggplot(data=crypto_data) +
geom_line(aes(x=as.numeric(Date), y = USD_Exchange, colour = Cryptocurrencies)
Just had to make the Date variable (that was a factor variable) numeric...

R ggplot stat_summary: how to include a count of NAs in legend?

I am trying to plot one discrete variable on the x-axis against a continuous one on the y. Imagine in mtcars that I am trying to plot cyl vs. disp. What if some of the values of disp were NA? I would like to know how many NA there were for each value of cyl, and to display this in a simple table, possibly right below the legend (or within the legend itself). Is there a simple (or a complicated) way to do this?
Similar and related question I posed: R - looking at means by subgroup and overall on a line graph
Thanks!
This answer does not meet all question requirements, but since the details on how exactly the data should be presented are a little vague, I'm posting anyway.
So here's a way to add NA counts to the legend itself:
library(datasets)
mycars <- mtcars
mycars$disp[c(1,2,3)] <- NA
lvls = levels(as.factor(mycars$cyl))
nacounts <- by(mycars, mycars$cyl, function(x) sum(is.na(x$disp)))
labels = paste(lvls," (NA=",as.integer(nacounts),")",sep="")
ggplot(data=mycars) +
geom_boxplot(aes(x=cyl,y=disp, fill=as.factor(cyl))) +
scale_fill_discrete(name="Cyl", labels=labels)
EDIT
Relating to the stat_summary graph referred-to in the question: labels describing line types can be added using the scale_linetype_* functions.
In case you'd like to have the same legend as in the image above, I think you'll have to add graph elements describing cyl, e.g:
ggplot(mycars,aes(cyl,disp)) +
stat_summary(fun.y=mean, geom="line", lwd=1.5) +
stat_summary(aes(lty=factor(vs)),fun.y="mean",geom="line") +
stat_summary(aes(color=factor(cyl)),fun.y="mean",geom="point",size=5) +
scale_x_continuous(breaks=c(4,6,8),labels=c("four","6","8")) +
scale_color_discrete(labels=labels)

Resources