Reasons that ggplot2 legend does not appear [duplicate] - r

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 2 years ago.
I was attempting (unsuccessfully) to show a legend in my R ggplot2 graph which involves multiple plots. My data frame df and code is as follows:
Individuals Mod.2 Mod.1 Mod.3
1 2 -0.013473145 0.010859793 -0.08914021
2 3 -0.011109863 0.009503278 -0.09049672
3 4 -0.006465788 0.011304668 -0.08869533
4 5 0.010536718 0.009110458 -0.09088954
5 6 0.015501212 0.005929766 -0.09407023
6 7 0.014565584 0.005530390 -0.09446961
7 8 -0.009712516 0.012234843 -0.08776516
8 9 -0.011282278 0.006569570 -0.09343043
9 10 -0.011330579 0.003505439 -0.09649456
str(df)
'data.frame': 9 obs. of 4 variables:
$ Individuals : num 2 3 4 5 6 7 8 9 10
$ Mod.2 : num -0.01347 -0.01111 -0.00647 0.01054 0.0155 ...
$ Mod.1 : num 0.01086 0.0095 0.0113 0.00911 0.00593 ...
$ Mod.3 : num -0.0891 -0.0905 -0.0887 -0.0909 -0.0941 ...
ggplot(df, aes(df$Individuals)) +
geom_point(aes(y=df[,2]), colour="red") + geom_line(aes(y=df[,2]), colour="red") +
geom_point(aes(y=df[,3]), colour="lightgreen") + geom_line(aes(y=df[,3]), colour="lightgreen") +
geom_point(aes(y=df[,4]), colour="darkgreen") + geom_line(aes(y=df[,4]), colour="darkgreen") +
labs(title = "Modules", x = "Number of individuals", y = "Mode")
I looked up the following stackflow threads, as well as Google searches:
Merging ggplot2 legend
ggplot2 legend not showing
`ggplot2` legend not showing label for added series
ggplot2 legend for geom_area/geom_ribbon not showing
ggplot and R: Two variables over time
ggplot legend not showing up in lift chart
Why ggplot2 legend not show in the graph
ggplot legend not showing up in lift chart.
This one was created 4 days ago
This made me realize that making legends appear is a recurring issue, despite the fact that legends usually appear automatically.
My first question is what are the causes of a legend to not appear when using ggplot? The second is how to solve these causes. One of the causes appears to be related to multiple plots and the use of aes(), but I suspect there are other reasons.

colour= XYZ should be inside the aes(),not outside:
geom_point(aes(data, colour=XYZ)) #------>legend
geom_point(aes(data),colour=XYZ) #------>no legend
Hope it helps, it took me a hell long way to figure out.

You are going about the setting of colour in completely the wrong way. You have set colour to a constant character value in multiple layers, rather than mapping it to the value of a variable in a single layer.
This is largely because your data is not "tidy" (see the following)
head(df)
x a b c
1 1 -0.71149883 2.0886033 0.3468103
2 2 -0.71122304 -2.0777620 -1.0694651
3 3 -0.27155800 0.7772972 0.6080115
4 4 -0.82038851 -1.9212633 -0.8742432
5 5 -0.71397683 1.5796136 -0.1019847
6 6 -0.02283531 -1.2957267 -0.7817367
Instead, you should reshape your data first:
df <- data.frame(x=1:10, a=rnorm(10), b=rnorm(10), c=rnorm(10))
mdf <- reshape2::melt(df, id.var = "x")
This produces a more suitable format:
head(mdf)
x variable value
1 1 a -0.71149883
2 2 a -0.71122304
3 3 a -0.27155800
4 4 a -0.82038851
5 5 a -0.71397683
6 6 a -0.02283531
This will make it much easier to use with ggplot2 in the intended way, where colour is mapped to the value of a variable:
ggplot(mdf, aes(x = x, y = value, colour = variable)) +
geom_point() +
geom_line()

ind = 1:10
my.df <- data.frame(ind, sample(-5:5,10,replace = T) ,
sample(-5:5,10,replace = T) , sample(-5:5,10,replace = T))
df <- data.frame(rep(ind,3) ,c(my.df[,2],my.df[,3],my.df[,4]),
c(rep("mod.1",10),rep("mod.2",10),rep("mod.3",10)))
colnames(df) <- c("ind","value","mod")
Your data frame should look something likes this
ind value mod
1 5 mod.1
2 -5 mod.1
3 3 mod.1
4 2 mod.1
5 -2 mod.1
6 5 mod.1
Then all you have to do is :
ggplot(df, aes(x = ind, y = value, shape = mod, color = mod)) +
geom_line() + geom_point()

I had a similar problem with the tittle, nevertheless, I found a way to show the title: you can add a layer using
ggtitle ("Name of the title that you want to show")
example:
ggplot(data=mtcars,
mapping = aes(x=hp,
fill = factor(vs)))+
geom_histogram(bins = 9,
position = 'identity',
alpha = 0.8, show.legend = T)+
labs(title = 'Horse power',
fill = 'Vs Motor',
x = 'HP',
y = 'conteo',
subtitle = 'A',
caption = 'B')+
ggtitle("Horse power")

Related

Error in ggplot2 when using both fill and group parameters in geom_bar

There seems to be a problem with R's ggplot2 library when I include both the fill and group parameters in a bar plot (geom_bar()). I've already tried looking for answers for several hours but couldn't find one that would help. This is actually my first post here.
To give a little background, I have a dataframe named smokement (short for smoke and mental health), a categorical variable named smoke100 (smoked in the past 100 days?) with "Yes" and "No", and another categorical variable named misnervs (frequency of feelings of nervousness) with 5 possible values: "All", "Most", "Some", "A little", and "None."
When I run this code, I get this result:
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, fill = smoke100)) +
facet_wrap(~misnervs, nrow = 1)
However, the result I want is to have all grouped bar plots display their respective proportions. By reading a bit of "R for Data Science" book I found out that I need to include y = ..prop.. and group = 1 in aes() to achieve it:
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, y = ..prop.., group = 1)) +
facet_wrap(~misnervs, nrow = 1)
Finally, I try to use the fill = smoke100 parameter in aes() to display this categorical variable in color, just like I did on the first code. But when I add this fill parameter, it doesn't work! The code runs, but it shows exactly the same output as the second code, as if the fill parameter this time was somehow ignored!
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, y = ..prop.., group = 1, fill = smoke100)) +
facet_wrap(~misnervs, nrow = 1)
Does anyone have an idea of why this happens, and how to solve it? My end goal is to display each value of smoke100 (the "Yes" and "No" bars) with colors and a legend at the right, just like on the first graph, while having each grouping level of "misnervs" display their respective proportions of smoke100 ("Yes", "No") levels, just like on the second graph.
EDIT:
> dim(smokement)
[1] 35471 6
> str(smokement)
'data.frame': 35471 obs. of 6 variables:
$ smoke100: Factor w/ 2 levels "Yes","No": 1 2 1 2 1 1 1 1 1 1 ...
$ misnervs: Factor w/ 5 levels "All","Most","Some",..: 3 4 5 4 1 5 3 3 5 5 ...
$ mishopls: Factor w/ 5 levels "All","Most","Some",..: 3 5 5 5 5 5 5 5 5 5 ...
$ misrstls: Factor w/ 5 levels "All","Most","Some",..: 3 5 5 3 1 5 3 5 1 5 ...
$ misdeprd: Factor w/ 5 levels "All","Most","Some",..: 5 5 5 5 4 5 5 5 5 5 ...
$ miswtles: Factor w/ 5 levels "All","Most","Some",..: 5 5 5 5 5 5 5 5 5 5 ...
> head(smokement)
smoke100 misnervs mishopls misrstls misdeprd miswtles
1 Yes Some Some Some None None
2 No A little None None None None
3 Yes None None None None None
4 No A little None Some None None
5 Yes All None All A little None
6 Yes None None None None None
As for the output without group = 1
ggplot(data = smokement) +
+ geom_bar(aes(x = smoke100, y = ..prop.., fill = smoke100)) +
+ facet_wrap(~misnervs, nrow = 1)
Besides the solution offered here the GGAlly package includes a stat_prop which introduces a new by aesthetic to specify the way the proportions should be calculated:
library(GGally)
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, y = ..prop.., fill = smoke100, by = misnervs), stat = "prop") +
facet_wrap(~misnervs, nrow = 1)
And just for reference the same could be achieved without GGAlly by setting fill=factor(..x..):
ggplot(data = smokement) +
geom_bar(aes(x = smoke100, y = ..prop.., fill = factor(..x..), group = 1)) +
facet_wrap(~misnervs, nrow = 1)
DATA
misnervs <- c("All", "Most", "Some", "A little", "None")
set.seed(123)
smokement <-
data.frame(
smoke100 = sample(c("Yes", "No"), 100, replace = TRUE),
misnervs = factor(sample(misnervs, 100, replace = TRUE), levels = misnervs)
)
I wasn't able to get what you wanted by tweaking your call to geom_bar*, but I think this gives you what you are looking for. As you didn't provide your input dataset (for understandable reasons), I've used the diamonds tibble in my code. The changes you need to make should be obvious.
*: I'm sure it can be done: I just wasn't able to work it out.
The idea behind my solution is to pre-compute the proportions you want to plot before the call to ggplot.
group_modify takes a grouped tibble and applies the specified function to each group in turn, before returning the modified (grouped) tibble.
diamonds %>%
group_by(cut) %>%
group_modify(
function(.x, .y)
.x %>%
group_by(color) %>%
summarise(Prop=n()/nrow(.))
) %>%
ggplot() +
geom_col(aes(x=color, y=Prop, fill=color)) +
facet_wrap(~cut)
Note the switch from geom_bar to geom_col: geom_bar uses row counts, geom_col uses values in the data.
As a rough-and-ready QC, here's the equivalent of your code that produces the "all grey' plot:
diamonds %>%
ggplot() +
geom_bar(aes(x=color, y=..prop.., fill=color, group=1)) +
facet_wrap(~cut)

Add Legend with ggplot2 [duplicate]

This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Add legend to ggplot2 line plot
(4 answers)
Closed 2 years ago.
I am having trouble adding a legend to my plot. I want the plot to have points and lines, that is why I am using both geom_line() and geom(points). Here is my code with some made up numbers. When I move "color" into "aes", somehow I get an error and I cannot plot it.
meanted=rnorm(13)
meantotal=rnorm(13)
meantedneg=rnorm(13)
meantedpos=rnorm(13)
totaldf=data.frame(x=c(0:12),meanted,meantotal,meantedneg,meantedpos)
pic=ggplot()+
geom_point(data=totaldf,aes(x=-x,y=meantedneg), color = "red")+
geom_point(data=totaldf,aes(x=-x,y=meantedpos), color = "blue")+
geom_point(data=totaldf,aes(x=-x,y=meanted), color = "green")+
geom_point(data=totaldf,aes(x=-x,y=meantotal),color = "black")+
geom_line(data=totaldf,aes(x=-x,y=meantedneg), color = "red")+
geom_line(data=totaldf,aes(x=-x,y=meantedpos), color = "blue")+
geom_line(data=totaldf,aes(x=-x,y=meanted), color = "green")+
geom_line(data=totaldf,aes(x=-x,y=meantotal),color = "black")
print(pic)
As markus said, ggplot2 will do this for you if you pivot/reshape the data so that each of your desired legend objects are defined in a single column.
Pivoting/reshaping means going from a "wide" format to a "long" format. I'll use tidyr::pivot_longer, though it can be done with reshape (not my preference) or data.table::melt:
tidyr::pivot_longer(totaldf, -x)
# # A tibble: 52 x 3
# x name value
# <int> <chr> <dbl>
# 1 0 meanted 1.37
# 2 0 meantotal -0.279
# 3 0 meantedneg -0.257
# 4 0 meantedpos 0.0361
# 5 1 meanted -0.565
# 6 1 meantotal -0.133
# 7 1 meantedneg -1.76
# 8 1 meantedpos 0.206
# 9 2 meanted 0.363
# 10 2 meantotal 0.636
# # ... with 42 more rows
From here,
library(ggplot2)
ggplot(tidyr::pivot_longer(totaldf, -x), aes(x, value, color = name, group = name)) +
geom_path() +
geom_point() +
scale_color_manual(values = c(meantedneg="red", meantedpos="blue", meanted="green", meantotal="black"))
(FYI, I pre-seeded the randomness with set.seed(42) to get this random data.)

2d plot with 3rd variable as color in RStudio

I have a dataset as CSV with three columns:
timestamp (e.g. 2018/12/15)
keyword (e.g. "hello")
count (e.g. 7)
I want one plot where all the lines of the same keyword are connected with each other and timestamp is on the X- and count is on the Y- axis. I would like each keyword to have a different color for its line and the line being labeled with the keyword.
The CSV has only ~30.000 rows and R runs on a dedicated machine. Performance can be ignored.
I tried various approaches with mathplot and ggplot in this forum, but didn't get it to work with my own data.
What is the easiest solution to do this in R?
Thanks!
EDIT:
I tried customizing Romans code and tried the following:
`csvdata <- read.csv("c:/mydataset.csv", header=TRUE, sep=",")
time <- csvdata$timestamp
count <- csvdata$count
keyword <- csvdata$keyword
time <- rep(time)
xy <- data.frame(time, word = c(keyword), count, lambda = 5)
library(ggplot2)
ggplot(xy, aes(x = time, y = count, color = keyword)) +
theme_bw() +
scale_color_brewer(palette = "Set1") + # choose appropriate palette
geom_line()`
This creates a correct canvas, but no points/lines in it...
DATA:
head(csvdata)
keyword count timestamp
1 non-distinct-word 3 2018/08/09
2 non-distinct-word 2 2018/08/10
3 non-distinct-word 3 2018/08/11
str(csvdata)
'data.frame': 121 obs. of 3 variables:
$ keyword : Factor w/ 10 levels "non-distinct-word",..: 5 5 5 5 5 5 5 5 5 5 ...
$ count : int 3 2 3 1 6 6 2 3 2 1 ...
$ timestamp: Factor w/ 103 levels "2018/08/09","2018/08/10",..: 1 2 3 4 5 6 7 8 9 10 ...
Something like this?
# Generate some data. This is the part poster of the question normally provides.
today <- as.Date(Sys.time())
time <- rep(seq.Date(from = today, to = today + 30, by = "day"), each = 2)
xy <- data.frame(time, word = c("hello", "world"), count = rpois(length(time), lambda = 5))
library(ggplot2)
ggplot(xy, aes(x = time, y = count, color = word)) +
theme_bw() +
scale_color_brewer(palette = "Set1") + # choose appropriate palette
geom_line()

Reorder stacks in horizontal stacked barplot (R)

I'm trying to make a horizontal stacked barplot using ggplot. Below are the actual values for three out of 300 sites in my data frame. Here's where I've gotten to so far, using info pulled from these previous questions which I admit I may not have fully understood.
df <- data.frame(id=c("AR001","AR001","AR001","AR001","AR002","AR002","AR002","AR003","AR003","AR003","AR003","AR003"),
landuse=c("agriculture","developed","forest","water","agriculture","developed","forest","agriculture","developed","forest","water","wetlands"),
percent=c(38.77,1.76,59.43,0.03,69.95,0.42,29.63,65.4,3.73,15.92,1.35,13.61))
df
id landuse percent
1 AR001 agriculture 38.77
2 AR001 developed 1.76
3 AR001 forest 59.43
4 AR001 water 0.03
5 AR002 agriculture 69.95
6 AR002 developed 0.42
7 AR002 forest 29.63
8 AR003 agriculture 65.40
9 AR003 developed 3.73
10 AR003 forest 15.92
11 AR003 water 1.35
12 AR003 wetlands 13.61
str(df)
'data.frame': 12 obs. of 3 variables:
$ id : Factor w/ 3 levels "AR001","AR002",..: 1 1 1 1 2 2 2 3 3 3 ...
$ landuse: Factor w/ 5 levels "agriculture",..: 1 2 3 4 1 2 3 1 2 3 ...
$ percent: num 38.77 1.76 59.43 0.03 69.95 ...
df <- transform(df,
landuse.ord = factor(
landuse,
levels=c("agriculture","forest","wetlands","water","developed"),
ordered =TRUE))
cols <- c(agriculture="maroon",forest="forestgreen",
wetlands="gold", water="dodgerblue", developed="darkorchid")
ggplot(df,aes(x = id, y = percent, fill = landuse.ord, order=landuse.ord)) +
geom_bar(position = "stack",stat = "identity", width=1) +
coord_flip() +
scale_fill_manual(values = cols)
which produces this graph.
What I would like to do is to reorder the bars so that they are in descending order by value for the agriculture category - in this example AR002 would be at the top, followed by AR003 then AR001. I tried changing the contents of aes to aes(x = reorder(landuse.ord, percent), but that eliminated the stacking and seemed to have maybe summed the percentages for each land use category:
I would like to have the stacks in order, from left to right: agriculture, forest, wetlands, water, developed. I tried doing that with the transform part of the code, which put it in the correct order in the legend, but not in the plot itself?
Thanks in advance... I have made a ton of progress based on answers to other peoples' questions, but seem to now be stuck at this point!
Update: here is the finished graph for all 326 sites!
Ok based on your comments, I believe this is your solution. Place these lines after cols<-...:
#create df to sort by argiculture's percentage
ag<-filter(df, landuse=="agriculture")
#use the df to sort and order df$id's levels
df$id<-factor(df$id, levels=ag$id[order(ag$percent)], ordered = TRUE)
#sort df, based on ordered ids and ordered landuse
df<-df[order(df$id, df$landuse.ord),]
ggplot(df,aes(x = id, y = percent, fill = landuse.ord, order=landuse.ord)) +
geom_bar(position = "stack",stat = "identity", width=1) +
coord_flip() +
scale_fill_manual(values = cols)
The comments should clarify each of the lines purposes. This will reorder your original data frame, if that is a problem I would create a copy and then operate on the new copy.

ggplot2: arranging multiple boxplots as a time series

I would like to create a multivariate boxplot time series with ggplot2 and I need to have an x axis that positions the boxplots based on their associated dates.
I found two posts about this question: one is Time series plot with groups using ggplot2 but the x axis is not a scale_x_axis so graph is biased in my case. The other one is ggplot2 : multiple factors boxplot with scale_x_date axis in R but the person uses an interaction function which i don't use in my case.
Here is an example file and my code:
dtm <- read.table(text="date ruche mortes trmt
03.10.2013 1 8 P+
04.10.2013 1 7 P+
07.10.2013 1 34 P+
03.10.2013 7 16 P+
04.10.2013 7 68 P+
07.10.2013 7 170 P+
03.10.2013 2 7 P-
04.10.2013 2 7 P-
07.10.2013 2 21 P-
03.10.2013 5 8 P-
04.10.2013 5 27 P-
07.10.2013 5 24 P-
03.10.2013 3 15 T
04.10.2013 3 6 T
07.10.2013 3 13 T
03.10.2013 4 6 T
04.10.2013 4 18 T
07.10.2013 4 19 T ", h=T)
require(ggplot2)
require(visreg)
require(MASS)
require(reshape2)
library(scales)
dtm$asDate = as.Date(dtm[,1], "%d.%m.%Y")
## Plot 1: Nearly what I want but is biased by the x-axis format where date should not be a factor##
p2<-ggplot(data = dtm, aes(x = factor(asDate), y = mortes))
p2 + geom_boxplot(aes(fill = factor(dtm$trmt)))
## Plot 2: Doesn't show me what I need, ggplot apparently needs a factor as x##
p<-ggplot(data = dtm, aes(x = asDate, y = mortes))
p + geom_boxplot(aes( group = asDate, fill=trmt) ) `
Can anyone help me with this issue, please?
Is this what you want?
Code:
p <- ggplot(data = dtm, aes(x = asDate, y = mortes, group=interaction(date, trmt)))
p + geom_boxplot(aes(fill = factor(dtm$trmt)))
The key is to group by interaction(date, trmt) so that you get all of the boxes, and not cast asDate to a factor, so that ggplot treats it as a date. If you want to add anything more to the x axis, be sure to do it with + scale_x_date().

Resources