GGplot scale_fill_discrete not creating a legend - r

I used the code in a prior stackoverflow post, which previously made me the graph i wanted, with a legend, however now i am using the exact same code yet i am not getting a legend on my barplot.
dput(year.dat2)
structure(list(year = structure(c(1136044800, 1167577200, 1199113200,
1230735600, 1262275200, 1136044800, 1167577200, 1199113200, 1230735600,
1262275200, 1136044800, 1167577200, 1199113200, 1230735600, 1262275200
), class = c("POSIXct", "POSIXt"), tzone = ""), variable = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("SM1",
"SM2", "SM3"), class = "factor"), value = c(1.24863586758821,
1.23185757914453, 1.10997352162401, 1.13683917747257, 0.987520605867152,
2.21498726809749, 1.6378992693761, 1.25635623380691, 1.13585705516765,
1.10169569342842, 7.40955802109858, 5.7940698875978, 6.03438772314438,
6.82271157830123, 7.24402375195127)), row.names = c(NA, -15L), .Names =
c("year",
"variable", "value"), class = "data.frame")
ggplot(year.dat2, aes(x = year, y = value, fill = factor(variable))) +
geom_bar(stat = "identity", position = "dodge")+
scale_fill_discrete(name = "variable",
breaks = c(1, 2, 3),
labels = c("SM1", "SM2", "SM3")) +
xlab("year") +
ylab("yearly Sub Mean")
Resulting plot:

Remove breaks in scale_fill_discrete, they do not correspond to values of your factored variable data used as fill aes.
This is the code you want:
ggplot(year.dat2, aes(x = year, y = value, fill = factor(variable))) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_discrete(name = "variable",
labels = c("SM1", "SM2", "SM3")) +
xlab("year") +
ylab("yearly Sub Mean")
Note 1: You do not even need the labels parameter as you are not renaming the variable categories. scale_fill_discrete(name = "variable") would be sufficient or labs(fill="variable") is all you need to change the legend title.
Note 2: In your original post you linked to this SO question: How to get a barplot with several variables side by side grouped by a factor
Here, the breaks in the sample code scale_fill_discrete(name="Gender", breaks=c(1, 2), labels=c("Male", "Female")) actually references the values of gender in the original df. Meanwhile, labels was used to rename 1 to Male and 2 to Female in the legend.
Group.1 tea coke beer water gender
1 1 87.70171 27.24834 24.27099 37.24007 1
2 2 24.73330 25.27344 25.64657 24.34669 2
On the other hand, there are no 1,2,3 values in your variable data that would match the breaks you have set in your original code, and that is why your legend is not plotted.
And for fun, here is an example how you might use breaks and labels in your data set:
ggplot(year.dat2, aes(x = year, y = value, fill = factor(variable))) +
geom_bar(stat = "identity", position = "dodge")+
scale_fill_discrete(name = "variable",
breaks = c("SM2", "SM3", "SM1"),
labels = c("SM2 new label", "SM3 new label", "SM1 new label")) +
xlab("year") +
ylab("yearly Sub Mean")

Related

How to highlight a specific region of a line changing linetype or size using ggplot

I have the following dataframe:
test = structure(list(Student = c("Ana", "Brenda", "Max", "Ana", "Brenda",
"Max", "Ana", "Brenda", "Max"), Month = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L), .Label = c("January", "February", "March"
), class = "factor"), Grade = c(7L, 6L, 7L, 8L, 6L, 7L, 5L, 10L,
10L), Change = c("February", "February", "February", "February",
"February", "February", "February", "February", "February")), row.names = c(NA,
-9L), class = "data.frame")
I plotted the grades of each student throughout the months, and wanted to know if there is a simple way to accentuate a specific part of each plotted line that would correspond to the period of time which showed the greatest shift in their grades (this info is already present in the dataset: column "Change". In this case, for all students it would be from February to March).
Trying to use a similar logic as presented here, to change the color a specific part of the line I attempted to change the linetype and/or size of the line (since I already used the color to group and display each student) in order to highlight that particular portion of the line. However, it doesn't seem to be as straightforward.
My attempts were the following:
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
+ geom_point() + geom_line(data = test, aes(linetype = (ifelse(Month == Change, "solid", "dashed"))))
Which yielded the error:
Error: geom_path: If you are using dotted or dashed lines, colour, size and linetype must be constant over the line
And
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point() + geom_line(data = test, aes(size = (ifelse(Month == Change, 1, 0.8))))
Which kinda does what I'm looking for, but looks horrible, and doesn't really seem like its using the size of the line that I'm trying to specify:
How do I fix it? Thanks in advance! n_n
Don't specify the size in aes. Use the scale function
For another suggestion regarding line design see below.
test = structure(list(Student = c("Ana", "Brenda", "Max", "Ana", "Brenda",
"Max", "Ana", "Brenda", "Max"), Month = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L), .Label = c("January", "February", "March"
), class = "factor"), Grade = c(7L, 6L, 7L, 8L, 6L, 7L, 5L, 10L,
10L), Change = c("Februrary", "Februrary", "Februrary", "Februrary",
"Februrary", "Februrary", "Februrary", "Februrary", "Februrary"
)), row.names = c(NA, -9L), class = "data.frame")
## typo corrected
test$Change <- "February"
library(ggplot2)
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point() +
## don't specify size in aes
geom_line(data = test, aes(size = Month == Change)) +
## change the size scale
scale_size_manual(values = c(`TRUE` = 2, `FALSE` = .8))
Another option might be to make use of the ggforce::geom_link functions, which interpolate aesthetics between two points.
library(ggforce)
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point() +
geom_link2(data = test, aes(size = Grade, ), lineend = "round")
This fails when trying to change the line type. In this case, use geom_segment instead - you will need some data transformation though.
library(tidyverse)
test %>%
group_by(Student) %>%
mutate(xend = lead(Month), yend = lead(Grade)) %>%
ggplot(aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point() +
geom_segment(aes(xend = xend, yend = yend, linetype = Month == Change)) +
# need to specify your x
scale_x_discrete(limits = unique(test$Month))
#> Warning: Removed 3 rows containing missing values (geom_segment).
Changing the line width along its course is possible (as tjebo points out), but it rarely makes for a nice plot. A clearer way might be simply to add a coloured background over the area of interest:
library(hrbrthemes)
library(ggplot2)
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point(alpha = 0) +
geom_rect(data = test[1,],
aes(xmin = 'February', xmax = 'March', ymin = -Inf, ymax = Inf),
color = NA, fill = 'deepskyblue4', alpha = 0.1) +
geom_line(size = 1) +
geom_point(shape = 21, fill = 'white', size = 3) +
theme_minimal() +
scale_color_manual(values = c('pink3', 'orange2', 'red4')) +
theme_tinyhand()

how to make a dot plot based on existing of a letter or missing

I have a data like this
df <- structure(list(Shared = structure(c(3L, 2L, 4L, 1L), .Label = c("Door",
"glass", "Water ", "WC"), class = "factor"), Cond1 = structure(c(1L,
4L, 2L, 3L), .Label = c("Hisy", "HIU", "JIS", "NHIS"), class = "factor"),
Cond2 = structure(c(1L, NA, NA, 2L), .Label = c("Hisy", "JIS"
), class = "factor"), Cond3 = structure(c(NA, NA, 1L, 2L), .Label = c("HIU",
"JIS"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
I can convert the letters to logical values like this
mynew<- t('row.names<-'(!is.na(df[-1]), df$Shared))
then I melt it like this
mydf2 <- melt(mynew)
then I plot it like this
ggplot(data = mydf2, aes(x = X1, y = value, fill = X2,colour=X2)) +
geom_point()
But I want it to be like the following example
I want to plot it in a way that the words become my x axis and cond1, cond2 and cond3 in my y axis.
example like this
Like #camille said in the comments you are pretty much there (unless we are missing something).
You just need to map X2 to the x-axis and X1 to the y-axis. The only step you can take first is to filter out from your data frame the cases where the column value is FALSE.
I tried to format the plot so it looks as close as possible to your example.
# remove rows where value == FALSE
mydf2 <- melt(mynew) %>% filter(value)
# create plot (in my case `melt` named the variables Var1 and Var2 instead of X1 and X2)
ggplot(data = mydf2, aes(x = Var2, y = Var1)) +
geom_point(fill='blue', colour='black', shape=21, size=3) +
scale_y_discrete(position = 'right') +
theme_bw() + theme(panel.grid=element_blank(),
axis.title = element_blank(),
axis.line = element_line(),
panel.border = element_blank())
Hope this helps

How can I make a fractional bar plot?

When I do this:
ggplot(d, aes(x=variable, fill=value))+geom_bar(position = "fill")
I get this
But I want something like this, basically the values should be colored by fraction. How can I do this? The idea is to basically see the relative contribution of models, percentiles, and models:percentiles for each of R2, R5, and R10.
dataframe
structure(list(term = c("models", "percentiles", "models:percentiles",
"models", "percentiles", "models:percentiles", "models", "percentiles",
"models:percentiles"), variable = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L), .Label = c("R2", "R5", "R10"), class = "factor"),
value = c(0.435697205847009, 0.533615307749147, 0.0306874864038442,
0.441369621882273, 0.520198994695284, 0.0384313834224421,
0.394491546635206, 0.579421546902868, 0.0260869064619254)), row.names = c(NA,
-9L), class = "data.frame")
geom_col does the job here:
ggplot(d, aes(x = variable, y = value, fill = term)) + geom_col()
or
ggplot(d, aes(x = variable, y = value, fill = term)) + geom_bar(stat = "identity")

What to do when adding asterisks to bar graph shifts the bars in R?

I've created a bar graph in R and now I tried to add the significant differences to the bar graph.
I've tried using geom_signif from the ggsignif package and stat_compare_means from the ggpubr package (based on these suggestions/examples: Put stars on ggplot barplots and boxplots - to indicate the level of significance (p-value) or https://cran.r-project.org/web/packages/ggsignif/vignettes/intro.html)
I was only able to add the significance levels when using geom_signif and choose the parameters as in https://cran.r-project.org/web/packages/ggsignif/vignettes/intro.html.
This is an example of what I would like to get:
And this is what I get:
So when I want to add the asterisks, it shifts the bars from the bar graph. I don't know how to change it...
This is a part of what I wrote:
bargraph = ggplot(dataPlotROI, aes(x = ROI, y=mean, fill = Group))
bargraph +
geom_bar(position = position_dodge(.5), width = 0.5, stat = "identity") +
geom_errorbar(position = position_dodge(width = 0.5), width = .2,
aes(ymin = mean-SEM, ymax = mean+SEM)) +
geom_signif(y_position = c(4.5,10,10), xmin=c(0.85,0.85,4.3), xmax = c(5,4,7.45),
annotation=c("***"), tip_length = 0.03, inherit.aes = TRUE) +
facet_grid(.~ROI, space= "free_x", scales = "free_x", switch = "x")
This is the output from dput(dataPlotROI):
> Dput <- dput(dataPlotROI)
structure(list(Group = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor"), ROI = structure(c(1L, 2L, 3L, 1L, 2L,
3L), .Label = c("LOT", "MO", "ROT"), class = "factor"), mean = c(2.56175803333696,
7.50825658538044, 3.34290874605435, 2.41750375190217, 6.90310020776087,
3.03040666678261), SD = c(1.15192431061913, 4.30564383354597,
2.01581544982848, 1.11404900115086, 3.35276625079825, 1.23786817391241
), SEM = c(0.120096411333424, 0.448894400545147, 0.210163288684092,
0.11614763735292, 0.349550045127766, 0.129056678481624)), class = "data.frame", row.names = c(NA,
-6L))
> Dput
Group ROI mean SD SEM
1 1 LOT 2.561758 1.151924 0.1200964
2 1 MO 7.508257 4.305644 0.4488944
3 1 ROT 3.342909 2.015815 0.2101633
4 2 LOT 2.417504 1.114049 0.1161476
5 2 MO 6.903100 3.352766 0.3495500
6 2 ROT 3.030407 1.237868 0.1290567
Does anyone know what I am doing wrong and how I can fix it?
Thanks!
I don't think geom_signif is meant to span across the facets, but in your case, I don't see any real need for facets anyway. See if the following works for you:
ggplot(dataPlotROI,
aes(x = ROI, y = mean, fill = Group)) +
# geom_col is equivalent to geom_bar(stat = "identity")
geom_col(position = position_dodge(0.5), width = 0.5) +
geom_errorbar(position = position_dodge(0.5), width = 0.2,
aes(ymin = mean - SEM, ymax = mean + SEM)) +
# xmin / xmax positions should match the x-axis labels' positions
geom_signif(y_position = c(4.5, 10, 10),
xmin = c(1, 1, 2.05),
xmax = c(3, 1.95, 3),
annotation = "***",
tip_length = 0.03)

second y-axis with other scale [duplicate]

This question already has answers here:
ggplot with 2 y axes on each side and different scales
(18 answers)
Closed 2 years ago.
I created a barchart with ggplot2 geom_bar and want to have two metrics in the bars. I used melt to do so. However, I now need a second y-axis with another scale, because the numbers of the two metrics are too different.
In the following the dataframe and the code I used:
df <- data.frame(categories = c("politics", "local", "economy", "cultural events",
"politics", "local", "economy", "cultural events"),
metric = c("page", "page", "page", "page",
"product", "product", "product", "product"),
value = c(100L, 50L, 20L, 19L,
950000L, 470000L, 50000L, 1320L))
In the following the code I used to create the plot and the second y-axis:
x <- ggplot(df, aes(x=categories, y=value, fill = metric))
x + geom_bar(stat = "identity", position = "dodge") +
scale_y_continuous(sec.axis=sec_axis(~. *1000), limits=c(1,1000))
However, now no bars appear in the chart anymore... Does anybody know how to solve this problem?
Another approach could be to use highcharter
library(dplyr)
library(tidyr)
library(highcharter)
#convert your data in wide format
df <- df %>% spread(metric, value)
#plot
highchart() %>%
hc_xAxis(categories = df$categories) %>%
hc_yAxis_multiples(
list(lineWidth = 3, title = list(text = "Page")),
list(opposite = TRUE, title = list(text = "Product"))
) %>%
hc_add_series(type = "column", data = df$page) %>%
hc_add_series(type = "line", data = df$product, yAxis=1) # replace "line" with "column" to have it in bar format
Output plot is:
Sample data:
df <- structure(list(categories = structure(c(4L, 3L, 2L, 1L, 4L, 3L,
2L, 1L), .Label = c("cultural events", "economy", "local", "politics"
), class = "factor"), metric = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), .Label = c("page", "product"), class = "factor"),
value = c(100L, 50L, 20L, 19L, 950000L, 470000L, 50000L,
1320L)), .Names = c("categories", "metric", "value"), row.names = c(NA,
-8L), class = "data.frame")
Instead of displaying the different bars on the same plot you can stack them together (or facet them):
As #ConorNeilson already mentioned in the comments, you don't have to (and also shouldn't) specify the variables with df$. ggplot knows to look for them in the specified data.frame df.
The free_y argument in the facet_grid-call makes it possible to display different scales on the y-axes.
library(ggplot2)
ggplot(df, aes(x = categories, y = value, fill = metric)) +
geom_bar(stat = "identity", position = "dodge") +
facet_grid(metric~., scales = "free_y")
You could compare the different values on a log10-scale:
ggplot(df, aes(x = categories, y = value, fill = metric)) +
geom_bar(stat = "identity", position = "dodge") +
scale_y_log10(name = "value (log10 scale)",
breaks = 10^(0:10), labels = paste0(10, "^", 0:10))

Resources