second y-axis with other scale [duplicate] - r

This question already has answers here:
ggplot with 2 y axes on each side and different scales
(18 answers)
Closed 2 years ago.
I created a barchart with ggplot2 geom_bar and want to have two metrics in the bars. I used melt to do so. However, I now need a second y-axis with another scale, because the numbers of the two metrics are too different.
In the following the dataframe and the code I used:
df <- data.frame(categories = c("politics", "local", "economy", "cultural events",
"politics", "local", "economy", "cultural events"),
metric = c("page", "page", "page", "page",
"product", "product", "product", "product"),
value = c(100L, 50L, 20L, 19L,
950000L, 470000L, 50000L, 1320L))
In the following the code I used to create the plot and the second y-axis:
x <- ggplot(df, aes(x=categories, y=value, fill = metric))
x + geom_bar(stat = "identity", position = "dodge") +
scale_y_continuous(sec.axis=sec_axis(~. *1000), limits=c(1,1000))
However, now no bars appear in the chart anymore... Does anybody know how to solve this problem?

Another approach could be to use highcharter
library(dplyr)
library(tidyr)
library(highcharter)
#convert your data in wide format
df <- df %>% spread(metric, value)
#plot
highchart() %>%
hc_xAxis(categories = df$categories) %>%
hc_yAxis_multiples(
list(lineWidth = 3, title = list(text = "Page")),
list(opposite = TRUE, title = list(text = "Product"))
) %>%
hc_add_series(type = "column", data = df$page) %>%
hc_add_series(type = "line", data = df$product, yAxis=1) # replace "line" with "column" to have it in bar format
Output plot is:
Sample data:
df <- structure(list(categories = structure(c(4L, 3L, 2L, 1L, 4L, 3L,
2L, 1L), .Label = c("cultural events", "economy", "local", "politics"
), class = "factor"), metric = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), .Label = c("page", "product"), class = "factor"),
value = c(100L, 50L, 20L, 19L, 950000L, 470000L, 50000L,
1320L)), .Names = c("categories", "metric", "value"), row.names = c(NA,
-8L), class = "data.frame")

Instead of displaying the different bars on the same plot you can stack them together (or facet them):
As #ConorNeilson already mentioned in the comments, you don't have to (and also shouldn't) specify the variables with df$. ggplot knows to look for them in the specified data.frame df.
The free_y argument in the facet_grid-call makes it possible to display different scales on the y-axes.
library(ggplot2)
ggplot(df, aes(x = categories, y = value, fill = metric)) +
geom_bar(stat = "identity", position = "dodge") +
facet_grid(metric~., scales = "free_y")
You could compare the different values on a log10-scale:
ggplot(df, aes(x = categories, y = value, fill = metric)) +
geom_bar(stat = "identity", position = "dodge") +
scale_y_log10(name = "value (log10 scale)",
breaks = 10^(0:10), labels = paste0(10, "^", 0:10))

Related

How to highlight a specific region of a line changing linetype or size using ggplot

I have the following dataframe:
test = structure(list(Student = c("Ana", "Brenda", "Max", "Ana", "Brenda",
"Max", "Ana", "Brenda", "Max"), Month = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L), .Label = c("January", "February", "March"
), class = "factor"), Grade = c(7L, 6L, 7L, 8L, 6L, 7L, 5L, 10L,
10L), Change = c("February", "February", "February", "February",
"February", "February", "February", "February", "February")), row.names = c(NA,
-9L), class = "data.frame")
I plotted the grades of each student throughout the months, and wanted to know if there is a simple way to accentuate a specific part of each plotted line that would correspond to the period of time which showed the greatest shift in their grades (this info is already present in the dataset: column "Change". In this case, for all students it would be from February to March).
Trying to use a similar logic as presented here, to change the color a specific part of the line I attempted to change the linetype and/or size of the line (since I already used the color to group and display each student) in order to highlight that particular portion of the line. However, it doesn't seem to be as straightforward.
My attempts were the following:
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
+ geom_point() + geom_line(data = test, aes(linetype = (ifelse(Month == Change, "solid", "dashed"))))
Which yielded the error:
Error: geom_path: If you are using dotted or dashed lines, colour, size and linetype must be constant over the line
And
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point() + geom_line(data = test, aes(size = (ifelse(Month == Change, 1, 0.8))))
Which kinda does what I'm looking for, but looks horrible, and doesn't really seem like its using the size of the line that I'm trying to specify:
How do I fix it? Thanks in advance! n_n
Don't specify the size in aes. Use the scale function
For another suggestion regarding line design see below.
test = structure(list(Student = c("Ana", "Brenda", "Max", "Ana", "Brenda",
"Max", "Ana", "Brenda", "Max"), Month = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L), .Label = c("January", "February", "March"
), class = "factor"), Grade = c(7L, 6L, 7L, 8L, 6L, 7L, 5L, 10L,
10L), Change = c("Februrary", "Februrary", "Februrary", "Februrary",
"Februrary", "Februrary", "Februrary", "Februrary", "Februrary"
)), row.names = c(NA, -9L), class = "data.frame")
## typo corrected
test$Change <- "February"
library(ggplot2)
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point() +
## don't specify size in aes
geom_line(data = test, aes(size = Month == Change)) +
## change the size scale
scale_size_manual(values = c(`TRUE` = 2, `FALSE` = .8))
Another option might be to make use of the ggforce::geom_link functions, which interpolate aesthetics between two points.
library(ggforce)
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point() +
geom_link2(data = test, aes(size = Grade, ), lineend = "round")
This fails when trying to change the line type. In this case, use geom_segment instead - you will need some data transformation though.
library(tidyverse)
test %>%
group_by(Student) %>%
mutate(xend = lead(Month), yend = lead(Grade)) %>%
ggplot(aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point() +
geom_segment(aes(xend = xend, yend = yend, linetype = Month == Change)) +
# need to specify your x
scale_x_discrete(limits = unique(test$Month))
#> Warning: Removed 3 rows containing missing values (geom_segment).
Changing the line width along its course is possible (as tjebo points out), but it rarely makes for a nice plot. A clearer way might be simply to add a coloured background over the area of interest:
library(hrbrthemes)
library(ggplot2)
ggplot(test, aes(x = Month, y = Grade, color = Student, group = Student)) +
geom_point(alpha = 0) +
geom_rect(data = test[1,],
aes(xmin = 'February', xmax = 'March', ymin = -Inf, ymax = Inf),
color = NA, fill = 'deepskyblue4', alpha = 0.1) +
geom_line(size = 1) +
geom_point(shape = 21, fill = 'white', size = 3) +
theme_minimal() +
scale_color_manual(values = c('pink3', 'orange2', 'red4')) +
theme_tinyhand()

How to set a tag with the number of observations at the top of a stacked bar

I want to set the number of observations at the top of each bar.
This is some sample data
structure(list(Treatment = structure(c(3L, 3L, 3L, 3L, 3L, 4L,
4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L), .Label = c("", "{\"ImportId\":\"Treatment\"}",
"Altruism", "Altruism - White", "Piece Rate - 0 cents", "Piece Rate - 3 cents",
"Piece Rate - 6 cents", "Piece Rate - 9 cents", "Reciprocity",
"Reciprocity - Black", "Reciprocity - White", "Treatment"), class = "factor"),
Gender = structure(c(5L, 3L, 5L, 5L, 5L, 3L, 3L, 7L, 3L,
3L, 5L, 5L, 3L, 3L, 5L), .Label = c("", "{\"ImportId\":\"QID2\"}",
"Female", "Gender you most closely identify with: - Selected Choice",
"Male", "Other", "Prefer not to answer"), class = "factor")), row.names = c(NA,
15L), class = "data.frame")
My approach was using the following code
totals <- Data1 %>%
group_by(Gender) %>%
summarize(total = n)
Data1 %>%
count(Treatment, Gender) %>%
ggplot(aes(Treatment, n))+
geom_col(aes(fill = Gender), position = "fill")+
ggtitle("Gender")+
ylab("Fraction")+
theme(axis.text.x = element_text(angle = 90, vjust=0.3, hjust=1))+
scale_fill_manual("Gender",
values = c("Female" = "pink", "Male" = "light blue",
"Other"="coral", "Prefer not to answer"="violet"))+
geom_text(aes(label=n, group=Gender),size=3,
position = position_fill(vjust=0.5),data<-totals)
I only want the total number of observations appear at the top of each bar.
My graph thus far looks like this
Now I only want to know how to display the total number of observations for each bar.
I couldn't get your sample data to work, here is an example of adding totals to each bar.
You will need to create another dataset that shows totals per each group (for your example, it will be Treatment). Then add geom_text for your totals.
library(dplyr)
library(ggplot2)
library(scales)
# Sample Data
Data1 <- data.frame(
Gender = factor(c("Female","Female","Male","Male")),
Treatment = factor(c("a","b","a","b"), levels=c("a","b")),
value = c(10, 12, 13, 11)
)
# Totals for each bar
totals <- Data1 %>%
group_by(Treatment) %>%
summarize(value = sum(value))
# Bar chart
ggplot(data=Data1, aes(x=Treatment, y=value)) +
geom_bar(stat="identity", aes(fill=Gender)) +
# comment this out if you don't want to show labels for each stacked bar
geom_text(aes(label = value),position = position_stack(vjust = 0.5))+
# Add totals for each bar
geom_text(data = totals, aes(x = Treatment, y = value, label = value))
EDIT (with sample data provided)
library(dplyr)
library(ggplot2)
library(scales)
totals <- Data1 %>%
count(Treatment)
Data1 %>%
count(Treatment, Gender) %>%
ggplot(aes(x = Treatment, y = n)) + geom_bar(stat = "identity", aes(fill = Gender)) +
ggtitle("Gender") + ylab("Fraction") +
theme(axis.text.x = element_text(angle = 90, vjust=0.3, hjust=1)) +
scale_fill_manual("Gender",
values = c("Female" = "pink", "Male" = "light blue",
"Other"="coral", "Prefer not to answer"="violet")) +
# Add totals for each bar
geom_text(data = totals, aes(label = n))

How can I make a stacked area chart?

I am trying to make a stacked area chart:
library(ggplot2)
ggplot(data, aes(x = variable, y = value, fill = term)) +
geom_area()
But this leads to a blank plot.
Expected plot (example)
How can I do this?
data <- structure(list(term = c("models", "percentiles", "models:percentiles",
"models", "percentiles", "models:percentiles",
"models", "percentiles", "models:percentiles"),
variable = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
.Label = c("R2", "R5", "R10"), class = "factor"),
value = c(0.435697205847009, 0.533615307749147, 0.0306874864038442,
0.441369621882273, 0.520198994695284, 0.0384313834224421,
0.394491546635206, 0.579421546902868, 0.0260869064619254)),
row.names = c(NA, -9L), class = "data.frame")
You are only missing the group aesthetic:
ggplot(data, aes(x = variable, y = value, group = term, fill = term)) +
geom_area(color = "black")

How to automatically adjust the width of each facet for facet_wrap?

I want to plot a boxplot using ggplot2, and i have more than one facet, each facet has different terms, as follows:
library(ggplot2)
p <- ggplot(
data=Data,
aes(x=trait,y=mean)
)
p <- p+facet_wrap(~SP,scales="free",nrow=1)
p <- p+geom_boxplot(aes(fill = Ref,
lower = mean - sd,
upper = mean + sd,
middle = mean,
ymin = min,
ymax = max,
width=c(rep(0.8/3,3),rep(0.8,9))),
lwd=0.5,
stat="identity")
as showed, the width of box in different facet is not the same, is there any way to adjust all the box at a same scale? I had tried to use facet_grid, it can automatically change the width of facets, but all facets share the same y axis.
Data
Data <- structure(list(SP = structure(c(3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L), .Label = c("Human", "Cattle", "Horse", "Maize"
), class = "factor"), Ref = structure(c(3L, 2L, 1L, 3L, 3L, 3L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("LMM", "Half", "Adoptive"
), class = "factor"), trait = structure(c(11L, 11L, 11L, 14L,
13L, 12L, 14L, 13L, 12L, 14L, 13L, 12L), .Label = c("cad", "ht",
"t2d", "bd", "cd", "ra", "t1d", "fpro", "mkg", "scs", "coat colour",
"ywk", "ssk", "gdd"), class = "factor"), min = c(0.324122039,
0.336486555, 0.073152049, 0.895455441, 0.849944623, 0.825248005,
0.890413591, 0.852385351, 0.826470308, 0.889139116, 0.838256672,
0.723753592), max = c(0.665536838, 0.678764774, 0.34033228, 0.919794865,
0.955018001, 0.899903826, 0.913350912, 0.957305688, 0.89843716,
0.911257005, 0.955312678, 0.817489555), mean = c(0.4919168555,
0.5360103372, 0.24320509565, 0.907436221, 0.9057516121, 0.8552899502,
0.9035394117, 0.9068819173, 0.8572309823, 0.90125638965, 0.90217769835,
0.7667208778), sd = c(0.0790133656517775, 0.09704320004497, 0.0767552215753863,
0.00611921020505611, 0.0339614482273291, 0.0199389195311925,
0.00598633573504195, 0.0332634006653858, 0.0196465508521771,
0.00592476494699222, 0.0348144156099722, 0.0271827880539459)), .Names = c("SP",
"Ref", "trait", "min", "max", "mean", "sd"), class = "data.frame", row.names = c(10L,
11L, 12L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L))
While u/z-lin's answer works, there is a far simpler solution. Switch from facet_wrap(...) to use facet_grid(...). With facet_grid, you don't need to specify rows and columns. You are still able to specify scales= (which allows automatic adjustment of axis scales for each facet if wanted), but you can also specify space=, which does the same thing, but with the scaling of the overall facet width. This is what you want. Your function call is now something like this:
ggplot(Data, aes(x = trait, y = mean)) +
geom_boxplot(aes(
fill = Ref, lower = mean-sd, upper = mean+sd, middle = mean,
ymin = min, ymax = max),
lwd = 0.5, stat = "identity") +
facet_grid(. ~ SP, scales = "free", space='free') +
scale_x_discrete(expand = c(0, 0.5)) +
theme_bw()
Some more description of layout of facets can be found here.
As #cdtip mentioned, this does not allow for independent y scales for each facet, which is what the OP asked for initially. Luckily, there is also a simple solution for this, which utilizes facet_row() from the ggforce package:
library(ggforce)
# same as above without facet_grid call..
p <- ggplot(Data, aes(x = trait, y = mean)) +
geom_boxplot(aes(
fill = Ref, lower = mean-sd, upper = mean+sd, middle = mean,
ymin = min, ymax = max),
lwd = 0.5, stat = "identity") +
scale_x_discrete(expand = c(0, 0.5)) +
theme_bw()
p + ggforce::facet_row(vars(SP), scales = 'free', space = 'free')
You can adjust facet widths after converting the ggplot object to a grob:
# create ggplot object (no need to manipulate boxplot width here.
# we'll adjust the facet width directly later)
p <- ggplot(Data,
aes(x = trait, y = mean)) +
geom_boxplot(aes(fill = Ref,
lower = mean - sd,
upper = mean + sd,
middle = mean,
ymin = min,
ymax = max),
lwd = 0.5,
stat = "identity") +
facet_wrap(~ SP, scales = "free", nrow = 1) +
scale_x_discrete(expand = c(0, 0.5)) + # change additive expansion from default 0.6 to 0.5
theme_bw()
# convert ggplot object to grob object
gp <- ggplotGrob(p)
# optional: take a look at the grob object's layout
gtable::gtable_show_layout(gp)
# get gtable columns corresponding to the facets (5 & 9, in this case)
facet.columns <- gp$layout$l[grepl("panel", gp$layout$name)]
# get the number of unique x-axis values per facet (1 & 3, in this case)
x.var <- sapply(ggplot_build(p)$layout$panel_scales_x,
function(l) length(l$range$range))
# change the relative widths of the facet columns based on
# how many unique x-axis values are in each facet
gp$widths[facet.columns] <- gp$widths[facet.columns] * x.var
# plot result
grid::grid.draw(gp)
In general, you can determine the width of a box plot in ggplot like so:
ggplot(data= df, aes(x = `some x`, y = `some y`)) + geom_boxplot(width = `some witdth`)
In your case, you might consider setting the width of all the box plots to the range of x divided by the maximum number of elements (in the leftmost figure).

GGplot scale_fill_discrete not creating a legend

I used the code in a prior stackoverflow post, which previously made me the graph i wanted, with a legend, however now i am using the exact same code yet i am not getting a legend on my barplot.
dput(year.dat2)
structure(list(year = structure(c(1136044800, 1167577200, 1199113200,
1230735600, 1262275200, 1136044800, 1167577200, 1199113200, 1230735600,
1262275200, 1136044800, 1167577200, 1199113200, 1230735600, 1262275200
), class = c("POSIXct", "POSIXt"), tzone = ""), variable = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("SM1",
"SM2", "SM3"), class = "factor"), value = c(1.24863586758821,
1.23185757914453, 1.10997352162401, 1.13683917747257, 0.987520605867152,
2.21498726809749, 1.6378992693761, 1.25635623380691, 1.13585705516765,
1.10169569342842, 7.40955802109858, 5.7940698875978, 6.03438772314438,
6.82271157830123, 7.24402375195127)), row.names = c(NA, -15L), .Names =
c("year",
"variable", "value"), class = "data.frame")
ggplot(year.dat2, aes(x = year, y = value, fill = factor(variable))) +
geom_bar(stat = "identity", position = "dodge")+
scale_fill_discrete(name = "variable",
breaks = c(1, 2, 3),
labels = c("SM1", "SM2", "SM3")) +
xlab("year") +
ylab("yearly Sub Mean")
Resulting plot:
Remove breaks in scale_fill_discrete, they do not correspond to values of your factored variable data used as fill aes.
This is the code you want:
ggplot(year.dat2, aes(x = year, y = value, fill = factor(variable))) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_discrete(name = "variable",
labels = c("SM1", "SM2", "SM3")) +
xlab("year") +
ylab("yearly Sub Mean")
Note 1: You do not even need the labels parameter as you are not renaming the variable categories. scale_fill_discrete(name = "variable") would be sufficient or labs(fill="variable") is all you need to change the legend title.
Note 2: In your original post you linked to this SO question: How to get a barplot with several variables side by side grouped by a factor
Here, the breaks in the sample code scale_fill_discrete(name="Gender", breaks=c(1, 2), labels=c("Male", "Female")) actually references the values of gender in the original df. Meanwhile, labels was used to rename 1 to Male and 2 to Female in the legend.
Group.1 tea coke beer water gender
1 1 87.70171 27.24834 24.27099 37.24007 1
2 2 24.73330 25.27344 25.64657 24.34669 2
On the other hand, there are no 1,2,3 values in your variable data that would match the breaks you have set in your original code, and that is why your legend is not plotted.
And for fun, here is an example how you might use breaks and labels in your data set:
ggplot(year.dat2, aes(x = year, y = value, fill = factor(variable))) +
geom_bar(stat = "identity", position = "dodge")+
scale_fill_discrete(name = "variable",
breaks = c("SM2", "SM3", "SM1"),
labels = c("SM2 new label", "SM3 new label", "SM1 new label")) +
xlab("year") +
ylab("yearly Sub Mean")

Resources