Change colour in stacked plots, ggplot 2 - r

I would like to use palette colours for my stacked plot:
p <- ggplot() + theme_bw() +
geom_bar(aes(fill = a, y = b, x= c), data = df, width = 0.7,
position="stack", stat="identity") + theme(legend.position="bottom")
I tried the following but it didn`t work:
p + scale_color_brewer(palette = "PuOr")
Futhermore I would like to plot a line showing the mean over the barplot. Maybe somebody has a Idea how to.

Some thoughts:
1) better to use geom_col than geom_bar for values you want the bar to represent, see the documentation
2) Used factor(...) to make continuous variables discrete
3) you code will be easier to read if you follow the order of arguments as set out in the documentation; although of course it does not matter what the order is.
4) updated to reflect request with mean for each x value
library(ggplot2)
library(dplyr)
df <- data.frame(a = c(2001, 2001, 2001, 2002, 2002, 2003),
x = c(6, 7, 8, 6, 7, 6),
y = c(1, 258, 1, 3, 9, 11))
#data frame for means
df_y_mean <-
df %>%
group_by(x) %>%
summarise(y_mean = mean(y))
ggplot() +
geom_col(data = df, aes(x = factor(x), y = y, fill = factor(a)), width = 0.7) +
geom_line(data = df_y_mean, aes(factor(x), y_mean, colour = "red"), group = 1, size = 1) +
scale_fill_brewer(palette = "PuOr", name = "Year") +
guides(colour = guide_legend(title = "Mean", label = FALSE)) +
theme_bw() +
theme(legend.position = "bottom")
Created on 2020-05-20 by the reprex package (v0.3.0)

You are defining fill but using scale_colour_brewer(). Use scale_fill_brewer() to modify fill.
To draw a horizontal line add geom_hline() to your plot call.
p <- ggplot() + theme_bw() +
geom_bar(aes(fill = a, y = b, x= c), data = df, width = 0.7,
position="stack", stat="identity") +
theme(legend.position="bottom")
my.mean <- mean(df$b) ## can be any value, change as needed
p + scale_fill_brewer(palette = "PuOr") + geom_hline(my.mean)

Related

Two axis plot with ggplot2

Example I want to replicate I need to plot a two axis plot in R with ggplot2. The first y axis goes from -10 to 10, and the second from 0 to 10. I add an example. Please, let me know if there is a way to do it with ggplot2.
I used this code, but the result makes the first axis from -5 to 10, and the second, from 5 to 10. I want to get the breaks I define earlier.
df %>% filter(Country == "Chile" & year >= 1973) %>% ggplot(aes(x = year)) +
geom_line(aes(y = polity2, colour = "Polity 2")) + geom_line(aes(y = gee_totGDP,colour = "gee_totGDP")) + scale_y_continuous(sec.axis = sec_axis(~.*-1,name = "gee_totGDP")) + scale_colour_manual(values = c("blue", "red"))
I generated some fake data with four rows based on your example image.
To make the plot, I set the limits for the first axis using the limits() argument. Then I set up the second axis using a transformation formula, like you attempted. The transformation should be axis2 = (axis1 + 10)/2.
library(tidyverse)
df <- tibble(year = seq(1985, 2000, 5),
ed = c(6, 6, 8, 5),
polity = c(-10, -10, -8, -8))
df %>%
ggplot(aes(x = year)) +
geom_line(aes(y = polity)) +
geom_line(aes(y = ed)) +
scale_y_continuous(limits = c(-10, 10),
sec.axis = sec_axis(~(. + 10)/2))
You can use scale_y_continuous() for both axis as the following:
ggplot(data = df, aes(x = year)) +
geom_line(aes(y = polity2, color = "Polity 2")) +
geom_line(aes(y = gee_totGDP, color = "gee_totGDP")) +
scale_y_continuous(limits = c(-10, 10), name = "Polity 2") +
scale_y_continuous(limits = c(0, 10), sec.axis = sec_axis(~., name =
"gee_totGDP")) +
scale_color_manual(values = c("blue", "red"))

create a legend with different datasets in ggplot2

I am trying to create a legend in ggplot. If I use different variables from the same file, I add colour = "xx" in aes and it works. but what about if it is the same variable but different datasets?
In the example below, I plot Value ~ Year from two different datasets. How can I create a legend that says df1 with a red line, and df2 with a blue line?
A <- c(2001, 2002, 2003, 2004, 2005)
B <- c(3, 5, 2, 7, 5)
C <- c(2, 7, 4, 3, 5)
df1 <- data.frame(A, B)
df2 <- data.frame(A, C)
colnames(df1) <- c("Year","Value")
colnames(df2) <- c("Year","Value")
(test <- ggplot(df1, aes(Value, Year)) + geom_path(size = 1, colour='red') +
geom_path(data=df2, colour='blue') + ylab("Year")+ scale_x_continuous(position = "top") + scale_y_reverse(expand = c(0, 0)))
We could create a single dataset with bind_rows and specify .id to create a grouping column, which can be passed in aes as 'colour`
library(ggplot2)
library(dplyr)
bind_rows(lst(df1, df2), .id = 'grp') %>%
ggplot(aes(Value, Year, colour = grp)) +
geom_path(size = 1) +
ylab("Year")+
scale_x_continuous(position = "top") +
scale_y_reverse(expand = c(0, 0))
-output
Here is simple solution, but not a great one with you have more data.frames
Libraries
library(tidyverse)
Code
ggplot(df1, aes(Value, Year)) +
geom_path(size = 1,aes(colour='df1')) +
geom_path(data = df2,size = 1,aes(colour='df2')) +
ylab("Year")+
scale_x_continuous(position = "top") +
scale_y_reverse(expand = c(0, 0))+
scale_colour_manual(values = c("df1" = "red", "df2" = "blue"))
Output

Problem with multiple variable in color aesthetic in ggplot

I am trying to do a graph with 2 different set of data, like this example:
library(ggplot2)
t <- data.frame(x=c(3,5, 7), y=c(25, 25, 27), text=c("t1", "t1", "t2"))
test <- data.frame(x=c(1:10), y=c(20:29), factor=c(rep("a", 5),rep("b", 5)))
ggplot(test, aes(x, y)) +
geom_line(aes(color=factor)) +
geom_text(inherit.aes=F, data=t, aes(x, y, label=text, color=text))
The resulting figure is:
As you can see, the problem is that the legend is mixing the data from both data.frame in the same legend. And I don't manage to quit the values from the geom_text. Anyone has an idea please ?
Two scales for the same aesthetic is generally something that you may want to avoid. Try not to overload your graph with too many dimensions to visualise with the same aesthetic. Especially colors have only a limited range where we can differentiate them.
Well, these were my two-cents worth. What you want has become easy with ggnewscale.
See below.
library(ggplot2)
library(ggnewscale)
textframe <- data.frame(x = c(3, 5, 7), y = c(25, 25, 27), text = c("t1", "t1", "t2"))
lineframe <- data.frame(x = c(1:10), y = c(20:29), factor = c(rep("a", 5), rep("b", 5)))
ggplot() +
geom_line(data = lineframe, aes(x, y, color = factor)) +
scale_color_brewer(palette = "Set2") +
ggnewscale::new_scale_color() +
geom_text(data = textframe, aes(x, y, label = text, color = text)) +
scale_color_brewer(palette = "Dark2")
Better would be a different aesthetic. Here for example very appropriate would be linetype.
ggplot() +
geom_line(data = lineframe, aes(x, y, linetype = factor)) +
geom_text(data = textframe, aes(x, y, label = text, color = text)) +
scale_color_brewer(palette = 'Paired')
Created on 2020-04-03 by the reprex package (v0.3.0)
You can just remove the text part of the legend by adding show.legend = FALSE in your geom_text. You also don't necessarily need to specify inherit.aes=F.
t <- data.frame(x=c(3,5, 7), y=c(25, 25, 27), text=c("t1", "t1", "t2"))
test <- data.frame(x=c(1:10), y=c(20:29), factor=c(rep("a", 5),rep("b", 5)))
test %>%
ggplot(aes(x, y)) +
geom_line(aes(color=factor)) +
geom_text(inherit.aes=F, data=t, aes(x, y, label=text, color=text), show.legend = FALSE)

Overlaying histogram with different y-scales

I'm struggling with the following issue:
I want to plot two histograms, but since the statistics of one of the two classes is much less than the other I need to add a second y-axis to allow a direct comparison of the values.
I report below the code I used at the moment and the result.
Thank you in advance!
ggplot(data,aes(x= x ,group=class,fill=class)) + geom_histogram(position="identity",
alpha=0.5, bins = 20)+ theme_bw()
Consider the following situation where you have 800 versus 200 observations:
library(ggplot2)
df <- data.frame(
x = rnorm(1000, rep(c(1, 2), c(800, 200))),
class = rep(c("A", "B"), c(800, 200))
)
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
# Note that y = stat(count) is the default behaviour
mapping = aes(y = stat(count)))
You could scale the counts for each group to a maximum of 1 by using y = stat(ncount):
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
mapping = aes(y = stat(ncount)))
Alternatively, you can set y = stat(density) to have the total area integrate to 1.
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
mapping = aes(y = stat(density)))
Note that after ggplot 3.3.0 stat() probably will get replaced by after_stat().
How about comparing them side by side with facets?
ggplot(data,aes(x= x ,group=class,fill=class)) +
geom_histogram(position="identity",
alpha=0.5,
bins = 20) +
theme_bw() +
facet_wrap(~class, scales = "free_y")

R bubble plot using ggplot manually selecting the colour and axis names

I using ggplot to create a bubble plot. With this code:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
theme_bw() +
theme() +
scale_size(range = c(1, 50)) +
ylim(0,100)
It is working perfectly apart from 2 things:
For each name (fill) I would like to manually specify the colour used (via a dataframe that maps name to colour) - this is to provide consistency across multiple figures.
I would like to substitute the numbers on the y for text labels (for several reasons I cannot use the text labels from the outset due to ordering issues)
I have tried several methods using scale_color_manual() and scale_y_continuous respectively and I am getting nowhere! Any help would be very gratefully received!
Thanks
Since you have not specified an example df, I created one of my own.
To manually specify the color, you have to use scale_fill_manual with a named vector as the argument of values.
Edit 2
This appears to do what you want. We use scale_y_continuous. The breaks argument specifies the vector of positions, while the labels argument specifies the labels which should appear at those positions. Since we already created the vectors when creating the data frame, we simply pass those vectors as arguments.
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(breaks = mean, labels = order_label)
Edit 1
From your comment, it appears that you want to label the circles. One option would be to use geom_text. Code below. You may need to experiment with values of nudge_y to get the position correct.
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
order_label <- c("New York", "London")
df <- data.frame(order, mean, n, name, order_label, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
geom_text(aes(label = order_label), size = 3, hjust = "inward",
nudge_y = 0.03) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
ylab(NULL)
Original Answer
It is not clear what you mean by "substitute the numbers on the y for text labels". In the example below, I have formatted the y-axis as a percentage using the scales::percent_format() function. Is this similar to what you want?
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
df <- data.frame(order, mean, n, name, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(labels = scales::percent_format())
Thanks, for all your help, this worked perfectly:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_x_continuous(breaks = order, labels = order_label)

Resources