I am having a difficult understanding why this code works and doesn't work. I want a plot by group + specify the number of columns of my legend. Basically, the only way I can get this to work is to specify both a fill and a colour variable in the aesthetic. It seems like fill allows me to change the columns and colour changes colors of the lines, but this feels a bit kludgy. Does anyone have a good understanding of the logic here, or a better way to accomplish this goal? I’ll accept a base plot answer!
My example code is below:
# ~ Library ~ #
require(ggplot2)
# Generate example data
x=1:10
y=10:1
data = data.frame(x_data=rep(1:10,2),
y_data=c(x,y),
group=c(rep('A',length(x)),rep('B',length(y))))
# ~ Plot data ~ #
# Only this works
plot = ggplot(data,aes(x=x_data,y=y_data,fill=group,colour=group)) + geom_line()
plot = plot + guides(fill = guide_legend(ncol = 2))
# This doesn't work
plot = ggplot(data,aes(x=x_data,y=y_data,fill=group)) + geom_line()
plot = plot + guides(fill = guide_legend(ncol = 2))
plot
# Neither does this
plot = ggplot(data,aes(x=x_data,y=y_data,colour=group)) + geom_line()
plot = plot + guides(fill = guide_legend(ncol = 2))
plot
As suggested by h-1, this works
#This works
plot = ggplot(data,aes(x=x_data,y=y_data,fill=group)) + geom_line()
plot = plot + guides(colour = guide_legend(ncol = 2))
plot
Related
I have six plots obtained with ggplot2 for normality analysis: 2 histograms, 2 qqplots and 2 boxplots.
I want to display them together ordered by type of plot: so the histograms in the first row, the qqplots in the second row and the boxplots in the third row. For this I use the grid.arrange function from gridExtra package as follows:
grid.arrange(grobs= list(plot1, plot2, qqplot1, qqplot2, boxplot1, boxplot2),
ncol=2, nrow=3,
top = ("Histograms + Quantile Graphics + Boxplots"))
But this error message pops up:
Error: stat_bin() requires an x or y aesthetic.
any idea how to solve this?
As people said in the comments the error was the aes() of one of the plots. The confussion came as R allows you to create an object even when it´s not operational, I guess this is because it can be modified later. This is the code for the plot:
ggplot(data = mtcars, aes(sample=mtcars$mpg)) +
geom_histogram(aes(y = ..density.., fill = ..count..), binwidth = 1) +
geom_density(alpha=.2) +
scale_fill_gradient(low = "#6ACE78", high = "#0D851D") +
stat_function(fun = dnorm, colour = "firebrick",
args = list(mean = mean(mtcars$mpg),
sd = sd(mtcars$mpg))) +
labs(x = "Tiempo de seguimiento", y = "")+
theme_bw()
As you can see, the mistake is the first aes() argument, as I wrote sample= instead of x=. Already solved.
Thanks
I would like to plot densities of two variables ("red_variable", "green_variable") from two independent dataframes on one density plot, using red and green color for the two variables.
This is my attempt at coding:
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = red_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = green_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
Result: The legend shows correct colors, but the colors on the plot are wrong: The "red" variable is plotted with green color, the "green" variable with red color. The "green" density (mean=8) should appear left and the "red" density (mean=12) on the right on the x-axis. This behavior of the plot doesn't make any sense to me.
I can in fact get the desired result by switching red and green in the code:
### load ggplot2
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = green_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = red_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
... While the plot makes sense now, the code doesn't. I cannot really trust code doing the opposite of what I would expect it to do. What's the problem here? Am I color blind?
On your code, in order to have color at the right position, you need to specify fill = red_color or fill = green_color (as well as alpha as it is a constant - as pointed out by #Gregor) outside of the aes such as:
...+
geom_density(aes(x=red_variable), alpha=0.5, fill = red_color, data=red_dataframe) +
geom_density(aes(x=green_variable), alpha=0.5, fill = green_color, data=green_dataframe) + ...
Alternatively, you can bind your dataframes together, reshape them into a longer format (much more appropriate to ggplot) and then add color column that you can use with scale_fill_identity function (https://ggplot2.tidyverse.org/reference/scale_identity.html):
df <- cbind(red_dataframe,green_dataframe)
library(tidyr)
library(ggplot2)
library(dplyr)
df <- df %>% pivot_longer(.,cols = c(red_variable,green_variable), names_to = "var",values_to = "val") %>%
mutate(Color = ifelse(grepl("red",var),red_color,green_color))
ggplot(df, aes(val, fill = Color))+
geom_density(alpha = 0.5)+
scale_fill_identity(guide = "legend", name = "Legend", labels = levels(as.factor(df$var)))+
xlab("X value") +
ylab("Density")
Does it answer your question ?
You're trying to use ggplot as if it's base graphics... the mindset shift can take a little while to get used to. dc37's answer shows how you should do it. I'll try to explain what goes wrong in your attempt:
When you put fill = green_color inside aes(), because it's inside aes() ggplot essentially creates a new column of data filled with the green_color values in your green_data_frame, i.e., "#008000", "#008000", "#008000", .... Ditto for the red color values in the red data frame. We can see this if we modify your plot by simply deleting your scale:
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
We can actually get what you want by putting the identity scale, which is designed for the (common in base, rare in ggplot2) case where you actually put color values in the data.
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
scale_fill_identity() +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
When you added your scale_fill_manual, ggplot was like "okay, cool, you want to specify colors and labels". But you were thinking in the order that you added the layers to the plot (much like base graphics), whereas ggplot was thinking of these newly created variables "#FF0000" and "#008000", which it ordered alphabetically by default (just as if they were factor or character columns in a data frame). And since you happened to add the layers in reverse alphabetical order, it was switched.
dc37's answer shows a couple better methods. With ggplot you should (a) work with a single, long-format data frame whenever possible (b) don't put constants inside aes() (constant color, constant alpha, etc.), (c) set colors in a scale_fill_* or scale_color_* function when they're not constant.
I want to create a graph that looks something like this:
However, I would like to incorporate density based on the connected lines (and not individual plot points, as the graph above using geom_density_2d does). The data, in reality, looks something like this:
Where I am showing gene expression over a 4-point time series (y = gene expression value, x = time) In both examples, the centre line was created using LOESS curve fitting.
How can I create a density or contour plot based on the actual individual connecting lines that span from time=1 to time=4?
This is what have done so far:
# make a dataset
test <- data.frame(gene=rep(c((1:500)), each=4),
time=rep(c(1:4), 125),
value=rep(c(1,2,3,1), 125))
# add random noise to dataset
test$value <- jitter(test$value, factor=1,amount=2)
# first graph created as follows:
ggplot(data=test, aes(x=time, y=value)) +
geom_density_2d(colour="grey") +
scale_x_continuous(limits = c(0,5),
breaks = seq(1,4),
minor_breaks = seq(1)) +
scale_y_continuous(limits = c(-3,8)) +
guides(fill=FALSE) +
theme_classic()
# second plot created as follows
ggplot(test, aes(time, value)) +
geom_line(aes(group = gene),
size = 0.5,
alpha = 0.3,
color = "snow3") +
geom_point() +
scale_y_continuous(limits = c(-3, 8)) +
scale_x_continuous(breaks = seq(1,4), minor_breaks = seq(1)) +
theme_classic()
Thanks in advance for your help!
I have a dataset with binary variables like the one below.
M4 = matrix(sample(1:2,20*5, replace=TRUE),20,5)
M4 <- as.data.frame(M4)
M4$id <- 1:20
I have produced a stacked bar plot using the code below
library(reshape)
library(ggplot2)
library(scales)
M5 <- melt(M4, id="id")
M5$value <- as.factor(M5$value)
ggplot(M5, aes(x = variable)) + geom_bar(aes(fill = value), position = 'fill') +
scale_y_continuous(labels = percent_format())
Now I want the percentage for each field in each bar to be displayed in the graph, so that each bar reach 100%. I have tried 1, 2, 3 and several similar questions, but I can't find any example that fits my situation. How can I manage this task?
Try this method:
test <- ggplot(M5, aes(x = variable, fill = value, position = 'fill')) +
geom_bar() +
scale_y_continuous(labels = percent_format()) +
stat_bin(aes(label=paste("n = ",..count..)), vjust=1, geom="text")
test
EDITED: to give percentages and using the scales package:
require(scales)
test <- ggplot(M5, aes(x = variable, fill = value, position = 'fill')) +
geom_bar() +
scale_y_continuous(labels = percent_format()) +
stat_bin(aes(label = paste("n = ", scales::percent((..count..)/sum(..count..)))), vjust=1, geom="text")
test
You could use the sjp.stackfrq function from the sjPlot-package (see examples here).
M4 = matrix(sample(1:2,20*5, replace=TRUE),20,5)
M4 <- as.data.frame(M4)
sjp.stackfrq(M4)
# alternative colors: sjp.stackfrq(M4, barColor = c("aquamarine4", "brown3"))
Plot appearance can be custzomized with various parameters...
I really like the usage of the implicit information that is created by ggplot itself, as described in this post:
using the ggplot_build() function
From my point of view this provides a lot of opportunities to finally control the appearance of a ggplot chart.
Hope this helps somehow
Tom
I'm trying to achieve an output where the fill gradient is independent on each histogram. I know I could make individual plots and then combine them using grid.arrange, but I want this to work on a data set with any number of columns.
Any help is appreciated.
P.S. I would include an image but I don't have the reputation points.
# rm(list=ls())
var_his <- function(this_data){
this_data <- melt(this_data)
ggplot(this_data, aes(x = value)) +
geom_histogram(aes(x = value, y = ..density.., fill = ..count..), position="identity") +
facet_wrap(~variable, scales = "free") +
scale_fill_gradient('count', low='lightblue', high='steelblue')
}
data(Seatbelts)
data <- data.frame(Seatbelts)
var_his(data)