I'm making a plot in which I have a 3x3 grid obtained from facet_wrap. Eight out of nine plots use geom_violin while the remaining plot is made using geom_bar. After finding some helpful answers here on the site, I got this all working. The problem that I have is that when I use fill = "white, color = "black" for my bar chart, it draws these lines inside the bars.
Here is some example code and figures.
library(tidyverse)
n <- 100
tib <- tibble(value = c(rnorm(n, mean = 100, sd = 10), rbinom(n, size = 1, prob = (1:4)/4)),
variable = rep(c("IQ", "Sex"), each = n),
year = factor(rep(2012:2015, n/2)))
ggplot(tib, aes(x = year, y = value)) +
facet_wrap(~variable, scales = "free_y") +
geom_violin(data = filter(tib, variable == "IQ")) +
geom_bar(data = filter(tib, variable == "Sex"), stat = "identity",
color = "black", fill = "white")
Now to my question: how do I get rid of these lines inside the bars? I just want it to be white with black borders. I've been experimenting a lot with various configurations, and I can manage to get rid of the lines but at the expense of screwing the facet up. I'm fairly certain it's got to do with the stat, but I'm at a loss trying to fix it. Any suggestions?
I would suggest summarizing the data within the barplot:
ggplot(tib, aes(x = year, y = value)) +
facet_wrap(~variable, scales = "free_y") +
geom_violin(data = filter(tib, variable == "IQ")) +
geom_bar(data = tib %>%
group_by(year,variable) %>%
summarise(value=sum(value)) %>%
filter(variable == "Sex"),
stat = "identity",
color = "black",
fill = "white")
I'm not sure this is a good way to represent the data, with the y-axes of the different panels representing very different things, but accept that your example might not match your actual use case. Making separate plots and then using gridExtra::grid.arrange, or cowplot::plot_grid is probably a better solution.
But if you want to do this
ggplot(tib, aes(x = year, y = value)) +
facet_wrap(~variable, scales = "free_y") +
geom_violin(data = filter(tib, variable == "IQ")) +
geom_col(data = filter(tib, variable == "Sex") %>%
group_by(year, variable) %>%
summarise(value = sum(value)),
fill = "white", colour = "black")
Using geom_col rather than geom_bar so I don't need to use stat = identity.
Related
I am plotting a distribution of two variables on a single histogram. I am interested in highlighting each distribution's mean value on that graph through a doted line or something similar (but hopefully something that matches the color present already in the aes section of the code).
How would I do that?
This is my code so far.
hist_plot <- ggplot(data, aes(x= value, fill= type, color = type)) +
geom_histogram(position="identity", alpha=0.2) +
labs( x = "Value", y = "Count", fill = "Type", title = "Title") +
guides(color = FALSE)
Also, is there any way to show the count of n for each type on this graph?
i've made some reproducible code that might help you with your problem.
library(tidyverse)
# Generate some random data
df <- data.frame(value = c(runif(50, 0.5, 1), runif(50, 1, 1.5)),
type = c(rep("type1", 50), rep("type2", 50)))
# Calculate means from df
stats <- df %>% group_by(type) %>% summarise(mean = mean(value),
n = n())
# Make the ggplot
ggplot(df, aes(x= value, fill= type, color = type)) +
geom_histogram(position="identity", alpha=0.2) +
labs(x = "Value", y = "Count", fill = "Type", title = "Title") +
guides(color = FALSE) +
geom_vline(data = stats, aes(xintercept = mean, color = type), size = 2) +
geom_text(data = stats, aes(x = mean, y = max(df$value), label = n),
size = 10,
color = "black")
If things go as intended, you'll end up something akin to the following plot.
histogram with means
I'm currently trying to plot mean values of a variable pt for each combination of species/treatments in my experiments. This is the code I'm using:
ggplot(data = data, aes(x=treat, y=pt, fill=species)) +
geom_bar(position = "dodge", stat="identity") +
labs(x = "Treatment",
y = "Proportion of Beetles on Treated Side",
colour = "Species") +
theme(legend.position = "right")
As you can see, the plot seems to assume the mean of my 5N and 95E treatments are 1.00, which isn't correct. I have no idea where the problem could be here.
Took a stab at what you are asking using tidyverse and ggplot2 which is in tidyverse.
dat %>%
group_by(treat, species) %>%
summarise(mean_pt = mean(pt)) %>%
ungroup() %>%
ggplot(aes(x = treat, y = mean_pt, fill = species, group = species)) +
geom_bar(position = "dodge", stat = "identity")+
labs(x = "Treatment",
y = "Proportion of Beetles on Treated Side",
colour = "Species") +
theme(legend.position = "right") +
geom_text(aes(label = round(mean_pt, 3)), size = 3, hjust = 0.5, vjust = 3, position = position_dodge(width = 1))
dat is the actual dataset. and I calculated the mean_pt as that is what you are trying to plot. I also added a geom_text piece just so you can see what the results were and compare them to your thoughts.
From my understanding, this won't plot the means of your y variable by default. Have you calculated the means for each treatment? If not, I'd recommend adding a column to your dataframe that contains the mean. I'm sure there's an easier way to do this, but try:
data$means <- rep(NA, nrow(data))
for (x in 1:nrow(data)) {
#assuming "treat" column is column #1 in your data fram
data[x,ncol(data)] <- mean(which(data[,1]==data[x,1]))
}
Then try replacing
geom_bar(position = "dodge", stat="identity")
with
geom_col(position = "dodge")
If your y variable already contains means, simply switching geom_bar to geom_col as shown should work. Geom_bar with stat = "identity" will sum the values rather than return the mean.
I want to create a customized legend that distinguishes two plotted geoms using appropriate shape and color. I see that guide_legend() should be involved, but my legend is presented with both shapes overlayed one on the other for both components of the legend. What is the right way to build these individual legend components using distinct shapes and colors? Thank you.
library(dplyr)
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
ggplot() +
geom_point(data = df, aes(x = year, y = annualNitrogen, fill="green"), shape=24, color="green", size = 4) +
geom_point(data = df, aes(x = year, y = annualPotassium, fill="blue"), color="blue", shape=21, size = 4) +
guides(fill = guide_legend(override.aes = list(color=c("green", "blue"))),
shape = guide_legend(override.aes = list(shape=c(21, 24)))
) +
scale_fill_manual(name = 'cumulative\nmaterial',
values = c("blue"="blue" , "green"="green" ),
labels = c("potassium" , "nitrogen") ) +
theme_bw() +
theme(legend.position="bottom")
Here it helps to transform to "long" format which is more in line with how ggplot is designed to be used when separating factor levels within a single time series.
This allows us to map shape and color directly, rather than having to manually assign different values to multiple plotted series, like you do in your question.
library(tidyverse)
df %>%
pivot_longer(-year, names_to = "element") %>%
ggplot(aes(x=year, y = value, fill = element, shape = element, color = element)) +
geom_point(size = 4)+
scale_color_manual(values = c("green", "blue"))
Put your df into a long format that ggplot likes with tidyr::gather. You should only use one geom_point for this, you don't need separate geoms for separate variables. You can then specify the shape and variable in one call to geom_point.
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
df <- tidyr::gather(df, key = 'variable', value='value', annualNitrogen, annualPotassium)
ggplot(df) +
geom_point(aes(x = year, y = value, shape = variable, color = variable)) +
scale_color_manual(
name = 'cumulative\nmaterial',
values = c(
"annualPotassium" = "blue",
"annualNitrogen" = "green"),
labels = c("potassium" , "nitrogen")) +
guides(shape = FALSE)
One of the value in my dataset is zero, I think because of that I am not able to adjust labels correctly in my pie chart.
#Providing you all a sample dataset
Averages <- data.frame(Parameters = c("Cars","Motorbike","Bicycle","Airplane","Ships"), Values = c(15.00,2.81,50.84,51.86,0.00))
mycols <- c("#0073C2FF", "#EFC000FF", "#868686FF", "#CD534CFF","#FF9999")
duty_cycle_pie <- Averages %>% ggplot(aes(x = "", y = Values, fill = Parameters)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0)+
geom_text(aes(y = cumsum(Values) - 0.7*Values,label = round(Values*100/sum(Values),2)), color = "white")+
scale_fill_manual(values = mycols)
Labels are not placed in the correct way. Please tell me how can get 3D piechart.
Welcome to stackoverflow. I am happy to help, however, I must note that piecharts are highly debatable and 3D piecharts are considered bad practice.
https://www.darkhorseanalytics.com/blog/salvaging-the-pie
https://en.wikipedia.org/wiki/Misleading_graph#3D_Pie_chart_slice_perspective
Additionally, if the names of your variables reflect your actual dataset (Averages), a piechart would not be appropriate as the pieces do not seem to be describing parts of a whole. Ex: avg value of Bicycle is 50.84 and avg value of Airplane is 51.86. Having these result in 43% and 42% is confusing; a barchart would be easier to follow.
Nonetheless, the answer to your question about placement can be solved with position_stack().
library(tidyverse)
Averages <-
data.frame(
Parameters = c("Cars","Motorbike","Bicycle","Airplane","Ships"),
Values = c(15.00,2.81,50.84,51.86,0.00)
) %>%
mutate(
# this will ensure the slices go biggest to smallest (a best practice)
Parameters = fct_reorder(Parameters, Values),
label = round(Values/sum(Values) * 100, 2)
)
mycols <- c("#0073C2FF", "#EFC000FF", "#868686FF", "#CD534CFF","#FF9999")
Averages %>%
ggplot(aes(x = "", y = Values, fill = Parameters)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0) +
geom_text(
aes(y = Values, label = label),
color = "black",
position = position_stack(vjust = 0.5)
) +
scale_fill_manual(values = mycols)
To move the pieces towards the outside of the pie, you can look into ggrepel
https://stackoverflow.com/a/44438500/4650934
For my earlier point, I might try something like this instead of a piechart:
ggplot(Averages, aes(Parameters, Values)) +
geom_col(aes(y = 100), fill = "grey70") +
geom_col(fill = "navyblue") +
coord_flip()
I am trying to change the facet labels without revising the data frame, and add a vertical line to the plot to enhance understandability.
library(ggplot2)
df <- data.frame(weeks = rep(-3:3, each = 2),
is_manual = rep(c(TRUE, FALSE), times = 7),
value = c(rnorm(7, 10), rnorm(7, 20)))
# Plotting
ggplot(df, aes(x = weeks, y = value)) + geom_line() +
facet_grid(is_manual ~ .) +
geom_vline(xintercept = 0, color = "blue", linetype = 2)
gives me this, which works fine:
Now I'd like to change the facet labels so that everyone knows what TRUE and FALSE are.
ggplot(df, aes(x = weeks, y = value)) + geom_line() +
facet_grid(ifelse(is_manual, "Manual", "Uploaded") ~ .) +
geom_vline(xintercept = 0, color = "blue", linetype = 2)
, but it returns an error:
Error in ifelse(is_manual, "Manual", "Uploaded"): object 'is_manual' not found
However, once I remove the geom_vline part, it works as normal, which means 'is_manual' should be able to be found.
ggplot(df, aes(x = weeks, y = value)) + geom_line() +
facet_grid(ifelse(is_manual, "Manual", "Uploaded") ~ .)
I can work around by doing
df$is_manual <- ifelse(df$is_manual, "Manual", "Uploaded")
ggplot(df, aes(x = weeks, y = value)) + geom_line() +
facet_grid(is_manual ~ .) +
geom_vline(xintercept = 0, color = "blue", linetype = 2)
, but it changes my underlying data.
Is there a way that I can change the facet labels and add the vertical line at the same time while not changing the data frame contents? Or is this a bug which needs reporting?
Just guessing here, but maybe geom_vline is getting the facet label names from the original data frame df passed to ggplot, rather than from the updated formula used in facet_grid, causing an error when geom_vline can't figure out where the facets are.
In any case, instead of changing your underlying data, you can update it on the fly using the dplyr pipe (%>%) and avoid the error:
library(tidyverse)
ggplot(df %>% mutate(is_manual = ifelse(is_manual, "Manual", "Uploaded"),
aes(x = weeks, y = value)) + geom_line() +
facet_grid(is_manual ~ .) +
geom_vline(xintercept = 0, color = "blue", linetype = 2)