Related
I know how to nicely split density plots by a binary variable (i.e. sex), but I want to compare and overlay density plots comparing data which contains NA values (in a specified column) and data that doesn't.
I have my data and then create subsets:
data_NA <- data[is.na(data$x4), ]
data_notNA <- data[!is.na(data$x4), ]
I then want to create histograms and density plots of the other variables to see how they they are distributed differently in each subset.
What would I add to compare these histograms easily side-by-side for the different subsets?
sex_hist <- ggplot(data = data) + geom_histogram(mapping = aes(x=factor(sex)), stat="count") + scale_x_discrete(labels = c("1" = "Female", "2" = "Male")) + xlab("Sex")
I could just make two and use grid.arrange(), but I was hoping there might be a neater way.
And how would I overlay age density plots for the different data subsets for example:
density_DE_age <- ggplot(data = data, aes(x=age, fill = sex)) + geom_density(alpha = 0.5, position = 'identity'))
(Instead of based on sex)
Create a variable indicating whether x4 is missing, then facet by it.
data$x4_missing <- is.na(data$x4)
sex_hist <- ggplot(data = data) +
geom_histogram(mapping = aes(x=factor(sex)), stat="count") +
scale_x_discrete(labels = c("1" = "Female", "2" = "Male")) + \.
xlab("Sex") +
facet_wrap(vars(x4_missing))
density_DE_age <- ggplot(data = data, aes(x=age, fill = sex)) +
geom_density(alpha = 0.5, position = 'identity')) +
facet_wrap(vars(x4_missing))
I am trying to plot a graph in ggplot2 where the x-axis represents month-day combinations, the dots represent y-values for two different groups.
When graphing my original data set using this code,
ggplot(graphing.df, aes(MONTHDAY, y.var, color = GROUP)) +
geom_point() +
ylab(paste0(""))+
scale_x_discrete(breaks = function(x) x[seq(1, length(x), by = 15)])+
theme(legend.text = element_blank(),
legend.title = element_blank()) +
geom_vline(xintercept = which(graphing.df$MONTHDAY == "12-27")[1], col='red', lwd=2)
I get this graph where the vertical line is not showing.
When I tried to create a reproducible example using the following code...
df <- data.frame(MONTHDAY = c("01-01", "01-01", "01-02", "01-02", "01-03", "01-03"),
TYPE = rep(c("A", "B"), 3),
VALUE = sample(1:10, 6, replace = TRUE))
verticle_line <- "01-02"
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
#geom_vline(xintercept = which(df$MONTHDAY == verticle_line)[1], col='red', lwd=2)+
geom_vline(xintercept = which(df$MONTHDAY == verticle_line), col='blue', lwd=2)
The vertical line is showing, but now its showing in the wrong place
In my original data set I have two values for each month-day combination (representing each of the two groups). The month-day combination column is a character vector, it is not a factor and does not have levels.
Here is a way. It subsets the data keeping only the rows of interest and plots the vertical line defined by MONTHDAY.
library(ggplot2)
verticle_line <- "01-02"
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(data = subset(df, MONTHDAY == verticle_line),
mapping = aes(xintercept = MONTHDAY), color = 'blue', size = 2)
Data
I will repost the data creation code, this time setting the RNG seed in order to make the example reproducible.
set.seed(2020)
df <- data.frame(MONTHDAY = c("01-01", "01-01", "01-02", "01-02", "01-03", "01-03"),
TYPE = rep(c("A", "B"), 3),
VALUE = sample(1:10, 6, replace = TRUE))
The reason your line is not showing up where you expect is because you are setting the value of xintercept= via the output of the which() function. which() returns the index value where the condition is true. So in the case of your reproducible example, you get the following:
> which(df$MONTHDAY == verticle_line)
[1] 3 4
It returns a vector indicating that in df$MONTHDAY, indexes 3 and 4 in that vector are true. So your code below:
geom_vline(xintercept = which(df$MONTHDAY == verticle_line)...
Reduces down to this:
geom_vline(xintercept = c(3,4)...
Your MONTHDAY axis is not formatted as a date, but treated as a discrete axis of character vectors. In this case xintercept=c(3,4) applied to a discrete axis draws two vertical lines at x intercepts equivalent to the 3rd and 4th discrete position on that axis: in other words, "01-03" and... some unknown 4th position that is not observable within the axis limits.
How do you fix this? Just take out which():
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(xintercept = verticle_line, col='blue', lwd=2)
We can get the corresponding values of 'MONTHDAY' after subsetting
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(xintercept = df$MONTHDAY[df$MONTHDAY == verticle_line],
col='blue', lwd=2)
I'm attempting to add a legend to a time series chart and I've so far been unable to get any traction. I've provided the working code below, which pulls three economic data series into one chart and applies several changes to get in a format/overall aesthetic that I'd like. I should also add that the chart is graphing the y/y change of quarterly data sets.
I've only been able to find examples of individuals using scale_colour_manual to add a legend - I've provided code that I put together below.
Ideally, the legend just needs to appear to the right of the graph with the color and line chart.
Any help would be greatly appreciated!
library(quantmod)
library(TTR)
library(ggthemes)
library(tidyverse)
Nondurable <- getSymbols("PCND", src = "FRED", auto.assign = F)
Nondurable$chng <- ROC(Nondurable$PCND,4)
Durable <- getSymbols("PCDG", src = "FRED", auto.assign = F)
Durable$chng <- ROC(Durable$PCDG,4)
Services <- getSymbols("PCESV", src = "FRED", auto.assign = F)
Services$chng <- ROC(Services$PCESV, 4)
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng), color = "#5b9bd5", size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng), color = "#00b050", size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng), color = "#ed7d31", size = 1, linetype = "twodash") +
theme_tufte() +
scale_y_continuous(labels = percent, limits = c(-0.01,.09)) +
xlim(as.Date(c('1/1/2010', '6/30/2019'), format="%d/%m/%Y")) +
labs(y = "Percent Change", x = "", caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis") +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(name = 'Legend',
guide = 'legend',
values = c('Nondurable' = '#5b9bd5',
'Durable' = '#00b050',
'Services' = '#ed7d31'),
labels = c('Nondurable',
'Durable',
'Services'))
I receive the following warning messages when I run the program (the chart still plots though).
Warning messages:
1: Removed 252 rows containing missing values (geom_path).
2: Removed 252 rows containing missing values (geom_path).
3: Removed 252 rows containing missing values (geom_path).
There are two reasons you are receiving this error:
The bulk are being removed because of your limits. When you use xlim() or scale_y_continuous(..., limits = ...) ggplot removes the values beyond these limits from your data before plotting and displays that warning as an FYI. After commenting out both of those lines, you will still see a message about removed values but a much smaller number. This is becuase
you have NA values in the first 4 rows of column chng. This is true in all 3 datasets.
For the scales to show, you need to put something differentiating the lines in the aes() as in aes(..., color = "Nondurable"). See if this solution works for you:
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng, color = "Nondurable"), size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng, color = "Durable"), size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng, color = "Services"), size = 1, linetype = "twodash") +
theme_tufte() +
labs(
y = "Percent Change",
x = "",
caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis"
) +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(
name = "Legend",
values = c("#5b9bd5","#00b050","#ed7d31"),
labels = c("Nondurable", "Durable", "Services"
)
) +
scale_x_date(limits = as.Date(c("2010-01-01", "2019-02-01")))
I am trying to create a boxplot using ggplot2, and need to have two axes from the same data frame representing two different scales. Essentially I am plotting surface area to volume ratios per two different species for three appendages, and one of the appendages has a very high SA:V ratio in comparison to the other two, which makes it difficult to have them all on the same graph.
I've recreated my data and code for the boxplot to demonstrate what I am talking about. If possible I would like the dorsal fins to be displayed on the same graph, but on a different y axis scale (that will also be shown on the graph) just so the boxes of the boxplot are all visible.
SAV <- c(seq(.35, .7, .01), seq(.09, .125, .001), seq(.09, .125, .001))
Type <- c(rep("Pectoral Fin", 36), rep("Dorsal fin", 36), rep("Fluke", 36))
Species <- c(rep(c(rep("Sp1", 18), rep("Sp2", 18)), 3))
appendage <- data.frame(SAV, Type, Species)
ggplot(aes(y = appendage$SAV,
x = factor(appendage$Type, levels = c("Dorsal fin", "Fluke")),
fill = appendage$Species),
data = appendage) +
geom_boxplot(outlier.shape = NA) +
labs(y = expression("SA:V("*cm^-1*")"), x="") +
scale_x_discrete(labels = c("PF", "DF", "F")) +
scale_fill_manual(values = c("black", "gray"))
If any one could help me with this that would be great!
One possibility is to use facet_wrap.
appendage %>%
mutate(
Type = factor(Type,
levels = c("Dorsal fin", "Fluke", "Pectoral Fin"),
labels = c("DF", "PF", "F"))) %>%
ggplot(aes(Type, SAV, fill = Species)) +
geom_boxplot(outlier.shape=NA) +
labs(y=expression("SA:V("*cm^-1*")"),x="") +
scale_fill_manual(values=c("black","gray")) +
facet_wrap(~Type, scales="free") +
theme(axis.ticks.x = element_blank(),
strip.background = element_blank(),
strip.text.x = element_blank())
First off, like what others have commented, I do not recommend this type of plot. Dual axes have a tendency to make comparisons harder, & visually confuse the audience even when they are aware of it.
That said, it is possible to achieve this using ggplot2, & I'll show one approach below, once we get past several other issues in the original code:
Issue 1: You are passing a data frame to ggplot(). The dollar sign $ has no place in aes() in such cases.
Instead of:
ggplot(aes(y = appendage$SAV,
x = factor(appendage$Type), # ignore the levels for now; see next issue
fill = appendage$Species),
data = appendage) +
...
Use:
ggplot(aes(y = SAV,
x = factor(Type),
fill = Species),
data = appendage) +
...
Issue 2: Which appendage has the extraordinarily high SA:V?
From the code used to generate the sample dataset, it should be "Pectoral Fin", but the final result shows "DF". I assume the mapping between full terms & axis labels to be:
"Pectoral Fin" -> "PF"
"Dorsal fin" -> "DF"
"Fin" -> "F"
... so this looks like a slip up between passing Type as a factor to the x parameter in aes(), and setting the axis labels in scale_x_discrete().
Since you're using factor(), it would be neater to set the labels there as well. Keeping it in the same place makes such things easier to spot.
Instead of:
ggplot(aes(y = SAV,
x = factor(Type, levels = c("Dorsal fin", "Fluke")),
fill = Species),
data = appendage) +
...
Use:
ggplot(aes(y = SAV,
x = factor(Type,
levels = c("Dorsal fin", "Fluke", "Pectoral Fin"),
labels = c("DF", "F", "PF")),
fill = Species),
data = appendage) +
...
I switched the order of factors as I feel it makes (marginally) more sense visually for the x-axis category corresponding to the secondary y-axis (typically on the right) to be on the right of other x-axis categories. You can change that if this isn't the desired case. Just make sure both levels = ... and labels = ... are changed together.
Solution for secondary y-axis
Manually re-scale the values of the offending appendage (whichever fin that turns out to be) until its range is somewhat similar to that of other appendages. (In the example below, I used a simple division of y / 5, but more complicated functions can be used too.)
Specify the sec.axis() option for the y-axis, using the inverse of the re-scaling function as the transformation. (In this case y * 5.)
Label the original y-axis (left) and the secondary y-axis (right) accordingly to make it clear which appendage(s) each axis's scale applies to.
Final code + result:
k = 5 #rescale factor
ggplot(aes(y = ifelse(Type == "Pectoral Fin",
SAV / k, SAV),
x = factor(Type,
levels = c("Dorsal fin", "Fluke", "Pectoral Fin"),
labels = c("DF", "F", "PF")),
fill = Species),
data = appendage) +
geom_boxplot(outlier.shape = NA) +
scale_y_continuous(sec.axis = sec_axis(trans = ~. * k,
name = expression("SA:V ("*cm^-1*") PF"))) +
labs(y = expression("SA:V ("*cm^-1*") DF / F"), x = "") +
scale_fill_manual(values = c("black", "gray"))
I have temporal data of gas emissions from two species of plant, both of which have been subjected to the same treatments. With some previous help to get this code together [edit]:
soilflux = read.csv("soil_fluxes.csv")
library(ggplot2)
soilflux$Treatment <- factor(soilflux$Treatment,levels=c("L-","C","L+"))
soilplot = ggplot(soilflux, aes(factor(Week), Flux, fill=Species, alpha=Treatment)) + stat_boxplot(geom ='errorbar') + geom_boxplot()
soilplot = soilplot + labs(x = "Week", y = "Flux (mg m-2 d-1)") + theme_bw(base_size = 12, base_family = "Helvetica")
soilplot
Producing this which works well but has its flaws.
Whilst it conveys all the information I need it to, despite Google trawls and looking through here I just couldn't get the 'Treatment' part of the legend to show that L- is light and L+ darkest. I've also been told that a monochrome colour scheme is easier to differentiate hence I'm trying to get something like this where the legend is clear.
(source: biomedcentral.com)
As a workaround you could create a combined factor from species and treatment and assign the fill colors manually:
library(ggplot2)
library(RColorBrewer)
d <- expand.grid(week = factor(1:4), species = factor(c("Heisteria", "Simarouba")),
trt = factor(c("C", "L-", "L+"), levels = c("L-", "C", "L+")))
d <- d[rep(1:24, each = 30), ]
d$flux <- runif(NROW(d))
# Create a combined factor for coding the color
d$spec.trt <- interaction(d$species, d$trt, lex.order = TRUE, sep = " - ")
ggplot(d, aes(x = week, y = flux, fill = spec.trt)) +
stat_boxplot(geom ='errorbar') + geom_boxplot() +
scale_fill_manual(values = c(brewer.pal(3, "Greens"), brewer.pal(3, "Reds")))