Related
I know how to nicely split density plots by a binary variable (i.e. sex), but I want to compare and overlay density plots comparing data which contains NA values (in a specified column) and data that doesn't.
I have my data and then create subsets:
data_NA <- data[is.na(data$x4), ]
data_notNA <- data[!is.na(data$x4), ]
I then want to create histograms and density plots of the other variables to see how they they are distributed differently in each subset.
What would I add to compare these histograms easily side-by-side for the different subsets?
sex_hist <- ggplot(data = data) + geom_histogram(mapping = aes(x=factor(sex)), stat="count") + scale_x_discrete(labels = c("1" = "Female", "2" = "Male")) + xlab("Sex")
I could just make two and use grid.arrange(), but I was hoping there might be a neater way.
And how would I overlay age density plots for the different data subsets for example:
density_DE_age <- ggplot(data = data, aes(x=age, fill = sex)) + geom_density(alpha = 0.5, position = 'identity'))
(Instead of based on sex)
Create a variable indicating whether x4 is missing, then facet by it.
data$x4_missing <- is.na(data$x4)
sex_hist <- ggplot(data = data) +
geom_histogram(mapping = aes(x=factor(sex)), stat="count") +
scale_x_discrete(labels = c("1" = "Female", "2" = "Male")) + \.
xlab("Sex") +
facet_wrap(vars(x4_missing))
density_DE_age <- ggplot(data = data, aes(x=age, fill = sex)) +
geom_density(alpha = 0.5, position = 'identity')) +
facet_wrap(vars(x4_missing))
Using the following website (http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html), I made the graph below:
mtcars$`car name` <- rownames(mtcars) # create new column for car names
mtcars$mpg_z <- round((mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg), 2) # compute normalized mpg
mtcars$mpg_type <- ifelse(mtcars$mpg_z < 0, "below", "above") # above / below avg flag
mtcars <- mtcars[order(mtcars$mpg_z), ] # sort
mtcars$`car name` <- factor(mtcars$`car name`, levels = mtcars$`car name`) # convert to factor to retain sorted order in plot.
library(ggplot2)
theme_set(theme_bw())
# Plot
ggplot(mtcars, aes(x=`car name`, y=mpg_z, label=mpg_z)) +
geom_point(stat='identity', aes(col=mpg_type), size=6) +
scale_color_manual(name="Mileage",
labels = c("Above Average", "Below Average"),
values = c("above"="#00ba38", "below"="#f8766d")) +
geom_text(color="white", size=2) +
labs(title="Diverging Dot Plot",
subtitle="Normalized mileage from 'mtcars': Dotplot") +
ylim(-2.5, 2.5) +
coord_flip()
My Question: I want to modify the above graph so that there are "2 dots" (green and red) on each horizontal line, representing the values of two different variables.
I created a data set for this example:
my_data = data.frame(var_1_col = "red", var_2_col = "green", var_1 = rnorm(8,10,10), var_2 = rnorm(8,5,1), name = c("A", "B", "C", "D", "E", "F", "G", "H"))
var_1_col var_2_col var_1 var_2 name
1 red green 14.726642 4.676161 A
2 red green 11.011187 4.937376 B
3 red green 12.418489 5.869617 C
4 red green 21.935154 5.641106 D
5 red green 20.209498 6.193123 E
6 red green -5.339944 5.187093 F
7 red green 20.540806 3.895683 G
8 red green 21.619631 4.097438 H
Then, I tried to create the graph - but it comes out as empty:
# Plot
ggplot(my_data, aes(x=name, y=var_1, label=name)) +
geom_point(stat='identity', aes(col=var_1_col), size=6) +
scale_color_manual(name="Var 1 or Var 2",
labels = c("Var 1", "Var 2"),
values = c("Var 1"="#00ba38", "Var 2"="#f8766d")) +
geom_text(color="white", size=2) +
labs(title="Plot",
subtitle="Plot: Dotplot") +
ylim(-2.5, 2.5) +
coord_flip()
Ideally, I would like the graph to look something like this:
Can someone please show me how to do this?
Thanks!
Note: var_1 could be some variable like "average fuel price" and var_2 could be "median fuel price"
I recommend putting the data into a long format, as it is the preference when plotting with ggplot2. So, I would just drop the two color columns as you can just set that in scale_color_manual. Then, in aes for geom_point, we can set that we want the two variables to be colored different (i.e., as their own group). Then, we can still set all of the labels, names, and colors in scale_color_manual.
library(tidyverse)
my_data %>%
select(-c(var_1_col, var_2_col)) %>%
pivot_longer(-name, names_to = "variable", values_to = "value") %>%
ggplot(., aes(x = name, y = value, label = name)) +
geom_point(stat = 'identity', aes(color = variable), size = 6) +
scale_color_manual(
name = "Var 1 or Var 2",
labels = c("Var 1", "Var 2"),
values = c("#00ba38", "#f8766d")
) +
labs(title = "Plot",
subtitle = "Plot: Dotplot") +
coord_flip() +
theme_bw()
Output
I want to modify [...], representing the values of two different variables.
If you're looking to plot two different variables on the same graph (and they share a common axis like the names in this case), you can construct two separate geom_point arguments.
ggplot(my_data) +
geom_point(aes(x=name, y=var_1, col=var_1_col)) +
geom_point(aes(x=name, y=var_2, col=var_2_col)) +
coord_flip()
You don't always have to define the axes/colors/labels in the initial ggplot function. By only specifying the dataset, then you can be flexible with the variables you use in the following graph-specific functions. That's how you can construct multiple graphs on one plot :)
I need to make 5 plots of bacteria species. Each plot has a different number of species present in a range of 30-90. I want each bacteria to always have the same color in all plots, therefore I need to set an assigned color to each name.
I tried to use scale_colour_manual to create a color set but, the environment created has only 16 colors. How can I increase the number of colors present in the environment created?
the code I am using can be replicated as follow:
colour_genus <- stringi::stri_rand_strings(90, 5) #to be random names
nb.cols = nrow(colour_genus) #to set the length of my string
MyPalette = colorRampPalette(brewer.pal(12,"Set1"))(nb.cols) # the palette of choice
colGenus <- scale_color_manual(name = colour_genus, values = MyPalette)
The output formed contains only 16 values, so when I try to apply it to a figure with 90 factors, it complains I have only 16 values
abundance <- runif(90, min = 10, max = 100)
my_data <- data.frame(colour_genus, abundance)
p <- ggplot(my_data, aes(x = colour_genus, y= abundance)) +
geom_bar(aes(color = colour_genus, fill = colour_genus), stat = "identity", position = "stack") +
labs(x = "", y = "Relative Abundance\n") +
theme(panel.background = element_blank())
p + theme(legend.text= element_text(size=7, face="bold"), axis.text.x = element_text(angle = 90)) + guides(fill=guide_legend(ncol=2)) + scale_fill_manual(values=colGenus)
The following error shows:
Error: Insufficient values in manual scale. 90 needed but only 16 provided.
Thank you very much for your help.
When you know all your 90 bacci names in front of plotting, you can try.
set.seed(123)
colour_genus <- sort(stringi::stri_rand_strings(90, 5))#to be random names. I sorted the vector to illustrate the output better (optional).
MyPalette <- sample(colors(), length(colour_genus))
# named vector for scale_fill
names(MyPalette) <- colour_genus
# data
abundance <- runif(90, min = 10, max = 100)
my_data <- data.frame(colour_genus, abundance)
# two sets to show results
set1 <- my_data[20:30,]
set2 <- my_data[25:35,]
ggplot(set1, aes(x = colour_genus, y= abundance)) +
geom_col(aes(fill = colour_genus)) +
scale_fill_manual(values = MyPalette)
ggplot(set2, aes(x = colour_genus, y= abundance)) +
geom_col(aes(fill = colour_genus)) +
scale_fill_manual(values = MyPalette)
I'm attempting to add a legend to a time series chart and I've so far been unable to get any traction. I've provided the working code below, which pulls three economic data series into one chart and applies several changes to get in a format/overall aesthetic that I'd like. I should also add that the chart is graphing the y/y change of quarterly data sets.
I've only been able to find examples of individuals using scale_colour_manual to add a legend - I've provided code that I put together below.
Ideally, the legend just needs to appear to the right of the graph with the color and line chart.
Any help would be greatly appreciated!
library(quantmod)
library(TTR)
library(ggthemes)
library(tidyverse)
Nondurable <- getSymbols("PCND", src = "FRED", auto.assign = F)
Nondurable$chng <- ROC(Nondurable$PCND,4)
Durable <- getSymbols("PCDG", src = "FRED", auto.assign = F)
Durable$chng <- ROC(Durable$PCDG,4)
Services <- getSymbols("PCESV", src = "FRED", auto.assign = F)
Services$chng <- ROC(Services$PCESV, 4)
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng), color = "#5b9bd5", size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng), color = "#00b050", size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng), color = "#ed7d31", size = 1, linetype = "twodash") +
theme_tufte() +
scale_y_continuous(labels = percent, limits = c(-0.01,.09)) +
xlim(as.Date(c('1/1/2010', '6/30/2019'), format="%d/%m/%Y")) +
labs(y = "Percent Change", x = "", caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis") +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(name = 'Legend',
guide = 'legend',
values = c('Nondurable' = '#5b9bd5',
'Durable' = '#00b050',
'Services' = '#ed7d31'),
labels = c('Nondurable',
'Durable',
'Services'))
I receive the following warning messages when I run the program (the chart still plots though).
Warning messages:
1: Removed 252 rows containing missing values (geom_path).
2: Removed 252 rows containing missing values (geom_path).
3: Removed 252 rows containing missing values (geom_path).
There are two reasons you are receiving this error:
The bulk are being removed because of your limits. When you use xlim() or scale_y_continuous(..., limits = ...) ggplot removes the values beyond these limits from your data before plotting and displays that warning as an FYI. After commenting out both of those lines, you will still see a message about removed values but a much smaller number. This is becuase
you have NA values in the first 4 rows of column chng. This is true in all 3 datasets.
For the scales to show, you need to put something differentiating the lines in the aes() as in aes(..., color = "Nondurable"). See if this solution works for you:
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng, color = "Nondurable"), size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng, color = "Durable"), size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng, color = "Services"), size = 1, linetype = "twodash") +
theme_tufte() +
labs(
y = "Percent Change",
x = "",
caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis"
) +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(
name = "Legend",
values = c("#5b9bd5","#00b050","#ed7d31"),
labels = c("Nondurable", "Durable", "Services"
)
) +
scale_x_date(limits = as.Date(c("2010-01-01", "2019-02-01")))
I'm struggling with ggplot (I always do). There are a number of very similar questions about forcing ggplot to include zero value categories in legends - here and here (for example). BUT I (think I) have a slightly different requirement to which all my mucking about with scale_x_discrete and scale_fill_manual has not helped.
Requirement: As you can see; the right-hand plot has no data in the TM=5 category - so is missing. What I need is for that right plot to have category 5 shown on the axis but obviously with no points or box.
Current Plot Script:
#data
plotData <- data.frame("TM" = c(3,2,3,3,3,4,3,2,3,3,4,3,4,3,2,3,2,2,3,2,3,3,3,2,3,1,3,2,2,4,4,3,2,3,4,2,3),
"Score" = c(5,4,4,4,3,5,5,5,5,5,5,3,5,5,4,4,5,4,5,4,5,4,5,4,4,4,4,4,5,4,4,5,3,5,5,5,5))
#vars
xTitle <- bquote("T"["M"])
v.I <- plotData$TM
depVar <- plotData$Score
#plot
p <- ggplot(plotData, aes_string(x=v.I,y=depVar,color=v.I)) +
geom_point() +
geom_jitter(alpha=0.8, position = position_jitter(width = 0.2, height = 0.2)) +
geom_boxplot(width=0.75,alpha=0.5,aes_string(group=v.I)) +
theme_bw() +
labs(x=xTitle) +
labs(y=NULL) +
theme(legend.position='none',
axis.text=element_text(size=10, face="bold"),
axis.title=element_text(size=16))
Attempted Solutions:
drop=False to scales (suggested by #Jarretinha here) totally borks margins and x-axis labels
> plot + scale_x_discrete(drop=FALSE) + scale_fill_manual(drop=FALSE)
Following logic from here and manually setting the labels in scale_fill_manual does nothing and results in the same right-hand plot from example above.
> p + scale_fill_manual(values = c("red", "blue", "green", "purple", "pink"),
labels = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"),
drop=FALSE)
Playing with this logic and trying something with scale_x_discrete results in a change to category names on x-axis but the fifth is still missing AND the margins (as attempt 1) are borked again. BUT apparent that scale_x_discrete is important and NOT the whole answer
> p + scale_x_discrete(limits = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"), drop=FALSE)
ANSWER for above example courtesy of input from #Bouncyball & #aosmith
#data
plotData <- data.frame("TM" = c(3,2,3,3,3,4,3,2,3,3,4,3,4,3,2,3,2,2,3,2,3,3,3,2,3,1,3,2,2,4,4,3,2,3,4,2,3),
"Score" = c(5,4,4,4,3,5,5,5,5,5,5,3,5,5,4,4,5,4,5,4,5,4,5,4,4,4,4,4,5,4,4,5,3,5,5,5,5))
plotData$TM <- factor(plotData$TM, levels=1:5) # add correct (desired number of factors to input data)
#vars
xTitle <- bquote("T"["M"])
v.I <- plotData$TM
depVar <- plotData$Score
myPalette <- c('#5c9bd4','#a5a5a4','#4770b6','#275f92','#646464','#002060')
#plot
ggplot(plotData, aes_string(x=v.I,y=depVar,color=v.I)) +
geom_jitter(alpha=0.8, position = position_jitter(width = 0.2, height = 0.2)) +
geom_boxplot(width=0.75,alpha=0.5,aes_string(group=v.I)) +
scale_colour_manual(values = myPalette, drop=F) + # new line added here
scale_x_discrete(drop=F) + # new line added here
theme_bw() +
labs(x=xTitle) +
labs(y=NULL) +
theme(legend.position='none',
axis.text=element_text(size=10, face="bold"),
axis.title=element_text(size=16))
Here's a workaround you could use:
# generate dummy data
set.seed(123)
df1 <- data.frame(lets = sample(letters[1:4], 20, replace = T),
y = rnorm(20), stringsAsFactors = FALSE)
# define factor, including the missing category as a level
df1$lets <- factor(df1$lets, levels = letters[1:5])
# make plot
ggplot(df1, aes(x = lets, y = y))+
geom_boxplot(aes(fill = lets))+
geom_point(data = NULL, aes(x = 'e', y = 0), pch = NA)+
scale_fill_brewer(drop = F, palette = 'Set1')+
theme_bw()
Basically, we plot an "empty" point (i.e. pch = NA) so that the category shows up on the x-axis, but has no visible geom associated with it. We also define our discrete variable, lets as a factor with five levels when only four are present in the data.frame. The missing category is the letter e.
NB: You'll have to adjust the positioning of this "empty" point so that it doesn't skew your y axis.
Otherwise, you could use the result from this answer to avoid having to plot an "empty" point.
# generate dummy data
set.seed(123)
df1 <- data.frame(lets = sample(letters[1:4], 20, replace = T),
y = rnorm(20), stringsAsFactors = FALSE)
# define factor, including the missing category as a level
df1$lets <- factor(df1$lets, levels = letters[1:5])
# make plot
ggplot(df1, aes(x = lets, y = y)) +
geom_boxplot(aes(fill = lets)) +
scale_x_discrete(drop = F) +
scale_fill_brewer(drop = F, palette = 'Set1') +
theme_bw()