Handling NA's and Legends in ggplot2 - r

I've got a dataframe that looks like:
df<-data.frame(Date=as.Date(c("06-08-10","06-09-10","06-10-10","06-11-10","06-13-10")),closed_this_year_cum_gv=c(3,5,6,7,NA),opened_this_year_cum_gv=c(2,5,6,8,10),closed_last_year_cum_gv=c(5,6,7,8,10),opened_last_year_cum_gv=c(5,6,8,10,NA))
and have this framework for a plot using ggplot2:
ggplot(df, aes(x=Date))+
geom_line(aes(y=closed_this_year_cum_gv, color="blue"),linetype="dashed")+
geom_line(aes(y=opened_this_year_cum_gv, color="blue"))+
geom_line(aes(y=closed_last_year_cum_gv, color="red"),linetype="dashed")+
geom_line(aes(y=opened_last_year_cum_gv, color="red"))+
xlab("Date")+
ylab("Millions of Dollars")+
ggtitle("Cummulative Sum of TGV for Opened and Closed Cases - 2013 vs. 2012")
I tried this with the sample data but for some reason the lines aren't showing up (they're showing up with my real data). I want the NAs to not be graphed, which is why they aren't 0.
In my real data, it graphs, but the legend title has "blue" and it's contents are "blue" and "red" as labels. I want them to be labeled by year and opened/closed. I've tried various methods but nothing seems to override the legend.
How do I control the legend title and labels?
Edit: changed to class "Date"

ggplot is generelly happier to be fed with data in 'long' format, as opposed to wide. Then it is, among other things, easier to map different aesstetics to variables in the data set.
# some data massage before the plot
# reshape data from wide to long format
library(reshape2)
df2 <- melt(df)
# convert variable 'Date' to class 'Date'
df2$Date <- as.Date(df2$Date, format = "%m-%d-%y")
# create two variables
# var1: opened vs closed
df2$var1 <- ifelse(grepl(x = df2$variable, pattern = "opened"), "Opened", "Closed")
# set factor levels so that 'opened' comes before 'closed'
df2$var1 <- factor(df2$var1, levels = c("Opened", "Closed"))
# var2: this vs last year
df2$var2 <- ifelse(grepl(x = df2$variable, pattern = "this"), "This year", "Last year")
# plot
# use default colours, slightly pale 'red' and 'blue'
ggplot(df2, aes(x = Date, y = value, linetype = var1, colour = var2, group = interaction(var1, var2))) +
geom_line()
# if you want to set colours to red and blue, add this
+ scale_colour_manual(values = c("red", "blue"))
Update following comment
If you only want one legend, one possibility is to let linetype and colour to depend on 'variable'.
# set factor levels so that 'opened' comes before 'closed', and 'last' before 'this'
df2$variable <- factor(df2$variable,
levels = c("opened_last_year_cum_gv",
"closed_last_year_cum_gv",
"opened_this_year_cum_gv",
"closed_this_year_cum_gv")
)
ggplot(df2, aes(x = Date, y = value, linetype = variable, colour = variable, group = variable)) +
geom_line() +
scale_colour_manual(values = rep(c("red", "blue"), each = 2),
name = "",
labels = c("Opened last year",
"Closed last year",
"Opened this year",
"Closed this year")) +
scale_linetype_manual(values = rep(c("solid", "dashed"), 2),
name = "",
labels = c("Opened last year",
"Closed last year",
"Opened this year",
"Closed this year"))

You need to specify appropriate mappings in aes(). Try this:
ggplot(df, aes(x=Date)) +
geom_line(aes(y=closed_this_year_cum_gv, color="this", linetype="closed")) +
geom_line(aes(y=opened_this_year_cum_gv, color="this", linetype="opened")) +
geom_line(aes(y=closed_last_year_cum_gv, color="last", linetype="closed")) +
geom_line(aes(y=opened_last_year_cum_gv, color="last", linetype="opened")) +
xlab("Date") +
ylab("Millions of Dollars") +
ggtitle("Cummulative Sum of TGV for Opened and Closed Cases - 2013 vs. 2012") +
scale_colour_manual(name="year", values=c("this"="blue", "last"="red")) +
scale_linetype_manual(name="type", values=c(2, 1))

Related

Changing the variable displayed on the x-axis in ggplot

I have a dataframe which contains a variable for week-since-2017. So, it counts up from 1 to 313 in that column. I mutated another variable into the dataframe to indicate the year. So, in my scatterplot, I have each week as a point, but the x-axis is horrid, counting up from 1 to 313. Is there a way I can change the scale at the bottom to instead display the variable year, possibly even adding vertical lines in between to show when the year changes?
Currently, I have this:
ggplot(HS, aes(as.integer(Obs), Total)) + geom_point(aes(color=YEAR)) + geom_smooth() + labs(title="Weekly Sales since 2017",x="Week",y="Written Sales") + theme(axis.line = element_line(colour = "orange", size = 1, linetype = "solid"))
You can convert the number of weeks to a number of days using 7 * Obs and add this value on to the start date (as.Date('2017-01-01')). This gives you a date-based x axis which you can format as you please.
Here, we set the breaks at the turn of each year so the grid fits to them:
ggplot(HS, aes(as.Date('2017-01-01') + 7 * Obs, Total)) +
geom_point(aes(color = YEAR)) +
geom_smooth() +
labs(title = "Weekly Sales since 2017", x = "Week", y = "Written Sales") +
theme(axis.line = element_line(colour = "orange", size = 1)) +
scale_x_date('Year', date_breaks = 'year', date_labels = '%Y')
Data used
Obviously, we don't have your data, so I had to create a reproducible set with the same names and similar values to yours for the above example:
set.seed(1)
HS <- data.frame(Obs = 1:312,
Total = rnorm(312, seq(1200, 1500, length = 312), 200)^2,
YEAR = rep(2017:2022, each = 52))

R: Adding Two Series to a Graph

Using the following website (http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html), I made the graph below:
mtcars$`car name` <- rownames(mtcars) # create new column for car names
mtcars$mpg_z <- round((mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg), 2) # compute normalized mpg
mtcars$mpg_type <- ifelse(mtcars$mpg_z < 0, "below", "above") # above / below avg flag
mtcars <- mtcars[order(mtcars$mpg_z), ] # sort
mtcars$`car name` <- factor(mtcars$`car name`, levels = mtcars$`car name`) # convert to factor to retain sorted order in plot.
library(ggplot2)
theme_set(theme_bw())
# Plot
ggplot(mtcars, aes(x=`car name`, y=mpg_z, label=mpg_z)) +
geom_point(stat='identity', aes(col=mpg_type), size=6) +
scale_color_manual(name="Mileage",
labels = c("Above Average", "Below Average"),
values = c("above"="#00ba38", "below"="#f8766d")) +
geom_text(color="white", size=2) +
labs(title="Diverging Dot Plot",
subtitle="Normalized mileage from 'mtcars': Dotplot") +
ylim(-2.5, 2.5) +
coord_flip()
My Question: I want to modify the above graph so that there are "2 dots" (green and red) on each horizontal line, representing the values of two different variables.
I created a data set for this example:
my_data = data.frame(var_1_col = "red", var_2_col = "green", var_1 = rnorm(8,10,10), var_2 = rnorm(8,5,1), name = c("A", "B", "C", "D", "E", "F", "G", "H"))
var_1_col var_2_col var_1 var_2 name
1 red green 14.726642 4.676161 A
2 red green 11.011187 4.937376 B
3 red green 12.418489 5.869617 C
4 red green 21.935154 5.641106 D
5 red green 20.209498 6.193123 E
6 red green -5.339944 5.187093 F
7 red green 20.540806 3.895683 G
8 red green 21.619631 4.097438 H
Then, I tried to create the graph - but it comes out as empty:
# Plot
ggplot(my_data, aes(x=name, y=var_1, label=name)) +
geom_point(stat='identity', aes(col=var_1_col), size=6) +
scale_color_manual(name="Var 1 or Var 2",
labels = c("Var 1", "Var 2"),
values = c("Var 1"="#00ba38", "Var 2"="#f8766d")) +
geom_text(color="white", size=2) +
labs(title="Plot",
subtitle="Plot: Dotplot") +
ylim(-2.5, 2.5) +
coord_flip()
Ideally, I would like the graph to look something like this:
Can someone please show me how to do this?
Thanks!
Note: var_1 could be some variable like "average fuel price" and var_2 could be "median fuel price"
I recommend putting the data into a long format, as it is the preference when plotting with ggplot2. So, I would just drop the two color columns as you can just set that in scale_color_manual. Then, in aes for geom_point, we can set that we want the two variables to be colored different (i.e., as their own group). Then, we can still set all of the labels, names, and colors in scale_color_manual.
library(tidyverse)
my_data %>%
select(-c(var_1_col, var_2_col)) %>%
pivot_longer(-name, names_to = "variable", values_to = "value") %>%
ggplot(., aes(x = name, y = value, label = name)) +
geom_point(stat = 'identity', aes(color = variable), size = 6) +
scale_color_manual(
name = "Var 1 or Var 2",
labels = c("Var 1", "Var 2"),
values = c("#00ba38", "#f8766d")
) +
labs(title = "Plot",
subtitle = "Plot: Dotplot") +
coord_flip() +
theme_bw()
Output
I want to modify [...], representing the values of two different variables.
If you're looking to plot two different variables on the same graph (and they share a common axis like the names in this case), you can construct two separate geom_point arguments.
ggplot(my_data) +
geom_point(aes(x=name, y=var_1, col=var_1_col)) +
geom_point(aes(x=name, y=var_2, col=var_2_col)) +
coord_flip()
You don't always have to define the axes/colors/labels in the initial ggplot function. By only specifying the dataset, then you can be flexible with the variables you use in the following graph-specific functions. That's how you can construct multiple graphs on one plot :)

How to change ggplot2 lineplot color based on y value

I'm using ggplot2 in R to create a lineplot. The y value of the line is rates and the x value is dates. I want the color of the line to change depending on what the rates value is, so I wrote a for loop to assign the variable color based on what the rates is (i.e. >90 = Blue, <70 = Red).
The dataset looks like this:
dates
rates
color
1/1/21
91
Blue
1/2/21
42
Red
1/3/21
NA
NA
etc.
The code looks like this:
ggplot(data, aes(x=dates,y=rates)) +
geom_line(aes(color = color)) +
scale_x_date(date_labels = "%b %Y",date_breaks = "1 week") +
labs(title = "Title", x = "Date", y = "Rates (%)")
For some reason, it keeps plotting like this:
I want it to look like this, but with colors:
Does anyone have any ideas on how to fix it? Thanks.
You can try changing dates to date class, specifying group = 1 in aes and use scale_color_identity to get the same color as the name in color column of the dataframe.
library(tidyverse)
df %>%
mutate(dates = lubridate::mdy(dates)) %>%
ggplot(aes(x=dates,y=rates, color = color, group = 1)) +
geom_line() +
scale_color_identity() +
scale_x_date(date_labels = "%b %Y",date_breaks = "1 week") +
labs(title = "Title", x = "Date", y = "Rates (%)")

Remove redundant grey vertical lines from plot

When I plot using ggplot I get grey vertical lines on my plot before data chart. Any ideas on how to remove it would be highly appreciated.
ggplot(fitbit_data, aes(x = Date, y = Steps)) +
geom_bar(stat = "identity", fill = "green") +
labs(title = "My Steps", subtitle = " June - Dec 2019",
x = " Date", y = "Steps") +
scale_x_date(
date_labels = "%b\n%Y",
date_breaks = "1 month",
limits = c(as.Date("2019-06-01"), as.Date("2019-12-31"))
)
Likely the data is converted to factor, thus ggplot shows a categorical y-axis, that then appears with overlapping labels that look like those grey columns.
When reading the data make sure to use
df= read.table(...,
# assign appropriate data types by using
colClasses = c(...),
... ,
# it can also be adviseable to use
stringsAsFactors = FALSE)

variable hline in ggplot with facet: discrete value supplied to continuous scale

I am trying to include a hline similar to question (Variable hline in ggplot with facet) but am having a different issue which I cannot find a working solution to.
p1 <- ggplot(excoretum, aes(x = Indmain...Dis.MutGroup, y = AgeOfOnset)) +
geom_boxplot() +
geom_point() +
geom_hline(data = excoretum, aes(yintercept = "seer_median age of diagnosis")) +
scale_y_continuous(name = "Age of onset of disease",
breaks = seq(0,100, 5),
limits = c(0, 100)) +
scale_x_discrete(name = "Mutation Type",
labels = c("Null" = "N", "Missense" = "M", "Unknown" = "U")) +
ggtitle("Age of onset of the BAP1 extended core tumour spectrum in carriers between mutation types") +
facet_grid(.~ICD_Grouping) +
theme_bw()
That is my code, I have tried having a separate data frame, I have left_joined the data from the original separate data frame as a column with excoretum. I have made transferred the data from integer to numeric etc but regardless, whenever I try to plot this now.
It returns:
Error: Discrete value supplied to continuous scale
I can put a singular hline at a value but it won't search out the hline for the facet.

Resources