Remove redundant grey vertical lines from plot - r

When I plot using ggplot I get grey vertical lines on my plot before data chart. Any ideas on how to remove it would be highly appreciated.
ggplot(fitbit_data, aes(x = Date, y = Steps)) +
geom_bar(stat = "identity", fill = "green") +
labs(title = "My Steps", subtitle = " June - Dec 2019",
x = " Date", y = "Steps") +
scale_x_date(
date_labels = "%b\n%Y",
date_breaks = "1 month",
limits = c(as.Date("2019-06-01"), as.Date("2019-12-31"))
)

Likely the data is converted to factor, thus ggplot shows a categorical y-axis, that then appears with overlapping labels that look like those grey columns.
When reading the data make sure to use
df= read.table(...,
# assign appropriate data types by using
colClasses = c(...),
... ,
# it can also be adviseable to use
stringsAsFactors = FALSE)

Related

How to change ggplot2 lineplot color based on y value

I'm using ggplot2 in R to create a lineplot. The y value of the line is rates and the x value is dates. I want the color of the line to change depending on what the rates value is, so I wrote a for loop to assign the variable color based on what the rates is (i.e. >90 = Blue, <70 = Red).
The dataset looks like this:
dates
rates
color
1/1/21
91
Blue
1/2/21
42
Red
1/3/21
NA
NA
etc.
The code looks like this:
ggplot(data, aes(x=dates,y=rates)) +
geom_line(aes(color = color)) +
scale_x_date(date_labels = "%b %Y",date_breaks = "1 week") +
labs(title = "Title", x = "Date", y = "Rates (%)")
For some reason, it keeps plotting like this:
I want it to look like this, but with colors:
Does anyone have any ideas on how to fix it? Thanks.
You can try changing dates to date class, specifying group = 1 in aes and use scale_color_identity to get the same color as the name in color column of the dataframe.
library(tidyverse)
df %>%
mutate(dates = lubridate::mdy(dates)) %>%
ggplot(aes(x=dates,y=rates, color = color, group = 1)) +
geom_line() +
scale_color_identity() +
scale_x_date(date_labels = "%b %Y",date_breaks = "1 week") +
labs(title = "Title", x = "Date", y = "Rates (%)")

Axis position in plotly plot created from ggplot

ggplot lets me control the position of my x-axis labels and breaks in x-axis but when I pass the ggplot object to ggplotly function, the resulting plotly object loses the formatting.
library(plotly)
df <- data.frame(
Date = seq(as.Date("2017-01-01"), as.Date("2020-01-01"), by = 30),
Value = rnorm(37)
)
p1 = ggplot(df) + geom_point(aes(x=Date, y = Value)) +
scale_x_date(position = "top", date_breaks = "1 year", date_minor_breaks =
"3 months")
ggplotly(p1)
With the code mention above the x-axis values are still plotted at the bottom in ggplotly plot and also the break lines every 3 months are not shown.
You could try this:
f <- list(
side = "top"
)
ggplotly(p1) %>% layout(xaxis = f)

How to graph two sets of data with lines and two *different* point symbols with *distinguishable* data point symbols in legend?

I have been trying to plot a graph of two sets of data with different point symbols and connecting lines with different colors using the R package ggplot2, but for the life of me, I have not been able to get the legend correctly distinguish between the two curves by showing the associated data point symbol for each curve.
I can get the legend to show different line colors. But I have not been able to make the legend to show different data point symbols for each set of data.
The following code:
df <- data.frame( thrd_cnt=c(1,2,4,8,16),
runtime4=c(53,38,31,41,54),
runtime8=c(54,35,31,35,44))
library("ggplot2")
print(
ggplot(data = df, aes(df$thrd_cnt, y=df$runtime, color=)) +
geom_line(aes(y=df$runtime4, color = "4 cores")) +
geom_point(aes(y=df$runtime4, color = "4 cores"), fill = "white",
size = 3, shape = 21) +
geom_line(aes(y=df$runtime8, color = "8 cores")) +
geom_point(aes(y=df$runtime8, color = "8 cores"), fill = "white",
size = 3, shape = 23) +
xlab("Number of Threads") +
ylab(substitute(paste("Execution Time, ", italic(milisec)))) +
scale_x_continuous(breaks=c(1,2,4,8,16)) +
theme(legend.position = c(0.3, 0.8)) +
labs(color="# cores")
)
## save a pdf and a png
ggsave("runtime.pdf", width=5, height=3.5)
ggsave("runtime.png", width=5, height=3.5)
outputs this graph:
plot
But the data point symbols in the legend are not distinguishable. The legend shows the same symbol for both graphs (which is formed of both data point symbols on top of each other).
One possible solution is to define the number of threads as a factor, then I might be able to get the data point symbols on the legend right, but still I don't know how to do that.
Any help would be appreciated.
As noted, you need to gather the data into a long format so you can map the cores variable to colour and shape. To keep the same choices of shape and fill as in your original plot, use scale_shape_manual to set the shape corresponding to each level of cores. Note that you need to set the name for both the colour and shape legends in labs() to ensure they coincide and don't produce two legends. I also used mutate so that the levels of cores don't confusingly include the word runtime.
df <- data.frame( thrd_cnt=c(1,2,4,8,16),
runtime4=c(53,38,31,41,54),
runtime8=c(54,35,31,35,44))
library(tidyverse)
ggplot(
data = df %>%
gather(cores, runtime, runtime4, runtime8) %>%
mutate(cores = str_c(str_extract(cores, "\\d"), " cores")),
mapping = aes(x = thrd_cnt, y = runtime, colour = cores)
) +
geom_line() +
geom_point(aes(shape = cores), size = 3, fill = "white") +
scale_x_continuous(breaks = c(1, 2, 4, 8, 16)) +
scale_shape_manual(values = c("4 cores" = 21, "8 cores" = 23)) +
theme(legend.position = c(0.3, 0.8)) +
labs(
x = "Number of Threads",
y = "Execution Time (millisec)",
colour = "# cores",
shape = "# cores"
)
Created on 2018-04-10 by the reprex package (v0.2.0).
or shape is fine too, and if you're doing more stuff with df, might make sense to convert and keep it in long, 'tidy' format.
library("ggplot2")
df <- data.frame( thrd_cnt=c(1,2,4,8,16),
runtime4=c(53,38,31,41,54),
runtime8=c(54,35,31,35,44))
df <- df %>% gather("runtime", "millisec", 2:3)
ggplot(data = df, aes(x = thrd_cnt, y = millisec, color = runtime, shape =
runtime)) + geom_line() + geom_point()
after gathering into a "long" formatted data frame, you pass colour and shape (pch) to the aesthetics arguments:
library(tidyverse)
df <- data.frame( thrd_cnt=c(1,2,4,8,16),
runtime4=c(53,38,31,41,54),
runtime8=c(54,35,31,35,44))
df %>% gather(key=run, value=time, -thrd_cnt) %>%
ggplot(aes(thrd_cnt, time, pch=run, colour=run)) + geom_line() + geom_point()
(Notice how brief the code is, compared to the original post)

date_minor_breaks in ggplot2

I am a beginner in ggplot2. I am unable to use date_minor_breaks to show quarterly "ticks" on x-axis.
Here's my code:
x<-c(seq(1:12))
time<-c("2010Q1","2010Q2","2010Q3","2010Q4","2011Q1","2011Q2", "2011Q3","2011Q4","2012Q1","2012Q2","2012Q3","2012Q4")
z<-data.frame(type = x,time = time)
z$time = as.yearqtr(z$time)
z$time = as.Date(z$time)
ggplot(data = z, aes(x=time,y=type)) +
geom_point() +
scale_x_date(date_labels = "%Y",date_minor_breaks = "3 months",name = "Year") +
theme_tufte() +
theme(legend.position = "none")
I researched this topic on SO Formatting dates with scale_x_date in ggplot2 and on https://github.com/hadley/ggplot2/issues/542, and found that there were some issues reported on this topic. However, I didn't quite follow the conversation about changes to ggplot2 because it's been only 6 days since I started using ggplot2.
Here's the graph I got (it doesn't have any ticks)...
Here's a sample graph with "tick marks" generated from Excel. Please ignore values because my point of creating this Excel chart is to demonstrate what I am looking for--i.e. "quarterly ticks". I'd appreciate your help.
You may have to make major breaks every three months and then pad your labels with blanks to give the illusion of major (labeled) and minor (unlabeled) ticks. See this answer for another example.
First manually make the breaks for the tick marks at every quarter.
breaks_qtr = seq(from = min(z$time), to = max(z$time), by = "3 months")
Then make the year labels and pad these labels with three blanks after each number.
labels_year = format(seq(from = min(z$time), to = max(z$time), by = "1 year"), "%Y")
labs = c(sapply(labels_year, function(x) {
c(x, rep("", 3))
}))
Now use the breaks and the labels with the labels and breaks arguments in scale_x_date. Notice that I'm not using date_labels and date_breaks for this.
ggplot(data = z, aes(x=time,y=type)) +
geom_point() +
scale_x_date(labels = labs, breaks = breaks_qtr, name = "Year") +
theme_tufte() +
theme(legend.position = "none")
You should also define your (major) date breaks:
ggplot(data = z, aes(x=time, y=type)) +
geom_point() +
scale_x_date(date_breaks = "1 year", name = "Year", date_minor_breaks="3 months",
limits = c(as.Date(as.yearqtr("2009Q4")),
as.Date(as.yearqtr("2013Q2"))),
expand=c(0,0), date_labels = "%Y") +
theme(legend.position = "none")
And some other "fancy" stuff to align the minor ticks with the major ticks (I guess there a better ways to do this, but this works).

Handling NA's and Legends in ggplot2

I've got a dataframe that looks like:
df<-data.frame(Date=as.Date(c("06-08-10","06-09-10","06-10-10","06-11-10","06-13-10")),closed_this_year_cum_gv=c(3,5,6,7,NA),opened_this_year_cum_gv=c(2,5,6,8,10),closed_last_year_cum_gv=c(5,6,7,8,10),opened_last_year_cum_gv=c(5,6,8,10,NA))
and have this framework for a plot using ggplot2:
ggplot(df, aes(x=Date))+
geom_line(aes(y=closed_this_year_cum_gv, color="blue"),linetype="dashed")+
geom_line(aes(y=opened_this_year_cum_gv, color="blue"))+
geom_line(aes(y=closed_last_year_cum_gv, color="red"),linetype="dashed")+
geom_line(aes(y=opened_last_year_cum_gv, color="red"))+
xlab("Date")+
ylab("Millions of Dollars")+
ggtitle("Cummulative Sum of TGV for Opened and Closed Cases - 2013 vs. 2012")
I tried this with the sample data but for some reason the lines aren't showing up (they're showing up with my real data). I want the NAs to not be graphed, which is why they aren't 0.
In my real data, it graphs, but the legend title has "blue" and it's contents are "blue" and "red" as labels. I want them to be labeled by year and opened/closed. I've tried various methods but nothing seems to override the legend.
How do I control the legend title and labels?
Edit: changed to class "Date"
ggplot is generelly happier to be fed with data in 'long' format, as opposed to wide. Then it is, among other things, easier to map different aesstetics to variables in the data set.
# some data massage before the plot
# reshape data from wide to long format
library(reshape2)
df2 <- melt(df)
# convert variable 'Date' to class 'Date'
df2$Date <- as.Date(df2$Date, format = "%m-%d-%y")
# create two variables
# var1: opened vs closed
df2$var1 <- ifelse(grepl(x = df2$variable, pattern = "opened"), "Opened", "Closed")
# set factor levels so that 'opened' comes before 'closed'
df2$var1 <- factor(df2$var1, levels = c("Opened", "Closed"))
# var2: this vs last year
df2$var2 <- ifelse(grepl(x = df2$variable, pattern = "this"), "This year", "Last year")
# plot
# use default colours, slightly pale 'red' and 'blue'
ggplot(df2, aes(x = Date, y = value, linetype = var1, colour = var2, group = interaction(var1, var2))) +
geom_line()
# if you want to set colours to red and blue, add this
+ scale_colour_manual(values = c("red", "blue"))
Update following comment
If you only want one legend, one possibility is to let linetype and colour to depend on 'variable'.
# set factor levels so that 'opened' comes before 'closed', and 'last' before 'this'
df2$variable <- factor(df2$variable,
levels = c("opened_last_year_cum_gv",
"closed_last_year_cum_gv",
"opened_this_year_cum_gv",
"closed_this_year_cum_gv")
)
ggplot(df2, aes(x = Date, y = value, linetype = variable, colour = variable, group = variable)) +
geom_line() +
scale_colour_manual(values = rep(c("red", "blue"), each = 2),
name = "",
labels = c("Opened last year",
"Closed last year",
"Opened this year",
"Closed this year")) +
scale_linetype_manual(values = rep(c("solid", "dashed"), 2),
name = "",
labels = c("Opened last year",
"Closed last year",
"Opened this year",
"Closed this year"))
You need to specify appropriate mappings in aes(). Try this:
ggplot(df, aes(x=Date)) +
geom_line(aes(y=closed_this_year_cum_gv, color="this", linetype="closed")) +
geom_line(aes(y=opened_this_year_cum_gv, color="this", linetype="opened")) +
geom_line(aes(y=closed_last_year_cum_gv, color="last", linetype="closed")) +
geom_line(aes(y=opened_last_year_cum_gv, color="last", linetype="opened")) +
xlab("Date") +
ylab("Millions of Dollars") +
ggtitle("Cummulative Sum of TGV for Opened and Closed Cases - 2013 vs. 2012") +
scale_colour_manual(name="year", values=c("this"="blue", "last"="red")) +
scale_linetype_manual(name="type", values=c(2, 1))

Resources