I’m putting together some functions to help summarize temporal data in fiscal quarters. Function I have will take a date—e.g. 2017-01-01—and spit out factored character value that corresponds—e.g. ”1Q2017”. I’m using my data to create graphs in ggplot. But since I factor the quarters, I can’t use attributes like geom_line() to connect my data points, like you would for dates.
Can I create a data type for quarters that displays as quarters but behaves like dates? How would I do this?
The "yearqtr" class in zoo represents year/quarters but acts sort of like dates in so far as internally such objects are represented numerically as year + frac where frac is 0, 1/4, 2/4, 3/4 and one can perform arithmetic on them and they format as meaningful year/quarter strings and work with lines in ggplot2 (and classic graphics and lattice graphics). See ?yearqtr and ?scale_x_yearqtr.
library(ggplot2)
library(zoo)
# test data
dates <- c("2017-01-01", "2017-04-01")
values <- 1:2
z <- zoo(values, as.yearqtr(dates)) # test zoo object
# 1. classic graphics
plot(z, axat = "n")
axis(1, at = time(z), labels = format(time(z), "%YQ%q"))
# 2. ggplot2 graphics
autoplot(z) + scale_x_yearqtr()
# 3. ggplot2 graphics using data frame with yearqtr
DF <- fortify.zoo(z) # test data frame
sapply(DF, class)
## Index z
## "yearqtr" "integer"
ggplot(DF, aes(Index, z)) + geom_line() + scale_x_yearqtr()
Taking the comment from #Jaap and incorporating with example graph:
library(ggplot2)
library(zoo)
df <- data.frame(date1 = c("2017-01-01", "2016-10-01", "2016-07-01"),
v1 = c(2, 4, 3))
df$date1 <- as.Date(df$date1)
ggplot(df, aes(x = date1, y = v1)) +
geom_line() +
scale_x_date(name = "quarters",
date_labels = as.yearqtr)
You just need to specify group=1 in aes.
library(tidyverse) # install.packages('tidyverse') if needed
dat = data_frame(date = seq.Date(as.Date('2017-01-01'),
as.Date('2017-12-31'),
length.out=365),
x = rnorm(365))
dat = mutate(dat, qtr = paste0(lubridate::quarter(date), 'Q', lubridate::year(date)))
dat$qtr = as.factor(dat$qtr) # for similarity to your situation
dat %>%
group_by(qtr) %>%
summarise(n = sum(x)) %>%
ggplot(aes(x=qtr, y=n, group=1)) +
geom_line()
Related
everyone!
How can I arrange weekdays, starting on Sunday, in R? I got the weekdays using lubridate's function weekdays(), but the days appears randomly (image attached) and I can't seem to find a way to sort it. I tried the arrange function, but I guess it only works with numeric values. A bar chart looks very weird starting on Friday. This is what the code looks like:
my_dataset <- my_dataset %>%
mutate(weekDay = weekdays(Date))
my_dataset %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = weekDay, y = steps))+
geom_bar(stat = "identity")
Thanks!
I tried the arrange function, but I guess it only works with numeric values.
Your weekDay-vector probably is of the class character. This will be arranged in alphabetical order by ggplot. The solution to this is to convert this character-vector into a factor-class.
There are several ways to get the x-axis in the order you would like to see. All of them mean to convert weekDays into a factor.
In order to come close to your example I have at first created a data frame with weekdays and some data. As those are both created randomly a seed was set to make the code reproducible.
One method is to create the data.frame with summaries and then to define in this DF weekdays as a factor with defined levels.
This can also be done within the ggplot-call when creating the aesthetics.
library(tidyverse)
set.seed(111)
myData <- data.frame(
weekDay = sample(weekdays(Sys.Date() + 0:6), 100, replace = TRUE),
TotalSteps = sample(1000:8000, 100)
)
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) -> DF # new data.frame
# the following defines weekDay as a factor and also sets
# the sequence of factor levels. This sequence is then taken
# by ggplot to construct the x-axis.
DF$weekDay <- factor(DF$weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
))
ggplot(DF, aes(x = weekDay, y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")
# the factor can also be defined within the ggplot-call
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = factor(weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
)), y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")
I want my X axis text to look like:
J
a
n
not be rotated with the letters turned.
I want to keep it as a date axis. I know I could make it discrete with values of "J\na\na\n" for instance. Perhaps I can map a vector of values like that over the axis.text.x values? It seems like there should be an easier way though.
Below will demonstrate the issue. I've rotated it 90 degrees but as shown above this is not what I want.
library(tidyverse)
library(scales)
y<- c(52014,51598,61920,58135,71242,76254,63882,64768,53526,55290,45490,35602)
months<-seq(as.Date("2018-01-01"),as.Date("2018-12-01"),"month")
dat<-as.tibble(cbind(y,months)) %>%
mutate(month=as.Date(months,origin="1970-01-01"))
ggplot(dat) +
geom_line(aes(x=month,y=y)) +
scale_x_date(breaks=date_breaks("month"),labels=date_format("%b")) +
theme(axis.text.x=element_text(angle=90))
Example data :
date <- seq(from = as.Date("2000-01-01"), to = as.Date("2000-12-01"), by = "month")
df <- data.frame(Month = date, Value = rnorm(12))
First, produce a custom set of dates you want. Here I use strsplit() and lapply to achieve your request.(month.name and month.abb are native character vectors in R )
mon.split <- strsplit(month.name, "")
mon <- unlist(lapply(mon.split, paste0, "\n", collapse = ""))
mon
[1] "J\na\nn\nu\na\nr\ny\n" "F\ne\nb\nr\nu\na\nr\ny\n"
[3] "M\na\nr\nc\nh\n" "A\np\nr\ni\nl\n"
[5] "M\na\ny\n" "J\nu\nn\ne\n"
[7] "J\nu\nl\ny\n" "A\nu\ng\nu\ns\nt\n"
[9] "S\ne\np\nt\ne\nm\nb\ne\nr\n" "O\nc\nt\no\nb\ne\nr\n"
[11] "N\no\nv\ne\nm\nb\ne\nr\n" "D\ne\nc\ne\nm\nb\ne\nr\n"
I supposed your date variable is 'Date' class so I use scale_x_date. If it's numeric or character, use scale_x_continuous and scale_x_discrete.
ggplot(df, aes(x = Month, y = Value)) +
geom_line() +
scale_x_date(breaks = date, labels = mon)
I'm not looking for help with coding, just a help in what direction I should take, i.e. what functions to use. I wonder if it is possible to use ggplot to plot something of like of this:
1) One easy way is place the year under each January (or use June, say, if you want the year centered -- i.e. replace 1 with 6 in the code below). Another approach is to just replace January with the year by using %Y in place of %b\n%Y.
First introduce some test data -- this should normally be done in the question but I have provided some this time. The x column of the test data frame d is of class "yearmon" and y is numeric. I have assumed that the numbers on the X axis in the question were intended to represent months but if they were intended to be quarters then start off with "yearqtr" class and use scale_x_yearqtr instead adjusting appropriately.
library(ggplot2)
library(zoo)
d <- data.frame(x = as.yearmon("2000-1") + 0:23/12, y = 1:24) # test data
x <- d$x
autoplot(d)
ggplot(d, aes(x, y)) +
geom_line() +
geom_point() +
scale_x_yearmon(breaks = x, lab = format(x, ifelse(cycle(x) == 1, "%b\n%Y", "%b")))
2) or if you are starting off with a zoo object z:
z <- read.zoo(d)
x <- time(z)
autoplot(z, geom = c("line", "point")) +
scale_x_yearmon(breaks = x, lab = format(x, ifelse(cycle(x) == 1, "%b\n%Y", "%b"))) +
xlab("")
I am fairly new to R and am attempting to plot two time series lines simultaneously (using different colors, of course) making use of ggplot2.
I have 2 data frames. the first one has 'Percent change for X' and 'Date' columns. The second one has 'Percent change for Y' and 'Date' columns as well, i.e., both have a 'Date' column with the same values whereas the 'Percent Change' columns have different values.
I would like to plot the 'Percent Change' columns against 'Date' (common to both) using ggplot2 on a single plot.
The examples that I found online made use of the same data frame with different variables to achieve this, I have not been able to find anything that makes use of 2 data frames to get to the plot. I do not want to bind the two data frames together, I want to keep them separate. Here is the code that I am using:
ggplot(jobsAFAM, aes(x=jobsAFAM$data_date, y=jobsAFAM$Percent.Change)) + geom_line() +
xlab("") + ylab("")
But this code produces only one line and I would like to add another line on top of it.
Any help would be much appreciated.
TIA.
ggplot allows you to have multiple layers, and that is what you should take advantage of here.
In the plot created below, you can see that there are two geom_line statements hitting each of your datasets and plotting them together on one plot. You can extend that logic if you wish to add any other dataset, plot, or even features of the chart such as the axis labels.
library(ggplot2)
jobsAFAM1 <- data.frame(
data_date = runif(5,1,100),
Percent.Change = runif(5,1,100)
)
jobsAFAM2 <- data.frame(
data_date = runif(5,1,100),
Percent.Change = runif(5,1,100)
)
ggplot() +
geom_line(data = jobsAFAM1, aes(x = data_date, y = Percent.Change), color = "red") +
geom_line(data = jobsAFAM2, aes(x = data_date, y = Percent.Change), color = "blue") +
xlab('data_date') +
ylab('percent.change')
If both data frames have the same column names then you should add one data frame inside ggplot() call and also name x and y values inside aes() of ggplot() call. Then add first geom_line() for the first line and add second geom_line() call with data=df2 (where df2 is your second data frame). If you need to have lines in different colors then add color= and name for eahc line inside aes() of each geom_line().
df1<-data.frame(x=1:10,y=rnorm(10))
df2<-data.frame(x=1:10,y=rnorm(10))
ggplot(df1,aes(x,y))+geom_line(aes(color="First line"))+
geom_line(data=df2,aes(color="Second line"))+
labs(color="Legend text")
I prefer using the ggfortify library. It is a ggplot2 wrapper that recognizes the type of object inside the autoplot function and chooses the best ggplot methods to plot. At least I don't have to remember the syntax of ggplot2.
library(ggfortify)
ts1 <- 1:100
ts2 <- 1:100*0.8
autoplot(ts( cbind(ts1, ts2) , start = c(2010,5), frequency = 12 ),
facets = FALSE)
I know this is old but it is still relevant. You can take advantage of reshape2::melt to change the dataframe into a more friendly structure for ggplot2.
Advantages:
allows you plot any number of lines
each line with a different color
adds a legend for each line
with only one call to ggplot/geom_line
Disadvantage:
an extra package(reshape2) required
melting is not so intuitive at first
For example:
jobsAFAM1 <- data.frame(
data_date = seq.Date(from = as.Date('2017-01-01'),by = 'day', length.out = 100),
Percent.Change = runif(5,1,100)
)
jobsAFAM2 <- data.frame(
data_date = seq.Date(from = as.Date('2017-01-01'),by = 'day', length.out = 100),
Percent.Change = runif(5,1,100)
)
jobsAFAM <- merge(jobsAFAM1, jobsAFAM2, by="data_date")
jobsAFAMMelted <- reshape2::melt(jobsAFAM, id.var='data_date')
ggplot(jobsAFAMMelted, aes(x=data_date, y=value, col=variable)) + geom_line()
This is old, just update new tidyverse workflow not mentioned above.
library(tidyverse)
jobsAFAM1 <- tibble(
date = seq.Date(from = as.Date('2017-01-01'),by = 'day', length.out = 5),
Percent.Change = runif(5, 0,1)
) %>%
mutate(serial='jobsAFAM1')
jobsAFAM2 <- tibble(
date = seq.Date(from = as.Date('2017-01-01'),by = 'day', length.out = 5),
Percent.Change = runif(5, 0,1)
) %>%
mutate(serial='jobsAFAM2')
jobsAFAM <- bind_rows(jobsAFAM1, jobsAFAM2)
ggplot(jobsAFAM, aes(x=date, y=Percent.Change, col=serial)) + geom_line()
#Chris Njuguna
tidyr::gather() is the one in tidyverse workflow to turn wide dataframe to long tidy layout, then ggplot could plot multiple serials.
An alternative is to bind the dataframes, and assign them the type of variable they represent. This will let you use the full dataset in a tidier way
library(ggplot2)
library(dplyr)
df1 <- data.frame(dates = 1:10,Variable = rnorm(mean = 0.5,10))
df2 <- data.frame(dates = 1:10,Variable = rnorm(mean = -0.5,10))
df3 <- df1 %>%
mutate(Type = 'a') %>%
bind_rows(df2 %>%
mutate(Type = 'b'))
ggplot(df3,aes(y = Variable,x = dates,color = Type)) +
geom_line()
I'm trying to plot time series data by week and month; ideally, I think, I'd like to use boxplots to visualise daily data binned by week. While I can change the labels and gridlines on the x-axis using scale_x_date, that won't affect the points in the plot.
Here's a demonstration of the problem and my current (clumsy) solution.
library(zoo)
library(ggplot2)
d = as.Date(c(as.Date("2007-06-01"):as.Date("2008-05-31"))) # using zoo to reformat numeric
x = runif(366, min = 0, max = 100)
df = data.frame(d,x)
# PROBLEM #
p = ggplot(df, aes(d, x))
p + geom_point()
p + geom_boxplot() # more or less useless
# CURRENT FIX #
df$Year.Month <- format(df$d, "%Y-%m")
p = ggplot(df, aes(Year.Month, x))
p + geom_point(alpha = 0.75)
p + geom_boxplot() # where I'm trying to get to...
I feel certain that there's a more elegant way to do this from within ggplot. Am I right?
#shadow's answer below is much neater. But is there a way to do this using binning? Using stats in some form, perhaps?
You can treat Dates as dates in R, and use scale_x_date() in ggplot to get the x-labels you want.
Also, I find it easier to just create a new variable-factor called "Month" to group the boxplots by month. In this case I used lubridate to accomplish the task.
If you do not want to go through the trouble of creating a new variable "Month", your bloxplot will be plotted on the 15th of the month, making the viz reading a bit more difficult.
library(magrittr)
library(lubridate)
library(dplyr)
df %>%
mutate(Date2 = as.Date(paste0("2000-", month(d), "-", "01"))) %>%
mutate(Month = lubridate::month(d)) %>%
ggplot(aes(Date2, x, group=Month)) +
geom_boxplot() +
scale_x_date(date_breaks="1 month", date_labels = "%b")
If you do not create the variable "Month", boxplots won't align nicely with the x tick marks: