I have some time series data with quarterly frequency, as below.
I'm using geom_tile to create a heatmap of these time series data, but the issue I have now is that the labeling on the x axis is defaulted to year eventhough the data is on quarterly.
My expectation was something like 2014 Q1, 2020 Q4 as in the dataset.
set.seed(1990)
ID <- rep(c('A','B','C'),each = 84)
n <- rep(round(runif(84,1,4)), 3)
datetime <- rep(seq(as.POSIXct("2014-01-01"), as.POSIXct("2020-12-01"), by="month"), 3)
df <- tibble(ID,n, datetime)
df <- df %>%
#mutate(yearweek = tsibble::yearweek(datetime)) %>%
mutate(yearquarter = zoo::as.yearqtr(datetime)) %>%
#group_by(ID, yearweek) %>%
group_by(ID, yearquarter) %>%
summarise(n = sum(n))
df
ggplot(df
,
aes(y=ID,x= yearquarter,fill=n))+
geom_tile(color = 'gray')
Normally I can easily control the monthly level dataset with scale_x_date as below but using it with quarterly data throws Error: Invalid input: date_trans works with objects of class Date only.
I'm using tsibble::yearweek to get weekly aggregation and zoo::as.yearqtr for quarterly aggregation.
But the issue is when it comes to plotting, ggplot may not support them. So is there a more consistent approach to dealing with time series data with multiple frequencies in R/ggplot?
scale_x_date(expand = c(0,0),breaks = seq(as.Date("2014-07-01"), as.Date("2020-12-01"), by = "1 month"), date_labels = "%Y %b", name = 'Monthly')
Since you have zoo's as.yearqtr variable use zoo's scale_x_yearqtr to format the x-axis.
library(ggplot2)
ggplot(df,aes(y=ID,x= yearquarter,fill=n))+
geom_tile(color = 'gray') +
zoo::scale_x_yearqtr(format = '%Y Q%q')
Related
I am trying to plot the number of unique detections per day throughout the year.
I had data that looked like this.
I summarized the number of unique detections per day using these codes
unique_day <- data %>% group_by(Day,tag) %>% filter(Date==min(Date)) %>% slice(1) %>% ungroup()
sum <- unique_day %>% group_by(Date) %>% summarise(Detections=n())
I then ended up with a dataframe like this
I then plot with this code
sum[year(sum$Date)==2016,] %>%
ggplot(aes(x=Date,y=Detections))+
theme_bw(base_size = 16,base_family = 'serif')+
theme(panel.grid.major = element_blank(),panel.grid.minor=element_blank())+
geom_line(size=1)+
scale_x_datetime(date_breaks = '1 month',date_labels = "%b",
limits = c(as.POSIXct('2016-01-01'),as.POSIXct("2016-12-01")))+
ggtitle('DIS 2016')+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Date")+
scale_y_continuous(expand = c(0,0),limits = c(0,5))
And get a plot like this
I cant seem to get the plot to start at 0... I figure it always starts at 1 because there are no 0 values for detections... I only have a data frame summarizing the days when there was detections, not when there was not detections. I have tried using ylim, scale_y_continuous and coord_cartesian... any ideas?
Any ideas?
Here is a simple way to get to your problem:
df_null <- data.frame(Date = seq(as.Date("2015/01/01"), by = "day", length.out = 365),
Detections = 0)
For the year 2015 we create a data.frame containing all days with value 0. Suppose your data.frame looks like this:
df <- data.frame(Date = c(as.Date("2015/06/07"), as.Date("2015/06/08"), as.Date("2015/12/12")),
Detections = 1:3)
Using dplyr we combine those two data.frames and summarize the values:
df %>%
bind_rows(df_null) %>%
group_by(Date) %>%
summarise(Detections = sum(Detections))
Finally you can get your plot using your ggplot2-code.
I want to perform an analysis of property prices across segments on a quarterly basis (x-axis) from 2016 Jan to 2019 Jan.
However the column of data I would like to use is in a month format (c("19-Jan", "19-Feb", "19-Mar", "19-Apr",..."19-Dec"), that is a "yy-mmm" character format.
I wanted the whole column of data to be converted from "19-Jan" to a date format as such c("Qtr 1 - 2016", "Qtr 2 - 2017", ... "Qtr 3 - 2018") etc.
How can I convert the column containing character values of "19-Jan" to a quarter format?
I have attached my raw date format data in a google link since it has over 56,000 rows:
https://docs.google.com/spreadsheets/d/1cynVkZv0aJRjwFgvVzlSRG7G-6t96cAgXOZMTJdPkC8/edit#gid=80649901
Here is my previous graph with yearly analysis (which I want to convert to quarterly):
This is my code:
library(dplyr)
library(ggplot2)
URA_data <- read.csv('URAdata.csv')
options(scipen=999)
Plotly<-URA_data %>%
mutate(Year = 2000 + as.integer(substring(Date.of.Sale, 1, 2))) %>%
filter(Type.of.Sale %in% "Resale" & Type %in% "Condominium")%>%
group_by(Year,Market.Segment ) %>%
summarise(Price = mean(Price....))%>%
ggplot(aes(Year, Price, color = Market.Segment)) + geom_line()+ geom_point()+
labs(color="Segments")+
ggtitle("Median Property Prices by Market Segments ")+
xlab("Year")+ylab("Price (Median)")+
theme(
plot.title=element_text(color="red",size=14,face="bold.italic",hjust=0.5),
axis.title.x=element_text(color="blue",size=14,face="bold"),
axis.title.y=element_text(color="green",size=14,face="bold")
)
library(plotly)
Graph<-ggplotly(Plotly)
Graph
The zoo package has a as.yearqtr function. You could use that to convert your months to quarters. The format = argument allows you to define the format of your month data. You can use zoo::scale_x_yearqtr to improve the x-axis formatting
library(dplyr)
library(ggplot2)
library(zoo)
URA_data %>%
mutate(Quarter = as.yearqtr(Date.of.Sale, format = "%y-%b")) %>%
filter(Type.of.Sale %in% "Resale" & Type %in% "Condominium")%>%
group_by(Quarter,Market.Segment ) %>%
summarise(Price = mean(Price....))%>%
ggplot(aes(Quarter, Price, color = Market.Segment)) + geom_line()+ geom_point()+
scale_x_yearqtr(breaks = seq(from = as.yearqtr("2016-1"), to = as.yearqtr("2018-3"), by = 0.25),
lim = as.yearqtr(c("2016-1","2018-3"))) +
labs(color="Segments") + ggtitle("Median Property Prices by Market Segments ")+
xlab("Quarter")+ylab("Price (Median)")+
theme(plot.title=element_text(color="red",size=14,face="bold.italic",hjust=0.5),
axis.title.x=element_text(color="blue",size=14,face="bold"),
axis.text.x=element_text(angle = 45, vjust = 1, hjust = 1),
axis.title.y=element_text(color="green",size=14,face="bold"))
Download and fix data:
library(gsheet)
URA_data <- gsheet::gsheet2tbl("https://docs.google.com/spreadsheets/d/1cynVkZv0aJRjwFgvVzlSRG7G-6t96cAgXOZMTJdPkC8/edit#gid=80649901")
URA_data <- URA_data %>%
mutate(Type.of.Sale = `Type of Sale`, Date.of.Sale = `Date of Sale`,
Market.Segment = `Market Segment`, Price.... = `Price ($)`)
I'm sure this is a relatively simple fix, but I can't figure it out for the life of me. I'm trying to plot a scatter plot for date and time information. Here is some sample code:
library(tidyverse)
library(lubridate)
library(hms)
time <- c("19:36:00", "18:20:00", "17:59:00", "17:22:00", "17:23:00")
date <- c("10-05-2019", "25-01-2019", "13-04-2019", "22-07-2019", "05-12-2019")
data <- data.frame(time = as_hms(as_datetime(time, format = "%H:%M:%S", tz = "America/Los_Angeles")), date = parse_date_time(date, "dmy", tz = "America/Los_Angeles"))
data %>%
mutate(time = as.POSIXct(time)) %>%
ggplot() +
geom_point(aes(x = date, y = time)) +
scale_y_datetime(
breaks = scales::date_breaks("1 hour"),
date_labels = "%l %p"
)
The result of this plot is a y-axis that corresponds to time in AM/PM format. The default here is about 4:30 PM to 8:30 PM. But, what if I wanted to change the limits of the y-axis to 4 PM to 10 PM? I've been combing through forums but I can't find anything that explicitly details this situation and the documentation only provides examples for doing this with date information.
Any help would be much appreciated!
You can set limits in scale_y_datetime :
library(dplyr)
library(ggplot2)
data %>%
mutate(time = as.POSIXct(time, format = "%T"),
date = as.Date(date, "%d-%m-%Y")) %>%
ggplot() +
geom_point(aes(x = date, y = time)) +
scale_y_datetime(
breaks = scales::date_breaks("1 hour"),
date_labels = "%l %p",
limits = c(as.POSIXct("16:00:00", format = "%T"),
as.POSIXct("22:00:00", format = "%T")))
data
time <- c("19:36:00", "18:20:00", "17:59:00", "17:22:00", "17:23:00")
date <- c("10-05-2019", "25-01-2019", "13-04-2019", "22-07-2019", "05-12-2019")
data <- data.frame(time, date)
I have data that looks as follows:
Date Time_finished
4/3/2020 16:30:21
4/6/2020 16:43:29
4/7/2020 16:28:47
4/8/2020 16:30:38
4/9/2020 16:50:01
I would like to plot a line chart showing date across the x axis and then the time finished on the y axis, to show a time series graph. For some reason this does not seem to be working, the Date is saved as Date but time as a factor, does this also need to be a date?
I have tried normal plot but having no luck.
Thanks
Like this?
df <- tibble::tribble(
~Date, ~Time_finished,
"4/3/2020", "16:30:21",
"4/6/2020", "16:43:29",
"4/7/2020", "16:28:47",
"4/8/2020", "16:30:38",
"4/9/2020", "16:50:01"
)
library(tidyverse)
df %>%
mutate(Date = as.POSIXct(Date, format = "%m/%d/%y"),
Time_finished = as.POSIXct(Time_finished, format = "%H:%M:%S")) %>%
ggplot(aes(x = Date, y = Time_finished, group = 1)) +
geom_line() + scale_y_datetime(breaks = date_breaks("10 min"),
minor_breaks = date_breaks("2 min"),
labels = date_format("%Hh %Mm %Ss"))
I'm trying to generate a stacked line/area graph utilizing the ggplot and geom_area functions. I have my data loaded into R correctly from what I can tell. Every time I generate the plot, the graph is empty (even though the axis looks correct except for the months being organized in alpha).
I've tried utilizing the data.frame function to define my variables but was unable to generate my plot. I've also looked around Stack Overflow and other websites, but no one seems to have the issue of no errors but still an empty plot.
Here's my data set:
Here's the code I'm using currently:
ggplot(OHV, aes(x=Month)) +
geom_area(aes(y=A+B+Unknown, fill="A")) +
geom_area(aes(y=B, fill="B")) +
geom_area(aes(y=Unknown, fill="Unknown"))
Here's the output at the end:
I have zero error messages, simply just no data being plotted on my graph.
Your dates are being interpreted as a factor. You must transform them.
ibrary(tidyverse)
set.seed(1)
df <- data.frame(Month = seq(lubridate::ymd('2018-01-01'),
lubridate::ymd('2018-12-01'), by = '1 month'),
Unknow = sample(17, replace = T, size = 12),
V1 = floor(runif(12, min = 35, max = 127)),
V2 = floor(runif(12, min = 75, max = 275)))
df <- df %>%
dplyr::mutate(Month = format(Month, '%b')) %>%
tidyr::gather(key = "Variable", value = "Value", -Month)
ggplot2::ggplot(df) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack')
Note that I used tidyr::gather to be able to stack the areas in an easier way.
Now assuming your year of analysis is 2018, you need to transform the date of your data frame to something continuous, in the interpretation of r.
df2 <- df %>%
dplyr::mutate(Month = paste0("2018-", Month, "-01"),
Month = lubridate::parse_date_time(Month,"y-b-d"),
Month = as.Date(Month))
library(scales)
ggplot2::ggplot(df2) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack') +
scale_x_date(labels = scales::date_format("%b"))