I have a data frame that looks like the following:
df <- data.frame(date.time = c("Fri 00:00", "Fri 23:30", "Mon 00:00", "Mon 23:30",
"Sat 00:00", "Sat 23:30", "Sun 00:00", "Sun 23:30",
"Thu 00:00", "Thu 23:30", "Tue 00:00", "Tue 23:30",
"Wed 00:00", "Wed 23:30"),
Price = c(36.15368, 41.61206, 30.80412, 37.47360, 38.04516, 35.72798,
33.05613, 32.65447, 35.50335, 41.81241, 35.14006, 37.56432,
35.04553, 38.00721))
the date.time values are of class character and the Price values are of class numeric. I would like to plot the data using ggplot. The problem is that the data is in the wrong order. I would like an order of: sun, mon, ..., sat
I have attempted to do this using the following code:
my.order <- c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)
df %>%
ggplot(aes(x = reorder(date.time, my.order), y = Price, group = 1)) +
geom_line()
but I end up getting a strange order that begins at the 'Tue' row of the original data frame. What am I doing wrong?
i would also like to label the x axis and so i have tried the following code:
df %>%
ggplot(aes(x = reorder(date.time, my.order), y = Price, group = 1)) +
geom_line() +
scale_x_discrete(name = 'Day', breaks = df$date.time[c(1,3,5,7,9,11,13)],
labels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
But the labels end up in the order of the original data set, while the plot is ordered beginning on 'Tue' as above. How can I get both the data and labels to appear in the order I would like?
Edit: I think it might have something to do with the levels. Running the following code
df$date.time[c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)]
results in the following output
[1] Sun 00:00 Sun 23:30 Mon 00:00 Mon 23:30 Tue 00:00 Tue 23:30 Wed 00:00 Wed 23:30
[9] Thu 00:00 Thu 23:30 Fri 00:00 Fri 23:30 Sat 00:00 Sat 23:30
14 Levels: Tue 00:00 Tue 23:30 Mon 00:00 Mon 23:30 Wed 00:00 Wed 23:30 ... Sun 23:30
Not sure why.
Your code actually does what you ask it to do in the first part of your problem: respecting the order of your data in df, you assigned position 1 and 2 to the two Tue values, which is why ggplot2 plots them first.
You can see the numbers associated to each element when running the following:
my.order <- c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)
reorder(df$date.time, my.order)
You can use this vector for my.order instead:
my.order <- c(11,12,3,4,13,14,1,2,9,10,5,6,7,8)
df %>%
ggplot(aes(x = reorder(date.time, my.order), y = Price, group = 1)) +
geom_line()
The difference with the method df$date.time[c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)] is that in your first reorder method, you associate a position to each element of your vector (i.e. 1st element has position 7, 2nd element has position 8, etc.) whereas, in the square bracket method you define the order in which elements in your vector come up (i.e. 7th element comes 1st, 8th element comes 2nd, etc.).
You will find that using the square bracket method in your ggplot call won't help as ggplot2 automatically uses the alphabetic order by default, i.e. the order of the data in your dataframe does not matter (the data being strings or factors won't make a difference).
However, if you use factors (which is the default when storing strings with the data.frame() function), you can order their levels:
df$date.time <- ordered(df$date.time,
levels = df$date.time[c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)])
# see the new ordered levels
levels(df$date.time)
# visualise as is, ggplot2 uses ordered levels
df %>%
ggplot(aes(x = date.time, y = Price, group = 1)) +
geom_line()
For your labels, as the ordering of levels has not changed the order of your data in your dataframe, you still have to refer to their original position. But if you want your original code to work, you can add a step to reorganise your whole dataframe according to the ordered levels:
library(dplyr)
df <- df %>%
arrange(date.time)
The dplyr::arrange() function will take the ordered levels into account, and your rows are now ordered as expected.
Your original labelling method should then work fine:
df %>%
ggplot(aes(x = date.time, y = Price, group = 1)) +
geom_line() +
scale_x_discrete(name = 'Day', breaks = df$date.time[c(1,3,5,7,9,11,13)],
labels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
To get Sunday to appear first do this:
df$date.time <- reorder(df$date.time, my.order)
df %>%
ggplot(aes(x = as.character(date.time), y = Price, group = 1)) +
geom_line()
No idea why, but making it a character sorts out the re-order issue.
EDIT: with as.character() it looks like the labels works as well?
df %>%
ggplot(aes(x = as.character(date.time), y = Price, group = 1)) +
geom_line() +
scale_x_discrete(name = 'Day', breaks = df$date.time[c(1,3,5,7,9,11,13)],
labels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
Related
I am trying to plot multiple time-periods on the same time-series graph by month. This is my data: https://pastebin.com/458t2YLg. I was trying to avoid dput() example but I think it would have caused confusion to reduce the sample and still keep the structure of the original data. Here is basically a glimpse of how it looks like:
date fl_all_cumsum
671 2015-11-02 0.785000
672 2015-11-03 1.046667
673 2015-11-04 1.046667
674 2015-11-05 1.099000
675 2015-11-06 1.099000
676 2015-11-07 1.099000
677 2015-11-08 1.151333
Basically, it is daily data that spans several years. My goal is to compare the cumulative snow gliding (fl_all_cumsum) of several winter seasons (
It is very similar to this: ggplot: Multiple years on same plot by month however, there are some differences, such as: 1) the time periods are not years but winter seasons (1.10.xxxx - 6.30.xxxx+1); 2) Because I care only about the winter periods I would like the x-axis to go only from October to end of June the following year; 3) the data is not consistent (there are a lot of NA gaps during the months).
I managed to produce this:
library(zoo)
library(lubridate)
library(ggplot2)
library(scales)
library(patchwork)
library(dplyr)
library(data.table)
startTime <- as.Date("2016-10-01")
endTime <- as.Date("2017-06-30")
start_end <- c(startTime,endTime)
ggplot(data = master_dataset, aes(x = date, y = fl_all_cumsum))+
geom_line(size = 1, na.rm=TRUE)+
ggtitle("Cumulative Seasonal Gliding Distance")+
labs(color = "")+
xlab("Month")+
ylab("Accumulated Distance [mm]")+
scale_x_date(limits=start_end,breaks=date_breaks("1 month"),labels=date_format("%d %b"))+
theme(axis.text.x = element_text(angle = 50, size = 10 , vjust = 0.5),
axis.text.y = element_text(size = 10, vjust = 0.5),
panel.background = element_rect(fill = "gray100"),
plot.background = element_rect(fill = "gray100"),
panel.grid.major = element_line(colour = "lightblue"),
plot.margin = unit(c(1, 1, 1, 1), "cm"),
plot.title = element_text(hjust = 0.5, size = 22))
This actually works good visually as the x axis goes from October to June as desired; however, I did it by setting limits,
startTime <- as.Date("2016-10-01")
endTime <- as.Date("2017-06-30")
start_end <- c(startTime,endTime)
and then setting breaks of 1 month.
scale_x_date(limits=start_end,breaks=date_breaks("1 month"),labels=date_format("%d %b"))+
It is needless to say that this technique will not work if I would like to include other winter seasons and a legend.
I also tried to assign a season to certain time periods and then use them as a factor:
master_dataset <- master_dataset %>%
mutate(season = case_when(date>=as.Date('2015-11-02')&date<=as.Date('2016-06-30')~"season 2015-16",
date>=as.Date('2016-11-02')&date<=as.Date('2017-06-30')~"season 2016-17",
date>=as.Date('2017-10-13')&date<=as.Date('2018-06-30')~"season 2017-18",
date>=as.Date('2018-10-18')&date<=as.Date('2019-06-30')~"season 2018-19"))
ggplot(master_dataset, aes(month(date, label=TRUE, abbr=TRUE), fl_all_cumsum, group=factor(season),colour=factor(season)))+
geom_line()+
labs(x="Month", colour="Season")+
theme_classic()
As you can see, I managed to include the other seasons in the graph but there are several issues now:
grouped by month it aggregates the daily values and I lose the daily dynamic in the graph (look how it is based on monthly steps)
the x-axis goes in chronological order which messes up my visualization (remember I care for the winter season development so I need the x-axis to go from October-End of June; see the first graph I produced)
Not big of an issue but because the data has NA gaps, the legend also shows a factor "NA"
I am not a programmer so I can't wrap my mind around on how to code for such an issue. In a perfect world, I would like to have something like the first graph I produced but with all winter seasons included and a legend. Does someone have a solution for this? Thanks in advance.
Zorin
This is indeed kind of a pain and rather fiddly. I create "fake dates" that are the same as your date column, but the year is set to 2015/2016 (using 2016 for the dates that will fall in February so leap days are not lost). Then we plot all the data, telling ggplot that it's all 2015-2016 so it gets plotted on the same axis, but we don't label the year. (The season labels are used and are not "fake".)
## Configure some constants:
start_month = 10 # first month on x-axis
end_month = 6 # last month on x-axis
fake_year_start = 2015 # year we'll use for start_month-December
fake_year_end = fake_year_start + 1 # year we'll use for January-end_month
fake_limits = c( # x-axis limits for plot
ymd(paste(fake_year_start, start_month, "01", sep = "-")),
ceiling_date(ymd(paste(fake_year_end, end_month, "01", sep = "-")), unit = "month")
)
df = df %>%
mutate(
## add (real) year and month columns
year = year(date),
month = month(date),
## add the year for the season start and end
season_start = ifelse(month >= start_month, year, year - 1),
season_end = season_start + 1,
## create season label
season = paste(season_start, substr(season_end, 3, 4), sep = "-"),
## add the appropriate fake year
fake_year = ifelse(month >= start_month, fake_year_start, fake_year_end),
## make a fake_date that is the same as the real date
## except set all the years to the fake_year
fake_date = date,
fake_date = "year<-"(fake_date, fake_year)
) %>%
filter(
## drop irrelevant data
month >= start_month | month <= end_month,
!is.na(fl_all_cumsum)
)
ggplot(df, aes(x = fake_date, y = fl_all_cumsum, group = season,colour= season))+
geom_line()+
labs(x="Month", colour = "Season")+
scale_x_date(
limits = fake_limits,
breaks = scales::date_breaks("1 month"),
labels = scales::date_format("%d %b")
) +
theme_classic()
I have some time series data with quarterly frequency, as below.
I'm using geom_tile to create a heatmap of these time series data, but the issue I have now is that the labeling on the x axis is defaulted to year eventhough the data is on quarterly.
My expectation was something like 2014 Q1, 2020 Q4 as in the dataset.
set.seed(1990)
ID <- rep(c('A','B','C'),each = 84)
n <- rep(round(runif(84,1,4)), 3)
datetime <- rep(seq(as.POSIXct("2014-01-01"), as.POSIXct("2020-12-01"), by="month"), 3)
df <- tibble(ID,n, datetime)
df <- df %>%
#mutate(yearweek = tsibble::yearweek(datetime)) %>%
mutate(yearquarter = zoo::as.yearqtr(datetime)) %>%
#group_by(ID, yearweek) %>%
group_by(ID, yearquarter) %>%
summarise(n = sum(n))
df
ggplot(df
,
aes(y=ID,x= yearquarter,fill=n))+
geom_tile(color = 'gray')
Normally I can easily control the monthly level dataset with scale_x_date as below but using it with quarterly data throws Error: Invalid input: date_trans works with objects of class Date only.
I'm using tsibble::yearweek to get weekly aggregation and zoo::as.yearqtr for quarterly aggregation.
But the issue is when it comes to plotting, ggplot may not support them. So is there a more consistent approach to dealing with time series data with multiple frequencies in R/ggplot?
scale_x_date(expand = c(0,0),breaks = seq(as.Date("2014-07-01"), as.Date("2020-12-01"), by = "1 month"), date_labels = "%Y %b", name = 'Monthly')
Since you have zoo's as.yearqtr variable use zoo's scale_x_yearqtr to format the x-axis.
library(ggplot2)
ggplot(df,aes(y=ID,x= yearquarter,fill=n))+
geom_tile(color = 'gray') +
zoo::scale_x_yearqtr(format = '%Y Q%q')
I have data that looks as follows:
Date Time_finished
4/3/2020 16:30:21
4/6/2020 16:43:29
4/7/2020 16:28:47
4/8/2020 16:30:38
4/9/2020 16:50:01
I would like to plot a line chart showing date across the x axis and then the time finished on the y axis, to show a time series graph. For some reason this does not seem to be working, the Date is saved as Date but time as a factor, does this also need to be a date?
I have tried normal plot but having no luck.
Thanks
Like this?
df <- tibble::tribble(
~Date, ~Time_finished,
"4/3/2020", "16:30:21",
"4/6/2020", "16:43:29",
"4/7/2020", "16:28:47",
"4/8/2020", "16:30:38",
"4/9/2020", "16:50:01"
)
library(tidyverse)
df %>%
mutate(Date = as.POSIXct(Date, format = "%m/%d/%y"),
Time_finished = as.POSIXct(Time_finished, format = "%H:%M:%S")) %>%
ggplot(aes(x = Date, y = Time_finished, group = 1)) +
geom_line() + scale_y_datetime(breaks = date_breaks("10 min"),
minor_breaks = date_breaks("2 min"),
labels = date_format("%Hh %Mm %Ss"))
I am looking at data from Nov to April and would like to have a plot starting from Nov to April. Below is my sample code to screen out month of interests.
library(tidyverse)
mydata = data.frame(seq(as.Date("2010-01-01"), to=as.Date("2011-12-31"),by="days"), A = runif(730,10,50))
colnames(mydata) = c("Date", "A")
DF = mydata %>%
mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>%
filter(Month == 11 | Month == 12 | Month == 01 | Month == 02 | Month == 03 | Month == 04)
I tried to re-order the data starting at month 11 followed by month 12 and then month 01,02,03,and,04. I used the code factor(Month, levels = c(11,12,01,02,03,04)) along with the code above but it didn't work.
I wanted a plot that starts at month Nov and ends on April. The following code gave me attached plot
ggplot(data = DF, aes(Month,A))+
geom_bar(stat = "identity")+ facet_wrap(~Year, ncol = 2)
Right now, the plot is starting at January all the way to December- I dont want this. I want the plot starting at November, and all the way to April. I tried to label the plot using scale_x_date(labels = date_format("%b", date_breaks = "month", name = "Month") which didn't work. Any help would
I converted Month to character before applying factor() and it worked.
DF = mydata %>%
mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>%
filter(Month %in% c(11, 12, 1, 2, 3, 4)) %>%
mutate(Month = sprintf("%02d", Month)) %>%
mutate(Month = factor(Month, levels = c("11","12","01","02","03","04")))
ggplot(data = DF, aes(Month,A))+
geom_bar(stat = "identity")+ facet_wrap(~Year, ncol = 2)
Output:
user2332849 answer is close but does introduce an error. The bar are not in the correct order. For example for 2010, it plot is showing November and December's data prior to the beginning of the year's data. In order to plot in the proper order the year will need adjustment so that the calendar starts on month 11 and goes to month 4.
#Convert month to Factor and set desired order
DF$Month<- factor(DF$Month, levels=c(11, 12, 1, 2, 3, 4))
#Adjust the year to match the year of the beginning of series
#For example assign Jan, Feb, Mar and April to prior year
DF$Year<-ifelse(as.integer(as.character(DF$Month)) <6, DF$Year-1, DF$Year)
#plot
ggplot(data = DF, aes(Month,A))+
geom_bar(stat = "identity") +
facet_wrap(~Year, ncol = 3)
In the plot below the first 4 months of 2010 is shifted to become the last 4 periods of the prior year. And the last 2 months of 2011 is ready for the first 4 months of 2012.
I am analyzing day to day data to see when the value would be lower. I set each day as categorical variable so I can differentiate each day. But I want to get each day plotted on top of another day instead of one continuous graph as shown below.
Data set:
Value Day
2013-01-03 01:55:00 0.35435715 1
2013-01-03 02:00:00 0.33018654 1
2013-01-03 02:05:00 0.38976118 1
2013-01-04 02:10:00 0.45583868 2
2013-01-04 02:15:00 0.29290860 2
My current ggplot code is as follows:
g <- ggplot(data = Data, aes(x = Index, color = Dates)) +
geom_line(y = Data$Value) +
scale_x_datetime(date_breaks = TimeIntervalForGraph, date_labels = "%H") +
xlab("Time") +
ylab("Random value")
I would really appreciate if anyone can guide me on how I can turn my x-axis into 24hrs time series so that I can plot each day on the same graph to see when the value is lower during the 24 hrs.Thanks in advance.
Method tried:
I tried creating an 3rd column with time only, for some reasons the following codes didnt work:
time <- format(index(x), format = "%H:%M"))
data <- cbind(data, time)
You need a way of summarising the data for each hour of the day. Here are some approaches you're probably looking for:
library(xts)
library(data.table)
library(ggplot2)
tm <- seq(as.POSIXct("2017-08-08 17:30:00"), by = "5 mins", length.out = 10000)
z <- xts(runif(10000), tm, dimnames = list(NULL, "vals"))
DT <- data.table(time = index(z), coredata(z))
# note the data.table syntax is different:
DT[, hr := hour(time)]
# Plot the average value by hour:
datByHour <- DT[, list(avgval = mean(vals)), by = c("hr")]
# Use line plot if you have one point per hour:
g <- ggplot(data = datByHour, aes(x = hr, y = avgval, colour = avgval)) +
geom_line()
datByHour <- DT[, list(avgval = mean(vals)), by = c("hr")]
# visualise the distribution by hour:
g2 <- ggplot(data = DT, aes(x = hr, y = vals, group = hr)) +
geom_boxplot()
Please try the following and let me know if it works (here I am taking tm time column as given):
Data$tm = strftime(Data$tm, format="%H:%M:%S")
library(ggplot2)
ggplot(Data, aes(x = tm, y = Value, group = Day, colour = Day)) +
geom_line() +
theme_classic()