Filling missing date months in a time series by Item - r

I am fairly new to R and I am trying to do the following task:
I have the following dataset:
df1 <- data.frame(ITEM = c("A","A","A","A","A","B","B","B","B","B"),
Date = c("Jan-2020","Feb-2020","May-2020","Jun-2020","Jul-2020","Jan-2020","Apr-2020","Jun-2020","Jul-2020","Aug-2020"))
Here is an image:
I have used the library "zoo" to change the date column into yearmon and I am trying to create rows for the missing "yearmon" dates. So something like this:
Anyone has any idea how I can do this?
Thank you

You can create a sequence of yearmon objects for each ITEM and use it in complete.
library(dplyr)
library(zoo)
library(tidyr)
df1 %>%
mutate(Date = as.yearmon(Date, '%b-%Y')) %>%
group_by(ITEM) %>%
complete(Date = seq(min(Date), max(Date), 1/12)) %>%
ungroup
# ITEM Date
# <chr> <yearmon>
# 1 A Jan 2020
# 2 A Feb 2020
# 3 A Mar 2020
# 4 A Apr 2020
# 5 A May 2020
# 6 A Jun 2020
# 7 A Jul 2020
# 8 B Jan 2020
# 9 B Feb 2020
#10 B Mar 2020
#11 B Apr 2020
#12 B May 2020
#13 B Jun 2020
#14 B Jul 2020
#15 B Aug 2020
If you want a sequence of date objects you can use :
df1 %>%
mutate(Date = as.Date(as.yearmon(Date, '%b-%Y'))) %>%
group_by(ITEM) %>%
complete(Date = seq(min(Date), max(Date), 'month')) %>%
ungroup()

Related

Extracting a month from date using lubridate - renaming problem

I have this dataframe
df=data.frame("temp"=c(60.80,46.04,26.96,24.98),"humid"=c(93.79,53.33,50.34,54.65),"wind_speed"=c(40.27,39.12,14.96, 13.81), "date" =c("2013-01-31","2013-01-31","2013-02-02", "2013-02-02"))
df$date <- as.Date(df$date, "%Y-%m-%d")
View(df)
# after the code above, I need to do smth else with my data, then I transform the dataframe.
df_mod<- cbind(df[4], stack(df[1:3]))
df_mod$date<- as.data.frame(df_mod$date) %>%
mutate(Month = lubridate::month(df_mod$date, label = TRUE))
View(df_mod)
I want to create a column named 'Month'. the code above does what I want however, it renames my 'date' column as 'date$df_mod$date' and my new month column becomes named as 'date$Month'.
Is there a better way to do it?
Instead of as.data.frame(df_mod$date), use select(date).
library(dplyr)
library(lubridate)
df_mod %>%
select(date) %>%
mutate(Month = month(date, label = TRUE))
Or combine the select and mutate steps using transmute():
df_mod %>%
transmute(
date,
Month = month(date, label = TRUE)
)
Or use mutate() with .keep = "used":
df_mod %>%
mutate(
Month = month(date, label = TRUE),
.keep = "used"
)
Output from all three approaches:
date Month
1 2013-01-31 Jan
2 2013-01-31 Jan
3 2013-02-02 Feb
4 2013-02-02 Feb
5 2013-01-31 Jan
6 2013-01-31 Jan
7 2013-02-02 Feb
8 2013-02-02 Feb
9 2013-01-31 Jan
10 2013-01-31 Jan
11 2013-02-02 Feb
12 2013-02-02 Feb

can't convert month number to month date in R [duplicate]

This question already has answers here:
How can I use name of a month in x-axis in ggplot2
(2 answers)
Closed 7 months ago.
I would like to create a ggplot with different tree types in Spain.
I used that code
library(dplyr)
library(reshape)
set.seed(123)
library(ggplot2)
library(tidyr)
df_long <- pivot_longer(df7,
cols = c(Birch, Palm, Oak),
values_to = "m3",
names_to = "Trees")
# Plot
ggplot(df_long,
aes(
x = Month,
y = Integral,
color = Trees
)) +
geom_line() +
ggtitle("trees in Spain") +
xlab("Month") + scale_x_continuous(breaks = seq(1, 12, by = 1), limits = c(1,12)) +
ylab(" m3")
But unfortunately the month names are not shown, just the number but I would like to have the month name
If your months are integers you can use the built in constants month.abb and month.name
library(dplyr)
df <- data.frame(month_nums = 1:12)
df |>
mutate(
month_abb = month.abb[month_nums],
month_full = month.name[month_nums]
)
# MONTH month_abb month_full
# 1 1 Jan January
# 2 2 Feb February
# 3 3 Mar March
# 4 4 Apr April
# 5 5 May May
# 6 6 Jun June
# 7 7 Jul July
# 8 8 Aug August
# 9 9 Sep September
# 10 10 Oct October
# 11 11 Nov November
# 12 12 Dec December
If they are dates you can use format():
df <- data.frame(
month = seq(from = as.Date("2020-01-01"), to = as.Date("2020-12-31"), by = "month")
)
df |>
mutate(
month_abb = format(month, "%b"),
month_full = format(month, "%B")
)
# month month_abb month_full
# 1 2020-01-01 Jan January
# 2 2020-02-01 Feb February
# 3 2020-03-01 Mar March
# 4 2020-04-01 Apr April
# 5 2020-05-01 May May
# 6 2020-06-01 Jun June
# 7 2020-07-01 Jul July
# 8 2020-08-01 Aug August
# 9 2020-09-01 Sep September
# 10 2020-10-01 Oct October
# 11 2020-11-01 Nov November
# 12 2020-12-01 Dec December

How to obtain an ordered object for a date variable that has been formatted?

I would like to format my date variable to %d %b %Y (e.g. 05 May 2020). However, once it has been formatted, it becomes a character variable and sorting the variable from the earliest date to the latest date would not be possible (e.g. 05 May 2020 is sorted before 26 Apr 2020).
Data:
df <- structure(list(Date = structure(c(1588204800, 1587945600, 1588464000, 1588032000,
1588291200, 1588377600, 1588118400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA, -7L))
# > df
# Date
# 1 2020-04-30
# 2 2020-04-27
# 3 2020-05-03
# 4 2020-04-28
# 5 2020-05-01
# 6 2020-05-02
# 7 2020-04-29
Here is how it looks like sorting a formatted date variable:
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(Date)
# Date
# 1 01 May 2020
# 2 02 May 2020
# 3 03 May 2020
# 4 27 Apr 2020
# 5 28 Apr 2020
# 6 29 Apr 2020
# 7 30 Apr 2020
So, this is what I have done, which works, but I would like to know if this is really correct or if there are alternatives to solve this.
df %>%
mutate(Date = factor(Date, labels = format(sort(unique(Date)), "%d %b %Y"), ordered = TRUE)) %>%
arrange(Date)
# Date
# 1 27 Apr 2020
# 2 28 Apr 2020
# 3 29 Apr 2020
# 4 30 Apr 2020
# 5 01 May 2020
# 6 02 May 2020
# 7 03 May 2020
Edit:
Actually the reason behind wanting to format it and arranging it, is so that I can have direct access to more readable date formats when building my dashboard for my users.
When it comes to ggplot(), even after you do arrange and mutate with format, the facetted plots, will always give in sorted character order. Example below:
df %>%
arrange(Date) %>%
mutate(n = 1:n(),
Date = format(Date, "%d %b %Y")) %>%
ggplot() +
geom_bar(aes(x = n)) +
facet_wrap(~Date)
If you want to use dates in plots the main idea is to adjust the factor levels based on order in which you want to show data. arrange the dates first and attach factor levels based on occurrence of dates.
library(dplyr)
library(ggplot2)
df %>%
arrange(Date) %>%
mutate(n = row_number(),
Date = format(Date, "%d %b %Y"),
Date = factor(Date, levels = unique(Date))) %>%
ggplot() + geom_bar(aes(x = n)) + facet_wrap(~Date)
My original solution is below, but the better solution is so simple it hurts a little that I didn't spot it immediately - do your arrange() before your mutate() - at that point it is a date-type variable so will sort the way you want it to:
df %>%
arrange(Date) %>%
mutate(Date = format(Date, "%d %b %Y"))
Giving:
Date
1 27 Apr 2020
2 28 Apr 2020
3 29 Apr 2020
4 30 Apr 2020
5 01 May 2020
6 02 May 2020
7 03 May 2020
Alternatively, you could add an as.Date(..., format = "%d %b %Y") to your arrange():
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(as.Date(Date, format = "%d %b %Y"))
Personally, I prefer the tidyverse solution for dates - lubridate. Here:
library(lubridate)
df %>%
mutate(Date = ymd(Date)) %>%
arrange(Date)
In short, you can parse your dates by combining d for day, m for month and y for year. You can add time, too. For example,
ymd_hms("20150102 12:23:01")
As the example shows we do not have to bother about the seperator. If you have access this is a nice paper on that package. Otherwise, there are many tutorials out there on lubridate.

R - approximate missing month values using zoo package

Consider code:
library('zoo')
data <- c(1, 2, 4, 6)
dates <- c("2016-11-01", "2016-12-01", "2017-02-01", "2017-04-01");
z1 <- zoo(data, as.yearmon(dates))
z2 <- na.approx(z1)
Variable z2 looks like this:
nov 2016 dec 2016 feb 2017 apr 2017
1 2 4 6
But I need z2 to be similar to this:
nov 2016 dec 2016 jan 2017 feb 2017 mar 2017 apr 2017
1 2 3 4 5 6
I just need to approximate values for months where value is missing. Thanks for any hints.
With the new as.zoo argument, calendar, in zoo 1.8 (which defaults to TRUE so we don't have to specify it) we can just convert the input to "ts" and then back to "zoo" again applying na.approx after that:
na.approx(as.zoo(as.ts(z2)))
## Nov 2016 Dec 2016 Jan 2017 Feb 2017 Mar 2017 Apr 2017
## 1 2 3 4 5 6
With prior versions of zoo we can do the same but manually convert the index back to "yearmon":
na.approx(aggregate(as.zoo(as.ts(z2)), as.yearmon, c))
magrittr
Using zoo with magrittr these can be expressed as the following pipelines, respectively:
library(magrittr)
z2 %>% as.ts %>% as.zoo %>% na.approx
z2 %>% as.ts %>% as.zoo %>% aggregate(as.yearmon, c) %>% na.approx
One way using just na.approx and base R:
#add your data and dates together
df <- data.frame(data, dates = as.Date(dates))
#create all dates using seq
new_dates <- data.frame(dates = seq(as.Date(dates[1]), as.Date(dates[4]), by = 'month'))
#merge the two and then na.approx
new_df <- merge(new_dates, df, by = 'dates', all.x = TRUE)
na.approx(new_df$data)
Out:
[1] 1 2 3 4 5 6

Aggregate data to weekly level with every week starting from Monday

I have a data frame like,
2015-01-30 1 Fri
2015-01-30 2 Sat
2015-02-01 3 Sun
2015-02-02 1 Mon
2015-02-03 1 Tue
2015-02-04 1 Wed
2015-02-05 1 Thu
2015-02-06 1 Fri
2015-02-07 1 Sat
2015-02-08 1 Sun
I want to aggregaate it to weekly level such that every week starts from "monday" and ends in "sunday". So, in the aggregated data for above, first week should end on 2015-02-01.
output should look like something for above
firstweek 6
secondweek 7
I tried this,
data <- as.xts(data$value,order.by=as.Date(data$interval))
weekly <- apply.weekly(data,sum)
But here in the final result, every week is starting from Sunday.
This should work. I've called the dataframe m and named the columns possibly different to yours.
library(plyr) # install.packages("plyr")
colnames(m) = c("Date", "count","Day")
start = as.Date("2015-01-26")
m$Week <- floor(unclass(as.Date(m$Date) - as.Date(start)) / 7) + 1
m$Week = as.numeric(m$Week)
m %>% group_by(Week) %>% summarise(count = sum(count))
The library plyr is great for data manipulation, but it's just a rough hack to get the week number in.
Convert to date and use the %W format to get a week number...
df <- read.csv(textConnection("2015-01-30, 1, Fri,
2015-01-30, 2, Sat,
2015-02-01, 3, Sun,
2015-02-02, 1, Mon,
2015-02-03, 1, Tue,
2015-02-04, 1, Wed,
2015-02-05, 1, Thu,
2015-02-06, 1, Fri,
2015-02-07, 1, Sat,
2015-02-08, 1, Sun"), header=F, stringsAsFactors=F)
names(df) <- c("date", "something", "day")
df$date <- as.Date(df$date, format="%Y-%m-%d")
df$week <- format(df$date, "%W")
aggregate(df$something, list(df$week), sum)
Wit dplyr and lubridate is this really easy thanks to the function isoweek
my.df <- read.table(header=FALSE, text=
'2015-01-30 1 Fri
2015-01-30 2 Sat
2015-02-01 3 Sun
2015-02-02 1 Mon
2015-02-03 1 Tue
2015-02-04 1 Wed
2015-02-05 1 Thu
2015-02-06 1 Fri
2015-02-07 1 Sat
2015-02-08 1 Sun')
my.df %>% mutate(week = isoweek(V1)) %>% group_by(week) %>% summarise(sum(V2))
or a bit shorter
my.df %>% group_by(isoweek(V1)) %>% summarise(sum(V2))

Resources