Add Month and Year column from complete date column - r

I have a column with date formatted as MM-DD-YYYY, in the Date format.
I want to add 2 columns one which only contains YYYY and the other only contains MM.
How do I do this?

Once again base R gives you all you need, and you should not do this with sub-strings.
Here we first create a data.frame with a proper Date column. If your date is in text format, parse it first with as.Date() or my anytime::anydate() (which does not need formats).
Then given the date creating year and month is simple:
R> df <- data.frame(date=Sys.Date()+seq(1,by=30,len=10))
R> df[, "year"] <- format(df[,"date"], "%Y")
R> df[, "month"] <- format(df[,"date"], "%m")
R> df
date year month
1 2017-12-29 2017 12
2 2018-01-28 2018 01
3 2018-02-27 2018 02
4 2018-03-29 2018 03
5 2018-04-28 2018 04
6 2018-05-28 2018 05
7 2018-06-27 2018 06
8 2018-07-27 2018 07
9 2018-08-26 2018 08
10 2018-09-25 2018 09
R>
If you want year or month as integers, you can wrap as as.integer() around the format.

A base R option would be to remove the substring with sub and then read with read.table
df1[c('month', 'year')] <- read.table(text=sub("-\\d{2}-", ",", df1$date), sep=",")
Or using tidyverse
library(tidyverse)
separate(df1, date, into = c('month', 'day', 'year') %>%
select(-day)
Note: it may be better to convert to datetime class instead of using the string formatting.
df1 %>%
mutate(date =mdy(date), month = month(date), year = year(date))
data
df1 <- data.frame(date = c("05-21-2017", "06-25-2015"))

Related

How to convert longt text date format to simple? [duplicate]

This question already has answers here:
Convert date-time string to class Date
(4 answers)
Closed 1 year ago.
I just want to convert date from "dbY hms" to "y-m-d".
This is my data:
Time X10cm X20cm X30cm X40cm X50cm X60cm X70cm X80cm X90cm
1 05 Apr 2019 09:46:13 20.70675 26.20419 23.66370 18.04151 3.507654 5.644918 3.947458 0.926415 1.304021
2 11 Apr 2019 08:36:32 18.45716 25.76273 23.82202 18.59679 3.829793 6.639636 4.313009 1.002555 1.440603
3 19 Apr 2019 09:24:16 17.22486 24.03394 21.70397 16.95699 3.507654 6.827912 4.417910 1.046471 1.574125
4 26 Apr 2019 12:14:05 16.60325 22.16044 19.54150 15.73587 3.237127 6.169124 3.987147 1.002555 1.397690
5 10 May 2019 07:40:20 19.68528 22.46739 19.20813 14.97607 3.184556 5.620616 3.888364 0.959796 1.484311
6 17 May 2019 12:07:31 16.82389 23.13976 20.70675 16.86820 3.470846 5.500014 3.714201 0.985313 1.429803
I have tried many things, for example creating a character to transform it to date:
data2 = data1 %>%
mutate(Time1 = format(Time, format = "%Y-%m-%d %H:%M:%S")) %>%
mutate_at("Time1",as.Date,format="%Y-%m-%d")
but it didn't work.
You may use -
x <- '05 Apr 2019 09:46:13'
as.Date(x, '%d %b %Y')
#[1] "2019-04-05"
For the entire column of the dataframe.
data1$Time <- as.Date(data1$Time, '%d %b %Y')
If you want to use dplyr and lubridate.
library(dplyr)
library(lubridate)
data1 <- data1 %>% mutate(Time = as.Date(dmy_hms(Time)))

How to obtain an ordered object for a date variable that has been formatted?

I would like to format my date variable to %d %b %Y (e.g. 05 May 2020). However, once it has been formatted, it becomes a character variable and sorting the variable from the earliest date to the latest date would not be possible (e.g. 05 May 2020 is sorted before 26 Apr 2020).
Data:
df <- structure(list(Date = structure(c(1588204800, 1587945600, 1588464000, 1588032000,
1588291200, 1588377600, 1588118400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA, -7L))
# > df
# Date
# 1 2020-04-30
# 2 2020-04-27
# 3 2020-05-03
# 4 2020-04-28
# 5 2020-05-01
# 6 2020-05-02
# 7 2020-04-29
Here is how it looks like sorting a formatted date variable:
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(Date)
# Date
# 1 01 May 2020
# 2 02 May 2020
# 3 03 May 2020
# 4 27 Apr 2020
# 5 28 Apr 2020
# 6 29 Apr 2020
# 7 30 Apr 2020
So, this is what I have done, which works, but I would like to know if this is really correct or if there are alternatives to solve this.
df %>%
mutate(Date = factor(Date, labels = format(sort(unique(Date)), "%d %b %Y"), ordered = TRUE)) %>%
arrange(Date)
# Date
# 1 27 Apr 2020
# 2 28 Apr 2020
# 3 29 Apr 2020
# 4 30 Apr 2020
# 5 01 May 2020
# 6 02 May 2020
# 7 03 May 2020
Edit:
Actually the reason behind wanting to format it and arranging it, is so that I can have direct access to more readable date formats when building my dashboard for my users.
When it comes to ggplot(), even after you do arrange and mutate with format, the facetted plots, will always give in sorted character order. Example below:
df %>%
arrange(Date) %>%
mutate(n = 1:n(),
Date = format(Date, "%d %b %Y")) %>%
ggplot() +
geom_bar(aes(x = n)) +
facet_wrap(~Date)
If you want to use dates in plots the main idea is to adjust the factor levels based on order in which you want to show data. arrange the dates first and attach factor levels based on occurrence of dates.
library(dplyr)
library(ggplot2)
df %>%
arrange(Date) %>%
mutate(n = row_number(),
Date = format(Date, "%d %b %Y"),
Date = factor(Date, levels = unique(Date))) %>%
ggplot() + geom_bar(aes(x = n)) + facet_wrap(~Date)
My original solution is below, but the better solution is so simple it hurts a little that I didn't spot it immediately - do your arrange() before your mutate() - at that point it is a date-type variable so will sort the way you want it to:
df %>%
arrange(Date) %>%
mutate(Date = format(Date, "%d %b %Y"))
Giving:
Date
1 27 Apr 2020
2 28 Apr 2020
3 29 Apr 2020
4 30 Apr 2020
5 01 May 2020
6 02 May 2020
7 03 May 2020
Alternatively, you could add an as.Date(..., format = "%d %b %Y") to your arrange():
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(as.Date(Date, format = "%d %b %Y"))
Personally, I prefer the tidyverse solution for dates - lubridate. Here:
library(lubridate)
df %>%
mutate(Date = ymd(Date)) %>%
arrange(Date)
In short, you can parse your dates by combining d for day, m for month and y for year. You can add time, too. For example,
ymd_hms("20150102 12:23:01")
As the example shows we do not have to bother about the seperator. If you have access this is a nice paper on that package. Otherwise, there are many tutorials out there on lubridate.

How to convert a data frame of Dates and Frequency to a quarterly basis? R

I have a data of from 2016-2019
Here is a sample of my data
print(myData)
Date Freq
2016-08-08 14
2016-08-09 20
2016-08-10 34
2016-08-11 32
2016-08-12 19
2016-08-15 35
2016-08-16 32
I want to create a line plot but I would like to see something like this that way I can see the trend on a quarterly basis.
Date Freq
2016 Q1 300
2016 Q2 313
2016 Q3 313
2016 Q4 432
2017 Q1 313
2017 Q2 131
How can I do this in R?
We can use as.yearqtr from zoo
library(zoo)
myData %>%
mutate(Date = as.yearqtr(Date))
If the column is not already a Date class, convert to Date class first
myData <- myData %>%
mutate(Date = as.yearqtr(as.Date(Date)))
lubridate has a function quarter:
quarter(myData$Date)
so for your data you can simply write:
myData <- myData %>% mutate(quarter = quarter(Date)

Converting 3 columns (day, month, year) into a single date column R

I have the data-frame called dates which looks like this:
Day Month Year
2 April 2015
5 May 2014
23 December 2017
This code is:
date <- data.frame(Day = c(2,5,23),
Month = c("April", "May", "December"),
Year = c(2015, 2014, 2017))
I want to create a new column that looks like this:
Day Month Year Date
2 April 2015 2/4/2015
5 May 2014 5/5/2014
23 December 2017 23/12/2017
To do this, I tried:
data <- data %>%
mutate(Date = as.Date(paste(Day, Month, Year, sep = "/"))) %>%
dmy()
But I got an error which says:
Error in charToDate(x) :
character string is not in a standard unambiguous format
Is there an obvious error that I'm not seeing?
Thank you so much.
We need to use appropriate format in as.Date. Using base R, we can do
transform(data, Date = as.Date(paste(Day, Month, Year, sep = "/"), "%d/%B/%Y"))
# Day Month Year Date
#1 2 April 2015 2015-04-02
#2 5 May 2014 2014-05-05
#3 23 December 2017 2017-12-23
Or with dplyr and lubridate
library(dplyr)
library(lubridate)
data %>% mutate(Date = dmy(paste(Day, Month, Year, sep = "/")))
You can add format(Date, "%d/%m/%Y") if you need to change the display format.

R: assign months to day of the year

Here's my data which has 10 years in one column and 365 day of another year in second column
dat <- data.frame(year = rep(1980:1989, each = 365), doy= rep(1:365, times = 10))
I am assuming all years are non-leap years i.e. they have 365 days.
I want to create another column month which is basically month of the year the day belongs to.
library(dplyr)
dat %>%
mutate(month = as.integer(ceiling(day/31)))
However, this solution is wrong since it assigns wrong months to days. I am looking for a dplyr
solution possibly.
We can convert it to to datetime class by using the appropriate format (i.e. %Y %j) and then extract the month with format
dat$month <- with(dat, format(strptime(paste(year, doy), format = "%Y %j"), '%m'))
Or use $mon to extract the month and add 1
dat$month <- with(dat, strptime(paste(year, doy), format = "%Y %j")$mon + 1)
tail(dat$month)
#[1] 12 12 12 12 12 12
This should give you an integer value for the months:
dat$month.num <- month(as.Date(paste(dat$year, dat$doy), '%Y %j'))
If you want the month names:
dat$month.names <- month.name[month(as.Date(paste(dat$year, dat$doy), '%Y %j'))]
The result (only showing a few rows):
> dat[29:33,]
year doy month.num month.names
29 1980 29 1 January
30 1980 30 1 January
31 1980 31 1 January
32 1980 32 2 February
33 1980 33 2 February

Resources