Converting 3 columns (day, month, year) into a single date column R - r

I have the data-frame called dates which looks like this:
Day Month Year
2 April 2015
5 May 2014
23 December 2017
This code is:
date <- data.frame(Day = c(2,5,23),
Month = c("April", "May", "December"),
Year = c(2015, 2014, 2017))
I want to create a new column that looks like this:
Day Month Year Date
2 April 2015 2/4/2015
5 May 2014 5/5/2014
23 December 2017 23/12/2017
To do this, I tried:
data <- data %>%
mutate(Date = as.Date(paste(Day, Month, Year, sep = "/"))) %>%
dmy()
But I got an error which says:
Error in charToDate(x) :
character string is not in a standard unambiguous format
Is there an obvious error that I'm not seeing?
Thank you so much.

We need to use appropriate format in as.Date. Using base R, we can do
transform(data, Date = as.Date(paste(Day, Month, Year, sep = "/"), "%d/%B/%Y"))
# Day Month Year Date
#1 2 April 2015 2015-04-02
#2 5 May 2014 2014-05-05
#3 23 December 2017 2017-12-23
Or with dplyr and lubridate
library(dplyr)
library(lubridate)
data %>% mutate(Date = dmy(paste(Day, Month, Year, sep = "/")))
You can add format(Date, "%d/%m/%Y") if you need to change the display format.

Related

How do I replace a value in my dataframe with text?

I have a dataframe with dates from April 2020 to today, right now they are labelled 1 to 492 with 1 being the first date I have data on. I also have a list of dates in the format I want. How can I tell R that date 1 is april 12 2020, date 2 is april 13, 2020, and so on for each date? I'm ok either replacing the values in the column or creating a new column called real_date next to it.
Update:
Sorry I didn't describe this very well. I ended up making a look-up table with the date number and real date, and I used the inner_join function to add the real date to my dataframe.
library(tidyverse)
library(lubridate)
#Creating a sample data.frame
df <-
tibble(
dates = seq.Date(dmy("01/04/20"),today(),by = "1 day")
)
df %>%
#Format date, where: %B = month as string, %d numeric day and %y numeric year
mutate(
new_date = format(dates,"%B %d %Y")
)
*Abril is April in portuguese.
If I have understood the question correctly, you have a dataframe which has numbers from 1 to 492, now you want to change them to dates where number 1 is 12th April 2020, number 2 is 13th April 2020 and so on.
You can use as.Date to convert these numbers to date and pass the origin as 11th April.
df <- data.frame(date = 1:492)
df$real_date <- as.Date(df$date, origin = '2020-04-11')
head(df)
# date real_date
#1 1 2020-04-12
#2 2 2020-04-13
#3 3 2020-04-14
#4 4 2020-04-15
#5 5 2020-04-16
#6 6 2020-04-17
Just create a sequence of dates
data.frame(date = seq(as.Date('2020-04-12'), length.out = 492,
by = '1 day'), code = 1:492)

Convert decimal month and year into Date

I have a 'decimal month' and a year variable:
df <- data.frame(decimal_month = c(4.75, 5, 5.25), year = c(2011, 2011, 2011))
How can I convert these variables to a Date? ("2011-04-22" "2011-05-01" "2011-05-08"). Or at least to day of the year.
You may use some nice functions from the zoo package:
as.yearmon to convert year and floor of the decimal month to class yearmon.
Then use as.Date.yearmon and its frac argument to coerce the year-month to class Date.
library(zoo)
df$date = as.Date(as.yearmon(paste(df$year, floor(df$decimal_month), sep = "-")),
frac = df$decimal_month - floor(df$decimal_month))
# decimal_month year date
# 1 4.75 2011 2011-04-22
# 2 5.00 2011 2011-05-01
# 3 5.25 2011 2011-05-08
If desired, day of year is simply format(df$date, "%j")

How to obtain an ordered object for a date variable that has been formatted?

I would like to format my date variable to %d %b %Y (e.g. 05 May 2020). However, once it has been formatted, it becomes a character variable and sorting the variable from the earliest date to the latest date would not be possible (e.g. 05 May 2020 is sorted before 26 Apr 2020).
Data:
df <- structure(list(Date = structure(c(1588204800, 1587945600, 1588464000, 1588032000,
1588291200, 1588377600, 1588118400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA, -7L))
# > df
# Date
# 1 2020-04-30
# 2 2020-04-27
# 3 2020-05-03
# 4 2020-04-28
# 5 2020-05-01
# 6 2020-05-02
# 7 2020-04-29
Here is how it looks like sorting a formatted date variable:
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(Date)
# Date
# 1 01 May 2020
# 2 02 May 2020
# 3 03 May 2020
# 4 27 Apr 2020
# 5 28 Apr 2020
# 6 29 Apr 2020
# 7 30 Apr 2020
So, this is what I have done, which works, but I would like to know if this is really correct or if there are alternatives to solve this.
df %>%
mutate(Date = factor(Date, labels = format(sort(unique(Date)), "%d %b %Y"), ordered = TRUE)) %>%
arrange(Date)
# Date
# 1 27 Apr 2020
# 2 28 Apr 2020
# 3 29 Apr 2020
# 4 30 Apr 2020
# 5 01 May 2020
# 6 02 May 2020
# 7 03 May 2020
Edit:
Actually the reason behind wanting to format it and arranging it, is so that I can have direct access to more readable date formats when building my dashboard for my users.
When it comes to ggplot(), even after you do arrange and mutate with format, the facetted plots, will always give in sorted character order. Example below:
df %>%
arrange(Date) %>%
mutate(n = 1:n(),
Date = format(Date, "%d %b %Y")) %>%
ggplot() +
geom_bar(aes(x = n)) +
facet_wrap(~Date)
If you want to use dates in plots the main idea is to adjust the factor levels based on order in which you want to show data. arrange the dates first and attach factor levels based on occurrence of dates.
library(dplyr)
library(ggplot2)
df %>%
arrange(Date) %>%
mutate(n = row_number(),
Date = format(Date, "%d %b %Y"),
Date = factor(Date, levels = unique(Date))) %>%
ggplot() + geom_bar(aes(x = n)) + facet_wrap(~Date)
My original solution is below, but the better solution is so simple it hurts a little that I didn't spot it immediately - do your arrange() before your mutate() - at that point it is a date-type variable so will sort the way you want it to:
df %>%
arrange(Date) %>%
mutate(Date = format(Date, "%d %b %Y"))
Giving:
Date
1 27 Apr 2020
2 28 Apr 2020
3 29 Apr 2020
4 30 Apr 2020
5 01 May 2020
6 02 May 2020
7 03 May 2020
Alternatively, you could add an as.Date(..., format = "%d %b %Y") to your arrange():
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(as.Date(Date, format = "%d %b %Y"))
Personally, I prefer the tidyverse solution for dates - lubridate. Here:
library(lubridate)
df %>%
mutate(Date = ymd(Date)) %>%
arrange(Date)
In short, you can parse your dates by combining d for day, m for month and y for year. You can add time, too. For example,
ymd_hms("20150102 12:23:01")
As the example shows we do not have to bother about the seperator. If you have access this is a nice paper on that package. Otherwise, there are many tutorials out there on lubridate.

R: assign months to day of the year

Here's my data which has 10 years in one column and 365 day of another year in second column
dat <- data.frame(year = rep(1980:1989, each = 365), doy= rep(1:365, times = 10))
I am assuming all years are non-leap years i.e. they have 365 days.
I want to create another column month which is basically month of the year the day belongs to.
library(dplyr)
dat %>%
mutate(month = as.integer(ceiling(day/31)))
However, this solution is wrong since it assigns wrong months to days. I am looking for a dplyr
solution possibly.
We can convert it to to datetime class by using the appropriate format (i.e. %Y %j) and then extract the month with format
dat$month <- with(dat, format(strptime(paste(year, doy), format = "%Y %j"), '%m'))
Or use $mon to extract the month and add 1
dat$month <- with(dat, strptime(paste(year, doy), format = "%Y %j")$mon + 1)
tail(dat$month)
#[1] 12 12 12 12 12 12
This should give you an integer value for the months:
dat$month.num <- month(as.Date(paste(dat$year, dat$doy), '%Y %j'))
If you want the month names:
dat$month.names <- month.name[month(as.Date(paste(dat$year, dat$doy), '%Y %j'))]
The result (only showing a few rows):
> dat[29:33,]
year doy month.num month.names
29 1980 29 1 January
30 1980 30 1 January
31 1980 31 1 January
32 1980 32 2 February
33 1980 33 2 February

Add Month and Year column from complete date column

I have a column with date formatted as MM-DD-YYYY, in the Date format.
I want to add 2 columns one which only contains YYYY and the other only contains MM.
How do I do this?
Once again base R gives you all you need, and you should not do this with sub-strings.
Here we first create a data.frame with a proper Date column. If your date is in text format, parse it first with as.Date() or my anytime::anydate() (which does not need formats).
Then given the date creating year and month is simple:
R> df <- data.frame(date=Sys.Date()+seq(1,by=30,len=10))
R> df[, "year"] <- format(df[,"date"], "%Y")
R> df[, "month"] <- format(df[,"date"], "%m")
R> df
date year month
1 2017-12-29 2017 12
2 2018-01-28 2018 01
3 2018-02-27 2018 02
4 2018-03-29 2018 03
5 2018-04-28 2018 04
6 2018-05-28 2018 05
7 2018-06-27 2018 06
8 2018-07-27 2018 07
9 2018-08-26 2018 08
10 2018-09-25 2018 09
R>
If you want year or month as integers, you can wrap as as.integer() around the format.
A base R option would be to remove the substring with sub and then read with read.table
df1[c('month', 'year')] <- read.table(text=sub("-\\d{2}-", ",", df1$date), sep=",")
Or using tidyverse
library(tidyverse)
separate(df1, date, into = c('month', 'day', 'year') %>%
select(-day)
Note: it may be better to convert to datetime class instead of using the string formatting.
df1 %>%
mutate(date =mdy(date), month = month(date), year = year(date))
data
df1 <- data.frame(date = c("05-21-2017", "06-25-2015"))

Resources