I have a df with dates formatted in the following way.
Date Year
<chr> <dbl>
Sunday, Jul 27 2008
Tuesday, Jul 29 2008
Wednesday, July 31 (1) 2008
Wednesday, July 31 (2) 2008
Is there a simple way to achieve the following format of columns and values? I'd also like to remove the (1) and (2) notations on the two July 31 dates.
Date Year Month Day Day_of_Week
2008-07-27 2008 07 27 Sunday
With base R, you can do:
dat <- data.frame(
Date = c("Sunday, Jul 27" ,"Tuesday, Jul 29", "Wednesday, July 31", "Wednesday, July 31"),
Year = rep(2008, 4),
stringsAsFactors = FALSE
)
dts <- as.POSIXlt(paste(dat$Year, dat$Date), format = "%Y %A, %B %d")
POSIXlt provides a list-based reference for the date/time. To see them, try unclass(dts[1]).
From here it can be rather academic:
dat$Month = 1 + dts$mon # months are 0-based in POSIXlt
dat$Day = dts$mday
dat$Day_of_Week = weekdays(dts)
dat
# Date Year Month Day Day_of_Week
# 1 Sunday, Jul 27 2008 7 27 Sunday
# 2 Tuesday, Jul 29 2008 7 29 Tuesday
# 3 Wednesday, July 31 2008 7 31 Thursday
# 4 Wednesday, July 31 2008 7 31 Thursday
library(dplyr)
library(lubridate)
dat = data_frame(date = c('Sunday, Jul 27','Tuesday, Jul 29', 'Wednesday, July
31 (1)','Wednesday, July 31 (2)'), year=rep(2008,4))
dat %>%
mutate(date = gsub("\\s*\\([^\\)]+\\)","",as.character(date)),
date = parse_date_time(date,'A, b! d ')) -> dat1
year(dat1$date) <- dat1$year
# A tibble: 4 × 2
date year
<dttm> <dbl>
1 2008-07-27 2008
2 2008-07-29 2008
3 2008-07-31 2008
4 2008-07-31 2008
Related
I have monthly data and want to convert period columns as.date in r.
In addition, rows are not ordered according to time in data frame
df <- data.frame (period = c("March 2019", "February 2019", "January 2019", "May 2019","April 2019","August 2019","June 2019","July 2019","November 2019","September 2019","October 2019","December 2019"),sales = rnorm(12))
period sales
1 March 2019 1.841711557
2 February 2019 0.403043685
3 January 2019 0.524417978
4 May 2019 0.236378511
5 April 2019 -0.099441313
6 August 2019 0.001731664
7 June 2019 0.792067260
8 July 2019 -0.352379347
9 November 2019 1.174681909
10 September 2019 0.075480279
11 October 2019 -0.258695621
12 December 2019 -1.775315927
Using as.Date with appropriate format on pasted 1 to period, then order.
transform(dat, period=as.Date(paste(1, period), '%d %b %Y')) |>
{\(.) .[order(.$period), ]}()
# period sales
# 1 2019-01-01 0.25542882
# 5 2019-02-01 0.11748736
# 10 2019-03-01 0.98889173
# 6 2019-04-01 0.47499708
# 2 2019-05-01 0.46229282
# 8 2019-06-01 0.90403139
# 12 2019-07-01 0.08243756
# 7 2019-08-01 0.56033275
# 4 2019-09-01 0.97822643
# 9 2019-10-01 0.13871017
# 11 2019-11-01 0.94666823
# 3 2019-12-01 0.94001452
Data:
set.seed(42)
dat <- data.frame(period=sample(paste(month.name, 2019)),
sales=runif(12))
This question already has answers here:
Extract Month and Year From Date in R
(5 answers)
Closed 1 year ago.
Let's say I have this example df dataset
Order.Date
2011-10-20
2011-12-25
2012-04-15
2012-08-23
2013-09-25
I want to extract the month and the year, and be like this
Order.Date Month Year
2011-10-20 October 2011
2011-12-25 December 2011
2012-04-15 April 2012
2012-08-23 August 2012
2013-09-25 September 2013
any solution? anything, can use lubridate or anything
lubridate month and year will work.
as.data.frame(Order.Date) %>%
mutate(Month = lubridate::month(Order.Date, label = FALSE),
Year = lubridate::year(Order.Date))
Order.Date Month Year
1 2011-10-20 10 2011
2 2011-12-25 12 2011
3 2012-04-15 4 2012
4 2012-08-23 8 2012
5 2013-09-25 9 2013
If you want month format as Jan, use month.abb and as January, use month.name
as.data.frame(Order.Date) %>%
mutate(Month = month.abb[lubridate::month(Order.Date, label = TRUE)],
Year = lubridate::year(Order.Date))
Order.Date Month Year
1 2011-10-20 Oct 2011
2 2011-12-25 Dec 2011
3 2012-04-15 Apr 2012
4 2012-08-23 Aug 2012
5 2013-09-25 Sep 2013
as.data.frame(Order.Date) %>%
mutate(Month = month.name[lubridate::month(Order.Date, label = TRUE)],
Year = lubridate::year(Order.Date))
Order.Date Month Year
1 2011-10-20 October 2011
2 2011-12-25 December 2011
3 2012-04-15 April 2012
4 2012-08-23 August 2012
5 2013-09-25 September 2013
You can use format with %B for Month and %Y for Year or using months for Month.
format(as.Date("2011-10-20"), "%B")
#months(as.Date("2011-10-20")) #Alternative
#[1] "October"
format(as.Date("2011-10-20"), "%Y")
#[1] "2011"
I would like to use some sort of regex function for the rest of the data (~2 million rows) to extract relevant date information (ideally have Day, Month, and time in date format since Year is only 2009 for ~2 million rows).
I have a column that looks like so:
ID | created_at
1 Mon Apr 06 22:19:45 PDT 2009
2 Mon Apr 06 22:19:49 PDT 2009
I applied these two functions to extract "day" and delete 'PDT 2009' from the end but now I would like to make the rest of the columns be date format for plotting purposes.
df$Day <- sub("([A-Za-z]+).*", "\\1", df$created_at) ## Extract first word
df$delete <- gsub("\\PDT.*","", df$created_at) ## Delete everything after PDT starts
Desired outcome:
ID | created_at | Month | Day | Time
1 Mon Apr 06 22:19:45 PDT 2009 Apr Mon 22:19:45
2 Mon Apr 06 22:19:49 PDT 2009 Apr Mon 22:19:49
Here is an approach using str_split in companion with map_char:
library(tidyverse)
df %>%
mutate(elements = str_split(created_at, fixed(" "), n=6)) %>%
mutate(Month = map_chr(elements, 2),
Day = map_chr(elements, 1),
Time = map_chr(elements, 4), .keep="unused"
)
output:
ID created_at Month Day Time
1 1 Mon Apr 06 22:19:45 PDT 2009 Apr Mon 22:19:45
2 2 Mon Apr 06 22:19:49 PDT 2009 Apr Mon 22:19:49
You do not need to use any regexes, just regular date formatting is sufficient. you can find a nice overview here or in ?strptime(). You just have to adjust for the separators. This should be easier and more efficient than using regexes, splitting, etc...
Once you have this in native R DateTime formats POSIXlt and POSIXt, you can easily extract all date-related information.
strptime(x = "Mon Apr 06 22:19:45 PDT 2009",
format = "%a %b %d %H:%M:%S PDT %Y")
#> [1] "2009-04-06 22:19:45 CEST"
You can use the following solution too:
library(dplyr)
df %>%
mutate(ID = row_number(),
Month = gsub("(?:[A-Za-z]+)\\s([A-Za-z]+).*", "\\1", created_at, perl = TRUE),
Day = gsub("([A-Za-z]+).*", "\\1", created_at, perl = TRUE),
Time = gsub(".*(\\d{2}:\\d{2}:\\d{2}).*", "\\1", created_at, perl = TRUE)) %>%
relocate(ID)
# A tibble: 2 x 5
ID created_at Month Day Time
<int> <chr> <chr> <chr> <chr>
1 1 Mon Apr 06 22:19:45 PDT 2009 Apr Mon 22:19:45
2 2 Mon Apr 06 22:19:49 PDT 2009 Apr Mon 22:19:49
If you're just after Month, Day, and Time, why not use extract from the tidyverse:
library(tidyr)
df %>%
extract(col = created_at,
into = c('Month', 'Day', 'Time'),
regex = "([A-Za-z]+)\\s([A-Za-z]+)\\s\\d{2}\\s([\\d:]+)")
Month Day Time
1 Mon Apr 22:19:45
2 Mon Apr 22:19:49
Here, we define three capturing groups using the round brackets syntax (...) to identify the substrings we want to extract into the three columns.
If you also need created_atin its original form, just store the results as, say, df1 and use cbind:
cbind(df, df1)
created_at Month Day Time
1 Mon Apr 06 22:19:45 PDT 2009 Mon Apr 22:19:45
2 Mon Apr 06 22:19:49 PDT 2009 Mon Apr 22:19:49
Data:
df <-
data.frame(
created_at = c("Mon Apr 06 22:19:45 PDT 2009","Mon Apr 06 22:19:49 PDT 2009")
)
I think this might help you
Libraries
library(tidyverse)
library(lubridate)
Data
df <-
tibble(
created_at = c("Mon Apr 06 22:19:45 PDT 2009","Mon Apr 06 22:19:49 PDT 2009")
)
Code
df %>%
separate(
col = created_at,
into = c("wday","month","day","time","type","year"),
sep = " ",
remove = FALSE
) %>%
mutate(
day = as.numeric(day),
year = as.numeric(year),
month_num = which(month.abb == month),
time = hms(time),
date = lubridate::make_date(year = year,month = month_num,day = day)
)
Results
# A tibble: 2 x 9
created_at wday month day time type year month_num date
<chr> <chr> <chr> <dbl> <Period> <chr> <dbl> <int> <date>
1 Mon Apr 06 22:19:45 PDT 2009 Mon Apr 6 22H 19M 45S PDT 2009 4 2009-04-06
2 Mon Apr 06 22:19:49 PDT 2009 Mon Apr 6 22H 19M 49S PDT 2009 4 2009-04-06
This question already has an answer here:
R convert string date (e.g. "October 1, 2014") to Date format
(1 answer)
Closed 4 years ago.
I have a dataframe which is about World Cup matches that include date,location,match_name etc.
In this dataframe I want to convert date column as date in format "2018-05-06"
Here is my file;
date match_name price
1 Thu Jun 14 Russia v Saudi Arabia €453.92
2 Fri Jun 15 Egypt v Uruguay €90.00
3 Tue Jun 19 Russia v Egypt €297.45
4 Wed Jun 20 Uruguay v Saudi Arabia €95.00
and here is my expectation;
date match_name price
1 2018-05-14 Russia v Saudi Arabia €453.92
2 2018-05-15 Egypt v Uruguay €90.00
3 2018-05-19 Russia v Egypt €297.45
4 2018-05-20 Uruguay v Saudi Arabia €95.00
This sure is not the easiest way to do it, But I just wanted you to have a quick answer.
library(stringr)
library(dplyr)
Data=data.frame(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20"),match_name=c("a","b","c","d"),price=c(1,2,3,4))
Data$date=as.character(Data$date)
regexp <- "[[:digit:]]+"
Data=mutate(Data,datenum=str_extract(Data$date, regexp))
Data=mutate(Data,monthnum=str_extract(Data$date, regexp))
Data=mutate(Data,monthname=str_extract(Data$date,"Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"))
Data=mutate(Data,monthnum=if(Data$monthname=="Jan")
"01"
else if(Data$monthname=="Feb")
"02"
else if(Data$monthname=="Mar")
"03"
else if(Data$monthname=="Apr")
"04"
else if(Data$monthname=="May")
"05"
else if(Data$monthname=="Jun")
"06"
else if(Data$monthname=="Jul")
"07"
else if(Data$monthname=="Aug")
"08"
else if(Data$monthname=="Sep")
"09"
else if(Data$monthname=="Oct")
"10"
else if(Data$monthname=="Nov")
"11"
else if(Data$monthname=="Dec")
"12"
)
mutate(Data,Final_Date=paste0("2018-",monthnum,"-",datenum))
Resulting in
date match_name price datenum monthnum monthname Final_Date
1 Thu Jun 14 a 1 14 06 Jun 2018-06-14
2 Fri Jun 15 b 2 15 06 Jun 2018-06-15
3 Tue Jun 19 c 3 19 06 Jun 2018-06-19
4 Wed Jun 20 d 4 20 06 Jun 2018-06-20
OK, let's say you have this data.frame:
myDF <-as.data.frame(x=list(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20")))
Which constructs the following data.frame:
date
1 Thu Jun 14
2 Fri Jun 15
3 Tue Jun 19
4 Wed Jun 20
Assuming that each game is in 2018:
#for handling month abbreviations in English:
Sys.setlocale("LC_TIME", "en_US.UTF-8")
myDF$date <- as.Date(paste0(substr(myDF$date,5,10),", 2018"),format="%b %d, %Y")
The resulting myDF:
date
1 2018-06-14
2 2018-06-15
3 2018-06-19
4 2018-06-20
You can change 2018 to any year you like where necessary.
To convert a variable "date" to the format '2018-05-14', you need to perform the following function:
conv_date <- function(var, year){
var <- as.Date(paste0(var, " ", year), '%a %b %d %Y')
return(var)
}
where:
var - variable in your data table (i.e 'date')
year - the year you need
Example:
yours_df$date <- conv_date(yours_df$date, 2018)
I have a dataset with dates in following format:
Initial:
Jan-2015 Apr-2013 Jun-2014 Jan-2015 Jan-2016 Jan-2015 Jan-2016 Jan-2015 Apr-2012 Nov-2012 Jun-2013 Sep-2013
Final:
Feb-2014 Jan-2013 Sep-2014 Apr-2013 Sep-2014 Mar-2013 Aug-2012 Apr-2012 Oct-2012 Oct-2013 Jun-2014 Oct-2013
I would like to perform these steps:
create dummy variables for Month and Year
Subtract these dates from another dates to find out duration (final- initials) in months
I would like to do these in R?
You could use as.yearmon from the zoo package for this.
library(zoo)
12 * (as.yearmon("Jan-2015", "%b-%Y") - as.yearmon("Feb-2014", "%b-%Y"))
# result
# [1] 11
To expand on #neilfws answer, you can use the month and year functions from the lubridate package to create your dummy variables with the month and year in your data frame.
Here is the code:
library(lubridate)
library(zoo)
df <- data.frame(Initial = c("Jan-2015", "Apr-2013", "Jun-2014", "Jan-2015", "Jan-2016", "Jan-2015",
"Jan-2016", "Jan-2015", "Apr-2012", "Nov-2012", "Jun-2013", "Sep-2013"),
Final = c("Feb-2014", "Jan-2013", "Sep-2014", "Apr-2013", "Sep-2014", "Mar-2013",
"Aug-2012", "Apr-2012", "Oct-2012", "Oct-2013", "Jun-2014", "Oct-2013"))
df$Initial <- as.character(df$Initial)
df$Final <- as.character(df$Final)
df$Initial <- as.yearmon(df$Initial, "%b-%Y")
df$Final <- as.yearmon(df$Final, "%b-%Y")
df$month_initial <- month(df$Initial)
df$year_intial <- year(df$Initial)
df$month_final <- month(df$Final)
df$year_final <- year(df$Final)
df$Difference <- 12*(df$Initial-df$Final)
And here is the final data.frame:
> head(df)
Initial Final month_initial year_intial month_final year_final Difference
1 Jan 2015 Feb 2014 1 2015 2 2014 11
2 Apr 2013 Jan 2013 4 2013 1 2013 3
3 Jun 2014 Sep 2014 6 2014 9 2014 -3
4 Jan 2015 Apr 2013 1 2015 4 2013 21
5 Jan 2016 Sep 2014 1 2016 9 2014 16
6 Jan 2015 Mar 2013 1 2015 3 2013 22
Hope this helps!