R extract Date and Time Info - r

I have a data.frame that looks like this:
> df1
Date Name Surname Amount
2015-07-24 John Smith 200
I want to extrapolate all the infos out of the Date into new columns, so I can get to this:
> df2
Date Year Month Day Day_w Name Surname Amount
2015-07-24 2015 7 24 Friday John Smith 200
So now I'd like to have Year, Month, Day and Day of the Week. How can I do that? When I try to first make the variable a date using as.Date the data.frame gets messed up and the Date all become NA (and no new columns). Thanks for your help!

Here's a simple and efficient solution using the devel version of data.table and its new tstrsplit function which will perform the splitting operation only once and also update your data set in place.
library(data.table)
setDT(df1)[, c("Year", "Month", "Day", "Day_w") :=
c(tstrsplit(Date, "-", type.convert = TRUE), wday(Date))]
df1
# Date Name Surname Amount Year Month Day Day_w
# 1: 2015-07-24 John Smith 200 2015 7 24 6
Note that I've used a numeric representation of the week days because there is an efficient built in wday function for that in the data.table package, but you can easily tweak it if you really need to using format(as.Date(Date), format = "%A") instead.
In order to install the devel version use the following
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)

Maybe this helps:
df2 <- df1
dates <- strptime(as.character(df1$Date),format="%Y-%m-%d")
df2$Year <- format(dates, "%Y")
df2$Month <- format(dates, "%m")
df2$Day <- format(dates, "%d")
df2$Day_w <- format(dates, "%a")
Afterwards you can rearrange the order of columns in df2as you desire.

Related

Creating a day of the week variable in R from 'YYYYMMDD'

I have my dates formatted as 'YYYYMMDD' like '20150531' but now I want to separate my data into categories for the 7 days of the week by creating another variable called Day. How could I do this in R?
You can try weekdays() function from base R:
#Data
df <- data.frame(Date='2015031',stringsAsFactors = F)
df$Weekday <- weekdays(as.Date(df$Date,'%Y%m%d'))
Output:
df
Date Weekday
1 2015031 Sunday
We can convert to Date class and then use format to get the weekday name in base R
df1$Weekday <- format(as.Date(df1$date , '%Y%m%d'), '%a')
-output
df1
# date Weekday
#1 2015031 Sun
data
df1 <- data.frame(date = '2015031')

R: How to remove the day from a date? [duplicate]

This question already has answers here:
Format Date to Year-Month in R
(3 answers)
Closed 2 years ago.
I have a bunch of dates in a df column in the following format: dd.mm.yyyy
I want it to look like this: 01/2020 (mm.yyyy)
How can I remove the day from all of the dates?
Use format to specify the date format you'd like
date <- as.Date("13/01/2020", format = "%d/%m/%Y")
format(date, "%m/%Y")
[1] "01/2020"
Edit - applying to dataframe column
dates <- c("13/01/2020", "17/02/2015", "13/03/2013")
df <- data.frame(dates, stringsAsFactors = FALSE)
df$dates <- as.Date(df$dates, format = "%d/%m/%Y")
df$dates_format <- format(df$dates, "%m/%Y")
df
dates dates_format
1 2020-01-13 01/2020
2 2015-02-17 02/2015
3 2013-03-13 03/2013
Besides format by #Greg, another option is using sub like below
> sub(".*?/","","13/01/2020")
[1] "01/2020"
Here is a solution using lubridate.
library(lubridate)
#Set the desired format (mm-yyyy) as my_stamp
my_stamp<-stamp( "02-2019",
orders = "my")
#A df with a column full of dates
df <- data.frame(dates = c("30/04/2020","29/03/2020","28/02/2020"))
#Change the column from string to date format
df$dates<-dmy(df$dates)
#Apply the format you desire to the dates (i.e., only month and year)
df$dates<-my_stamp(df$dates)
# dates
#1 04-2020
#2 03-2020
#3 02-2020
There are explicit date formatting options in R (see answer by Greg). Another option would be to separate the date into 3 columns, and then recombine the month and year, putting a / in between. Note this leaves the new date column in character format, which you may want to change depending on your needs.
library(tidyr)
df <- data.frame(date = "13/01/2020")
df <- separate(df, date, into = c("day","month","year"), sep = "/")
df$newdate <- paste(df$month, df$year, sep = "/")

Replace characters with dates in dataframe in r

I have a dataframe with dates stored as strings. The conversion with strptime works fine when I test it in the terminal, but when I want to assign the date in the original cell, I get an error:
provided 11 variables to replace 1 variables
This must be due to the fact that the Object created by strptime() POSIXlt is a list.
How can I assign that object into the cell? I later want to order the dataframe by the date column.
I'm sorry that I can't share the code, due to privacy restrictions.
Edit: This snippet should produce the same error
#creating dataframe
x <- c( "20.11.2019 10:12:15", "21.10.2019 10:12:16", "20.10.2019 10:12:20")
y <- c( "1234", "1238", "1250")
df <- data.frame( "date" = x, "id" = y)
df[order(df$date),] #ordering by date
df #showing that dates get ordered 'incorrectly'
df[,1] = strptime(df[,1], "%d.%m.%Y %H:%M:%S") #trying to replace character with dates while converting
#afterwards I want to order them again 'correctly'
Personally I would use dplyr to mutate the values of the original cell. In combination with lubridate it works for me (at least I think this what you wanted):
df <- df %>% mutate(date =ymd_hms(strptime(date, "%d.%m.%Y %H:%M:%S"))) %>% arrange(date)
date id
1 2019-10-20 10:12:20 1250
2 2019-10-21 10:12:16 1238
3 2019-11-20 10:12:15 1234
This simple adjustment also works. Change df[,1] to df$date.
df$date = strptime(df[,1], "%d.%m.%Y %H:%M:%S")

Filtering on a date using dplyr without changing the variable format

I want use a list of years to filter a database by date
years<-c("2014")
yearsdata <- data.frame(animal=c("cow","pig"),
mydate=c(as.Date("2015-01-01"),
as.Date("2014-01-01")))
yearsdata %>%
mutate(mydate =format(mydate, "%Y") %>%
as.character()) %>%
filter(is.null(years) | mydate %in% years)
The above code works and lets me filter my dataset but it also formats the date column. Is there a way to get my filter results without having the format of the date column change in the finished subset dataframe?
If you're up for using the lubridate package you could do:
library("lubridate")
yearsdata %>%
filter(is.null(years) | year(mydate) %in% years)
Which gives:
# animal mydate
# 1 pig 2014-01-01
All these pipes gives me headache, I would just do
library(data.table)
setDT(yearsdata)[is.null(years) | year(mydate) %in% years]
# animal mydate
# 1: pig 2014-01-01

How can I convert timestamps into year?

I have a dataframe with this kind of column data.
1906-02-20
1906-02-21
I want to create separate columns with years, months and days.
Intended output:
1906 02 20
1906 02 21
I have used things like strptime and lubridate before. But not able to figure this out.
format(as.Date('2014-12-31'),'%Y %m %d')
(obviously you want to replace as.Date('2014-12-31') with your date vector).
format() converts dates to stings given the format string provided. for the individual year, month and date values, you want:
myData$year <- format(as.Date('2014-12-31'),'%Y')
#> "2014"
myData$month <- format(as.Date('2014-12-31'),'%m')
#> "12"
myData$day <- format(as.Date('2014-12-31'),'%d')
#> "31"
I often refer to this page when I need to look up the meaning of the format strings.
Look at this:
X <- data.frame()
X <- rbind(strsplit(x = "1906-02-20", split = "-")[[1]])
colnames(X) <- c("year", "month", "day")
Try the separate function in the tidyr package
library(tidyr)
separate(df, "V1", into = c("year", "month", "day"), sep = "-")
# year month day
# 1 1906 02 20
# 2 1906 02 21
Data
df <- read.table(text = "
1906-02-20
1906-02-21
")

Resources