Extract date and time from datetime field in R - r

I have a dateset that looks like this, the readingdate is in POSIXct format. I want to extract date in one field and time in another field in R. I'm trying to avoid using base R as much as possible so if you can do this that'ld be great (lubridate ). I want newly extracted fields to be in the right format because my ultimate goal is to plot the time(x) against total items sold (y) in order to determine what time of the day the highest sale is made. Thanks for your help.

If I understood well, R can read correctly your dates and times as you import your data (because they are in POSIXct format), but you can not extract the date and the time in the right format from your date-time column.
Considering that you have a data.frame in R, like this:
date_time Sold
1 2020-01-01 03:16:01 2
2 2020-01-02 02:15:12 2
3 2020-01-03 08:26:11 3
4 2020-01-04 09:29:14 2
5 2020-01-05 12:06:06 1
6 2020-01-06 08:08:11 3
Lubridate does not offer a function to extract the time component, so you have to extract piece by piece with the minute(), hour() and second() functions. Then you can just concatenate these components with paste() function. Now, with the dates, you can use the date() function to extract then, after that, you use the format() function to format these dates in the way you want.
library(lubridate)
library(dplyr)
library(magrittr)
tab <- tab %>%
mutate(
date = as.Date(date_time),
hour = hour(date_time),
minute = minute(date_time),
second = second(date_time)
) %>%
mutate(
format_date = format(date, "%m/%d/%Y"),
format_hour = paste(hour, minute, second, sep = ":")
)
Resulting this:
tab %>% select(format_date, format_hour) %>% head()
format_date format_hour
1 01/01/2020 12:4:23
2 01/02/2020 3:19:13
3 01/03/2020 8:6:24
4 01/04/2020 6:28:2
5 01/05/2020 2:16:20
6 01/06/2020 12:8:28

Related

How to convert time to standard format and calculate time difference

newdf=data.frame(date=as.Date(c("2021-01-04","2021-01-05","2021-01-06","2021-01-07")),
time=c("10:32:29","11:25","12:18:42","09:58"))
This is my data frame. I want to calculate time difference between two consecutive days in hours. Could you please suggest a method to calculate? Note, some time values do not contain seconds. So, first we have to convert it to standard form. Could you please give me a method to solve all these problems. This is completely R programming.
Paste date and time together in one column, use parse_date_time to change the time value in standard format (Posixct) and use difftime to calculate difference between consecutive time in hours.
library(dplyr)
library(tidyr)
library(lubridate)
newdf %>%
unite(datetime, date, time, sep = ' ') %>%
mutate(datetime = parse_date_time(datetime, c('Ymd HMS', 'Ymd HM')),
difference_in_hours = round(as.numeric(difftime(datetime,
lag(datetime), 'hours')), 2))
# datetime difference_in_hours
#1 2021-01-04 10:32:29 NA
#2 2021-01-05 11:25:00 24.88
#3 2021-01-06 12:18:42 24.90
#4 2021-01-07 09:58:00 21.66

R generate one random date per month between defined interval

I'd like to generate a list of random dates between a defined interval using R such that there is only one date for each month present in the interval.
I've tried using a variation of the code from another solution, but I can't seem to limit it to one date per month. I get multiple dates for a given month.
Here's my attempt
df = data.frame(Date=c(sample(seq(as.Date('2020/01/01'), as.Date('2020/09/01'), by="day"), 9)))
But I seem to get more than one date for a given month. Any inputs would be highly appreciated.
First I create a table, containing all the possible dates that you want to sample. And I store in a column of this table, the index, or the number of the month of each date, using the month() function from lubridate package.
library(lubridate)
dates <- data.frame(
days = seq(as.Date('2020/01/01'), as.Date('2020/09/01'), by="day")
)
dates$month <- month(dates$day)
Then, the idea is to create a loop with lapply() function. In each loop, I select in the table dates, only the dates of that month, and I paste these months in to the sample() function.
results <- lapply(1:9, function(x){
sample_dates <- dates$days[dates$month == x]
return(sample(sample_dates, size = 1))
})
df <- data.frame(
dates = as.Date(unlist(results), origin = "1970-01-01")
)
Resulting this:
dates
1 2020-01-19
2 2020-02-06
3 2020-03-26
4 2020-04-13
5 2020-05-16
6 2020-06-29
7 2020-07-06
8 2020-08-21
9 2020-09-01
In other words, the ideia of this approach is to provide selected dates to sample() function on each loop. So it will sample, or choose a date, only for that specific month, on each loop.
How about this:
First you create a function that returns a random day from month 'month'
Then you lapply for all months you need, 1 to 9
x <- function(month){
(Date=c(sample(seq(as.Date(paste0('2020/',month,'/01')), as.Date(paste0('2020/',month+1,'/01')), by="day"), 1)))
}
df <- data.frame(
dates = as.Date(unlist(lapply(1:9,x)), origin = "1970-01-01")
)
If you also want the results to be random (not January, February, March...) you only need to add a sample:
df <- data.frame(
dates = as.Date(unlist(sample(lapply(1:9,x))), origin = "1970-01-01")
)

Using lubridate with multiple date formats

I have a column of dates that was stored in the format 8/7/2001, 10/21/1990, etc. Two values are just four-digit years. I converted the entire column to class Date using the following code.
lubridate::parse_date_time(eventDate, orders = c('mdy', 'Y'))
It works great, except the values that were just years are converted to yyyy-01-01 and I want them to just be yyyy. Is there a way to keep lubridate from adding on any information that wasn't already there?
Edit: Code to create data frame
id = (1:5)
eventDate = c("10/7/2001", "1989", NA, "5/5/2016", "9/18/2011")
df <- data.frame(id, eventDate)
I do not think is possible to convert your values to Dates, and keep the "yyyy" values intact. And by transforming your "yyyy" values into "yyyy-01-01" the lubridate is doing the right thing. Because dates have order, and if you have other values in your column that have days and months defined, all the other values needs to have these components too.
For example. If I produce the data.frame below. If I ask R, to order the table, according to the date column, the date in the first line ("2020"), comes before the value in the second row ("2020-02-28")? Or comes after it? The value "2020" being the year of 2020, it can actually means every possible day in this year, so how R should treate it? By adding the first day of the year, lubridate is defining these components, and avoiding that R get confused by it.
dates <- c("2020", "2020-02-28", "2020-02-20", "2020-01-10", "2020-05-12")
id <- 1:5
df <- data.frame(
id,
dates
)
id dates
1 1 2020
2 2 2020-02-28
3 3 2020-02-20
4 4 2020-01-10
5 5 2020-05-12
So if you want to mantain the "yyyy" intact, is very likely that they should not rest in your eventDate column, with other values that are in a different structure ("dd/mm/yyyy"). Now if is really necessary to mantain these values intact, I think is best, to keep the values of eventDate column as characters, and store these values as Dates in another column, like this:
df$as_dates <- lubridate::parse_date_time(df$eventDate, orders = c('mdy', 'Y'))
id eventDate as_dates
1 1 10/7/2001 2001-10-07
2 2 1989 1989-01-01
3 3 <NA> <NA>
4 4 5/5/2016 2016-05-05
5 5 9/18/2011 2011-09-18

How to group by timestamp in UTC by day in R

So I have this sample of UTC timestamps and a bunch of other data. I would like to group my data by date. This means I do not need hours/mins/secs and would like to have a new df which shows the number of actions grouped together.
I tried using lubridate to pull out the date but I cant get the origin right.
DATA
hw0 <- read.table(text =
'ID timestamp action
4f.. 20160305195246 visitPage
75.. 20160305195302 visitPage
77.. 20160305195312 checkin
42.. 20160305195322 checkin
8f.. 20160305195332 searchResultPage
29.. 20160305195342 checkin', header = T)
Here's what I tried
library(dplyr)
library(lubridate) #this will allow us to extract the date
daily <- hw0 %>%
mutate(date=date(as.POSIXct(timestamp),origin='1970-01-01'))
daily <- daily %>%
group_by(date)
I am unsure what to use as an origin and my error says this value is incorrect. Ultimately, I expect the code to return a new df which features a variable (date) with a list of unique dates as well as how many of the different actions there are in each day.
Assuming the numbers at the end are 24 hour time based, you can use:
daily = hw0 %>%
mutate(date = as.POSIXct(as.character(timestamp), format = '%Y%m%d%H%M%S'))
You can use as.Date instead if you want to get rid of the hour times. You need to supply the origin when you give a numeric argument, which is interpreted as the number of days since the origin. In your case you should just give it a character vector and supply the date format.
Lubridate also has the ymd_hms() function that can extract the date, and the floor_date() function that would help.
library(tidyverse)
daily <- hw0 %>%
mutate(time = ymd_hms(timestamp, tz = 'UTC'),
date = floor_date(time, unit = 'day'))
lubridate also has parse_date_time which seems to be a nice mix of the above two solutions.
library(tidyverse)
library(lubridate)
hw0 %>%
mutate(timestamp = parse_date_time(timestamp, order = "%Y%m%d%H%M%S"))
ID timestamp action
1 4f.. 2016-03-05 19:52:46 visitPage
2 75.. 2016-03-05 19:53:02 visitPage
3 77.. 2016-03-05 19:53:12 checkin
4 42.. 2016-03-05 19:53:22 checkin
5 8f.. 2016-03-05 19:53:32 searchResultPage
6 29.. 2016-03-05 19:53:42 checkin

How do I manipulate a datetime variable imported from Excel into R

I am importing multiple Excel sheets to R using readxl. Each of these sheets contains observations of transactions which include DateOfEvent and TimeOfEvent fields.
When I import the time field, R converts it to a POSIXct object based on the date being from Excel Day 0 - i.e. 1899-12-31 0:0:0
e.g. dat <- data.frame(date=Sys.Date()+0:1, time=as.POSIXct(c(10,11), origin="1899-12-31"))
With the data in a data frame, using a dplyr step to clean my data, how would I -
Use lubridate to recode the date part of the variable using the DateOfEvent value?
Keep the times but make them independent of date so that I can compare events occurring in time buckets across different days (i.e. drop the 1899 date but format the date so that I can perform cross day comparisons)?
Use update() to change the year in time.
Use hms::as.hms() if you want to extract just the time object from time (this will convert to UTC):
library(tidyverse)
dat %>%
mutate(time = update(time,
year = year(date),
month = month(date),
day = day(date)),
hms = hms::as.hms(time))
date time hms
1 2018-06-02 2018-06-02 16:00:10 23:00:10
2 2018-06-03 2018-06-03 16:00:11 23:00:11

Resources