I have a data frame with a date-time column. I want to split the column into multiple columns: year, month, day, time_12, time_24, and timezone.
The time_12 and time_24 need to be character vectors using the 12-hour convention and 24-hour convention, respectively. How could I accomplish this?
library(tidyverse)
library(lubridate)
# data frame
myDates <- ymd_hm(c('2018-October-31 8:00 PM',
'2018Oct31T20:00'))
df <- data.frame(datetime = myDates)
# split datetime into parts
df$year <- year(df$datetime)
df$month <- month(df$datetime)
df$day <- day(df$datetime)
df$time_12 <- '8:00 PM' ### need help
df$time_24 <- '20:00' ### need help
df$tz <- tz(df$datetime)
df
# datetime year month day time_12 time_24 tz
# 1 2018-10-31 20:00:00 2018 10 31 8:00 PM 20:00 UTC
# 2 2018-10-31 20:00:00 2018 10 31 8:00 PM 20:00 UTC
sapply(df, class)
# $datetime
# [1] "POSIXct" "POSIXt"
#
# $year
# [1] "numeric"
#
# $month
# [1] "numeric"
#
# $day
# [1] "integer"
#
# $time_12
# [1] "character"
#
# $time_24
# [1] "character"
#
# $tz
# [1] "character"
We can use format to extract the correct format
library(dplyr)
df %>%
mutate(year = year(datetime),
month = month(datetime),
day = day(datetime),
time_12 = format(datetime, "%I:%M %p"),
time_24 = format(datetime, '%H:%M'),
tz = tz(datetime))
# datetime year month day time_12 time_24 tz
#1 2018-10-31 20:00:00 2018 10 31 08:00 PM 20:00 UTC
#2 2018-10-31 20:00:00 2018 10 31 08:00 PM 20:00 UTC
Related
I currently have a dataset with multiple different time formats(AM/PM, numeric, 24hr format) and I'm trying to turn them all into 24hr format. Is there a way to standardize mixed format columns?
Current sample data
time
12:30 PM
03:00 PM
0.961469907
0.913622685
0.911423611
09:10 AM
18:00
Desired output
new_time
12:30:00
15:00:00
23:04:31
21:55:37
21:52:27
09:10:00
18:00:00
I know how to do them all individually(an example below), but is there a way to do it all in one go because I have a large amount of data and can't go line by line?
#for numeric time
> library(chron)
> x <- c(0.961469907, 0.913622685, 0.911423611)
> times(x)
[1] 23:04:31 21:55:37 21:52:27
The decimal times are a pain but we can parse them first, feed them back as a character then use lubridate's parse_date_time to do them all at once
library(tidyverse)
library(chron)
# Create reproducible dataframe
df <-
tibble::tibble(
time = c(
"12:30 PM",
"03:00 PM",
0.961469907,
0.913622685,
0.911423611,
"09:10 AM",
"18:00")
)
# Parse times
df <-
df %>%
dplyr::mutate(
time_chron = chron::times(as.numeric(time)),
time_chron = if_else(
is.na(time_chron),
time,
as.character(time_chron)),
time_clean = lubridate::parse_date_time(
x = time_chron,
orders = c(
"%I:%M %p", # HH:MM AM/PM 12 hour format
"%H:%M:%S", # HH:MM:SS 24 hour format
"%H:%M")), # HH:MM 24 hour format
time_clean = hms::as_hms(time_clean)) %>%
select(-time_chron)
Which gives us
> df
# A tibble: 7 × 2
time time_clean
<chr> <time>
1 12:30 PM 12:30:00
2 03:00 PM 15:00:00
3 0.961469907 23:04:31
4 0.913622685 21:55:37
5 0.911423611 21:52:27
6 09:10 AM 09:10:00
7 18:00 18:00:00
Let's say that I have a date in R and it's formatted as follows.
date
2012-02-01
2012-02-01
2012-02-02
Is there any way in R to add another column with the day of the week associated with the date? The dataset is really large, so it would not make sense to go through manually and make the changes.
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
So after adding the days, it would end up looking like:
date day
2012-02-01 Wednesday
2012-02-01 Wednesday
2012-02-02 Thursday
Is this possible? Can anyone point me to a package that will allow me to do this?
Just trying to automatically generate the day by the date.
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
df$day <- weekdays(as.Date(df$date))
df
## date day
## 1 2012-02-01 Wednesday
## 2 2012-02-01 Wednesday
## 3 2012-02-02 Thursday
Edit: Just to show another way...
The wday component of a POSIXlt object is the numeric weekday (0-6 starting on Sunday).
as.POSIXlt(df$date)$wday
## [1] 3 3 4
which you could use to subset a character vector of weekday names
c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday")[as.POSIXlt(df$date)$wday + 1]
## [1] "Wednesday" "Wednesday" "Thursday"
Use the lubridate package and function wday:
library(lubridate)
df$date <- as.Date(df$date)
wday(df$date, label=TRUE)
[1] Wed Wed Thurs
Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
Look up ?strftime:
%A Full weekday name in the current locale
df$day = strftime(df$date,'%A')
Let's say you additionally want the week to begin on Monday (instead of default on Sunday), then the following is helpful:
require(lubridate)
df$day = ifelse(wday(df$time)==1,6,wday(df$time)-2)
The result is the days in the interval [0,..,6].
If you want the interval to be [1,..7], use the following:
df$day = ifelse(wday(df$time)==1,7,wday(df$time)-1)
... or, alternatively:
df$day = df$day + 1
This should do the trick
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
dow <- function(x) format(as.Date(x), "%A")
df$day <- dow(df$date)
df
#Returns:
date day
1 2012-02-01 Wednesday
2 2012-02-01 Wednesday
3 2012-02-02 Thursday
start = as.POSIXct("2017-09-01")
end = as.POSIXct("2017-09-06")
dat = data.frame(Date = seq.POSIXt(from = start,
to = end,
by = "DSTday"))
# see ?strptime for details of formats you can extract
# day of the week as numeric (Monday is 1)
dat$weekday1 = as.numeric(format(dat$Date, format = "%u"))
# abbreviated weekday name
dat$weekday2 = format(dat$Date, format = "%a")
# full weekday name
dat$weekday3 = format(dat$Date, format = "%A")
dat
# returns
Date weekday1 weekday2 weekday3
1 2017-09-01 5 Fri Friday
2 2017-09-02 6 Sat Saturday
3 2017-09-03 7 Sun Sunday
4 2017-09-04 1 Mon Monday
5 2017-09-05 2 Tue Tuesday
6 2017-09-06 3 Wed Wednesday
form comment of JStrahl format(as.Date(df$date),"%w"), we get number of current day :
as.numeric(format(as.Date("2016-05-09"),"%w"))
Let's say that I have a date in R and it's formatted as follows.
date
2012-02-01
2012-02-01
2012-02-02
Is there any way in R to add another column with the day of the week associated with the date? The dataset is really large, so it would not make sense to go through manually and make the changes.
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
So after adding the days, it would end up looking like:
date day
2012-02-01 Wednesday
2012-02-01 Wednesday
2012-02-02 Thursday
Is this possible? Can anyone point me to a package that will allow me to do this?
Just trying to automatically generate the day by the date.
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
df$day <- weekdays(as.Date(df$date))
df
## date day
## 1 2012-02-01 Wednesday
## 2 2012-02-01 Wednesday
## 3 2012-02-02 Thursday
Edit: Just to show another way...
The wday component of a POSIXlt object is the numeric weekday (0-6 starting on Sunday).
as.POSIXlt(df$date)$wday
## [1] 3 3 4
which you could use to subset a character vector of weekday names
c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday")[as.POSIXlt(df$date)$wday + 1]
## [1] "Wednesday" "Wednesday" "Thursday"
Use the lubridate package and function wday:
library(lubridate)
df$date <- as.Date(df$date)
wday(df$date, label=TRUE)
[1] Wed Wed Thurs
Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
Look up ?strftime:
%A Full weekday name in the current locale
df$day = strftime(df$date,'%A')
Let's say you additionally want the week to begin on Monday (instead of default on Sunday), then the following is helpful:
require(lubridate)
df$day = ifelse(wday(df$time)==1,6,wday(df$time)-2)
The result is the days in the interval [0,..,6].
If you want the interval to be [1,..7], use the following:
df$day = ifelse(wday(df$time)==1,7,wday(df$time)-1)
... or, alternatively:
df$day = df$day + 1
This should do the trick
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
dow <- function(x) format(as.Date(x), "%A")
df$day <- dow(df$date)
df
#Returns:
date day
1 2012-02-01 Wednesday
2 2012-02-01 Wednesday
3 2012-02-02 Thursday
start = as.POSIXct("2017-09-01")
end = as.POSIXct("2017-09-06")
dat = data.frame(Date = seq.POSIXt(from = start,
to = end,
by = "DSTday"))
# see ?strptime for details of formats you can extract
# day of the week as numeric (Monday is 1)
dat$weekday1 = as.numeric(format(dat$Date, format = "%u"))
# abbreviated weekday name
dat$weekday2 = format(dat$Date, format = "%a")
# full weekday name
dat$weekday3 = format(dat$Date, format = "%A")
dat
# returns
Date weekday1 weekday2 weekday3
1 2017-09-01 5 Fri Friday
2 2017-09-02 6 Sat Saturday
3 2017-09-03 7 Sun Sunday
4 2017-09-04 1 Mon Monday
5 2017-09-05 2 Tue Tuesday
6 2017-09-06 3 Wed Wednesday
form comment of JStrahl format(as.Date(df$date),"%w"), we get number of current day :
as.numeric(format(as.Date("2016-05-09"),"%w"))
How to convert a string which is Saturday, 5 Oct 2013 20:31:59 to a datetime format 2013-10-05 Saturday 20:31:59? Thanks. Or how to get the year, month, date, day of the week, hour, minute, second values from the string?
You need to use the relevant format specification when you create the time object from the string, eg:
(x <- as.POSIXct("Saturday, 5 Oct 2013 20:31:59", format="%A, %d %b %Y %H:%M:%S"))
[1] "2013-10-05 20:31:59 BST"
Look at ?strftime to see the format specifications, and how to extract specific parts of a datetime.
#your desired format
format(x, "%Y-%m-%d %A %H:%M:%S")
[1] "2013-10-05 Saturday 20:31:59"
#only the year
format(x,"%Y")
[1] "2013"
> now <- Sys.time()
> now
[1] "2014-01-16 16:58:23 IST"
> as.POSIXlt(as.character(now),tz="GMT")
[1] "2014-01-16 17:05:24 GMT"
> str(as.POSIXlt(now))
POSIXlt[1:1], format: "2014-01-16 16:58:23"
> unclass(as.POSIXlt(now))
$sec
[1] 23.1636
$min
[1] 58
$hour
[1] 16
$mday
[1] 16
$mon
[1] 0
$year
[1] 114
$wday
[1] 4
$yday
[1] 15
$isdst
[1] 0
Using the lubridate package.
library(lubridate)
x <- "Saturday, 5 Oct 2013 20:31:59"
dmy_hms(x)
## [1] "2013-10-05 20:31:59 UTC"
library(lubridate)
R> date <- now()
R> year(date)
use the below accessor for other
Date component Accessor
Year year()
Month month()
Week week()
Day of year yday()
Day of month mday()
Day of week wday()
Hour hour()
Minute minute()
Second second()
Time zone tz()
I know this has been asked several times and I looked at the questions and followed the suggestions. However, I couldn't solve this one.
The datetime.csv can be found on https://www.dropbox.com/s/6bvhk4kei4pg8zq/datetime.csv
My code looks like:
jd1 <- read.csv("datetime.csv")
head(jd1)
Date Time
1 20100101 0:00
2 20100101 1:00
3 20100101 2:00
4 20100101 3:00
5 20100101 4:00
6 20100101 5:00
sapply(jd1,class)
> sapply(jd1,class)
Date Time
"integer" "factor"
jd1 <- transform(jd1, timestamp=format(as.POSIXct(paste(Date, Time)), "%Y%m%d %H:%M:%S"))
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I tried the solution suggested by rcs on Converting two columns of date and time data to one but this seems to give an error.
Any help is highly appreciated.
Thanks.
The format string you're passing to format includes %S which you don't have. But that won't fix the error since its coming from as.POSIXct. You need to pass the format string there instead and remove the call to the format function.
foo <- transform(jd1, timestamp=as.POSIXct(paste(Date, Time), format="%Y%m%d %H:%M"))
str(foo)
Compare this to:
bar <- transform(jd1, timestamp=as.POSIXct(paste(Date, Time), format="%Y%m%d %H:%M:%S"))
str(bar)
And the result of calling format:
baz <- transform(jd1, timestamp=format(as.POSIXct(paste(Date, Time), format="%Y%m%d %H:%M"), format='%Y%m%d %H:%M:%S'))
str(baz)
if it's just this file you don't even need to read it as csv. Following will do
# if you are reading just timestamps, you may want to read it as just one column
jd1 <- read.table("datetime.csv", header = TRUE, colClasses = c("character"))
jd1$timestamp <- as.POSIXct(jd1$Date.Time, format = "%Y%m%d,%H:%M")
head(jd1)
## Date.Time timestamp
## 1 20100101,0:00 2010-01-01 00:00:00
## 2 20100101,1:00 2010-01-01 01:00:00
## 3 20100101,2:00 2010-01-01 02:00:00
## 4 20100101,3:00 2010-01-01 03:00:00
## 5 20100101,4:00 2010-01-01 04:00:00
## 6 20100101,5:00 2010-01-01 05:00:00
# if you must read it as seperate columns as you may have other columns in your file
jd2 <- read.csv("datetime.csv", header = TRUE, colClasses = c("character", "character"))
jd2$timestamp <- as.POSIXct(paste(jd2$Date, jd2$Time, sep = " "), format = "%Y%m%d %H:%M")
head(jd2)
## Date Time timestamp
## 1 20100101 0:00 2010-01-01 00:00:00
## 2 20100101 1:00 2010-01-01 01:00:00
## 3 20100101 2:00 2010-01-01 02:00:00
## 4 20100101 3:00 2010-01-01 03:00:00
## 5 20100101 4:00 2010-01-01 04:00:00
## 6 20100101 5:00 2010-01-01 05:00:00
Arun's comment prompted me to do some benchmarking..
jd2 <- read.csv("datetime.csv", header = TRUE, colClasses = c("character", "character"))
library(microbenchmark)
microbenchmark(as.POSIXct(paste(jd2$Date, jd2$Time, sep = " "), format = "%Y%m%d %H:%M"), as.POSIXct(do.call(paste, c(jd2[c("Date", "Time")])), format = "%Y%m%d %H:%M"),
transform(jd2, timestamp = as.POSIXct(paste(Date, Time), format = "%Y%m%d %H:%M")), times = 100)
## Unit: milliseconds
## expr min lq median uq max neval
## as.POSIXct(paste(jd2$Date, jd2$Time, sep = " "), format = "%Y%m%d %H:%M") 18.84720 18.87736 18.89542 18.93307 20.99021 100
## as.POSIXct(do.call(paste, c(jd2[c("Date", "Time")])), format = "%Y%m%d %H:%M") 18.94440 18.97917 18.99492 19.02220 21.07320 100
## transform(jd2, timestamp = as.POSIXct(paste(Date, Time), format = "%Y%m%d %H:%M")) 19.05581 19.10230 19.12612 19.16877 21.27490 100