I currently have a dataset with multiple different time formats(AM/PM, numeric, 24hr format) and I'm trying to turn them all into 24hr format. Is there a way to standardize mixed format columns?
Current sample data
time
12:30 PM
03:00 PM
0.961469907
0.913622685
0.911423611
09:10 AM
18:00
Desired output
new_time
12:30:00
15:00:00
23:04:31
21:55:37
21:52:27
09:10:00
18:00:00
I know how to do them all individually(an example below), but is there a way to do it all in one go because I have a large amount of data and can't go line by line?
#for numeric time
> library(chron)
> x <- c(0.961469907, 0.913622685, 0.911423611)
> times(x)
[1] 23:04:31 21:55:37 21:52:27
The decimal times are a pain but we can parse them first, feed them back as a character then use lubridate's parse_date_time to do them all at once
library(tidyverse)
library(chron)
# Create reproducible dataframe
df <-
tibble::tibble(
time = c(
"12:30 PM",
"03:00 PM",
0.961469907,
0.913622685,
0.911423611,
"09:10 AM",
"18:00")
)
# Parse times
df <-
df %>%
dplyr::mutate(
time_chron = chron::times(as.numeric(time)),
time_chron = if_else(
is.na(time_chron),
time,
as.character(time_chron)),
time_clean = lubridate::parse_date_time(
x = time_chron,
orders = c(
"%I:%M %p", # HH:MM AM/PM 12 hour format
"%H:%M:%S", # HH:MM:SS 24 hour format
"%H:%M")), # HH:MM 24 hour format
time_clean = hms::as_hms(time_clean)) %>%
select(-time_chron)
Which gives us
> df
# A tibble: 7 × 2
time time_clean
<chr> <time>
1 12:30 PM 12:30:00
2 03:00 PM 15:00:00
3 0.961469907 23:04:31
4 0.913622685 21:55:37
5 0.911423611 21:52:27
6 09:10 AM 09:10:00
7 18:00 18:00:00
Related
I have a dataframe (vlinder) like the following, whereby the date and the timestamp (in UTC) are in separate columns:
date time.utc variable
1/04/2020 0:00:00 12
1/04/2020 0:05:00 54
In a first step, I combined the date and time variables into one column called dateandtime using the following code:
vlinder$dateandtime <- paste(vlinder$date, vlinder$time.utc)
which resulted in an extra column in dataframe vlinder:
date time.utc variable dateandtime
1/04/2020 0:00:00 12 1/04/2020 0:00:00
1/04/2020 0:05:00 54 1/04/2020 0:05:00
I want to convert the time of UTC into local time (which is CEST, so a time difference of 2 hours).
I tried using the following code, but I get something totally different.
vlinder$dateandtime <- as.POSIXct(vlinder$dateandtime, tz = "UTC")
vlinder$dateandtime.cest <- format(vlinder$dateandtime, tz = "Europe/Brussels", usetz = TRUE)
which results in:
date time.utc variable dateandtime dateandtime.cest
1/04/2020 0:00:00 12 0001-04-20 0001-04-20 00:17:30 LMT
1/04/2020 0:05:00 54 0001-04-20 0001-04-20 00:17:30 LMT
How can I solve this?
Many thanks!
Here's a lubridate and tidyverse answer. Some data tidying, data type changes, and then bam. Check lubridate::OlsonNames() for valid time zones (tz). (I'm not positive I chose the correct tz.)
library(tidyverse)
library(lubridate)
df <- read.table(header = TRUE,
text = "date time.utc variable
1/04/2020 00:00:00 12
1/04/2020 00:05:00 54")
df <- df %>%
mutate(date = mdy(date),
datetime_utc = as_datetime(paste(date, time.utc)),
datetime_cest = as_datetime(datetime_utc, tz = 'Europe/Brussels'))
date time.utc variable datetime_utc datetime_cest
1 2020-01-04 00:00:00 12 2020-01-04 00:00:00 2020-01-04 01:00:00
2 2020-01-04 00:05:00 54 2020-01-04 00:05:00 2020-01-04 01:05:00
The default format of as.POSIXct expects an date ordered by Year-Month-Day. Therefore the date 01/04/2020 is translated into the 20th April of Year 1.
You just need to add your timeformat to as.POSIXct:
vlinder$dateandtime <- as.POSIXct(vlinder$dateandtime, tz = "UTC", format = "%d/%m/%Y %H:%M:%S")
format(vlinder$dateandtime, tz = "Europe/Brussels", usetz = TRUE)
I have a column of dates in an R data frame, that look like this,
Date
2020-08-05
2020-08-05
2020-08-05
2020-08-07
2020-08-08
2020-08-08
So the dates are formatted as 'yyyy-mm-dd'.
I am writing this data frame to a CSV that needs to be formatted in a very specific manner. I need to convert these dates to the format 'mm/dd/yyyy hh:mm:ss', so this is what I want the columns to look like:
Date
8/5/2020 12:00:00 AM
8/5/2020 12:00:00 AM
8/5/2020 12:00:00 AM
8/7/2020 12:00:00 AM
8/8/2020 12:00:00 AM
8/8/2020 12:00:00 AM
The dates do not have a timestamp attached to begin with, so all dates will need a midnight timestamp in the format shown above.
I spent quite some time trying to coerce this format yesterday and was unable. I am easily able to change 2020-08-05 to 8/5/2020 using as.Date(), but the issue arises when I attempt to add the midnight time stamp.
How can I add a midnight timestamp to these reformatted dates?
Thanks so much for any help!
You can use format:
df <- data.frame(Date = as.Date(c("2020-08-05", "2020-08-07")))
format(df$Date, "%d-%m-%Y 12:00:00 AM")
[1] "05-08-2020 12:00:00 AM" "07-08-2020 12:00:00 AM"
dat <- data.frame(
Date = as.Date("2020-08-05") + c(0, 0, 0, 2, 3, 3)
)
dat[["Date"]] <- format(dat[["Date"]], "%m/%d/%Y %I:%M:%S %p")
dat[["Date"]] <- sub("([ap]m)$", "\\U\\1", dat[["Date"]], perl = T)
dat
## Date
## 1 08/05/2020 12:00:00 AM
## 2 08/05/2020 12:00:00 AM
## 3 08/05/2020 12:00:00 AM
## 4 08/07/2020 12:00:00 AM
## 5 08/08/2020 12:00:00 AM
## 6 08/08/2020 12:00:00 AM
Try this:
format(as.POSIXct("2022-11-08", tz = "Australia/Sydney"), "%Y-%m-%d %H:%M:%S")
I have a data frame with a date-time column. I want to split the column into multiple columns: year, month, day, time_12, time_24, and timezone.
The time_12 and time_24 need to be character vectors using the 12-hour convention and 24-hour convention, respectively. How could I accomplish this?
library(tidyverse)
library(lubridate)
# data frame
myDates <- ymd_hm(c('2018-October-31 8:00 PM',
'2018Oct31T20:00'))
df <- data.frame(datetime = myDates)
# split datetime into parts
df$year <- year(df$datetime)
df$month <- month(df$datetime)
df$day <- day(df$datetime)
df$time_12 <- '8:00 PM' ### need help
df$time_24 <- '20:00' ### need help
df$tz <- tz(df$datetime)
df
# datetime year month day time_12 time_24 tz
# 1 2018-10-31 20:00:00 2018 10 31 8:00 PM 20:00 UTC
# 2 2018-10-31 20:00:00 2018 10 31 8:00 PM 20:00 UTC
sapply(df, class)
# $datetime
# [1] "POSIXct" "POSIXt"
#
# $year
# [1] "numeric"
#
# $month
# [1] "numeric"
#
# $day
# [1] "integer"
#
# $time_12
# [1] "character"
#
# $time_24
# [1] "character"
#
# $tz
# [1] "character"
We can use format to extract the correct format
library(dplyr)
df %>%
mutate(year = year(datetime),
month = month(datetime),
day = day(datetime),
time_12 = format(datetime, "%I:%M %p"),
time_24 = format(datetime, '%H:%M'),
tz = tz(datetime))
# datetime year month day time_12 time_24 tz
#1 2018-10-31 20:00:00 2018 10 31 08:00 PM 20:00 UTC
#2 2018-10-31 20:00:00 2018 10 31 08:00 PM 20:00 UTC
This question already has answers here:
R tick data : merging date and time into a single object
(2 answers)
Closed 5 years ago.
Hi Have 2 columns in a dataframe. Column 1 has Dates like 2017-01-01 and column 2 has time stamp like 1:00 PM.
I need to create another column that combines these 2 information and gives me the 2017-01-01 13:00:00
Use as.POSIXct to convert from character to date format.
df$date.time <- as.POSIXct(paste(df$date, df$time), format = "%Y-%m-%d %I:%M %p")
EDIT:
To provide some further context... You paste the date and the time column together to get the string 2017-001-01 1:00 PM.
You then input the format of the string as a POSIXct argument using format =. You can see the relationship between symbols and their meaning here.
Reproducible example
library(lubridate)
A <- data.frame(X1 = ymd("2017-01-01"),
X2 = "1:00 PM", stringsAsFactors=F)
# X1 X2
# 1 2017-01-01 1:00 PM
solution
library(dplyr)
library(lubridate)
temp <- A %>%
mutate(X3 = ymd_hm(paste(X1, X2)))
output
X1 X2 X3
<date> <chr> <dttm>
1 2017-01-01 1:00 PM 2017-01-01 13:00:00
multi-row input
B <- data.frame(X1 = ymd("2017-01-01", "2016-01-01"),
X2 = c("1:00 PM", "2:00 AM"), stringsAsFactors=F)
temp <- B %>%
mutate(X3 = ymd_hm(paste(X1, X2)))
# X1 X2 X3
# <date> <chr> <dttm>
# 1 2017-01-01 1:00 PM 2017-01-01 13:00:00
# 2 2016-01-01 2:00 AM 2016-01-01 02:00:00
I have read in and formatted my data set like shown under.
library(xts)
#Read data from file
x <- read.csv("data.dat", header=F)
x[is.na(x)] <- c(0) #If empty fill in zero
#Construct data frames
rawdata.h <- data.frame(x[,2],x[,3],x[,4],x[,5],x[,6],x[,7],x[,8]) #Hourly data
rawdata.15min <- data.frame(x[,10]) #15 min data
#Convert time index to proper format
index.h <- as.POSIXct(strptime(x[,1], "%d.%m.%Y %H:%M"))
index.15min <- as.POSIXct(strptime(x[,9], "%d.%m.%Y %H:%M"))
#Set column names
names(rawdata.h) <- c("spot","RKup", "RKdown","RKcon","anm", "pp.stat","prod.h")
names(rawdata.15min) <- c("prod.15min")
#Convert data frames to time series objects
data.htemp <- xts(rawdata.h,order.by=index.h)
data.15mintemp <- xts(rawdata.15min,order.by=index.15min)
#Select desired subset period
data.h <- data.htemp["2013"]
data.15min <- data.15mintemp["2013"]
I want to be able to combine hourly data from data.h$prod.h with data, with 15 min resolution, from data.15min$prod.15min corresponding to the same hour.
An example would be to take the average of the hourly value at time 2013-12-01 00:00-01:00 with the last 15 minute value in that same hour, i.e. the 15 minute value from time 2013-12-01 00:45-01:00. I'm looking for a flexible way to do this with an arbitrary hour.
Any suggestions?
Edit: Just to clarify further: I want to do something like this:
N <- NROW(data.h$prod.h)
for (i in 1:N){
prod.average[i] <- mean(data.h$prod.h[i] + #INSERT CODE THAT FINDS LAST 15 MIN IN HOUR i )
}
I found a solution to my problem by converting the 15 minute data into hourly data using the very useful .index* function from the xts package like shown under.
prod.new <- data.15min$prod.15min[.indexmin(data.15min$prod.15min) %in% c(45:59)]
This creates a new time series with only the values occuring in the 45-59 minute interval each hour.
For those curious my data looked like this:
Original hourly series:
> data.h$prod.h[1:4]
2013-01-01 00:00:00 19.744
2013-01-01 01:00:00 27.866
2013-01-01 02:00:00 26.227
2013-01-01 03:00:00 16.013
Original 15 minute series:
> data.15min$prod.15min[1:4]
2013-09-30 00:00:00 16.4251
2013-09-30 00:15:00 18.4495
2013-09-30 00:30:00 7.2125
2013-09-30 00:45:00 12.1913
2013-09-30 01:00:00 12.4606
2013-09-30 01:15:00 12.7299
2013-09-30 01:30:00 12.9992
2013-09-30 01:45:00 26.7522
New series with only the last 15 minutes in each hour:
> prod.new[1:4]
2013-09-30 00:45:00 12.1913
2013-09-30 01:45:00 26.7522
2013-09-30 02:45:00 5.0332
2013-09-30 03:45:00 2.6974
Short answer
df %>%
group_by(t = cut(time, "30 min")) %>%
summarise(v = mean(value))
Long answer
Since, you want to compress the 15 minutes time series to a smaller resolution (30 minutes), you should use dplyr package or any other package that computes the "group by" concept.
For instance:
s = seq(as.POSIXct("2017-01-01"), as.POSIXct("2017-01-02"), "15 min")
df = data.frame(time = s, value=1:97)
df is a time series with 97 rows and two columns.
head(df)
time value
1 2017-01-01 00:00:00 1
2 2017-01-01 00:15:00 2
3 2017-01-01 00:30:00 3
4 2017-01-01 00:45:00 4
5 2017-01-01 01:00:00 5
6 2017-01-01 01:15:00 6
The cut.POSIXt, group_by and summarise functions do the work:
df %>%
group_by(t = cut(time, "30 min")) %>%
summarise(v = mean(value))
t v
1 2017-01-01 00:00:00 1.5
2 2017-01-01 00:30:00 3.5
3 2017-01-01 01:00:00 5.5
4 2017-01-01 01:30:00 7.5
5 2017-01-01 02:00:00 9.5
6 2017-01-01 02:30:00 11.5
A more robust way is to convert 15 minutes values into hourly values by taking average. Then do whatever operation you want to.
### 15 Minutes Data
min15 <- structure(list(V1 = structure(1:8, .Label = c("2013-01-01 00:00:00",
"2013-01-01 00:15:00", "2013-01-01 00:30:00", "2013-01-01 00:45:00",
"2013-01-01 01:00:00", "2013-01-01 01:15:00", "2013-01-01 01:30:00",
"2013-01-01 01:45:00"), class = "factor"), V2 = c(16.4251, 18.4495,
7.2125, 12.1913, 12.4606, 12.7299, 12.9992, 26.7522)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -8L))
min15
### Hourly Data
hourly <- structure(list(V1 = structure(1:4, .Label = c("2013-01-01 00:00:00",
"2013-01-01 01:00:00", "2013-01-01 02:00:00", "2013-01-01 03:00:00"
), class = "factor"), V2 = c(19.744, 27.866, 26.227, 16.013)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -4L))
hourly
### Convert 15min data into hourly data by taking average of 4 values
min15$V1 <- as.POSIXct(min15$V1,origin="1970-01-01 0:0:0")
min15 <- aggregate(. ~ cut(min15$V1,"60 min"),min15[setdiff(names(min15), "V1")],mean)
min15
names(min15) <- c("time","min15")
names(hourly) <- c("time","hourly")
### merge the corresponding values
combined <- merge(hourly,min15)
### average of hourly and 15min values
rowMeans(combined[,2:3])