I do not understand why an time which is derived from the function parse_date_time is not usable by another function in lubridate().
This produces a df that has the dates with am/pm parsed correctly.
dt2 <- data.frame('date_time' = c("11/24/19 06:00:00 PM",
"11/25/19 12:00:00 AM",
"11/25/19 06:00:00 AM",
"11/25/19 12:00:00 PM",
"11/25/19 06:00:00 PM",
"11/26/19 12:00:00 AM"),
'date' = c(1:6), 'time' = c(1:6)) %>%
mutate(date_time = parse_date_time(date_time, orders = "mdy IMS %p"),
date = date(date_time),
time = strftime(date_time,"%H:%M:%S", tz = "UTC"))
When I try to extract the hour from the hour column I get errors:
dt2 <- dt2 %>% mutate(hour_from_hour = hour(time))
Error: Problem with mutate() column hour_from_hour.
i hour_from_hour = hour(time).
x character string is not in a standard unambiguous format
But when I use the the original variable "date_time" it works fine.
dt2 <- dt2 %>% mutate(hour_from_date_time = hour(date_time))
My data sets have variable headers (some are in date time, some are already parsed). It would be nice if I could use hour() on the time column.
R doesn't have a native way to handle times that aren't associated to a day. But you can use a package like hms. For example:
library(tidyverse)
library(lubridate)
library(hms)
dt2 <- data.frame('date_time' = c("11/24/19 06:00:00 PM",
"11/25/19 12:00:00 AM",
"11/25/19 06:00:00 AM",
"11/25/19 12:00:00 PM",
"11/25/19 06:00:00 PM",
"11/26/19 12:00:00 AM"),
'date' = c(1:6), 'time' = c(1:6)) %>%
mutate(date_time = parse_date_time(date_time, orders = "mdy IMS %p"),
date = date(date_time),
time = as_hms(date_time),
hour = hour(time))
But to be honest, it's probably better to keep the date_time column and use hour directly on it.
If I understood your question correctly this code answers it. It first extracts the two digits for the hour as a character string and then converts them to an integer. The code assumes leading zeros and no leading spaces. The regular expression needs to be edited if cases with different formatting are to be handled. The solution is rather simple once one finds which functions to use, but it is not trivial, I think.
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
dt2 <- data.frame('date_time' = c("11/24/19 06:00:00 PM",
"11/25/19 12:00:00 AM",
"11/25/19 06:00:00 AM",
"11/25/19 12:00:00 PM",
"11/25/19 06:00:00 PM",
"11/26/19 12:00:00 AM"),
'date' = c(1:6), 'time' = c(1:6)) %>%
mutate(date_time = parse_date_time(date_time, orders = "mdy IMS %p"),
date = date(date_time),
time = strftime(date_time,"%H:%M:%S", tz = "UTC"))
# hour is of mode character, assuming that TZ is always UTC
dt2 <- dt2 %>% mutate(hour_from_hour = as.integer(str_extract(time, "^[0-2][0-9]")),
hour_from_date_time = hour(date_time))
identical(dt2$hour_from_hour, dt2$hour_from_date_time)
#> [1] TRUE
dt2
#> date_time date time hour_from_hour hour_from_date_time
#> 1 2019-11-24 18:00:00 2019-11-24 18:00:00 18 18
#> 2 2019-11-25 00:00:00 2019-11-25 00:00:00 0 0
#> 3 2019-11-25 06:00:00 2019-11-25 06:00:00 6 6
#> 4 2019-11-25 12:00:00 2019-11-25 12:00:00 12 12
#> 5 2019-11-25 18:00:00 2019-11-25 18:00:00 18 18
#> 6 2019-11-26 00:00:00 2019-11-26 00:00:00 0 0
Created on 2021-12-21 by the reprex package (v2.0.1)
Related
I'm new to R - and searched old post for an answer but failed to come across anything that resolved my issue.
I pulled in a csv with the time a trip started in the mdy h:mm:ss format, but it is currently recognized as a character. I've tried to use mdy_hms(c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00"))
as well as
as.Date(parse_date_time(dc_biketrips$started_at, c(mdy_hms))) to no avail.
Does anyone have any suggestions for how I could fix this?
UPDATE: I also tried to use date <-mdy_hms(c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00")) str(date)but this also did not work
attempt to use date <-mdy_hms(C("11/1/2020 0:05:00"etc
image of csv
The first of your two options works:
library(lubridate)
date <-mdy_hms(c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00"))
str(date)
# POSIXct[1:3], format: "2020-11-01 00:05:00" "2020-11-01 07:29:00" "2020-11-01 14:04:00"
How were your data "pulled in"?
One option would be to use as.POSIXct:
started_at <- c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00")
as.POSIXct(started_at, format = "%m/%d/%Y %H:%M:%OS")
#> [1] "2020-11-01 00:05:00 CET" "2020-11-01 07:29:00 CET"
#> [3] "2020-11-01 14:04:00 CET"
EDIT
library(lubridate)
library(dplyr)
started_at <- c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00")
Both as.POSIXct and lubridate::mdy_hms return an object of class "POSIXct" "POSIXt"
class(as.POSIXct(started_at, format = "%m/%d/%Y %H:%M:%OS"))
#> [1] "POSIXct" "POSIXt"
class(mdy_hms(started_at))
#> [1] "POSIXct" "POSIXt"
Not sure what you expect. When I run your code everything works fine except that we end up with 0 obs after filtering for week < 15 as all the dates in the example data are from week 44:
dc_biketrips <- data.frame(
started_at
)
dc_biketrips <- dc_biketrips %>%
mutate(started_at = as.POSIXct(started_at, format = "%m/%d/%Y %H:%M:%OS"),
interval60 = floor_date(started_at, unit = "hour"),
interval15 = floor_date(started_at, unit = "15 mins"),
week = week(interval60),
dotw = wday(interval60, label=TRUE))
dc_biketrips
#> started_at interval60 interval15 week dotw
#> 1 2020-11-01 00:05:00 2020-11-01 00:00:00 2020-11-01 00:00:00 44 So
#> 2 2020-11-01 07:29:00 2020-11-01 07:00:00 2020-11-01 07:15:00 44 So
#> 3 2020-11-01 14:04:00 2020-11-01 14:00:00 2020-11-01 14:00:00 44 So
dc_biketrips %>%
filter(week < 15)
#> [1] started_at interval60 interval15 week dotw
#> <0 rows> (or 0-length row.names)
Hi Friends Both these columns(starttime/stoptime) are in Character class ,how could I convert to numeric(POSIXct) to find the time consumption ,Thank you
starttime
12/31/2015 23:48
12/31/2015 23:47
12/31/2015 23:37
12/31/2015 23:37
stoptime
12/31/2015 23:51
12/31/2015 23:53
12/31/2015 23:43
1/1/2016 0:02
The function parse_date_time from the lubridate package is a clean way to deal with time. Here is how to do it:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
tribble(
~starttime, ~stoptime,
"12/31/2015 23:48", "12/31/2015 23:51",
"12/31/2015 23:47", "12/31/2015 23:53"
) %>%
mutate(
starttime = starttime %>% parse_date_time("%m/%d/%y %H:%M"),
stoptime = stoptime %>% parse_date_time("%m/%d/%y %H:%M"),
duration = stoptime - starttime
)
#> # A tibble: 2 x 3
#> starttime stoptime duration
#> <dttm> <dttm> <drtn>
#> 1 2015-12-31 23:48:00 2015-12-31 23:51:00 3 mins
#> 2 2015-12-31 23:47:00 2015-12-31 23:53:00 6 mins
Created on 2021-09-09 by the reprex package (v2.0.0)
Additionally, you can also use mdy_hm function from lubridate -
library(dplyr)
library(lubridate)
df <- df %>% mutate(across(c(starttime, stoptime), mdy_hm))
In base R, you can use as.POSIXct
df[1:2] <- lapply(df[1:2], as.POSIXct, format = "%m/%d/%Y %H:%M")
We have the code:
times <- c("2:30 PM", "10:00 AM", "10:00 AM")
mydat <- data.frame(times=times)
which results in
> mydat
times
1 2:30 PM
2 10:00 AM
3 10:00 AM
I want to convert these times, which are characters, into POSIX format. So I do
mydat$ntimes <- as.POSIXct(NA,"")
mydat$ntimes <- sapply(mydat$times, function(x) parse_date_time(x, '%I:%M %p'))
Then we get
> mydat
times ntimes
1 2:30 PM -62167167000
2 10:00 AM -62167183200
3 10:00 AM -62167183200
I have no idea why these are negative. Furthermore, if instead of sapply we did a loop:
for (i in 1:length(mydat$times)){
mydat$ntimes[i] <- parse_date_time(mydat$times[i], '%I:%M %p')
}
we get the format right, but everything is off by 7 minutes and 2 seconds, why is that?
> mydat
times ntimes
1 2:30 PM 0000-01-01 06:37:02
2 10:00 AM 0000-01-01 02:07:02
3 10:00 AM 0000-01-01 02:07:02
You don't need a loop for this :
as.POSIXct(mydat$times, format = '%I:%M %p', tz = 'UTC')
#[1] "2021-03-14 14:30:00 UTC" "2021-03-14 10:00:00 UTC" "2021-03-14 10:00:00 UTC"
Or
lubridate::parse_date_time(mydat$times, '%I:%M %p')
#[1] "0000-01-01 14:30:00 UTC" "0000-01-01 10:00:00 UTC" "0000-01-01 10:00:00 UTC"
The difference in two options is that when the date is absent as.POSIXct will give today's date whereas parse_date_time will give 0000-01-01.
Base R Solution
You can use the strptime function to convert the times variable of character type to POSIXlt. Without a date provided, this function also returns todays date.
times <- c("2:30 PM", "10:00 AM", "10:00 AM")
mydat <- data.frame(times=times)
# FORMAT SPECIFICATIONS:
# %I = Hours as decimal number (01–12).
# %M = Minute as decimal number (00–59).
# %p = AM/PM indicator in the locale.
strptime(mydat$times, format='%I:%M %p', tz = 'UTC')
#> [1] "2021-03-13 14:30:00 UTC" "2021-03-13 10:00:00 UTC"
#> [3] "2021-03-13 10:00:00 UTC"
Created on 2021-03-13 by the reprex package (v0.3.0)
Add it to the data frame as a new variable
times <- c("2:30 PM", "10:00 AM", "10:00 AM")
mydat <- data.frame(times=times)
mydat$new_times <- strptime(mydat$times, format='%I:%M %p')
#> times new_times
#> 1 2:30 PM 2021-03-13 14:30:00
#> 2 10:00 AM 2021-03-13 10:00:00
#> 3 10:00 AM 2021-03-13 10:00:00
Created on 2021-03-13 by the reprex package (v0.3.0)
I'm trying to convert characters to dates in R.
My data as the following format:
df <- data.frame(Date = c("23Jul2019 11:51:09 AM","23Jul2019 11:53:09 AM","19Jul2019 2:30:06 PM","01Aug2019 3:00:17 PM"))
Based on the solution found here:
Convert character to Date in R
I could use
> as.Date(df$Date, "%d/%b/%Y %I:%M:%S %p")
[1] NA NA NA NA
%I is for decimal hour (12h) and %p Locale-specific AM/PM (https://www.stat.berkeley.edu/~s133/dates.html) but for some reason, it returns NAs.
My goal is to sort the rows of a dataframe by date and time once dates in the character format converted to Dates in R.
What is the issue with the code I'm using?
This should solve it
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
df <- data.frame(Date = c("23Jul2019 11:51:09 AM","23Jul2019 11:53:09 AM","19Jul2019 2:30:06 PM","01Aug2019 3:00:17 PM"))
df %>%
mutate(r_date_time = Date %>% dmy_hms)
#> Date r_date_time
#> 1 23Jul2019 11:51:09 AM 2019-07-23 11:51:09
#> 2 23Jul2019 11:53:09 AM 2019-07-23 11:53:09
#> 3 19Jul2019 2:30:06 PM 2019-07-19 14:30:06
#> 4 01Aug2019 3:00:17 PM 2019-08-01 15:00:17
dmy_hms(df$Date)
#> [1] "2019-07-23 11:51:09 UTC" "2019-07-23 11:53:09 UTC"
#> [3] "2019-07-19 14:30:06 UTC" "2019-08-01 15:00:17 UTC"
Created on 2020-01-14 by the reprex package (v0.3.0)
Below is what my data looks like. I need to find the max and min temp for each day as well as the corresponding temperature.
Temp date time
280.9876771 01-01-79 03:00:00
291.9695498 01-01-79 06:00:00
294.9583426 01-01-79 09:00:00
290.2357847 01-01-79 12:00:00
286.2944531 01-01-79 15:00:00
282.9282138 01-01-79 18:00:00
280.326689 01-01-79 21:00:00
279.2551605 02-01-79 00:00:00
281.3981824 02-01-79 03:00:00
293.076125 02-01-79 06:00:00
295.8072204 02-01-79 09:00:00
This code I tried for min and max temp for daily.
library(xts)
read.csv("hourly1.csv", header = T) -> hourly1
xts(hourly1$Temp, as.Date(hourly1$date)) -> temp_date1
apply.daily(temp_date1, min) -> mintemp1_date
apply.daily(temp_date1, max) -> maxtemp1_date
I need help regarding how to find the time of day for min and max temp
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dataset <- read.table(text = 'Temp date time
280.9876771 01-01-79 03:00:00
291.9695498 01-01-79 06:00:00
294.9583426 01-01-79 09:00:00
290.2357847 01-01-79 12:00:00
286.2944531 01-01-79 15:00:00
282.9282138 01-01-79 18:00:00
280.326689 01-01-79 21:00:00
279.2551605 02-01-79 00:00:00
281.3981824 02-01-79 03:00:00
293.076125 02-01-79 06:00:00
295.8072204 02-01-79 09:00:00',
header = TRUE,
stringsAsFactors = FALSE)
dataset %>%
group_by(date) %>%
summarise(min_temp = min(Temp),
min_temp_time = time[which.min(x = Temp)],
max_temp = max(Temp),
max_temp_time = time[which.max(x = Temp)])
#> # A tibble: 2 x 5
#> date min_temp min_temp_time max_temp max_temp_time
#> <chr> <dbl> <chr> <dbl> <chr>
#> 1 01-01-79 280. 21:00:00 295. 09:00:00
#> 2 02-01-79 279. 00:00:00 296. 09:00:00
Created on 2019-06-15 by the reprex package (v0.3.0)
Hope this helps.
Try the dplyr package.
df <- structure(list(Temp = c(280.9876771, 291.9695498, 294.9583426,
290.2357847, 286.2944531, 282.9282138, 280.326689, 279.2551605,
281.3981824, 293.076125, 295.8072204),
date = c("01-01-79", "01-01-79",
"01-01-79", "01-01-79", "01-01-79", "01-01-79", "01-01-79", "02-01-79",
"02-01-79", "02-01-79", "02-01-79"),
time = c("03:00:00", "06:00:00", "09:00:00", "12:00:00", "15:00:00", "18:00:00", "21:00:00", "00:00:00",
"03:00:00", "06:00:00", "09:00:00")),
row.names = c(NA, -11L),
class = c("data.frame"))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df %>%
group_by(date)%>%
slice(which.max(Temp))
#> # A tibble: 2 x 3
#> # Groups: date [2]
#> Temp date time
#> <dbl> <chr> <chr>
#> 1 295. 01-01-79 09:00:00
#> 2 296. 02-01-79 09:00:00
df %>%
group_by(date)%>%
slice(which.min(Temp))
#> # A tibble: 2 x 3
#> # Groups: date [2]
#> Temp date time
#> <dbl> <chr> <chr>
#> 1 280. 01-01-79 21:00:00
#> 2 279. 02-01-79 00:00:00
Created on 2019-06-15 by the reprex package (v0.3.0)
a data.table + lubridate solution
# load libraries
library(data.table)
library(lubridate)
# load data
dt <- fread(" Temp date time
280.9876771 01-01-79 03:00:00
291.9695498 01-01-79 06:00:00
294.9583426 01-01-79 09:00:00
290.2357847 01-01-79 12:00:00
286.2944531 01-01-79 15:00:00
282.9282138 01-01-79 18:00:00
280.326689 01-01-79 21:00:00
279.2551605 02-01-79 00:00:00
281.3981824 02-01-79 03:00:00
293.076125 02-01-79 06:00:00
295.8072204 02-01-79 09:00:00")
# Convert date - time values to real dates:
dt[, date2 := dmy_hms(paste(date, time, sep = " "))]
# find the date - time for max temp:
dt[, date2[which(Temp == max(Temp))], by = floor_date(date2, "days")]
# find the date - time for min temp:
dt[, date2[which(Temp == min(Temp))], by = floor_date(date2, "days")]
Thank You Guys for the help. But I have 116881 entries.
So I tried the index command in R. This fetched me the corresponding id.
index(hourly1)[hourly1$Temp %in% maxtemp1_date] -> max_id
index(hourly1)[hourly1$Temp %in% mintemp1_date] -> min_id
Then I used the vlookup command in Excel to get the desired solution.
In data.table
dt[, x.value.min := frollapply(x = x, n = 2, min, fill = NA, align = "right", na.rm =TRUE), by = ID]