Related
I do not understand why an time which is derived from the function parse_date_time is not usable by another function in lubridate().
This produces a df that has the dates with am/pm parsed correctly.
dt2 <- data.frame('date_time' = c("11/24/19 06:00:00 PM",
"11/25/19 12:00:00 AM",
"11/25/19 06:00:00 AM",
"11/25/19 12:00:00 PM",
"11/25/19 06:00:00 PM",
"11/26/19 12:00:00 AM"),
'date' = c(1:6), 'time' = c(1:6)) %>%
mutate(date_time = parse_date_time(date_time, orders = "mdy IMS %p"),
date = date(date_time),
time = strftime(date_time,"%H:%M:%S", tz = "UTC"))
When I try to extract the hour from the hour column I get errors:
dt2 <- dt2 %>% mutate(hour_from_hour = hour(time))
Error: Problem with mutate() column hour_from_hour.
i hour_from_hour = hour(time).
x character string is not in a standard unambiguous format
But when I use the the original variable "date_time" it works fine.
dt2 <- dt2 %>% mutate(hour_from_date_time = hour(date_time))
My data sets have variable headers (some are in date time, some are already parsed). It would be nice if I could use hour() on the time column.
R doesn't have a native way to handle times that aren't associated to a day. But you can use a package like hms. For example:
library(tidyverse)
library(lubridate)
library(hms)
dt2 <- data.frame('date_time' = c("11/24/19 06:00:00 PM",
"11/25/19 12:00:00 AM",
"11/25/19 06:00:00 AM",
"11/25/19 12:00:00 PM",
"11/25/19 06:00:00 PM",
"11/26/19 12:00:00 AM"),
'date' = c(1:6), 'time' = c(1:6)) %>%
mutate(date_time = parse_date_time(date_time, orders = "mdy IMS %p"),
date = date(date_time),
time = as_hms(date_time),
hour = hour(time))
But to be honest, it's probably better to keep the date_time column and use hour directly on it.
If I understood your question correctly this code answers it. It first extracts the two digits for the hour as a character string and then converts them to an integer. The code assumes leading zeros and no leading spaces. The regular expression needs to be edited if cases with different formatting are to be handled. The solution is rather simple once one finds which functions to use, but it is not trivial, I think.
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
dt2 <- data.frame('date_time' = c("11/24/19 06:00:00 PM",
"11/25/19 12:00:00 AM",
"11/25/19 06:00:00 AM",
"11/25/19 12:00:00 PM",
"11/25/19 06:00:00 PM",
"11/26/19 12:00:00 AM"),
'date' = c(1:6), 'time' = c(1:6)) %>%
mutate(date_time = parse_date_time(date_time, orders = "mdy IMS %p"),
date = date(date_time),
time = strftime(date_time,"%H:%M:%S", tz = "UTC"))
# hour is of mode character, assuming that TZ is always UTC
dt2 <- dt2 %>% mutate(hour_from_hour = as.integer(str_extract(time, "^[0-2][0-9]")),
hour_from_date_time = hour(date_time))
identical(dt2$hour_from_hour, dt2$hour_from_date_time)
#> [1] TRUE
dt2
#> date_time date time hour_from_hour hour_from_date_time
#> 1 2019-11-24 18:00:00 2019-11-24 18:00:00 18 18
#> 2 2019-11-25 00:00:00 2019-11-25 00:00:00 0 0
#> 3 2019-11-25 06:00:00 2019-11-25 06:00:00 6 6
#> 4 2019-11-25 12:00:00 2019-11-25 12:00:00 12 12
#> 5 2019-11-25 18:00:00 2019-11-25 18:00:00 18 18
#> 6 2019-11-26 00:00:00 2019-11-26 00:00:00 0 0
Created on 2021-12-21 by the reprex package (v2.0.1)
I'm new to R - and searched old post for an answer but failed to come across anything that resolved my issue.
I pulled in a csv with the time a trip started in the mdy h:mm:ss format, but it is currently recognized as a character. I've tried to use mdy_hms(c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00"))
as well as
as.Date(parse_date_time(dc_biketrips$started_at, c(mdy_hms))) to no avail.
Does anyone have any suggestions for how I could fix this?
UPDATE: I also tried to use date <-mdy_hms(c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00")) str(date)but this also did not work
attempt to use date <-mdy_hms(C("11/1/2020 0:05:00"etc
image of csv
The first of your two options works:
library(lubridate)
date <-mdy_hms(c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00"))
str(date)
# POSIXct[1:3], format: "2020-11-01 00:05:00" "2020-11-01 07:29:00" "2020-11-01 14:04:00"
How were your data "pulled in"?
One option would be to use as.POSIXct:
started_at <- c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00")
as.POSIXct(started_at, format = "%m/%d/%Y %H:%M:%OS")
#> [1] "2020-11-01 00:05:00 CET" "2020-11-01 07:29:00 CET"
#> [3] "2020-11-01 14:04:00 CET"
EDIT
library(lubridate)
library(dplyr)
started_at <- c("11/1/2020 0:05:00","11/1/2020 7:29:00","11/1/2020 14:04:00")
Both as.POSIXct and lubridate::mdy_hms return an object of class "POSIXct" "POSIXt"
class(as.POSIXct(started_at, format = "%m/%d/%Y %H:%M:%OS"))
#> [1] "POSIXct" "POSIXt"
class(mdy_hms(started_at))
#> [1] "POSIXct" "POSIXt"
Not sure what you expect. When I run your code everything works fine except that we end up with 0 obs after filtering for week < 15 as all the dates in the example data are from week 44:
dc_biketrips <- data.frame(
started_at
)
dc_biketrips <- dc_biketrips %>%
mutate(started_at = as.POSIXct(started_at, format = "%m/%d/%Y %H:%M:%OS"),
interval60 = floor_date(started_at, unit = "hour"),
interval15 = floor_date(started_at, unit = "15 mins"),
week = week(interval60),
dotw = wday(interval60, label=TRUE))
dc_biketrips
#> started_at interval60 interval15 week dotw
#> 1 2020-11-01 00:05:00 2020-11-01 00:00:00 2020-11-01 00:00:00 44 So
#> 2 2020-11-01 07:29:00 2020-11-01 07:00:00 2020-11-01 07:15:00 44 So
#> 3 2020-11-01 14:04:00 2020-11-01 14:00:00 2020-11-01 14:00:00 44 So
dc_biketrips %>%
filter(week < 15)
#> [1] started_at interval60 interval15 week dotw
#> <0 rows> (or 0-length row.names)
Hi Friends Both these columns(starttime/stoptime) are in Character class ,how could I convert to numeric(POSIXct) to find the time consumption ,Thank you
starttime
12/31/2015 23:48
12/31/2015 23:47
12/31/2015 23:37
12/31/2015 23:37
stoptime
12/31/2015 23:51
12/31/2015 23:53
12/31/2015 23:43
1/1/2016 0:02
The function parse_date_time from the lubridate package is a clean way to deal with time. Here is how to do it:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
tribble(
~starttime, ~stoptime,
"12/31/2015 23:48", "12/31/2015 23:51",
"12/31/2015 23:47", "12/31/2015 23:53"
) %>%
mutate(
starttime = starttime %>% parse_date_time("%m/%d/%y %H:%M"),
stoptime = stoptime %>% parse_date_time("%m/%d/%y %H:%M"),
duration = stoptime - starttime
)
#> # A tibble: 2 x 3
#> starttime stoptime duration
#> <dttm> <dttm> <drtn>
#> 1 2015-12-31 23:48:00 2015-12-31 23:51:00 3 mins
#> 2 2015-12-31 23:47:00 2015-12-31 23:53:00 6 mins
Created on 2021-09-09 by the reprex package (v2.0.0)
Additionally, you can also use mdy_hm function from lubridate -
library(dplyr)
library(lubridate)
df <- df %>% mutate(across(c(starttime, stoptime), mdy_hm))
In base R, you can use as.POSIXct
df[1:2] <- lapply(df[1:2], as.POSIXct, format = "%m/%d/%Y %H:%M")
In my raw data file (4600 records) date, year, hour and minutes are merged together in a large integer, example:
1205981254 (May 12, 1998, at 12:54)
The problem is that records for dates between day 10 and 31 of each month have 10 digits, while there are only 9 digits of dates between day 1 and 9:
905981254 (May 9, 1998, at 12:54)
I created this raw data file many years ago while being a student, and followed no particular format. How can I extract day, month, year, and time of day from these integers? I have read trough all the previous Qs and As without finding a solution of my particular problem.
You can convert data back to POSIXct/POSIXlt format :
x <- c(1205981254, 905981254)
x1 <- as.POSIXct(sprintf("%010d", x), format = "%d%m%y%H%M", tz = 'UTC')
x1
#[1] "1998-05-12 12:54:00 UTC" "1998-05-09 12:54:00 UTC"
You can then extract whichever information you want from this.
#Date
as.integer(format(x1, "%d"))
#[1] 12 9
#Hour
as.integer(format(x1, "%H"))
#[1] 12 12
#Minute
as.integer(format(x1, "%m"))
#[1] 5 5
dates <- c( 1205981254, 905981254 )
#convert to character
dates <- as.character( dates )
#convert to posix, based on length.. add a 0 as prefix in case of 9 character-length
dplyr::if_else( nchar(dates) == 10,
as.POSIXct( dates, format = "%d%m%y%H%M"),
as.POSIXct( paste0(0,dates), format = "%d%m%y%H%M") )
[1] "1998-05-12 12:54:00 CEST" "1998-05-09 12:54:00 CEST"
Maybe this works for you if the century of the year stays the same.
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
x <- 905981254
y <- 1205981254
df <- data.frame(records = as.character(rep(c(x, y), 100))) %>%
mutate(records = ifelse(nchar(records) == 9, paste("0", records, sep = ""), records)) %>%
mutate(records = as.POSIXct(records, format = "%d%m%y%H%M"))
head(df)
#> records
#> 1 1998-05-09 12:54:00
#> 2 1998-05-12 12:54:00
#> 3 1998-05-09 12:54:00
#> 4 1998-05-12 12:54:00
#> 5 1998-05-09 12:54:00
#> 6 1998-05-12 12:54:00
Created on 2020-07-07 by the reprex package (v0.3.0)
You can try this:
v1 <- '1205981254'
v2 <- '905981254'
#Extract dates first
nv1 <- as.Date(v1,'%d%m%y%H%M')
nv2 <- as.Date(paste0(0,v2),'%d%m%y%H%M')
#Extract hours
nh1 <- paste0(substr(v1,nchar(v1)-3,nchar(v1)-2),':',substr(v1,nchar(v1)-1,nchar(v1)),':00')
nh2 <- paste0(substr(v2,nchar(v2)-3,nchar(v2)-2),':',substr(v2,nchar(v2)-1,nchar(v2)),':00')
#Concatenate
ndate1 <- paste0(nv1,' ',nh1)
ndate2 <- paste0(nv2,' ',nh2)
#Define as dates
as.POSIXlt(ndate1,tz = 'GMT')
as.POSIXlt(ndate2,tz = 'GMT')
[1] "1998-05-12 12:54:00 GMT"
[1] "1998-05-09 12:54:00 GMT"
If all your years are 19XX and not 20XX you can use
dates <- c(1205981254,905981254)
as.POSIXct(sub("(..)(..)(..)(..)$","-\\1-19\\2 \\3:\\4", dates),format="%d-%m-%Y %H:%M")
"1998-05-12 12:54:00 AST" "1998-05-09 12:54:00 AST"
Below is what my data looks like. I need to find the max and min temp for each day as well as the corresponding temperature.
Temp date time
280.9876771 01-01-79 03:00:00
291.9695498 01-01-79 06:00:00
294.9583426 01-01-79 09:00:00
290.2357847 01-01-79 12:00:00
286.2944531 01-01-79 15:00:00
282.9282138 01-01-79 18:00:00
280.326689 01-01-79 21:00:00
279.2551605 02-01-79 00:00:00
281.3981824 02-01-79 03:00:00
293.076125 02-01-79 06:00:00
295.8072204 02-01-79 09:00:00
This code I tried for min and max temp for daily.
library(xts)
read.csv("hourly1.csv", header = T) -> hourly1
xts(hourly1$Temp, as.Date(hourly1$date)) -> temp_date1
apply.daily(temp_date1, min) -> mintemp1_date
apply.daily(temp_date1, max) -> maxtemp1_date
I need help regarding how to find the time of day for min and max temp
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dataset <- read.table(text = 'Temp date time
280.9876771 01-01-79 03:00:00
291.9695498 01-01-79 06:00:00
294.9583426 01-01-79 09:00:00
290.2357847 01-01-79 12:00:00
286.2944531 01-01-79 15:00:00
282.9282138 01-01-79 18:00:00
280.326689 01-01-79 21:00:00
279.2551605 02-01-79 00:00:00
281.3981824 02-01-79 03:00:00
293.076125 02-01-79 06:00:00
295.8072204 02-01-79 09:00:00',
header = TRUE,
stringsAsFactors = FALSE)
dataset %>%
group_by(date) %>%
summarise(min_temp = min(Temp),
min_temp_time = time[which.min(x = Temp)],
max_temp = max(Temp),
max_temp_time = time[which.max(x = Temp)])
#> # A tibble: 2 x 5
#> date min_temp min_temp_time max_temp max_temp_time
#> <chr> <dbl> <chr> <dbl> <chr>
#> 1 01-01-79 280. 21:00:00 295. 09:00:00
#> 2 02-01-79 279. 00:00:00 296. 09:00:00
Created on 2019-06-15 by the reprex package (v0.3.0)
Hope this helps.
Try the dplyr package.
df <- structure(list(Temp = c(280.9876771, 291.9695498, 294.9583426,
290.2357847, 286.2944531, 282.9282138, 280.326689, 279.2551605,
281.3981824, 293.076125, 295.8072204),
date = c("01-01-79", "01-01-79",
"01-01-79", "01-01-79", "01-01-79", "01-01-79", "01-01-79", "02-01-79",
"02-01-79", "02-01-79", "02-01-79"),
time = c("03:00:00", "06:00:00", "09:00:00", "12:00:00", "15:00:00", "18:00:00", "21:00:00", "00:00:00",
"03:00:00", "06:00:00", "09:00:00")),
row.names = c(NA, -11L),
class = c("data.frame"))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df %>%
group_by(date)%>%
slice(which.max(Temp))
#> # A tibble: 2 x 3
#> # Groups: date [2]
#> Temp date time
#> <dbl> <chr> <chr>
#> 1 295. 01-01-79 09:00:00
#> 2 296. 02-01-79 09:00:00
df %>%
group_by(date)%>%
slice(which.min(Temp))
#> # A tibble: 2 x 3
#> # Groups: date [2]
#> Temp date time
#> <dbl> <chr> <chr>
#> 1 280. 01-01-79 21:00:00
#> 2 279. 02-01-79 00:00:00
Created on 2019-06-15 by the reprex package (v0.3.0)
a data.table + lubridate solution
# load libraries
library(data.table)
library(lubridate)
# load data
dt <- fread(" Temp date time
280.9876771 01-01-79 03:00:00
291.9695498 01-01-79 06:00:00
294.9583426 01-01-79 09:00:00
290.2357847 01-01-79 12:00:00
286.2944531 01-01-79 15:00:00
282.9282138 01-01-79 18:00:00
280.326689 01-01-79 21:00:00
279.2551605 02-01-79 00:00:00
281.3981824 02-01-79 03:00:00
293.076125 02-01-79 06:00:00
295.8072204 02-01-79 09:00:00")
# Convert date - time values to real dates:
dt[, date2 := dmy_hms(paste(date, time, sep = " "))]
# find the date - time for max temp:
dt[, date2[which(Temp == max(Temp))], by = floor_date(date2, "days")]
# find the date - time for min temp:
dt[, date2[which(Temp == min(Temp))], by = floor_date(date2, "days")]
Thank You Guys for the help. But I have 116881 entries.
So I tried the index command in R. This fetched me the corresponding id.
index(hourly1)[hourly1$Temp %in% maxtemp1_date] -> max_id
index(hourly1)[hourly1$Temp %in% mintemp1_date] -> min_id
Then I used the vlookup command in Excel to get the desired solution.
In data.table
dt[, x.value.min := frollapply(x = x, n = 2, min, fill = NA, align = "right", na.rm =TRUE), by = ID]