Convert from military time to UTC in R - r

I have a dataset, df1, I would like to convert all the values from the 24 hour clock to UTC.
Date Name
1/2/2020 16:46 A
1/2/2020 16:51 B
I Would like
Date Name
1/2/2020 4:46:47 PM A
1/2/2020 4:51:44 PM B
I have tried:
df$Date<- format(df$Date, "%m/%d/%Y %I:%M:%S %p")
dput:
structure(list(Date = structure(1:2, .Label = c("1/2/2020 16:46",
"1/2/2020 16:51"), class = "factor"), Name = structure(1:2, .Label = c("A",
"B"), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))

You can first convert the data to POSIXct format and then use format to get data in the required format.
df$Date <- format(as.POSIXct(df$Date, format = "%m/%d/%Y %H:%M"),
"%m/%d/%Y %I:%M:%S %p")
#Can also use mdy_hm from lubridate
#df$Date <- format(lubridate::mdy_hm(df$Date), "%m/%d/%Y %I:%M:%S %p")
df
# Date Name
#1 01/02/2020 04:46:00 PM A
#2 01/02/2020 04:51:00 PM B

Assuming you want to actually convert a string in one format to a string in another format rather than having it as a (more useful) actual date/time, you can use a little arithmetic and string chopping along with mapply:
splits <- strsplit(as.character(df$Date), " |:")
Hours <- as.numeric(sapply(splits, `[`, 2))
AMPM <- c(" AM", " PM")[Hours %/% 12 + 1]
Hours <- Hours %% 13 + Hours %/% 13
df$Date <- mapply(function(x, y, z) paste0(x[1], " ", y, ":", x[3], z), splits, Hours, AMPM)
df
#> Date Name
#> 1 1/2/2020 4:46 PM A
#> 2 1/2/2020 4:51 PM B
Created on 2020-02-26 by the reprex package (v0.3.0)

Assuming the same assumptions as the previous answer by Allan, here is another way of converting from 24 hour to 12 hour.
library(tidyverse)
library(lubridate)
df <- tibble(
date = c(ymd_hms("2020/01/02 16:46:00", "2020/01/02 16:51:00", tz = "UTC")),
name = c("A", "B")
)
df %>%
mutate(date_hour = hour(date),
am_pm = if_else(date_hour > 12, "PM", "AM"),
date_hour = if_else(date_hour > 12, date_hour - 12, date_hour - 0),
newdatetime = paste0(date(date), " ", date_hour , ":", minute(date), " ", am_pm)) %>%
select(-c(date_hour, am_pm))
df
# A tibble: 2 x 3
date name newdatetime
<dttm> <chr> <chr>
1 2020-01-02 16:46:00 A 2020-01-02 4:46 PM
2 2020-01-02 16:51:00 B 2020-01-02 4:51 PM
Hope this helps!

Related

Groupby a column and find its sum and count

Background:
I have a dataset, df,
Date Duration
1/2/2020 5:00:00 PM 20
1/2/2020 5:30:01 PM 30
1/2/2020 6:00:00 PM 10
1/5/2020 7:00:01 AM 5
1/6/2020 8:00:00 AM 2
1/6/2020 9:00:00 AM 8
Desired Output:
Date Total_Duration Count
1/2/2020 60 3
1/5/2020 5 1
1/6/2020 10 2
Dput:
structure(list(Date = structure(1:6, .Label = c("1/2/2020 5:00:00 PM",
"1/2/2020 5:30:01 PM", "1/2/2020 6:00:00 PM", "1/5/2020 7:00:01 AM",
"1/6/2020 8:00:00 AM", "1/6/2020 9:00:00 AM"), class = "factor"),
Duration = c(20L, 30L, 10L, 5L, 2L, 8L)), class = "data.frame", row.names = c(NA,
-6L))
What I have tried:
library(dplyr)
df %>% group_by(Date) %>% add_tally() %>%
summarize(Duration)
Any guidance will be helpful.
We can get the Date only part from the 'Date' after converting to 'DateTime' with dmy_hms (assuming the format is DD/MM/YYYYY HH::MM:SS), use that as grouping variable and get the sum of 'Duration' and 'Count' as the n()
library(dplyr)
library(lubridate)
df %>%
group_by(Date = as.Date(dmy_hms(Date))) %>%
summarise(Total_Duration = sum(Duration), Count = n())
# A tibble: 3 x 3
# Date Total_Duration Count
# <date> <int> <int>
#1 2020-02-01 60 3
#2 2020-05-01 5 1
#3 2020-06-01 10 2

Filter within a column by date in R

I have a dataset, df, The Date column consists of dates from December and January. I would like to filter and make a new dataset with dates only from January onward.
Date ID
12/20/2019 1:00:01 AM A
12/30/2019 2:00:02 AM B
01/01/2020 1:00:00 AM C
02/05/2020 2:00:05 AM D
I would like this:
Date ID
01/01/2020 1:00:00 AM C
02/05/2020 2:00:05 AM D
Can I use dplyr with this? or Base R
library(lubridate)
library(tidyverse)
filter(Date) >= 01-01-2020 ?
dput is
structure(list(Date = structure(c(2L, 3L, 1L, 4L), .Label = c("1/1/2020 1:00:00 AM",
"12/20/2019 1:00:01 AM", "12/30/2019 2:00:02 AM", "2/5/2020 2:00:05 AM"
), class = "factor"), ID = structure(1:4, .Label = c("A", "B",
"C", "D"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
Maybe just filter on year and select datest from 2020?
library(dplyr)
library(lubridate)
df %>% mutate(Date = mdy_hms(Date)) %>% filter(year(Date) >= 2020)
# Date ID
#1 2020-01-01 01:00:00 C
#2 2020-02-05 02:00:05 D
Or using base R :
subset(transform(df, Date = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p")),
as.integer(format(Date, "%Y")) >= 2020)
We can use subset with strptime in base R
subset(df1, strptime(Date, "%m/%d/%Y %I:%M:%S %p")$year + 1900 >=2020)
# Date ID
#3 1/1/2020 1:00:00 AM C
#4 2/5/2020 2:00:05 AM D

Converting Date/Time in R

I am struggling hard with date time formatting in R. I am sure this is an easy fix... can someone write me a line of code that will convert all values from Year, M, D, Time into a new column "datetime"?
What data looks like:
x year m d time
A 2019 2 23 11:12 PM
B 2019 1 31 2:04 PM
C 2018 12 31 12:01 AM
D 2017 2 1 10:14 AM
What I want:
x datetime
A 2/23/19 11:12 PM
B 1/31/19 11:12 PM
C 12/31/18 12:01 AM
D 2/23/17 10:14 PM
Since it's a datetime value we can convert it into a standard format by pasting the values together.
df$datetime <- with(df, as.POSIXct(paste(year, m, d, time),
format = "%Y %m %d %I:%M %p", tz = "UTC"))
df
# x year m d time datetime
#1 A 2019 2 23 11:12PM 2019-02-23 23:12:00
#2 B 2019 1 31 2:04PM 2019-01-31 14:04:00
#3 C 2018 12 31 12:01AM 2018-12-31 00:01:00
#4 D 2017 2 1 10:14AM 2017-02-01 10:14:00
Or using lubridate
library(dplyr)
library(lubridate)
df %>% mutate(datetime = ymd_hm(paste(year, m, d, time)))
data
df <- structure(list(x = structure(1:4, .Label = c("A", "B", "C", "D"
), class = "factor"), year = c(2019L, 2019L, 2018L, 2017L), m = c(2L,
1L, 12L, 2L), d = c(23L, 31L, 31L, 1L), time = c("11:12 PM",
"2:04 PM", "12:01 AM", "10:14 AM")), row.names = c(NA, -4L), class = "data.frame")
I think the below should work for your goal:
df <- data.frame(datetime = apply(df,1, function(v) sprintf("%s/%s/%s %s",v["d"], v["m"], v["year"], v["time"])))
If you want to append the new column to the existing data.frame df, then use:
df$datetime <- apply(df,1, function(v) sprintf("%s/%s/%s %s",v["d"], v["m"], v["year"], v["time"]))

Find the smallest date for each ID

I have one data table with the following schema
id|smalltime
1 2199-08-02 20:00:00
2 2150-11-13 15:00:00
...
And I have another data table with the following schema
id|time
1 2199-08-02 20:10:00
1 2199-08-02 19:00:00
2 2150-11-13 15:10:00
...
I want to find for each id in data table two the smallest date after the smalltime of each id in data table one.
With the previous example, I am looking for the following new data table:
id|time
1 2199-08-02 20:10:00
2 2150-11-13 15:10:00
Did you mean to have something like below?
library(lubridate)
library(dplyr)
df1$smalltime <- ymd_hms(df1$smalltime)
df2$time <- ymd_hms(df2$time)
df2 %>%
inner_join(df1, by="id") %>%
mutate(time_diff = time - smalltime) %>%
filter(time_diff > 0) %>%
group_by(id) %>%
summarise(time = time[which.min(time_diff)])
Output is:
id time
1 1 2199-08-02 20:10:00
2 2 2150-11-13 15:10:00
Sample data:
df1 <- structure(list(id = 1:2, smalltime = c("2199-08-02 20:00:00",
"2150-11-13 15:00:00")), .Names = c("id", "smalltime"), class = "data.frame", row.names = c(NA,
-2L))
df2 <- structure(list(id = c(1L, 1L, 2L), time = c("2199-08-02 20:10:00",
"2199-08-02 19:00:00", "2150-11-13 15:10:00")), .Names = c("id",
"time"), class = "data.frame", row.names = c(NA, -3L))
You can try this way:
library(data.table)
library(purrr)
# convert to date time format
df1[, smalltime := ymd_hms(smalltime)]
df2[, time := ymd_hms(time)]
# merge df2 in df1 while grouping by df2 on id
df1[df2[, list(list(time)), .(id)], on = 'id', z := i.V1]
# check if the time is greater than df1 time
df1[, ans := map2(z, smalltime, function(x, y) lapply(x, function(j) as.character(j[j > y])))]
# extract the time (answer)
df1[, ans1 := map_chr(ans, 1)]
print(df1[,.(id, ans1)])
id ans1
1: 1 2199-08-02 20:10:00
2: 2 2150-11-13 15:10:00
> A=strptime(df1$smalltime,"%F %T")
> B=strptime(df2$time,"%F %T")
> d=findInterval(B,sort(A))
> unname(by(B,list(d,df2$id),function(x)format(min(x),"%F %T"))[unique(d)])
[1] "2199-08-02 20:10:00" "2150-11-13 15:10:00"

R - subset dataframe by Time only

I have been looking around but I still couldn't find a way to subset my dataframe by time, here is the sample data:
Duration End Date Start Date
228 2013-01-03 09:10:00 2013-01-03 09:06:00
1675 2013-01-04 17:34:00 2013-01-04 17:06:00
393 2013-01-04 17:54:00 2013-01-04 17:48:00
426 2013-01-04 11:10:00 2013-01-04 11:03:00
827 2013-01-01 16:13:00 2013-01-01 15:59:00
780 2013-01-01 16:13:00 2013-01-01 16:00:00
The End Date and Start Date are in POSIXct format, and here is what I have tried if I only what times between 8:00 to 9:30.
tm1 <- as.POSIXct("08:00", format = "%H:%M")
tm2 <- as.POSIXct("09:30", format = "%H:%M")
df.time <- with(df, df[format('Start Date', '%H:%M')>= tm1 & format('End Date', '%H:%M')< tm2, ])
but this returns an error. I have also tried this, but it didn't work as well.
df.time <- subset(df, format('Start Date', '%H:%M') >= '8:00' & format('End Date', '%H:%M') < '9:30'))
if anybody tell me what am I doing wrong? Thanks
Assuming that the start and end dates are always the same and only the times differ and you want those rows for which the time starts at or after 8:00 and ends before 9:30, convert the date/time values to characters strings of the form HH:MM and compare:
subset(DF, format(`Start Date`, "%H:%M") >= "08:00" &
format(`End Date`, "%H:%M") < "09:30")
giving:
Duration End Date Start Date
1 228 2013-01-03 09:10:00 2013-01-03 09:06:00
Note: We used the following for DF. (Next time please use dput to provide your data in reproducible form.)
DF <- structure(list(Duration = c(228L, 1675L, 393L, 426L, 827L, 780L
), `End Date` = structure(c(1357222200, 1357338840, 1357340040,
1357315800, 1357074780, 1357074780), class = c("POSIXct", "POSIXt"
), tzone = ""), `Start Date` = structure(c(1357221960, 1357337160,
1357339680, 1357315380, 1357073940, 1357074000), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("Duration", "End Date", "Start Date"
), row.names = c(NA, -6L), class = "data.frame")

Resources