I am struggling hard with date time formatting in R. I am sure this is an easy fix... can someone write me a line of code that will convert all values from Year, M, D, Time into a new column "datetime"?
What data looks like:
x year m d time
A 2019 2 23 11:12 PM
B 2019 1 31 2:04 PM
C 2018 12 31 12:01 AM
D 2017 2 1 10:14 AM
What I want:
x datetime
A 2/23/19 11:12 PM
B 1/31/19 11:12 PM
C 12/31/18 12:01 AM
D 2/23/17 10:14 PM
Since it's a datetime value we can convert it into a standard format by pasting the values together.
df$datetime <- with(df, as.POSIXct(paste(year, m, d, time),
format = "%Y %m %d %I:%M %p", tz = "UTC"))
df
# x year m d time datetime
#1 A 2019 2 23 11:12PM 2019-02-23 23:12:00
#2 B 2019 1 31 2:04PM 2019-01-31 14:04:00
#3 C 2018 12 31 12:01AM 2018-12-31 00:01:00
#4 D 2017 2 1 10:14AM 2017-02-01 10:14:00
Or using lubridate
library(dplyr)
library(lubridate)
df %>% mutate(datetime = ymd_hm(paste(year, m, d, time)))
data
df <- structure(list(x = structure(1:4, .Label = c("A", "B", "C", "D"
), class = "factor"), year = c(2019L, 2019L, 2018L, 2017L), m = c(2L,
1L, 12L, 2L), d = c(23L, 31L, 31L, 1L), time = c("11:12 PM",
"2:04 PM", "12:01 AM", "10:14 AM")), row.names = c(NA, -4L), class = "data.frame")
I think the below should work for your goal:
df <- data.frame(datetime = apply(df,1, function(v) sprintf("%s/%s/%s %s",v["d"], v["m"], v["year"], v["time"])))
If you want to append the new column to the existing data.frame df, then use:
df$datetime <- apply(df,1, function(v) sprintf("%s/%s/%s %s",v["d"], v["m"], v["year"], v["time"]))
Related
Background:
I have a dataset, df,
Date Duration
1/2/2020 5:00:00 PM 20
1/2/2020 5:30:01 PM 30
1/2/2020 6:00:00 PM 10
1/5/2020 7:00:01 AM 5
1/6/2020 8:00:00 AM 2
1/6/2020 9:00:00 AM 8
Desired Output:
Date Total_Duration Count
1/2/2020 60 3
1/5/2020 5 1
1/6/2020 10 2
Dput:
structure(list(Date = structure(1:6, .Label = c("1/2/2020 5:00:00 PM",
"1/2/2020 5:30:01 PM", "1/2/2020 6:00:00 PM", "1/5/2020 7:00:01 AM",
"1/6/2020 8:00:00 AM", "1/6/2020 9:00:00 AM"), class = "factor"),
Duration = c(20L, 30L, 10L, 5L, 2L, 8L)), class = "data.frame", row.names = c(NA,
-6L))
What I have tried:
library(dplyr)
df %>% group_by(Date) %>% add_tally() %>%
summarize(Duration)
Any guidance will be helpful.
We can get the Date only part from the 'Date' after converting to 'DateTime' with dmy_hms (assuming the format is DD/MM/YYYYY HH::MM:SS), use that as grouping variable and get the sum of 'Duration' and 'Count' as the n()
library(dplyr)
library(lubridate)
df %>%
group_by(Date = as.Date(dmy_hms(Date))) %>%
summarise(Total_Duration = sum(Duration), Count = n())
# A tibble: 3 x 3
# Date Total_Duration Count
# <date> <int> <int>
#1 2020-02-01 60 3
#2 2020-05-01 5 1
#3 2020-06-01 10 2
Objective:
I have a dataset, df, that I wish to first tally up the number of occurrences for each date and then multiply the output by a certain number.
Sent Duration Length
1/7/2020 8:11:00 PM 34 216
1/22/2020 7:51:05 AM 432 111
1/7/2020 1:35:08 AM 57 90
1/22/2020 3:43:26 AM 22 212
1/22/2020 4:00:00 AM 55 500
Desired Outcome:
Date Count Aggregation(80)
1/7/2020 2 160
1/22/2020 3 240
I wish to count the number of times a particular 'datetime' occurs and then multiply this outcome by 80. The date, 1/7/2020 occurs twice, and the date of 1/22/2020, occurs three times. I am then multiplying this number count by the number 80.
The dput is:
structure(list(Sent = structure(c(5L, 3L, 4L, 1L, 2L), .Label = c("1/22/2020 3:43:26 AM",
"1/22/2020 4:00:00 AM", "1/22/2020 7:51:05 PM", "1/7/2020 1:35:08 AM",
"1/7/2020 8:11:00 PM"), class = "factor"), Duration = c(34L,
432L, 57L, 22L, 55L), length = c(216L, 111L, 90L, 212L, 500L)), class = "data.frame", row.names = c(NA,
-5L))
This is what I have tried:
df1<- aggregate(df$Sent, by=list(Category= df$dSent),
FUN=length)
However, I need to output the frequency that the dates occurs along with the aggregation (multiply by 80)
Any suggestions are welcome.
We can convert Sent to POSIXct format and extract the date, count the number of rows in each date and multiply it by 80. Using dplyr, we can do it as :
library(dplyr)
df %>%
group_by(Date = as.Date(lubridate::mdy_hms(Sent))) %>%
summarise(Count = n(), `Aggregation(80)` = Count * 80)
# Date Count `Aggregation(80)`
# <date> <int> <dbl>
#1 2020-01-07 2 160
#2 2020-01-22 3 240
Using table.
as.data.frame(cbind(Count=(r <- table(as.Date(df$Sent, format="%m/%d/%Y %H:%M:%S"))),
Agg=r*80))
# Count Agg
# 2020-01-07 2 160
# 2020-01-22 3 240
or
`rownames<-`(as.data.frame(cbind(Count=(r <- table(as.Date(df$Sent, format="%m/%d/%Y %H:%M:%S"))),
Agg=r*80, Date=names(r)))[c(3, 1:2)], NULL)
# Date Count Agg
# 1 2020-01-07 2 160
# 2 2020-01-22 3 240
Here is the data.table way of things..
code
library( data.table )
#set data as data.table
setDT(mydata)
#set timestamps as posix
mydata[, Sent := as.POSIXct( Sent, format = "%m/%d/%Y %H:%M:%S %p" ) ]
#summarise
mydata[, .(Count = .N, Aggregation = .N * 80), by = .(Date = as.Date(Sent) )]
output
# Date Count Aggregation
# 1: 2020-01-07 2 160
# 2: 2020-01-22 3 240
I have a dataset, df, The Date column consists of dates from December and January. I would like to filter and make a new dataset with dates only from January onward.
Date ID
12/20/2019 1:00:01 AM A
12/30/2019 2:00:02 AM B
01/01/2020 1:00:00 AM C
02/05/2020 2:00:05 AM D
I would like this:
Date ID
01/01/2020 1:00:00 AM C
02/05/2020 2:00:05 AM D
Can I use dplyr with this? or Base R
library(lubridate)
library(tidyverse)
filter(Date) >= 01-01-2020 ?
dput is
structure(list(Date = structure(c(2L, 3L, 1L, 4L), .Label = c("1/1/2020 1:00:00 AM",
"12/20/2019 1:00:01 AM", "12/30/2019 2:00:02 AM", "2/5/2020 2:00:05 AM"
), class = "factor"), ID = structure(1:4, .Label = c("A", "B",
"C", "D"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
Maybe just filter on year and select datest from 2020?
library(dplyr)
library(lubridate)
df %>% mutate(Date = mdy_hms(Date)) %>% filter(year(Date) >= 2020)
# Date ID
#1 2020-01-01 01:00:00 C
#2 2020-02-05 02:00:05 D
Or using base R :
subset(transform(df, Date = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p")),
as.integer(format(Date, "%Y")) >= 2020)
We can use subset with strptime in base R
subset(df1, strptime(Date, "%m/%d/%Y %I:%M:%S %p")$year + 1900 >=2020)
# Date ID
#3 1/1/2020 1:00:00 AM C
#4 2/5/2020 2:00:05 AM D
I have a dataset, df1, I would like to convert all the values from the 24 hour clock to UTC.
Date Name
1/2/2020 16:46 A
1/2/2020 16:51 B
I Would like
Date Name
1/2/2020 4:46:47 PM A
1/2/2020 4:51:44 PM B
I have tried:
df$Date<- format(df$Date, "%m/%d/%Y %I:%M:%S %p")
dput:
structure(list(Date = structure(1:2, .Label = c("1/2/2020 16:46",
"1/2/2020 16:51"), class = "factor"), Name = structure(1:2, .Label = c("A",
"B"), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
You can first convert the data to POSIXct format and then use format to get data in the required format.
df$Date <- format(as.POSIXct(df$Date, format = "%m/%d/%Y %H:%M"),
"%m/%d/%Y %I:%M:%S %p")
#Can also use mdy_hm from lubridate
#df$Date <- format(lubridate::mdy_hm(df$Date), "%m/%d/%Y %I:%M:%S %p")
df
# Date Name
#1 01/02/2020 04:46:00 PM A
#2 01/02/2020 04:51:00 PM B
Assuming you want to actually convert a string in one format to a string in another format rather than having it as a (more useful) actual date/time, you can use a little arithmetic and string chopping along with mapply:
splits <- strsplit(as.character(df$Date), " |:")
Hours <- as.numeric(sapply(splits, `[`, 2))
AMPM <- c(" AM", " PM")[Hours %/% 12 + 1]
Hours <- Hours %% 13 + Hours %/% 13
df$Date <- mapply(function(x, y, z) paste0(x[1], " ", y, ":", x[3], z), splits, Hours, AMPM)
df
#> Date Name
#> 1 1/2/2020 4:46 PM A
#> 2 1/2/2020 4:51 PM B
Created on 2020-02-26 by the reprex package (v0.3.0)
Assuming the same assumptions as the previous answer by Allan, here is another way of converting from 24 hour to 12 hour.
library(tidyverse)
library(lubridate)
df <- tibble(
date = c(ymd_hms("2020/01/02 16:46:00", "2020/01/02 16:51:00", tz = "UTC")),
name = c("A", "B")
)
df %>%
mutate(date_hour = hour(date),
am_pm = if_else(date_hour > 12, "PM", "AM"),
date_hour = if_else(date_hour > 12, date_hour - 12, date_hour - 0),
newdatetime = paste0(date(date), " ", date_hour , ":", minute(date), " ", am_pm)) %>%
select(-c(date_hour, am_pm))
df
# A tibble: 2 x 3
date name newdatetime
<dttm> <chr> <chr>
1 2020-01-02 16:46:00 A 2020-01-02 4:46 PM
2 2020-01-02 16:51:00 B 2020-01-02 4:51 PM
Hope this helps!
I have one text file that look like:
wd <- read.table("C:\\Users\\value.txt", sep ='' , header =TRUE)
head(wd) # hourly values
# Year day hour mint valu1
# 1 2002 1 7 30 0.5
# 2 2002 1 8 0 0.3
# 3 2002 1 8 30 0.4
I want to add another column with format od date like this:
"2002-01-01 07:30:00 UTC"
Thanks for your help
Try this. No packages are used:
transform(wd,
Date = as.POSIXct(paste(Year, day, hour, mint), format = "%Y %j %H %M", tz = "UTC")
)
## Year day hour mint valu1 Date
## 1 2002 1 7 30 0.5 2002-01-01 07:30:00
## 2 2002 1 8 0 0.3 2002-01-01 08:00:00
## 3 2002 1 8 30 0.4 2002-01-01 08:30:00
Note: Input is:
wd <- structure(list(Year = c(2002L, 2002L, 2002L), day = c(1L, 1L,
1L), hour = c(7L, 8L, 8L), mint = c(30L, 0L, 30L), valu1 = c(0.5,
0.3, 0.4)), .Names = c("Year", "day", "hour", "mint", "valu1"
), class = "data.frame", row.names = c(NA, -3L))
You might be able to simplify things with a package like lubridate but I think to illustrate the solution this will work for you. Next time it would save time for people answering if you provide code to create the sample data like I've done here.
d <- read.table(header=T, stringsAsFactors=F, text="
Year day hour mint valu1
2002 1 7 30 0.5
2002 1 8 0 0.3
2002 1 8 30 0.4
")
require(stringr)
d$datetime <- strptime(
paste0(
d$Year, "-",
str_pad(d$day,3,pad="0"),
str_pad(d$hour,2,pad="0"),
":",
str_pad(d$mint, 2, pad="0")
),
format="%Y-%j %H:%M"
)