Combining separate date time fields to one date_time field - r

I used the following R code to create a POSIXct date time field from a separate date and time field both in character format using lubridate and dplyr.
library(dplyr)
library(lubridate)
c_cycle_work <- tibble(
StartDate = c("1/28/2011", "2/26/2011", "4/2/2011", "4/11/2011"),
StartTime = c("10:58", "6:02", "6:00", "9:47")
)
c_cycle_work %>%
mutate(start_dt = paste0(StartDate, StartTime, sep = " ", collapse = NULL)) %>%
mutate(start_dt = mdy_hms(start_dt))
# 1 1/28/2011 10:58 2020-01-28 11:10:58
# 2 2/26/2011 6:02 2020-02-26 11:06:02
# 3 4/2/2011 6:00 2020-04-02 11:06:00
# 4 4/11/2011 9:47 2020-04-11 11:09:47
The start_dt field I created is in Y m d format even though I used mdy_hms based on the data. Also, all years have been changed to 2020.
Went over this several times, used paste vs. paste0, etc. but still stumped.

Your problem is the paste0() which doesn't have a sep= argument. So when you paste the date and time you get 1/28/201110:58 and it spilts that into 1/28/20/11/10/58 though it seemed to work differently with my version lubridate_1.6.0. Also you where use "hms" but your times didn't have seconds. This should work with your data
c_cycle_work %>%
mutate(start_dt = paste(StartDate, StartTime, sep=" ")) %>%
mutate(start_dt = mdy_hm(start_dt))
# StartDate StartTime start_dt
# <chr> <chr> <dttm>
# 1 1/28/2011 10:58 2011-01-28 10:58:00
# 2 2/26/2011 6:02 2011-02-26 06:02:00
# 3 4/2/2011 6:00 2011-04-02 06:00:00
# 4 4/11/2011 9:47 2011-04-11 09:47:00

Related

How to convert week numbers into date format using R

I am trying to convert a column in my dataset that contains week numbers into weekly Dates. I was trying to use the lubridate package but could not find a solution. The dataset looks like the one below:
df <- tibble(week = c("202009", "202010", "202011","202012", "202013", "202014"),
Revenue = c(4543, 6764, 2324, 5674, 2232, 2323))
So I would like to create a Date column with in a weekly format e.g. (2020-03-07, 2020-03-14).
Would anyone know how to convert these week numbers into weekly dates?
Maybe there is a more automated way, but try something like this. I think this gets the right days, I looked at a 2020 calendar and counted. But if something is off, its a matter of playing with the (week - 1) * 7 - 1 component to return what you want.
This just grabs the first day of the year, adds x weeks worth of days, and then uses ceiling_date() to find the next Sunday.
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
separate(week, c("year", "week"), sep = 4, convert = TRUE) %>%
mutate(date = ceiling_date(ymd(paste(year, "01", "01", sep = "-")) +
(week - 1) * 7 - 1, "week", week_start = 7))
# # A tibble: 6 x 4
# year week Revenue date
# <int> <int> <dbl> <date>
# 1 2020 9 4543 2020-03-01
# 2 2020 10 6764 2020-03-08
# 3 2020 11 2324 2020-03-15
# 4 2020 12 5674 2020-03-22
# 5 2020 13 2232 2020-03-29
# 6 2020 14 2323 2020-04-05

Is there a way to select rows based on a loose distinct?

I have a dataset with a lot of replicated rows, and I want to make a dataset with no replications. Date and time are the main ways of distinguishing between distinct and similar rows, but sometimes the times are a bit off. I want to reduce my dataset so that if 2 rows are within 1 hour of each other on the same day the second instance does not show up.
input_date<-c("4/20/2014", "5/15/2002", "3/12/2019", "3/12/2019", "3/12/2019", "3/12/2019")
input_time<-c("4:30", "4:30", "9:00", "9:55", "12:00", "12:00")
input<-cbind(input_date, input_time)
colnames(input)<-c("date", "time")
#use distinct to remove duplicate values--this removes final row, but I want it to also remove row 4.
output<-distinct(input, date, time)
Is there any easy way to tell R to get rid of rows with values that are close to each other but not exactly the same?
Here is an approach that rounds times to make groups based on the hour.
Then, use {dplyr} group_by / slice to get the first row of each group.
input_date <- c("4/20/2014", "5/15/2002", "3/12/2019", "3/12/2019", "3/12/2019", "3/12/2019")
input_time <- c("4:30", "4:30", "9:00", "9:55", "12:00", "12:00")
# make a data.frame
input <- data.frame(date =input_date, time = input_time)
# use dplyr for data manipulation of groups
library(dplyr, warn.conflicts = FALSE)
# take the 1st slice index from each group
input %>%
mutate(datetime = as.POSIXct(sprintf("%s %s", date, time),
format = "%m/%d/%Y %H:%M"),
hour = round(datetime, "hours")) %>%
group_by(hour) %>%
slice(1)
#> # A tibble: 5 x 4
#> # Groups: hour [5]
#> date time datetime hour
#> <chr> <chr> <dttm> <dttm>
#> 1 5/15/2002 4:30 2002-05-15 04:30:00 2002-05-15 05:00:00
#> 2 4/20/2014 4:30 2014-04-20 04:30:00 2014-04-20 05:00:00
#> 3 3/12/2019 9:00 2019-03-12 09:00:00 2019-03-12 09:00:00
#> 4 3/12/2019 9:55 2019-03-12 09:55:00 2019-03-12 10:00:00
#> 5 3/12/2019 12:00 2019-03-12 12:00:00 2019-03-12 12:00:00

Assigning values to all rows within a specific hour range using monthly data

I have a dataframe in the following format:
temp:
id time date
1 06:22:30 2018-01-01
2 08:58:00 2018-01-15
3 09:30:21 2018-01-30
The actual data set continues on for 9000 rows with obs for times throughout the month of January. I want to write a code that will assign each row a new value depending on which hour range the time variable belongs to.
A couple of example hour ranges would be:
Morning peak: 06:00:00 - 08:59:00
Morning: 09:00:00 - 11:59:00
The desired output would look like this:
id time date time_of_day
1 06:22:30 2018-01-01 MorningPeak
2 08:58:00 2018-01-15 MorningPeak
3 09:30:21 2018-01-30 Morning
I have tried playing around with time objects using the chron package using the following code to specify different time ranges:
MorningPeak <- temp[temp$Time >= "06:00:00" & temp$Time <= "08:59:59",]
MorningPeak$time_of_day <- "MorningPeak"
Morning <- temp[temp$Time >= "09:00:00" & temp$Time <= "11:59:59",]
Midday$time_of_day <- "Morning"
The results could then be merged and then manipulated to get everything in the same column. Is there a way to do this such that the desired result is generated and no extra data manipulation is required? I am interested in learning how to make my code more efficient.
You are comparing characters and not time/datetime objects, you need to convert it to date-time before comparison. It seems you can compare the hour of the day to get appropriate labels.
library(dplyr)
df %>%
mutate(hour = as.integer(format(as.POSIXct(time, format = "%T"), "%H")),
time_of_day = case_when(hour >= 6 & hour < 9 ~ "MorningPeak",
hour >= 9 & hour < 12 ~ "Morning",
TRUE ~ "Rest of the day"))
# id time date hour time_of_day
#1 1 06:22:30 2018-01-01 6 MorningPeak
#2 2 08:58:00 2018-01-15 8 MorningPeak
#3 3 09:30:21 2018-01-30 9 Morning
You can add more hourly criteria if needed.
We can also use cut
cut(as.integer(format(as.POSIXct(df$time, format = "%T"), "%H")),
breaks = c(-Inf, 6, 9, 12, Inf), right = FALSE,
labels = c("Rest of the day", "MorningPeak", "Morning", "Rest of the day"))

Build datetime column in R

I have 2 columns
one is date :
2011-04-13
2013-07-29
2010-11-23
the other is time :
3
22
15
I want to make a new column contains date time
it will be like this
2011-04-13 3:00:00
2013-07-29 22:00:00
2010-11-23 15:00:00
I managed to combine them as string
but when i convert them to datetime i get only date the time disappears
any idea how to get date and time in one column?
my script
data <- read.csv("d:\\__r\\hour.csv")
data$date <- as.POSIXct(paste(data$dteday , paste(data$hr, ":00:00", sep=""), sep=" "))
as example you can use ymd_hm function from lubridate:
a <- c("2014-09-08", "2014-09-08", "2014-09-08")
b <- c(3, 4, 5)
library(lubridate)
library(tidyverse)
tibble(a, b) %>%
mutate(time = paste0(a, " ", b, "-0"),
time = ymd_hm(time))
output would be:
# A tibble: 3 x 3
a b time
<chr> <dbl> <dttm>
1 2014-09-08 3 2014-09-08 03:00:00
2 2014-09-08 4 2014-09-08 04:00:00
3 2014-09-08 5 2014-09-08 05:00:00
found this fixed the problem
data$date <- as.POSIXct(strptime(paste(data$dteday , paste(data$hr, ":00:00", sep=""), sep=" "), "%Y-%m-%d %H:%M:%S"))

R: extract hour from variable format timestamp

My dataframe has timestamp with and without seconds, and a random use of 0 in front of months and hours, i.e. 01 or 1
library(tidyverse)
df <- data_frame(cust=c('A','A','B','B'), timestamp=c('5/31/2016 1:03:12', '05/25/2016 01:06',
'6/16/2016 01:03', '12/30/2015 23:04:25'))
cust timestamp
A 5/31/2016 1:03:12
A 05/25/2016 01:06
B 6/16/2016 01:03
B 12/30/2015 23:04:25
How to extract hours into a separate column? The desired output:
cust timestamp hours
A 5/31/2016 1:03:12 1
A 05/25/2016 01:06 1
B 6/16/2016 9:03 9
B 12/30/2015 23:04:25 23
I prefer the answer with tidyverse and mutate, but my attempt fails to extract hours correctly:
df %>% mutate(hours=strptime(timestamp, '%H') %>% as.character() )
# A tibble: 4 × 3
cust timestamp hours
<chr> <chr> <chr>
1 A 5/31/2016 1:03:12 2016-10-31 05:00:00
2 A 05/25/2016 01:06 2016-10-31 05:00:00
3 B 6/16/2016 01:03 2016-10-31 06:00:00
4 B 12/30/2015 23:04:25 2016-10-31 12:00:00
Try this:
library(lubridate)
df <- data.frame(cust=c('A','A','B','B'), timestamp=c('5/31/2016 1:03:12', '05/25/2016 01:06',
'6/16/2016 09:03', '12/30/2015 23:04:25'))
df %>% mutate(hours=hour(strptime(timestamp, '%m/%d/%Y %H:%M')) %>% as.character() )
cust timestamp hours
1 A 5/31/2016 1:03:12 1
2 A 05/25/2016 01:06 1
3 B 6/16/2016 09:03 9
4 B 12/30/2015 23:04:25 23
Here is a solution that appends 00 for the seconds when they are missing, then converts to a date using lubridate and extracts the hours using format. Note, if you don't want the 00:00 at the end of the hours, you can just eliminate them from the output format in format:
df %>%
mutate(
cleanTime = ifelse(grepl(":[0-9][0-9]:", timestamp)
, timestamp
, paste0(timestamp, ":00")) %>% mdy_hms
, hour = format(cleanTime, "%H:00:00")
)
returns:
cust timestamp cleanTime hour
<chr> <chr> <dttm> <chr>
1 A 5/31/2016 1:03:12 2016-05-31 01:03:12 01:00:00
2 A 05/25/2016 01:06 2016-05-25 01:06:00 01:00:00
3 B 6/16/2016 01:03 2016-06-16 01:03:00 01:00:00
4 B 12/30/2015 23:04:25 2015-12-30 23:04:25 23:00:00
Your timestamp is a character string (), you need to format is as a date (with as.Date for example) before you can start using functions like strptime.
You are going to have to go through some string manipulations to have properly formatted data before you can convert it to dates. Prepend a zero to months with a single digit and append :00 to hours with missing seconds. Use strsplit() and other regex functions. Afterwards do as.Date(df$timestamp,format = '%m/%d/%Y %H:%M:%S'), then you will be able to use strptime to extract the hours.

Resources