How to extract the hour component in R - r

I have following code and as you can see system calculating the difference correctly, but I would like consider only first 5 character i.e. 3.449 etc and I like to discard the 'hours' part.
Once I try to convert the same as character, then it not working.
Any help will be appreciated.
library(dplyr)
df %>%
mutate(difference = difftime(DateTime_Stat, DateTime_End, units = 'hours'))
# DateTime_Start DateTime_End difference
#1 2021-02-02 16:42:11 2021-02-02 16:43:15 0.000000 hours
#2 2021-02-02 20:10:14 2021-02-02 20:11:55 3.449754 hours
I have tried to convert the value to character type so I can use substr() to extract the values but system is failing to convert it to as character.

You can convert to numeric and round:
round(as.numeric(df$difference), 3)

Related

I have a ton of dates (listed as characters in a column) on a .csv file. How do I convert them to dates without having to list all of them?

I'm quite new to R, but I'm especially new to working with dates. I've got a .csv file with a column titled, "Date." Dates are listed as "9_12_22," etc., etc. I see that a lot of solutions use strings to work with dates, like:
strDates <- c("01/05/1965", "08/16/1975")
dates <- as.Date(strDates, "%m/%d/%Y")
And you use the lubridate functions to do so. This makes sense. But I have well over a hundred dates (big psych project) and I need to convert these dates seamlessly and automatically (I'm not going to write out a hundred-variable string!)
What's more is that I'd like to convert these dates to the standard R date format rather than the awkward underscore situation. There's not much I can do to change the .csv file (the output data is set in stone).
#Script example
Data_Dates <- read.csv("myfilename.csv")
I've tried using gsub to get rid of the underscores:
gsub("_", "/", Data_Dates, fixed=TRUE)
This did not work.
I've also tried manually reading in the dates:
Data_Dates <- read.csv("filename.csv") %>%
select(Date) %>%
mdy("01_14_2022","01_21_2022","02_03_2022","02_07_2022","02_09_2022","02_16_2022","02_17_2022","02_21_2022","02_24_2022","03_03_2022","03_16_2022","03_21_2022","03_23_2022","03_24_2022","03_25_2022","03_31_2022","04_04_2022","04_06_2022",etc.)
THIS TAKES FOREVER! I quit after a while because it seems futile.
Your problems are confusions with syntax, not any issues with the dates.
Let's say your data frame has a column named Date with dates in the format m_d_y. Here's some sample data:
Data_Dates = data.frame(Date =
c("01_14_2022","01_21_2022","02_03_2022","02_07_2022","02_09_2022",
"02_16_2022","02_17_2022","02_21_2022","02_24_2022","03_03_2022",
"03_16_2022","03_21_2022","03_23_2022","03_24_2022","03_25_2022",
"03_31_2022","04_04_2022","04_06_2022"
))
What you want to do is apply the lubridate::mdy function to that column. With dplyr, we use mutate to create new columns or edit existing columns.
library(dplyr)
library(lubridate)
Data_Dates %>%
mutate(Nice_Date = mdy(Date))
# Date Nice_Date
# 1 01_14_2022 2022-01-14
# 2 01_21_2022 2022-01-21
# 3 02_03_2022 2022-02-03
# 4 02_07_2022 2022-02-07
# 5 02_09_2022 2022-02-09
# 6 02_16_2022 2022-02-16
# 7 02_17_2022 2022-02-17
# 8 02_21_2022 2022-02-21
# 9 02_24_2022 2022-02-24
# 10 03_03_2022 2022-03-03
# 11 03_16_2022 2022-03-16
# 12 03_21_2022 2022-03-21
# 13 03_23_2022 2022-03-23
# 14 03_24_2022 2022-03-24
# 15 03_25_2022 2022-03-25
# 16 03_31_2022 2022-03-31
# 17 04_04_2022 2022-04-04
# 18 04_06_2022 2022-04-06
This works fine and very quickly.
Do note that I did not assign the result with <- or =, so the result prints but is not saved. If you want to assign the result to a new object, give it a name, e.g., Nice_Data_Dates <- Data_Dates %>% mutate..., or you can assign it to the same name to modify the existing object, Data_Dates <- Data_Dates %>% mutate....
Probably you do not want to select() the Date column, as that will select only the date column and drop the other columns. Probably you want to mutate the date column in place, as I have here.

How to convert time to standard format and calculate time difference

newdf=data.frame(date=as.Date(c("2021-01-04","2021-01-05","2021-01-06","2021-01-07")),
time=c("10:32:29","11:25","12:18:42","09:58"))
This is my data frame. I want to calculate time difference between two consecutive days in hours. Could you please suggest a method to calculate? Note, some time values do not contain seconds. So, first we have to convert it to standard form. Could you please give me a method to solve all these problems. This is completely R programming.
Paste date and time together in one column, use parse_date_time to change the time value in standard format (Posixct) and use difftime to calculate difference between consecutive time in hours.
library(dplyr)
library(tidyr)
library(lubridate)
newdf %>%
unite(datetime, date, time, sep = ' ') %>%
mutate(datetime = parse_date_time(datetime, c('Ymd HMS', 'Ymd HM')),
difference_in_hours = round(as.numeric(difftime(datetime,
lag(datetime), 'hours')), 2))
# datetime difference_in_hours
#1 2021-01-04 10:32:29 NA
#2 2021-01-05 11:25:00 24.88
#3 2021-01-06 12:18:42 24.90
#4 2021-01-07 09:58:00 21.66

Extract date and time from datetime field in R

I have a dateset that looks like this, the readingdate is in POSIXct format. I want to extract date in one field and time in another field in R. I'm trying to avoid using base R as much as possible so if you can do this that'ld be great (lubridate ). I want newly extracted fields to be in the right format because my ultimate goal is to plot the time(x) against total items sold (y) in order to determine what time of the day the highest sale is made. Thanks for your help.
If I understood well, R can read correctly your dates and times as you import your data (because they are in POSIXct format), but you can not extract the date and the time in the right format from your date-time column.
Considering that you have a data.frame in R, like this:
date_time Sold
1 2020-01-01 03:16:01 2
2 2020-01-02 02:15:12 2
3 2020-01-03 08:26:11 3
4 2020-01-04 09:29:14 2
5 2020-01-05 12:06:06 1
6 2020-01-06 08:08:11 3
Lubridate does not offer a function to extract the time component, so you have to extract piece by piece with the minute(), hour() and second() functions. Then you can just concatenate these components with paste() function. Now, with the dates, you can use the date() function to extract then, after that, you use the format() function to format these dates in the way you want.
library(lubridate)
library(dplyr)
library(magrittr)
tab <- tab %>%
mutate(
date = as.Date(date_time),
hour = hour(date_time),
minute = minute(date_time),
second = second(date_time)
) %>%
mutate(
format_date = format(date, "%m/%d/%Y"),
format_hour = paste(hour, minute, second, sep = ":")
)
Resulting this:
tab %>% select(format_date, format_hour) %>% head()
format_date format_hour
1 01/01/2020 12:4:23
2 01/02/2020 3:19:13
3 01/03/2020 8:6:24
4 01/04/2020 6:28:2
5 01/05/2020 2:16:20
6 01/06/2020 12:8:28

A way to make as.Date a bit more careful with formats

I Just found out that R's as.Date is a very forgiving function.
Here is a simple dataset of start/end days:
df <- data.frame(start = c("02-03-2020","04-05-2020", "06-01-2002", "13-09-2020"),
end = c("12-07-2020","04-06-2020", "26-02-2020", "11-10-2020"))
I've got a code who parsed it as follows:
df %>% mutate_all(function(x) as.Date(x = x, format = "%d-%m-%y"))
# start end
#1 2020-03-02 2020-07-12
#2 2020-05-04 2020-06-04
#3 2020-01-06 2020-02-26
#4 2020-09-13 2020-10-11
No warnings at all, data look nice, but as you can see there is a typo in the format argument. It should be a full 4 digits year %Y: %d-%m-%Y. Nonetheless, as.Date doesn't see a problem and takes the first two digits from the year number.
My question is: Do you know of any way to prevent such mistakes? Is there a way to force as.Date to control the full compatibility of arguments and the specified format, so that I can notice that I made this typo in the moment of the function execution?
Thanks in advance!

Subset dataframe in r for a specific month and date

I have a dataframe that looks like this:
V1 V2 V3 Month_nr Date
1 2 3 1 2017-01-01
3 5 6 1 2017-01-02
6 8 9 2 2017-02-01
6 8 9 8 2017-08-01
and I want to take all variables from the data set that have Month=1 (January) and date from 2017-01-01 til 2017-01-31 (so end of January), which means that I want to take the dates as well. I would create a column with days but I have multiple observations for one day and this would be even more confusing. I tried it with this:
df<- filter(df,df$Month_nr == 1, df$Date > 2017-01-01 && df$Date < 2017-01-31)
but it did not work. I would appreciate so much your help! I am desperate at this point. My dataset has measurements for an entire year (from 1 to 12) and hence I filter for months.
The problem is that you didn't put quotation marks around 2017-01-01. Directly putting 2017-01-01 will compute the subtraction and return a number, and then you're comparing a string to a number. You can compare string to string; with string, "2" is still greater than "1", so it would work for comparing dates as strings. BTW, you don't need to write df$ when using filter; you can directly write the column names without quoting when using the tidyverse.
Why do you need to have the month as well as dates in the filter? Just the filter on the dates would work fine. However, you will have to convert the date column into a date object. You can do that as follows:
df$Date_nr <- as.Date(df$Date_nr, format = "%Y-%m-%d")
df_new <- subset(df, Date_nr >= "2017-01-01" & Date_nr <= "2017-01-31")

Resources