R aggregate a dataframe by hours from a date with time field - r

I'm relatively new to R but I am very familiar with Excel and T-SQL.
I have a simple dataset that has a date with time and a numeric value associated it. What I'd like to do is summarize the numeric values by-hour of the day. I've found a couple resources for working with time-types in R but I was hoping to find a solution similar to is offered excel (where I can call a function and pass-in my date/time data and have it return the hour of the day).
Any suggestions would be appreciated - thanks!

library(readr)
library(dplyr)
library(lubridate)
df <- read_delim('DateTime|Value
3/14/2015 12:00:00|23
3/14/2015 13:00:00|24
3/15/2015 12:00:00|22
3/15/2015 13:00:00|40',"|")
df %>%
mutate(hour_of_day = hour(as.POSIXct(strptime(DateTime, "%m/%d/%Y %H:%M:%S")))) %>%
group_by(hour_of_day) %>%
summarise(meanValue = mean(Value))
breakdown:
Convert column of DateTime (character) into formatted time then use hour() from lubridate to pull out just that hour value and put it into new column named hour_of_day.
> df %>%
mutate(hour_of_day = hour(as.POSIXct(strptime(DateTime, "%m/%d/%Y %H:%M:%S"))))
Source: local data frame [4 x 3]
DateTime Value hour_of_day
1 3/14/2015 12:00:00 23 12
2 3/14/2015 13:00:00 24 13
3 3/15/2015 12:00:00 22 12
4 3/15/2015 13:00:00 40 13
The group_by(hour_of_day) sets the groups upon which mean(Value) is computed in the via the summarise(...) call.
this gives the result:
hour_of_day meanValue
1 12 22.5
2 13 32.0

Related

How to use group_by without ordering alphabetically?

I'm trying to visualize some bird data, however after grouping by month, the resulting output is out of order from the original data. It is in order for December, January, February, and March in the original, but after manipulating it results in December, February, January, March.
Any ideas how I can fix this or sort the rows?
This is the code:
BirdDataTimeClean <- BirdDataTimes %>%
group_by(Date) %>%
summarise(Gulls=sum(Gulls), Terns=sum(Terns), Sandpipers=sum(Sandpipers),
Plovers=sum(Plovers), Pelicans=sum(Pelicans), Oystercatchers=sum(Oystercatchers),
Egrets=sum(Egrets), PeregrineFalcon=sum(Peregrine_Falcon), BlackPhoebe=sum(Black_Phoebe),
Raven=sum(Common_Raven))
BirdDataTimeClean2 <- BirdDataTimeClean %>%
pivot_longer(!Date, names_to = "Species", values_to = "Count")
You haven't shared any workable data but i face this many times when reading from csv and hence all dates and data are in character.
as suggested, please convert the date data to "date" format using lubridate package or base as.Date() and then arrange() in dplyr will work or even group_by
example :toy data created
birds <- data.table(dates = c("2020-Feb-20","2020-Jan-20","2020-Dec-20","2020-Apr-20"),
species = c('Gulls','Turns','Gulls','Sandpiper'),
Counts = c(20,30,40,50)
str(birds) will show date is character (and I have not kept order)
using lubridate convert dates
birds$dates%>%lubridate::ymd() will change to date data-type
birds$dates%>%ymd()%>%str()
Date[1:4], format: "2020-02-20" "2020-01-20" "2020-12-20" "2020-04-20"
save it with birds$dates <- ymd(birds$dates) or do it in your pipeline as follows
now simply so the dplyr analysis:
birds%>%group_by(Months= ymd(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
will give
# A tibble: 4 x 3
Months N Species_Count
<date> <int> <dbl>
1 2020-01-20 1 30
2 2020-02-20 1 20
3 2020-04-20 1 50
However, if you want Apr , Jan instead of numbers and apply as.Date() with format etc, the dates become "character" again. I woudl suggest you keep your data that way and while representing in output for others -> format it there with as.Date or if using DT or other datatables -> check the output formatting options. That way your original data remains and users see what they want.
this will make it character
birds%>%group_by(Months= as.character.Date(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
A tibble: 4 x 3
Months N Species_Count
<chr> <int> <dbl>
1 2020-Apr-20 1 50
2 2020-Dec-20 1 40
3 2020-Feb-20 1 20
4 2020-Jan-20 1 30

How can I split this date?

I am working with this dataset and I am trying to separate the 'Date' column into the day, month, and year but have run into a problem doing it because it has the month as a character value. Any help would be great.
Here's an image: Dataset
You can convert your Date column using as.Date(), specifying the format for the date; in this case, one option is "%d%B%y"
library(lubridate)
dataset = data.frame(Date=c("19MAY19","31MAY19"))
dataset %>% mutate(Date = as.Date(Date,"%d%B%y"),
y = year(Date),m=month(Date),d = day(Date))
Output:
Date y m d
1 2019-05-19 2019 5 19
2 2019-05-31 2019 5 31

Finding difference in time between two data frames using R

I have two data frame ,one is the in time of employees and the other is the out time of employees.The data in both the data frames have timestamps for about 4000 employees in the last one year(excludes weekend/public holiday dates).Each data frame has 4000 rows and 250 columns.I would like to find the number of hours spent by an employee each day at work basically my approach would be to find the difference in time between the two data frames using difftime() function.i used the below code and expected a resulting data frame containing 4000 rows and 250 columns with difference in time,however the data was returned in one single column.How should I deal with this problem so that I can get the difference in time between two data frames in the data frame format with 4000 rows and 250 columns?
hours_spent <- as.data.frame(as.matrix(difftime(as.matrix(out_time_data_hrs),as.matrix(in_time_data_hrs),unit='hour')))
Input data looks like below ,
In_time data frame
Out_time data frame
Expected output
Here's a small and simple example based on the data you posted and a possible solution:
# example data in_times
df1 = data.frame(`2018-08-01` = c("2018-08-01 10:30:00", "2018-08-01 10:25:00"),
`2018-08-02` = c("2018-08-02 10:20:00", "2018-08-02 10:45:00"))
# example data out_times
df2 = data.frame(`2018-08-01` = c("2018-08-01 17:33:00", "2018-08-01 18:06:00"),
`2018-08-02` = c("2018-08-02 17:11:00", "2018-08-02 17:45:00"))
library(tidyverse)
# reshape datasets
df1_resh = df1 %>%
mutate(empl_id = row_number()) %>% # add an employee id (using the row number)
gather(day, in_time, -empl_id) # reshape dataset
df2_resh = df2 %>%
mutate(empl_id = row_number()) %>%
gather(day, out_time, -empl_id)
# join datasets and calculate hours spent
left_join(df1_resh, df2_resh, by=c("empl_id","day")) %>%
mutate(hours_spent = difftime(out_time, in_time))
# empl_id day in_time out_time hours_spent
# 1 1 X2018.08.01 2018-08-01 10:30:00 2018-08-01 17:33:00 7.050000 hours
# 2 2 X2018.08.01 2018-08-01 10:25:00 2018-08-01 18:06:00 7.683333 hours
# 3 1 X2018.08.02 2018-08-02 10:20:00 2018-08-02 17:11:00 6.850000 hours
# 4 2 X2018.08.02 2018-08-02 10:45:00 2018-08-02 17:45:00 7.000000 hours
You can use this as the final piece of code if you want to reshape back to your initial format:
left_join(df1_resh, df2_resh, by=c("empl_id","day")) %>%
mutate(hours_spent = difftime(out_time, in_time)) %>%
select(empl_id, day, hours_spent) %>%
spread(day, hours_spent)
# empl_id X2018.08.01 X2018.08.02
# 1 1 7.050000 hours 6.85 hours
# 2 2 7.683333 hours 7.00 hours
my requirement is satisfied by just doing the below, pretty straight forward
employee_hrs_df <- out_time_data - in_time_data

Sum variable by month/year in r

I receive historical demand data from an excel spreadsheet in the following format:
Part Number Requested Date Quantity
123 01/24/2013 12:53 1
122 02/07/2013 09:57 1
122 02/14/2013 09:58 7
124 11/21/2012 12:46 1
I typically provide my management charts from excel by part like this but I want to be more proficient in R---doing hundreds of these at time
We can try with as.yearmon to create the grouping variable and then get the sum of 'Quantity' in summarise
library(zoo)
library(dplyr)
df1 %>%
group_by(PartNumber, yearMon = as.yearmon(RequestedDate, "%m/%d/%Y %H:%M")) %>%
summarise(Quantity = sum(Quantity))

Split Date to Day, Month and Year for ffdf Data in R

I'm using R's ff package with ffdf objects named MyData, (dim=c(10819740,16)). I'm trying to split the variable Date into Day, Month and Year and add these 3 variables into ffdf existing data MyData.
For instance: My Date column named SalesReportDate with VirtualVmode and PhysicalVmode = double after I've changed SalesReportDate to as.date(,format="%m/%d/%Y").
Example of SalesReportDate are as follow:
> B
SalesReportDate
1 2013-02-01
2 2013-05-02
3 2013-05-04
4 2013-10-06
5 2013-15-10
6 2013-11-01
7 2013-11-03
8 2013-30-02
9 2013-12-12
10 2014-01-01
I've refer to Split date into different columns for year, month and day and try to apply it but keep getting error warning.
So, is there any way for me to do this? Thanks in advance.
Credit to #jwijffels for this great solution:
require(ffbase)
MyData$SalesReportDateYear <- with(MyData["SalesReportDate"], format(SalesReportDate, "%Y"), by = 250000)
MyData$SalesReportDateMonth <- with(MyData["SalesReportDate"], format(SalesReportDate, "%m"), by = 250000)
MyData$SalesReportDateDay <- with(MyData["SalesReportDate"], format(SalesReportDate, "%d"), by = 250000)

Resources