How to group by timestamp in UTC by day in R - r

So I have this sample of UTC timestamps and a bunch of other data. I would like to group my data by date. This means I do not need hours/mins/secs and would like to have a new df which shows the number of actions grouped together.
I tried using lubridate to pull out the date but I cant get the origin right.
DATA
hw0 <- read.table(text =
'ID timestamp action
4f.. 20160305195246 visitPage
75.. 20160305195302 visitPage
77.. 20160305195312 checkin
42.. 20160305195322 checkin
8f.. 20160305195332 searchResultPage
29.. 20160305195342 checkin', header = T)
Here's what I tried
library(dplyr)
library(lubridate) #this will allow us to extract the date
daily <- hw0 %>%
mutate(date=date(as.POSIXct(timestamp),origin='1970-01-01'))
daily <- daily %>%
group_by(date)
I am unsure what to use as an origin and my error says this value is incorrect. Ultimately, I expect the code to return a new df which features a variable (date) with a list of unique dates as well as how many of the different actions there are in each day.

Assuming the numbers at the end are 24 hour time based, you can use:
daily = hw0 %>%
mutate(date = as.POSIXct(as.character(timestamp), format = '%Y%m%d%H%M%S'))
You can use as.Date instead if you want to get rid of the hour times. You need to supply the origin when you give a numeric argument, which is interpreted as the number of days since the origin. In your case you should just give it a character vector and supply the date format.

Lubridate also has the ymd_hms() function that can extract the date, and the floor_date() function that would help.
library(tidyverse)
daily <- hw0 %>%
mutate(time = ymd_hms(timestamp, tz = 'UTC'),
date = floor_date(time, unit = 'day'))

lubridate also has parse_date_time which seems to be a nice mix of the above two solutions.
library(tidyverse)
library(lubridate)
hw0 %>%
mutate(timestamp = parse_date_time(timestamp, order = "%Y%m%d%H%M%S"))
ID timestamp action
1 4f.. 2016-03-05 19:52:46 visitPage
2 75.. 2016-03-05 19:53:02 visitPage
3 77.. 2016-03-05 19:53:12 checkin
4 42.. 2016-03-05 19:53:22 checkin
5 8f.. 2016-03-05 19:53:32 searchResultPage
6 29.. 2016-03-05 19:53:42 checkin

Related

R: Using the lubridate as.Dates function to convert YYYYMMDD to dates

Currently I am attempting to convert dates in the YYYYMMDD format to separate columns for year, month, and day. I know that using the as.Date function I can convert YYYYMMDD to YYYY-MM-DD, and work from there, however R is misinterpreting the dates and I'm not sure what to do. The function is converting the values into dates, but not correctly.
For example: R is converting '19030106' to '2019-03-01', when it should be '1903-01-06'. I'm not sure how to fix this, but this is the code I am using.
library(lubridate)
PrecipAll$Date <- as.Date(as.character(PrecipAll$YYYYMMDD), format = "%y%m%d")
YYYYMMDD is currently numeric, and I needed to include as.character in order for it to output a date at all, but if there are better solutions please help.
Additionally, if you have any tips on separating the corrected dates into separate Year, Month, and Date columns that would be greatly appreciated.
With {lubridate}, try ymd() to parse the YYYYMMDD varaible, regradless if it is in numeric or character form. Also use {lubridate}'s year, month, and day functions to get those variables as numeric signals.
library(lubridate)
PrecipAll <- data.frame(YYYYMMDD = c(19030106, 19100207, 20001130))
mutate(.data = PrecipAll,
date = lubridate::ymd(YYYYMMDD),
year = year(date),
month_n = month(date),
day_n = day(date))
YYYYMMDD date year month_n day_n
1 19030106 1903-01-06 1903 1 6
2 19100207 1910-02-07 1910 2 7
3 20001130 2000-11-30 2000 11 30

Sort date column and convert csv file to a time -series

I need your help. I am new to R, I have this csv file shorturl.at/chDK9 with the "All Share Index" from the Nigerian stock exchange, formatted in a matrix, with the months as rows and the years as columns.
I am trying to do 4 things:
Reshape the data, to four columns for Date, Month, Year, ASI
The period should be a date column in the format 01-2013 for January 2013 and so on.
Arrange the data by the date, oldest to newest
Convert the data to a time-series type for analysis (xts prefarably)
So far I have solved 1 & 2 above.
Please see my code below
rASI <- ASI_conv_to_USD_2003_2018
gathered.rASI <- gather(rASI, Month, ASI, -Year)
gathered.rASI$Date <- format(as.Date(paste0(gathered.rASI$Month, gathered.rASI$Year, "01"), format="%b%Y%d"), "%m-%Y")
ASI <- select(gathered.rASI, Date, Month, Year, ASI)
Created on 2020-06-04 by the reprex package (v0.3.0)
I do not know what I am doing wrong, but the date column still shows as a chr. How do I make the date column function as a proper date?
Any help would be greatly appreciated.
Data:
Year,January,February,March,April,May,June,July,August,September,October,November,December
2003,104.904946,108.036674,106.6532671,106.1211644,110.6369777,114.3109402,109.7382693,120.7042254,129.0513061,141.9747008,140.2999274,147.4647619
2004,168.4931751,184.3675093,171.8948949,194.2243976,209.6846881,218.4302457,204.5201028,179.6591854,171.788925,176.3957704,175.7856172,180.1624481
2005,174.3600786,165.874575,156.2704949,165.9616111,162.3373385,162.9130468,165.5409489,177.6973735,190.975969,200.5254592,189.5253288,187.4381323
2006,184.2754864,187.0039216,184.1151874,183.9374803,195.3248086,207.753217,220.2425152,261.5902624,257.3486166,257.9713924,257.9644269,262.3660079
2007,290.763576,321.9563671,344.0977116,373.70341,397.1224052,408.8450816,422.9554882,404.1068702,405.3995157,413.592025,462.4500768,498.6259673
2008,465.9093801,564.6059512,542.1712123,511.539673,507.3090565,481.7790407,457.4977173,411.7628813,398.2089436,312.9651073,284.4105236,240.5413384
2009,151.4739254,160.8334365,136.7210055,147.8068088,203.1480164,183.6687179,169.4245226,152.975866,150.2860646,146.6946313,142.143901,141.1054878
2010,152.3225241,155.1887111,175.6850474,178.6050908,176.5795117,171.5144595,174.5049291,163.103972,154.3394041,169.2037838,167.046543,166.6141118
2011,179.0501835,173.3762495,163.0327771,164.2939247,168.9634855,165.1146804,158.8889704,141.5247531,132.2063595,139.799399,128.3830306,132.7185019
2012,133.3492814,129.4949163,132.8047714,142.0467784,142.1346216,138.9576042,148.4574482,152.9350934,167.5144256,170.2584385,170.6456267,180.8386037
2013,205.1867431,213.04438,216.0144928,215.3981965,243.4601263,232.9424155,244.1989566,233.469857,235.6526892,242.2584675,250.74636,266.2963273
2014,261.3308857,254.8076651,249.6006828,247.9683695,267.1803131,273.6744186,271.1943568,267.5533724,265.4434783,241.8539225,209.9881459,206.9083582
2015,176.4899701,152.4243544,161.5512468,176.6316031,174.6074809,170.307101,153.589313,151.067888,158.9094935,148.4871247,140.5468193,145.7620865
2016,121.710687,125.041883,128.7848346,127.5440712,140.7794402,104.7709381,89.631776,90.34052373,92.97916325,89.3927422,83.19668309,88.25819376
2017,85.43474979,83.04616393,83.42762792,84.35732766,96.74749098,108.4396857,120.8084876,116.2751597,116.1014906,120.1450704,124.20491,125.1822913
2018,145.2937418,141.8812705,136.0134688,135.2162844,124.7488623,125.4006552,121.2306533,114.014232,107.1321563,106.06426,100.7971596,102.5464927
Here might be a way out, gathering your data (i.e., changing them from wide to long), creating a date variable and only then translating the result to xts.
## This assumes that you already have written the data frame (as in your example)
myxts <- ASI_conv_to_USD_2003_2018 %>%
## gather changes the data from wide to long
tidyr::gather("month","value",-Year) %>%
## dmy creates the date variable
mutate(dat = paste0("01 ",month," ",Year) %>% lubridate::dmy()) %>%
## keep only the date and the value
select(dat, value) %>%
## sort by date (not compulsory)
arrange(dat) %>%
## convert to xts (note that xts::as_xts() is deprecated)
timetk::tk_xts(select=value,date_var=dat)

Can I match a character string containing m-d with a date vector in R?

All, Ive seen that date conversion questions get downvoted a lot, but I couldn't find any information online or in the help files...
I have a df with a date formatted as ymd_hm() and then some data in other columns. Then I have another df with 366 row, one for each day, and a column containing some values relevant for that day (some climatological stuff, that is essentially the same every year, so the year doesn't matter). The dfs might look something like this:
df1 <- tibble(Date=seq(ymd_hm('2010-05-01 00:00'),ymd_hm('2010-05-03 00:00'), by = 'hour'), Data=c(1:length(Date)))
df2 <- tibble(MonthDay=c("04-30", "05-01", "05-02","05-03","05-04"), OtherData=c(20,30,40,50, 60))
Now, is it possible to do some lookup sort of thing and match Date and MonthDay and then write whatever OtherData is into df1? I'm struggling since I can't convert MonthDay to a date.
So, all the 2010-05-01 dates should have 30 next to them, all 2010-05-02 dates should have 40 in the next column, and so on and so forth...
Thanks y'all!
We extract the 'MondayDay' with format, use that as common joining column in left_join
library(dplyr)
df1 %>%
mutate(MonthDay = format(Date, "%m-%d")) %>%
left_join(df2) %>%
select(-MonthDay)

How do I manipulate a datetime variable imported from Excel into R

I am importing multiple Excel sheets to R using readxl. Each of these sheets contains observations of transactions which include DateOfEvent and TimeOfEvent fields.
When I import the time field, R converts it to a POSIXct object based on the date being from Excel Day 0 - i.e. 1899-12-31 0:0:0
e.g. dat <- data.frame(date=Sys.Date()+0:1, time=as.POSIXct(c(10,11), origin="1899-12-31"))
With the data in a data frame, using a dplyr step to clean my data, how would I -
Use lubridate to recode the date part of the variable using the DateOfEvent value?
Keep the times but make them independent of date so that I can compare events occurring in time buckets across different days (i.e. drop the 1899 date but format the date so that I can perform cross day comparisons)?
Use update() to change the year in time.
Use hms::as.hms() if you want to extract just the time object from time (this will convert to UTC):
library(tidyverse)
dat %>%
mutate(time = update(time,
year = year(date),
month = month(date),
day = day(date)),
hms = hms::as.hms(time))
date time hms
1 2018-06-02 2018-06-02 16:00:10 23:00:10
2 2018-06-03 2018-06-03 16:00:11 23:00:11

Calculate average daily value from large data set with R standard format date/times?

I have a dataframe of approximately 10 million rows spanning about 570 days. After using striptime to convert the dates and times, the data looks like this:
date X1
1 2004-01-01 07:43:00 1.2587
2 2004-01-01 07:47:52 1.2585
3 2004-01-01 17:46:14 1.2586
4 2004-01-01 17:56:08 1.2585
5 2004-01-01 17:56:15 1.2585
I would like to compute the average value on each day (as in days of the year, not days of the week) and then plot them. Eg. Get all rows which have day "2004-01-01", compute average price, then do the same for "2004-01-2" and so on.
Similarly I would be interested in finding the average monthly value, or hourly price, but I imagine I can work these out once I know how to get average daily price.
My biggest difficulty here is extracting the day of the year from the date variable automatically. How can I cycle through all 365 days and compute the average value for each day, storing it in a list?
I was able to find the average value for day of the week using the weekdays() function, but I couldn't find anything similar for this.
Here's a solution using dplyr and lubridate. First, simplify the date by rounding it down to the nearest day-unit using floor_date (see below comment by thelatemail), then group_by date and calculate the mean value using summarize:
library(dplyr)
library(lubridate)
df %>%
mutate(date = floor_date(date)) %>%
group_by(date) %>%
summarize(mean_X1 = mean(X1))
Using the lubridate package, you can use a similar method to get the average by month, week, or hour. For example, to calculate the average by month:
df %>%
mutate(date = month(date)) %>%
group_by(date) %>%
summarize(mean_X1 = mean(X1))
And by hour:
df %>%
mutate(date = hour(date)) %>%
group_by(date) %>%
summarize(mean_X1 = mean(X1))
day of year in lubridate is
yday, as in
lubridate::yday(Sys.time())
because the size of data is big I recommend a data.table approach
library(lubridate)
library(data.table)
df$ydate=yday(df$date)
df=data.table(df)
df[,mean(X1),ydate]
if you want different days for different years as in 1Jan2004 and 1Jan2005
library(lubridate)
library(data.table)
df$ydate=ymd(df$date)
df=data.table(df)
df[,mean(X1),ydate]
Note -instead of using striptime to convert dates you could just use ymd_hms function from lubridate
Just to contribute, here is the solution to do it for multiple columns in your data frame. It consists of the same method as George, so a little more is added an using summarise:
new_df <- df %>% mutate(date = hour(date)) %>%
group_by(date) %>%
summarise(across(.cols = where(is.numeric), .fns = ~mean(.x, na.rm = TRUE))
In this case, in ".cols" it is specified that the operation be applied to all columns with numeric format (you can modify it for specific columns). In the ".fns" section you can put the operation you want to perform (mean, sd, etc.) and you can apply na.rm.
Greetings!

Resources