How to count the number of days that pass between two dates in a dataset column in R - r

I am working with a dataset This is the dataset. In the dataset there are 33 unique Ids that are repeated for each day they provided data, within 30 days, from their fitbit. I am trying to count the number of days they input data through the ActivityDay column and group it to the Id, so that I can see how many total days they used their fitbit out of the 30 days.
the Activity date data type was originally POSIXct and I converted it to Date type. How can I count the dates as number or days and group it to each indvidual ID?
I tried using count within a dplyr::summarise to get the ID and number of days counted while grouping the data to the ID. that failed.
I also thought of using a case_when, however, I thought that wouldn't work because it wouldn't count all the way up to the end dates I specify, so anything between the two dates would get the ouputs I specified. I also tried count_date_between(min(user_device_activity), max(user_device_activity), by 'day') but it said that the function doesn't exist and when I tried installing it. It said it didn't exist within R.

library(dplyr)
user_device_activity %>%
distinct(Id, ActivityDate) %>% # in case duplicates possible in data
count(Id, month = lubridate::floor_date(ActivityDate, "month"))

Related

Creating new datasets from unique dates in R

I have a dataset of 2015 with every day of the year. In this dataset, there are actions that happen on any given day. Some days have more actions than others, therefore some days have many more entries than others.
I am trying to create a function that will create an individual dataset per day of the year without having to code 365 of these:
df <- subset(dataset, date== "2015-01-01")
I have looked at dyplyr's group_by(), however I do not want a summary per day, it is important that I get to see the whole observation on any given day for graphing purposes.

Average after 2 group_by's in R

I am new to R can't find the right syntax for a specific average I need. I have a large fitbit dataset of heartrate per second for 30 people, for a month each. I want an average of heartrate per day per person to make the data easier to manage and join with other fitbit data.
First few lines of Data
The columns I have are Id (person Id#), Time (Date-Time), and Value (Heartrate). I already separated Time into two columns, one for date and one for time only. My idea is to group the information by person, then by date and get one average number per person per day. But, my code is not doing that.
hr_avg <- hr_per_second %>% group_by(Id) %>% group_by(Date) %>% summarize(mean(Value))
As a result I get an average by date only. I can't do this manually because the dataset is so big, Excel can't open it. And I can't upload it to BigQuery either, the database I learned to use during my data analysis course. Thanks.

Difficulty in generating time series in R for my data set

So I am trying to generate time series for my dataset in R but finding difficulty in doing so. My dataset has two columns- one for date and other for price of a material. Now there are many dates which don't have price and hence are not in the dataset. Data is roughly for a year. NOw i am finding difficulty in setting the frequency and start time for the time series. Is there any way to set the start as per the dataset and time series automatically incorporates the missing data points.
the following is for a dataframe df with two columns called "date" and "price".
This will create missing dates and fill the missing prices for those dates as the preceding price. You can change fill('price') to fill with other specified values.
library(tidyverse)
df<-df %>%
complete(date = seq.Date(min(date), max(date), by="day")) %>%
fill('price')

Creating a single timestamp from separate DAY OF YEAR, Year and Time columns in R

I have a time series dataset for several meteorological variables. The time data is logged in three separate columns:
Year (e.g. 2012)
Day of year (e.g. 261 representing 17-September in a Leap Year)
Hrs:Mins (e.g. 1610)
Is there a way I can merge the three columns to create a single timestamp in R? I'm not very familiar with how R deals with the Day of Year variable.
Thanks for any help with this!
It looks like the timeDate package can handle gregorian time frames. I haven't used it personally but it looks straightforward. There is a shift argument in some methods that allow you to set the offset from your data.
http://cran.r-project.org/web/packages/timeDate/timeDate.pdf
Because you mentioned it, I thought I'd show the actual code to merge together separate columns. When you have the values you need in separate columns you can use paste to bring them together and lubridate::mdy to parse them.
library(lubridate)
col.month <- "Jan"
col.year <- "2012"
col.day <- "23"
date <- mdy(paste(col.month, col.day, col.year, sep = "-"))
Lubridate is a great package, here's the official page: https://github.com/hadley/lubridate
And here is a nice set of examples: http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
You should get quite far using ISOdatetime. This function takes vectors of year, day, hour, and minute as input and outputs an POSIXct object which represents time. You just have to split the third column into two separate hour minute columns and you can use the function.

R - fill in values for all dates

I have a data set with sales by date, where date is not unique and not all dates are represented: my data set has dates (the date of the sale), quantity, and totalprice. This is an irregular time series.
What I'd like is a vector of sales by date, with every date represented exactly once, and quantities and totalprice summed by date, with zeros where there are no sales.
I have part of this now; I can make a sequence containing all dates:
first_date=as.Date(min(dates))
last_date=as.Date(max(dates))
all_dates=seq(first_date, by=1, to=last_date)
And I can aggregate the sales data by sale date:
quantitybydate=aggregate(quantity, by=list(as.Date(dates)), sum)
But not sure what to do next. If this were python I'd loop through one of the dates arrays, setting or getting the related quantity. But this being R I suspect there's a better way.
Make a dataframe with the all_dates as a column, then merge with quantitybydate using the by variable columns as the by.y, and all.x=TRUE. Then replace the NA's by 0.

Resources