Average after 2 group_by's in R - r

I am new to R can't find the right syntax for a specific average I need. I have a large fitbit dataset of heartrate per second for 30 people, for a month each. I want an average of heartrate per day per person to make the data easier to manage and join with other fitbit data.
First few lines of Data
The columns I have are Id (person Id#), Time (Date-Time), and Value (Heartrate). I already separated Time into two columns, one for date and one for time only. My idea is to group the information by person, then by date and get one average number per person per day. But, my code is not doing that.
hr_avg <- hr_per_second %>% group_by(Id) %>% group_by(Date) %>% summarize(mean(Value))
As a result I get an average by date only. I can't do this manually because the dataset is so big, Excel can't open it. And I can't upload it to BigQuery either, the database I learned to use during my data analysis course. Thanks.

Related

Creating new datasets from unique dates in R

I have a dataset of 2015 with every day of the year. In this dataset, there are actions that happen on any given day. Some days have more actions than others, therefore some days have many more entries than others.
I am trying to create a function that will create an individual dataset per day of the year without having to code 365 of these:
df <- subset(dataset, date== "2015-01-01")
I have looked at dyplyr's group_by(), however I do not want a summary per day, it is important that I get to see the whole observation on any given day for graphing purposes.

How to count the number of days that pass between two dates in a dataset column in R

I am working with a dataset This is the dataset. In the dataset there are 33 unique Ids that are repeated for each day they provided data, within 30 days, from their fitbit. I am trying to count the number of days they input data through the ActivityDay column and group it to the Id, so that I can see how many total days they used their fitbit out of the 30 days.
the Activity date data type was originally POSIXct and I converted it to Date type. How can I count the dates as number or days and group it to each indvidual ID?
I tried using count within a dplyr::summarise to get the ID and number of days counted while grouping the data to the ID. that failed.
I also thought of using a case_when, however, I thought that wouldn't work because it wouldn't count all the way up to the end dates I specify, so anything between the two dates would get the ouputs I specified. I also tried count_date_between(min(user_device_activity), max(user_device_activity), by 'day') but it said that the function doesn't exist and when I tried installing it. It said it didn't exist within R.
library(dplyr)
user_device_activity %>%
distinct(Id, ActivityDate) %>% # in case duplicates possible in data
count(Id, month = lubridate::floor_date(ActivityDate, "month"))

Apply cumsum function using condition

I am trying to calculate the maximum number of aircraft on the ground simultaneously throughout the year per station, knowing that I have more than 300 stations and that the data is per (day and hours) for 1 year.
So I thought of this solution: find the maximum per day and per station then extract the maximum per station.
my data are in this format: station, aircraft ,time , type ( arrive to the station or depart from the station) and value is 1 if is arrival and -1 if is depart, I create this column to facilitate the count, the idea is apply cumsum once the data are sorted by time for each station.
I need to create a function which group the data by day and by station and count the cumulative sum, but I have planes that have been sleeping in the station, so I need to delete them( the yellows lines in the screenshot).
The trick to detect these planes:
Aircraft allows us to track the plane:
generally it appears twice a day when it arrives and when it leaves.
to detect these planes that I have to look
the variables: Aircraft and Type:
if the type is departure and the aircraft variable of this line appears only once in this day, ( it means there is no arrival for this flight) then I should not count them.
I was thinking to create a function: to group by (station and time) then apply cumsum but skipping the lines with the conditions that I explained before.(if the type is departure and the aircraft variable of this line appears only once in this day, then I should not count them)
Any Help??
I solved this by creating new table Tab1 with Table function to count the frequency of the aircrafts by day, then I joined this table to my original data base to then delete the ones with freq=1 and type=departure.
tab<- as.data.frame(table(df$day,df$AIRCRAFT, useNA = "always"))
df_new<-df%>% left_join(tab,by=c("day"="Var1","AIRCRAFT" = "Var2"))
df_final<-df_new%>% group_by(day)%>% filter(TYPE=="Departure",Freq==1)%>% arrange(TIMESTAMP)%>% group_by(day) %>%
mutate(ground= cumsum(Freq))

Creating unique time measurement ID for each user based on date of measurement (varying per user)

I am having a problem with my analyses in R that I am hoping you guys can help me with. So I have a dataset with multiple daily measurements per participant (DiaryEating), all participants have a unique ID (UserID), and all measurements are timestamped with the date (DateVariable). However, these measurements instances differ for participants (i.e. some got measured on the 1st, others on the 3rd of the month). For my analysis, instead of the DateVariable, I need a Time variable that indicates whether this is the 1st, 2nd or 3rd measurement PER participant.
This sounds very simple, but has proven very difficult to me (Yes, I do indeed suck very hard at this).
Code looks something like:
UserID (1,1,1,2,2,2,3,3,3,3)
DateVariable( 2018/10/30, 2018/10/31, 2018/11/01, 2018/10/30, 2018/10/31, 2018/11/03, 2018/10/30, 2018/10/31, 2018/11/02, 2018/11/04)
DiaryEating (3,3,4,2,3,3,4,4,4,4,5)
TimeMeas(WHAT I WANT TO CREATE WOULD BE): 1,2,3,1,2,3,1,2,3,4)
You can use dplyr::group_by as:
library(dplyr)
df %>%
group_by(UserID) %>%
mutate(Time = row_number())

Sumarizing data depending on time and date

I have a workspace where I have study of the weather of every hour past one year (temperature, CO2 and stuff).
What I need to do is split whole workspace depending on date (cause I have several 2009-01-01 etc) and in next step summarize the data for each day separetly (I'm looking for summary of every variable for every day separetly).
I was searching for some kind of function and have one, which is almoust good. Separating day works quite good, but summary is really bad.
df <- data.frame(date=rep(seq.POSIXt(as.POSIXct("2009-01-01"), by="day", length.out=31), each=1))
summary(split(df, as.Date(df$date),AM19))

Resources