I am working with a dataset This is the dataset. In the dataset there are 33 unique Ids that are repeated for each day they provided data, within 30 days, from their fitbit. I am trying to count the number of days they input data through the ActivityDay column and group it to the Id, so that I can see how many total days they used their fitbit out of the 30 days.
the Activity date data type was originally POSIXct and I converted it to Date type. How can I count the dates as number or days and group it to each indvidual ID?
I tried using count within a dplyr::summarise to get the ID and number of days counted while grouping the data to the ID. that failed.
I also thought of using a case_when, however, I thought that wouldn't work because it wouldn't count all the way up to the end dates I specify, so anything between the two dates would get the ouputs I specified. I also tried count_date_between(min(user_device_activity), max(user_device_activity), by 'day') but it said that the function doesn't exist and when I tried installing it. It said it didn't exist within R.
library(dplyr)
user_device_activity %>%
distinct(Id, ActivityDate) %>% # in case duplicates possible in data
count(Id, month = lubridate::floor_date(ActivityDate, "month"))
I am new to R can't find the right syntax for a specific average I need. I have a large fitbit dataset of heartrate per second for 30 people, for a month each. I want an average of heartrate per day per person to make the data easier to manage and join with other fitbit data.
First few lines of Data
The columns I have are Id (person Id#), Time (Date-Time), and Value (Heartrate). I already separated Time into two columns, one for date and one for time only. My idea is to group the information by person, then by date and get one average number per person per day. But, my code is not doing that.
hr_avg <- hr_per_second %>% group_by(Id) %>% group_by(Date) %>% summarize(mean(Value))
As a result I get an average by date only. I can't do this manually because the dataset is so big, Excel can't open it. And I can't upload it to BigQuery either, the database I learned to use during my data analysis course. Thanks.
I'm trying to plot a line graph with R using the dataset that can be found here . I'm looking specifically at how to plot the number of cases in each region i.e. north east, north west etc against the period of time.
However, as the date is a period of a week rather than a standard date, how can I convert it to make the line graph actually possible? For example, right now it has the dates as 01/09/2020 - 07/09/2020. How can I use this for a line graph?
Sorry if my explanation isn't clear, here is a picture below.
I assume you're trying to plot a time series? You could just trim the dates to the beginning of the week and label the time axis as "Week beginning on date". You could do this with substr() in base r and keep the first 10 characters.
substr(data$column,1,10)
You may also want to format it as a date, easiest with the lubridate package, something like dmy() (day month year).
Here is the full code you would want:
library(tidyverse)
#Read in data
data <- read.csv("/Users/sabrinaxie/Downloads/covid19casesbysociodemographiccharacteristicengland1sep2020to10dec20213.csv")
#Modify data and remove extraneous top rows
data <- data %>%
rename(Period=Table.9..Weekly.estimates.of.age.standardised.COVID.19.case.rates..per.100.000.person.weeks..by.region..England..1.September.2020.to.6.December.20211.2.3) %>%
slice(3:n())
#Keep first 10 characters of Period column and assign to old column to replace
data$Period <- substr(data$Period,1,10)
#Parse as date
data$Period <- dmy(data$Period)
I am fairly new to R and DPLYR and I am stuck on a this issue:
I have two tables:
(1) Repairs done on cars
(2) Amount owed on each car over time
What I would like to do is create three extra columns on the repair table that gives me:
(1) the amount owed on the car when the repair was done,
(2) 3months down the road and
(3) finally last payment record on file.
And if the case where the repair date does not match with any payment record, I need to use the closest amount owed on record.
So something like:
Any ideas how I can do that?
Here are the data frames:
Repairs done on cars:
df_repair <- data.frame(unique_id =
c("A1","A2","A3","A4","A5","A6","A7","A8"),
car_number = c(1,1,1,2,2,2,3,3),
repair_done = c("Front Fender","Front
Lights","Rear Lights","Front Fender", "Rear Fender","Rear Lights","Front
Lights","Front Fender"),
YearMonth = c("2014-03","2016-03","2016-07","2015-05","2015-08","2016-01","2018-01","2018-05"))
df_owed <- data.frame(car_number = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3),
YearMonth = c("2014-02","2014-05","2014-06","2014-08","2015-06","2015-12","2016-03","2016-04","2016-05","2016-06","2016-07","2016-08","2015-05","2015-08","2015-12","2016-03","2018-01","2018-02","2018-03","2018-04","2018-05","2018-09"),
amount_owed = c(20000,18000,17500,16000,10000,7000,6000,5500,5000,4500,4000,3000,10000,8000,6000,0,50000,40000,35000,30000,25000,15000))
Using zoo for year-months, and tidyverse, you could try the following. Using left_join add all the df_owed data to your df_repair data, by the car_number. You can convert your year-month columns to yearmon objects with zoo. Then, sort your rows by the year-month column from df_owed.
For each unique_id (using group_by) you can create your three columns of interest. The first will use the latest amount_owed where the owed date is prior to the service date. Then second (3 months) will use the first amount_owed value where the owed date follows the service date by 3 months (3/12). Finally, the most recent take just the last value from amount_owed.
Using the example data, the results differ a bit, possibly due to the data frames not matching the images in the post.
library(tidyverse)
library(zoo)
df_repair %>%
left_join(df_owed, by = "car_number") %>%
mutate_at(c("YearMonth.x", "YearMonth.y"), as.yearmon) %>%
arrange(YearMonth.y) %>%
group_by(unique_id, car_number) %>%
summarise(
owed_repair_done = last(amount_owed[YearMonth.y <= YearMonth.x]),
owed_3_months = first(amount_owed[YearMonth.y >= YearMonth.x + 3/12]),
owed_most_recent = last(amount_owed)
)
I have a data set with sales by date, where date is not unique and not all dates are represented: my data set has dates (the date of the sale), quantity, and totalprice. This is an irregular time series.
What I'd like is a vector of sales by date, with every date represented exactly once, and quantities and totalprice summed by date, with zeros where there are no sales.
I have part of this now; I can make a sequence containing all dates:
first_date=as.Date(min(dates))
last_date=as.Date(max(dates))
all_dates=seq(first_date, by=1, to=last_date)
And I can aggregate the sales data by sale date:
quantitybydate=aggregate(quantity, by=list(as.Date(dates)), sum)
But not sure what to do next. If this were python I'd loop through one of the dates arrays, setting or getting the related quantity. But this being R I suspect there's a better way.
Make a dataframe with the all_dates as a column, then merge with quantitybydate using the by variable columns as the by.y, and all.x=TRUE. Then replace the NA's by 0.