Earliest date by country using R - r

I have a list of countries with the cumulative amount of cases (csum colum) by date in my dataframe (df)
I am trying to group by country and pull out the earliest date per country.
I tried the following code, but the dates it returns are incorrect:
df_2 = aggregate(df$date, by = list(df$country), FUN = "min")
Would anyone be able to see where i'm going wrong (p.s: I need to avoid using any libraries)
Thanks :)

Related

How to count the number of days that pass between two dates in a dataset column in R

I am working with a dataset This is the dataset. In the dataset there are 33 unique Ids that are repeated for each day they provided data, within 30 days, from their fitbit. I am trying to count the number of days they input data through the ActivityDay column and group it to the Id, so that I can see how many total days they used their fitbit out of the 30 days.
the Activity date data type was originally POSIXct and I converted it to Date type. How can I count the dates as number or days and group it to each indvidual ID?
I tried using count within a dplyr::summarise to get the ID and number of days counted while grouping the data to the ID. that failed.
I also thought of using a case_when, however, I thought that wouldn't work because it wouldn't count all the way up to the end dates I specify, so anything between the two dates would get the ouputs I specified. I also tried count_date_between(min(user_device_activity), max(user_device_activity), by 'day') but it said that the function doesn't exist and when I tried installing it. It said it didn't exist within R.
library(dplyr)
user_device_activity %>%
distinct(Id, ActivityDate) %>% # in case duplicates possible in data
count(Id, month = lubridate::floor_date(ActivityDate, "month"))

How can I show Q1 to quarter without year on r

I am studying R and the exercise needs that I create a column to Quarter where the data seem Q4.
I use zoo library and lubridate, but the result that I achieved was a year plus Q1.
I have a column InvoiceDate where I it has a complete datetime.
What I need:
The original table, more specifically column, has a date-time like this: 2010-12-01 08:26:00. The column name is InvoiceDate. I need to get this column and to create other columns with specific values, like a quarter, year, month, etc.
What I archieved:
How do I achieve my goal?
You can use the inbuilt functions to extract the information that you need.
library(dplyr)
df %>%
mutate(quarter = quarters(InvoiceDate),
Month = format(InvoiceDate, '%b'),
weekday = weekdays(InvoiceDate))

How to create a ''for loop'' to download 5 consecutive months of data?

For an assignment we are supposed to use a for-loop to obtain a dataframe of 5 consecutive months.
The data regards crimes and their accompanying type of crime, location, month, street name etc.
How do we go about this issue?
We use the package 'ukpolice' and use this code to obtain data for a specific month and location of choice;
ukpolice syntax is as follows:
data <- as.data.frame(ukc_crime_location(lat = , lng = , date = ""))
Thank you in advance!

Column operators regarding only specific columns (specific dates and code i.e.) in R

i am trying to calculate the average_relative_humidity of the city Seoul for the dates 2020-01-01 tll 2020-31-01.
I have this data:
and I´ve tried this already but don´t really know what´missing.
Seoul_weather_dt <- Corona_relevant_weather_dt[, avg_relative_humidity_seoul := mean(avg_relative_humidity[code =="2020-01-01":"2020-01-01"]), by = c("province", "date", "avg_temp", "avg_relative_humidity"]
Can someone help me?
Something like this?
#select only Seoul and relevant dates
Seoul_weather_dt <- Corona_relevant_weather_dt[province == "Seoul" & date > as.date("2020-01-01") & date <= as.date("2020-31-01")]
#calculate average humidity for each unique date
aggregate(Seoul_weather_dt$avg_relative_humidity, by = list(Seoul_weather_dt$date), FUN = mean)
The line of code you provide is pretty long. I would suggest creating multiple lines with less functions per line to maintain an overview (also easier when getting an error). Also
is datein class "Date"? You can see that using str(Seoul_weather_dt)
code =="2020-01-01":"2020-01-01" only selects one day
Using by = c("province", "date", "avg_temp", "avg_relative_humidity") is strange. Then you would calculate a mean value for each observation of avg_relative_humidity as well, which is not what you want
Why create average values for each province when you are only interested in Seoul?

Different age calculation for different rows

I'm an absolute R beginner here working on a Master's project.
I have a data.frame that contains information on trotting horses (their wins, earnings, time records and such). The data is organised in a way that every row contains information for a specific year the horse competed and including a first row for each horse of "Total", so there's a summary for every variable for it's total competing life. It looks like this:
I created a new variable with their age using the age_calc function in the eeptools package:
travdata$Age<-age_calc(as.Date(travdata$Birth.date), enddate=as.Date("2016-12-31"),
units="years")
With no problems. What I'm trying to figure out is if there is any way I can calculate the age of the horses for each specific year I have info on them-that is, the "Total" row would have their age up until 2016-12-31, for the year 2015 it would have their age at that time and so on. I've been trying to include if statements in age_calc but it won't work and I'm really at a loss on how best to do this.
Any literature or help you could point me to would be much, much appreciated.
MWE
travdata <- data.frame(
"Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
"Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
"Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
"Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
"Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
"Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
"starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
"X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
"X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
"X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
"Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)
The trick is to filter out the "Total" rows and specify a format for the as.Date() function
library(eeptools)
travdata <- data.frame(
"Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
"Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
"Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
"Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
"Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
"Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
"starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
"X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
"X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
"X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
"Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)
travdata$Age<-age_calc(as.Date(travdata$Birth.date),
enddate=as.Date("2016-12-31"), units="years")
competitions <- travdata[travdata$Competition.year!="Total",]
competitions$Competition.age<-age_calc(
as.Date(competitions$Birth.date),
enddate=as.Date(competitions$Competition.year, format="%Y"),
units="years",F)

Resources