Aggregated sum month by month until variable changed - r

I´m working with invoices. I want to calculate the pool of money claimable each month (invoice´s amount once it past expiration date). The point is that the invoices can be canceled and paid. So, I would like to aggregate the value of the invoices, month by month, taking into account only invoices from month corresponding the day after the expiration date until the month it has been paid or canceled, including that month.
Here is an example of my matrix
Client.Code. Invoice Expiration.Date Amount Payment.date Out.Of.Process
1: 1004773 21506000409 2016-09-28 6993.80 <NA> Current
2: 1004773 21506000670 2016-08-29 30034.62 <NA> Current
3: 1004773 21507000583 2017-10-29 3872.00 <NA> Current
4: 1005109 21601000237 2016-04-30 3594.31 <NA> Current
5: 1005109 21606000480 2016-08-29 6301.68 <NA> Current
6: 1004737 20170500125 2016-07-24 142818.72 2017-06-19 Paid
For example, the code should count the first one from September in each aggregate and should count the number six from July 16 to June 17 in every aggregate. The number 4 would be better to count in each month from may 16 (next day).
There is a way to achieve the aggregate sum of amount invoices claimable per month I´m looking for?

Here is one solution which aggregates Amount by month. I don't know if it is exactly what you wanted, but its close I hope.
library(dplyr)
library(lubridate) # to floor_date function
your_df %>% mutate(exp_date = as.Date(expiration.Date), #you might not need this if you expiration date is in Date format
monthly_date = floor_date(exp_date, "month")) %>%
select(Amount,monthly_date) %>%
group_by(monthly_date) %>%
summarise(aggregate_amt = sum(Amount))

Related

Convert from character to date in a "YYYY-WW" format in R

I have a hard time converting character to date in R.
I have a file where the dates are given as "2014-01", where the first is the year and the second is the week of the year. I want to convert this to a date type.
I have tried the following
z <- as.Date('2014-01', '%Y-%W')
print(z)
Output: "2014-12-05"
Which is not what I desire. I want to get the same format out, ie. the output should be "2014-01" but now as a date type.
It sounds like you are dealing with some version of year week, which exists in three forms in lubridate:
week() returns the number of complete seven day periods that have
occurred between the date and January 1st, plus one.
isoweek() returns the week as it would appear in the ISO 8601 system,
which uses a reoccurring leap week.
epiweek() is the US CDC version of epidemiological week. It follows
same rules as isoweek() but starts on Sunday. In other parts of the
world the convention is to start epidemiological weeks on Monday,
which is the same as isoweek.
Lubridate has functions to extract these from a date, but I don't know of a built-in way to go the other direction, from week to one representative day (out of 7 possible). One simple way if you're dealing with the first version would be to add 7 * (Week - 1) to jan 1 of the year.
library(dplyr)
data.frame(yearweek = c('2014-01', '2014-03')) %>%
tidyr::separate(yearweek, c("Year", "Week"), convert = TRUE) %>%
mutate(Date = as.Date(paste0(Year, "-01-01")) + 7 * (Week-1))
Year Week Date
1 2014 1 2014-01-01
2 2014 3 2014-01-15

Sum up ride_length for weekdays vs weekends and compare for casual and annual members

Hi Im a beginner in R so dont know much functionality to go about to perform this operation even though in my head i know what to do just dont know how to do it.
So I have data for of ride length I want to sum up for weekdays vs weekends and compare it with annual and casual members.
I have used the wday() to convert the dates to '1' to '7'. Now i want to filter out '2' to '6' (weekdays) and sum the ride_lenth and filter out '1' & '7' (weekends) and sum that ride_length and then use the aggregate() to compare them with the casual and annual members usage.
That is what i have decided.
member_type ride_length date month day year day_of_week weekday_num
casual 5280 2020-07-01 Jul 01 2020 Wednesday 4
casual 9840 2020-07-01 Jul 01 2020 Wednesday 4
Any other path to this would be welcome too.
unfortunately I can not test the code due to missing input and desired output. But you should be able to make these lines work for you:
library(dplyr)
# your data.frame/tibble
df %>%
# create variable to indicate weekend or not (check the weekend day names)
dplyr::mutate(day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "WEEKEND","WEEK")) %>%
# build gouping by member type and day type
dplyr::group_by(member_type, day_type) %>%
# summarise total ride length
dplyr::summarize(total_ride_length = sum(ride_length, na.rm = TRUE))
Just as an advice: possibly there are some holidays you should consider, as they can be on a working day but show the behaviour of a weekend day (due to most people having free time to rent and ride bikes or viceverse if people predominantly rent to get to and from work)

Building User Activity Cohorts

Thank you for your help - I am trying to build cohorts.
And I do get what I am looking for with ...
cohort3 <- transactions %>%
group_by(userId) %>%
mutate(first_transaction = min(createDate)) %>%
group_by(first_transaction, createDate) %>%
summarize(clients = n())
BUT ... as you can see by the result, I get data back for every single day.
We had 7 users that transacted on 2017-01-03 the first time.
2 of these users transacted on 2017-01-04.
4 of these users transacted on 2017-01-05 and so forth.
This is great - but it's too granular.
How do I modify the above code to summarize by month or better quarter?
Like:
Jan-2017 - 25 users transacted the first time.
Feb-2017 - 12 users from that cohort transacted again ... and so on.
Even better.
Q1 2017 - 78 users transacted.
Q2 2017 - 35 users of that Q1 2017 cohort transacted. etc
Thank you.
The lubridate package includes a quarter() function for determining what quarter of the year a given date falls into. Something along these lines should do what you want:
library(dplyr)
library(lubridate)
cohort3 <-
transactions %>%
group_by(userId) %>%
mutate(first_transaction = min(createDate),
quarter = quarter(first_transaction, year = TRUE) %>%
group_by(quarter) %>%
summarize(clients = n())

R create graphs of average in time difference

I have a big data.table that contains the following cols:
timestamp, value, house
The value is a cumulative value of eg energy of that one house. So here is a small sample:
time value house
2014-10-27 11:40:00 100 2
2014-10-27 15:40:00 150 2
2014-10-27 19:40:30 160 2
2014-10-28 00:00:01 170 2
2014-10-28 20:20:20 180 2
2014-10-27 11:40:00 200 3
2014-10-27 15:40:00 300 3
2014-10-27 19:40:30 400 3
2014-10-28 00:00:01 500 3
2014-10-28 20:20:20 600 3
I want to get 3 bar charts: one with the average per house usage per hour of a day, one with the average per house usage per day of a week, and the average per house usage per month of a year.
To get the value of one hour of one day, I guess I should do something like
max(data$value) - min(data$value)
, but that per time interval of an hour and also per house. I know cut(data$time, breaks="hour") splits it up in intervals, but of course does not take the difference of the maximum and minimum value and also doesn't consider the house it is from. On top of that I would also need the average of course.
How can I do this?
First, I'd split time variable to hours, days, months. Convenient and quick way is using regular expressions, for example
hour <- str_extract(rl, ' [[:digit:]]{2}')
hour <- substring(hour, 2)
day <- str_extract(rl, '-[[:digit:]]{2} ')
day <- substring(day, 2, 3)
Then we need to cope with value being in cumulated form, reverse cumsum with diff (both from base R):
value <- diff(value)
Aggregated data for one of barplots created with data.table syntax
data[ , .(avg = mean(value)), by=.(house, day)]
or by using aggregate(base), which looks more readable
aggregate(data, value ~ house + day, mean)

Calculating the number of weeks for each year based on dates using R

I have a dataset with dates of 2 different years (2009 and 2010) and would like to have the corresponding week number for each date.
My dataset is similar to this:
anim <- c(012,023,045,098,067)
dob <- c("01-09-2009","12-09-2009","22-09-2009","10-10-2010","28-10-2010")
mydf <- data.frame(anim,dob)
mydf
anim dob
1 12 01-09-2009
2 23 12-09-2009
3 45 22-09-2009
4 98 10-10-2010
5 67 28-10-2010
I would like to have variable "week" in the third column with the corresponding week numbers for each date.
EDIT:
Note: Week one begins on January 1st, week two begins on January 8th for each year
Any help would be highly appreciated.
Baz
Your definition of "week of year"
EDIT: Note: Week one begins on January 1st, week two begins on January 8th for each year
differs from the standard ones supported by strftime:
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1
of the week (and typically with the first Sunday of the year as day 1 of
week 1). The US convention.
%W
Week of the year as decimal number (00–53) using Monday as the first day
of week (and typically with the first Monday of the year as day 1 of week
1). The UK convention.
So you need to compute it based on the day-of-year number.
mydf$week <- (as.numeric(strftime(as.POSIXct(mydf$dob,
format="%d-%m-%Y"),
format="%j")) %/% 7) + 1
Post 2011 Answer
library(lubridate)
mydf$week <- week(mydf$week)
lubridate package is straight-forward for day-to-day tasks like this.
If you want to do how many weeks (or 7 day periods) have passed between your date of interest and the first day of the year, regardless of what day of the week it was on the first of the year, the following is a solution (using floor_date from lubridate).
mydf$weeks <- difftime(mydf$dob, floor_date(mydf$dob, "year"), units = c("weeks")))

Resources