I am fairly new to R and DPLYR and I am stuck on a this issue:
I have two tables:
(1) Repairs done on cars
(2) Amount owed on each car over time
What I would like to do is create three extra columns on the repair table that gives me:
(1) the amount owed on the car when the repair was done,
(2) 3months down the road and
(3) finally last payment record on file.
And if the case where the repair date does not match with any payment record, I need to use the closest amount owed on record.
So something like:
Any ideas how I can do that?
Here are the data frames:
Repairs done on cars:
df_repair <- data.frame(unique_id =
c("A1","A2","A3","A4","A5","A6","A7","A8"),
car_number = c(1,1,1,2,2,2,3,3),
repair_done = c("Front Fender","Front
Lights","Rear Lights","Front Fender", "Rear Fender","Rear Lights","Front
Lights","Front Fender"),
YearMonth = c("2014-03","2016-03","2016-07","2015-05","2015-08","2016-01","2018-01","2018-05"))
df_owed <- data.frame(car_number = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3),
YearMonth = c("2014-02","2014-05","2014-06","2014-08","2015-06","2015-12","2016-03","2016-04","2016-05","2016-06","2016-07","2016-08","2015-05","2015-08","2015-12","2016-03","2018-01","2018-02","2018-03","2018-04","2018-05","2018-09"),
amount_owed = c(20000,18000,17500,16000,10000,7000,6000,5500,5000,4500,4000,3000,10000,8000,6000,0,50000,40000,35000,30000,25000,15000))
Using zoo for year-months, and tidyverse, you could try the following. Using left_join add all the df_owed data to your df_repair data, by the car_number. You can convert your year-month columns to yearmon objects with zoo. Then, sort your rows by the year-month column from df_owed.
For each unique_id (using group_by) you can create your three columns of interest. The first will use the latest amount_owed where the owed date is prior to the service date. Then second (3 months) will use the first amount_owed value where the owed date follows the service date by 3 months (3/12). Finally, the most recent take just the last value from amount_owed.
Using the example data, the results differ a bit, possibly due to the data frames not matching the images in the post.
library(tidyverse)
library(zoo)
df_repair %>%
left_join(df_owed, by = "car_number") %>%
mutate_at(c("YearMonth.x", "YearMonth.y"), as.yearmon) %>%
arrange(YearMonth.y) %>%
group_by(unique_id, car_number) %>%
summarise(
owed_repair_done = last(amount_owed[YearMonth.y <= YearMonth.x]),
owed_3_months = first(amount_owed[YearMonth.y >= YearMonth.x + 3/12]),
owed_most_recent = last(amount_owed)
)
Can anyone give some suggestions as to how to construct a tsibble?
I have a dateset that has four original columns: product, market, price, date
I want to construct a tsibble object.
I have the key=id(product,market) and 'year of the week' as index.
then I can predict for each product in each market what should be the base line of price.
if you use nycflights13::weather data set, I can use key=id(origin,year) (here I do not have a market so use year to represent the market).
then have idex=week(year+month+day). I can then combine the year, month, day column as date then calculate the week() then add year and week together as 'weekyear' and set up this as an index, then use median(temp) for that yearweek.
After this dataset reform I can have a tsibble able to predict next 2-4 weeks temp.
Looks like you want to convert the date variable to yearweek class first. Can you do
library(tsibble)
data %>%
mutate(index = yearweek(date)) %>%
as_tsibble(key = id(product, market), index = index)
All, Ive seen that date conversion questions get downvoted a lot, but I couldn't find any information online or in the help files...
I have a df with a date formatted as ymd_hm() and then some data in other columns. Then I have another df with 366 row, one for each day, and a column containing some values relevant for that day (some climatological stuff, that is essentially the same every year, so the year doesn't matter). The dfs might look something like this:
df1 <- tibble(Date=seq(ymd_hm('2010-05-01 00:00'),ymd_hm('2010-05-03 00:00'), by = 'hour'), Data=c(1:length(Date)))
df2 <- tibble(MonthDay=c("04-30", "05-01", "05-02","05-03","05-04"), OtherData=c(20,30,40,50, 60))
Now, is it possible to do some lookup sort of thing and match Date and MonthDay and then write whatever OtherData is into df1? I'm struggling since I can't convert MonthDay to a date.
So, all the 2010-05-01 dates should have 30 next to them, all 2010-05-02 dates should have 40 in the next column, and so on and so forth...
Thanks y'all!
We extract the 'MondayDay' with format, use that as common joining column in left_join
library(dplyr)
df1 %>%
mutate(MonthDay = format(Date, "%m-%d")) %>%
left_join(df2) %>%
select(-MonthDay)
The lubridate allows us to break down y-m-d format to month, year week, etc... I have done this with my data set. I have the months in numerical months, but want a separate column with month abbreviations. I can convert them, but I want to have both numerical and word month in the data frame. Is there another way to go about doing this besides manually adding a column vector?
lubridate::month generates the numerical month. Adding the argument label = TRUE generates the month abbreviation. You can use dplyr::mutate to add the new column.
For example:
library(dplyr)
library(lubridate)
data.frame(Date = as_date("2001-10-11")) %>%
mutate(Month = month(Date),
MonthAbb = month(Date, label = TRUE))
Date Month MonthAbb
1 2001-10-11 10 Oct
I have a date frame df that simply looks like this:
month values
2012M01 99904
2012M02 99616
2012M03 99530
2012M04 99500
2012M05 99380
2012M06 99103
2013M01 98533
2013M02 97600
2013M03 96431
2013M04 95369
2013M05 94527
2013M06 93783
with month that was written in form of "M01", "M02"... and so on.
Now I want to convert this column to date format, is there a way to do it in R with lubridate?
I also want to select columns that contain one certain month from each year, like only March columns from all these years, what is the best way to do it?
The short answer is that dates require a year, month and day, so you cannot convert directly to a date format. You have 2 options.
Option 1: convert to a year-month format using zoo::as.yearmon.
library(zoo)
df$yearmon <- as.yearmon(df$month, "%YM%m")
# you can get e.g. month from that
months(df$yearmon[1])
# [1] "January"
Option 2: convert to a date by assuming that the day is always the first day of the month.
df$date <- as.Date(paste(df$month, "01", sep = "-"), "%YM%m-%d")
For selection (and I think you mean select rows, not columns), you already have everything you need. For example, to select only March 2013:
library(dplyr)
df %>% filter(month == "2013M03")
Something like this will get it:
raw <- "2012M01"
dt <- strptime(raw,format = "%YM%m")
dt will be in a Posix format. The strptime function will assign a '1' as the default day of month to make it a complete date.