Developing a custom function of loops for forecasting any loan - r

Dear all,
I have written a simple function that takes inputs of a typical loan and forecasts the loan until the end of the Term, and saves all data as Time Series in a dataframe.
Inputs: R (interest rate), start (start of the loan), CMP (monthly payment), Term (how long is the loan contract), Bal0 (as default loan amount), frequency (how often is interest calculated, if 365 then daily), and months (how many months in a year).
Typical behaviour of a loan is as follows: Take out a loan, Bal0 = £1000 on start= 2022.01.20 with interest of R = 10%, CMP= £100, Term= 1 year. Daily Interest (R/365) will start counting on the following day (2022.01.21) until end of Term, calculated on previous day's closing balance.
After a precise calendar month on the 2022.02.20, the Loan Balance of £1000 will get added the cumulative daily interest over the calendar month and get the monthly payment deducted. Such that on that day the Closing Balance = £1000 + (the sum of all interests over the past month= sum(Closing_Balance*R/365)=) £0.2739 - the monthly payment (CMP) £100=
1000+0.2739-100= £900.27.
Then the daily interest starts counting on now the lower balance of £900.27. Until, again, in a precise calendar month, 2022.03.20. the same will happen: Closing Balance = £900.27 + (sum of all interests)-£100.
#Loading all packages.#
library(zoo)
library(lubridate)
library(timetk)
#Importing Data.#
data<- structure(list(Key = c(281575989162.667, 281576178390.667, 281576180153.333,
281576278448, 281576311669.333, 281576312741.333, 281576399669.333, 281576403989.333,
281576418453.333, 281576513364, 281576518421.333, 281576520128, 294909322496, 294909511724,
294909513486.667, 294909611781.333, 294909645002.667, 294909646074.667, 294909733002.667, 294909737322.667,
294909751786.667, 294909846697.333, 294909851754.667, 294909853461.333), start = structure(c(1637539200, 1637539200,
1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200,
1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200, 1637539200,
1637539200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), CMP = c(94.32, 117.7, 246.5, 94.32, 117.7, 246.5, 94.32, 117.7, 246.5, 94.32,
117.7, 246.5, 94.32, 117.7, 246.5, 94.32, 117.7, 246.5, 94.32, 117.7, 246.5, 94.32, 117.7, 246.5), R = c(0.23497, 0.21294, 0.1485, 0.23497, 0.21294, 0.1485,
0.23497, 0.21294, 0.1485, 0.23497, 0.21294, 0.1485, 0.23497, 0.21294, 0.1485, 0.23497, 0.21294, 0.1485, 0.23497, 0.21294, 0.1485, 0.23497, 0.21294, 0.1485),
Term = c(12, 18, 36, 12, 18, 36, 12, 18, 36, 12, 18, 36, 12, 18, 36, 12, 18, 36, 12, 18, 36, 12, 18, 36), Bal0 = c(1000, 1800, 7126, 1000, 1800, 7126, 1000,
1800, 7126, 1000, 1800, 7126, 1000, 1800, 7126, 1000, 1800, 7126, 1000, 1800, 7126, 1000, 1800, 7126)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -24L))
#Writing loan amortization function.#
loan_amort <- function (R, start, CMP, Term, Bal0, frequency = 365,months = 12) {
answer = df() #Save all answers in a dataframe#
periods = Term * frequency #How many interest counting periods we have#
daily_interest = R / frequency #Daily interest for each loan#
nmonths = Term * months #For how many months loan x is for#
#For each day, function calculates the closing_balance. For the first day, closing_balance = original loan amount. Monthly_CMP is applied only after one calendar month after the start of the loan for all remaining months, and the monthly_interest is the sum of all interests each single day in that calendar month.#
for (p in 1:periods) {
closing_balance = (opening_balance – monthly_CMP + monthly_interest)
if (Date == start) {
opening_balance = Bal0
} else {
opening_balance = lag(closing_balance)
}
}
#The monthly_CMP is CMP, but gets applied only once per calendar month after start.#
for(m in 1:nmonths){
if (Date == (ymd(start) %m+% months(m)) {
monthly_CMP = CMP
} else {
monthly_CMP = 0
}
#The monthly_interest is the sum of all daily interests (lag(closing_balance) * R/365, but gets applied only once per calendar month after start.#
if (Date == (ymd(start) %m+% months(m)) {
monthly_interest = summarise_by_time(interest, .by = "month", .type = "round")
} else {
monthly_interest = 0
}
}
#If the Date is a calendar month away from start, apply the sum of all daily interests that are calculated on each days' closing_balance. If it's not a calendar month away, keep calculating the interest, but the monthly_interests remains 0 until that monthly settlement date.#
interest = daily_interest * lag(closing_balance)
#Then save closing_balance as a zooreg timeseries, where we are saving the closing_balance starting on the start date of the loan and it's saved in daily frequency.#
closing_balance = zooreg(closing_balance, start, frequency = frequency)
}
#Save all columns to a dataframe with all of these aspects#
answer$closing_balance = closing_balance
answer$opening_balance = opening_balance
answer$monthly_CMP = monthly_CMP
answer$monthly_interest = monthly_interest
answer$interest = interest
return(answer)
}
Could anyone help me out with this function, please? The output says it cannot find any of my variables in the function, including: Date, unexpected errors "m in 1:nmonths", object CMP not found, error in recognising loops on "for (p in 1:periods)", daily_interest not found, closing_balance not found, opening_balance not found, object "answer" not found, interest not found.
Hence, would anyone have better ideas on how to specify this function or write loops and saving output as a timeseries (in my case zooreg), or is it better to completely rewrite the whole script?
Any advice would be very appreciated,
thank you.

Related

r circlize: missing value where TRUE/FALSE needed

I am trying to plot (for the first time) a chord diagram in the package circlize in R Studio. I am going through the manual chapters (Circular Visualization in R). The first step is to allocate the sectors on a circle by using the circos.initialize command. However, when I get to this step, I get an error stating missing values where TRUE/FALSE needed.
A reproducible example
library(circlize)
Types <- data.frame(Types = c("OOP", "UVA", "MAT", "OIC", "FIN", "WSE"))
stack.df <- data.frame(Year = c(rep(2019, 1), rep(2020, 4), rep(2021, 7), rep(2022, 11), rep(2023, 11)), Invoice = c(paste0("2019.", "10", ".INV"),
paste0("2020.", seq(from = 20, to = 23, by = 1), ".INV"),
paste0("2021.", seq(from = 30, to = 36, by = 1), ".INV"),
paste0("2022.", seq(from = 40, to = 50, by = 1), ".INV"),
paste0("2023.", seq(from = 50, to = 60, by = 1), ".INV")))
stack.df <- cbind(stack.df, Org_1 = Types[sample(nrow(Types), nrow(stack.df), replace = TRUE), ], Org_2 = Types[sample(nrow(Types), nrow(stack.df), replace = TRUE), ])
Making Chord Diagram
My overall objective: Make a chord diagram where the sectors are the stack.df$Year and track 1 is the stack.df$Invoice, with the circos.links from stack.df$Org_1 to stack.df$Org_2.
Initialize
circos.initialize(sectors = stack.df$Year, x = stack.df$Invoice)
Error in if (sector.range[i] == 0) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In circos.initialize(sectors = stack.df$Year, x = stack.df$Invoice) :
NAs introduced by coercion
What am I am missing? My sector.range !== 0 as stack.df$Year is from 2019-2023. Any help in overcoming this error is greatly appreciated.

offset lexis grid by one

I have a dataset that has age cohorts from 0 - 80, and years 1900-2021. I am making a lexis plot with this data using the lexis_grid function from the LexisPlotR package. I am using the following code to get started:
library(LexisPlotR)
lexis_grid(year_start = 1900,
year_end = 2021,
age_start = 0,
age_end = 80,
delta = 10
)
However, what this does is sets the upper righthand end of the diaganal lines at 2021, and so they don't like up with vertical decade lines and it just looks wrong:
Any advice on how to index the diagonal lines to start at 1900, rather than end at 2021?
Change year_end to 2020 or 2030, so it increments in 10s to match with age increment - delta:
lexis_grid(year_start = 1900,
year_end = 2030,
age_start = 0,
age_end = 80,
delta = 10)

Calculate new values in dataframe from values from a separate dataframe with same field prefix (dplyover/dyplyr)

Hi I had asked a question earlier about calculating new sums for a column based on field name patterns in R and Tim TeaFan had gave me a solution with his dylpyover package.
Now I need to do some more calculations using a similar method but have an added complication that some of those calculations need to grab a cell value from a different dataframe but has a matching prefix to the cut_names from the dataframe being altered.
Here is the basic equation I need done, but the "{.x}_LB_SUM" and "{.x}_UB_SUM" don't live exist in the ClusterFinal dataframe but within AOIsummary, a single row dataframe with a different field structure than the ClusterFinal dataframe. I've included the first couple rows & columns of both dataframes below.
FinalALL <- ClusterFinal %>%
mutate(
over(cut_names("_EST$"),
list(LB_LQ = ~ round((.("{.x}_LB")/(.("{.x}_LB_SUM"))))/(ClusterFinal$TOTPOPRACE_LB/SUM_TOTPOP_LB), digits = 2),
UB_LQ = ~ round((.("{.x}_UB")/(.("{.x}_UB_SUM"))))/(ClusterFinal$TOTPOPRACE_UB/SUM_TOTPOP_UB), digits = 2),
)
)
> dput(head(st_drop_geometry(AOIsummary[1:20])))
structure(list(TOTPOPRACE_LB_SUM = 151945, TOTPOPRACE_EST_SUM = 155886,
TOTPOPRACE_UB_SUM = 159827, TOTPOPRACE_LB_PCT_SUM = 100,
TOTPOPRACE_UB_PCT_SUM = 100, HSPBIPOC_LB_SUM = 25871, HSPBIPOC_EST_SUM = 28037,
HSPBIPOC_UB_SUM = 30203, HSPBIPOC_LB_PCT_SUM = 17.0265556615881,
HSPBIPOC_UB_PCT_SUM = 18.8973077139657, NHSPWHTALN_LB_SUM = 124238,
NHSPWHTALN_EST_SUM = 127849, NHSPWHTALN_UB_SUM = 131460,
NHSPWHTALN_LB_PCT_SUM = 81.7651123761888, NHSPWHTALN_UB_PCT_SUM = 82.2514343634054,
NHSPBLKALN_LB_SUM = 805, NHSPBLKALN_EST_SUM = 1113, NHSPBLKALN_UB_SUM = 1421,
NHSPBLKALN_LB_PCT_SUM = 0.529796966007437, NHSPBLKALN_UB_PCT_SUM = 0.889086324588461), row.names = 1L, class = "data.frame")
> dput(head(st_drop_geometry(ClusterFinal[5:32])))
structure(list(TOTPOPRACE_EST = c(1134, 13396, 35275, 20924,
14351), TOTPOPRACE_CV = c(0.283044659944356, 0.0549543862201148,
0.0307605467404691, 0.041019248793624, 0.0462602994322089), TOTPOPRACE_SE = c(320.9726443769,
736.168957804657, 1085.07828627005, 858.286761757789, 663.88155715163
), TOTPOPRACE_SESQD = c(103023.4384383, 541944.734435195, 1177394.88733474,
736656.165408671, 440738.721926072), TOTPOPRACE_MOE = c(528,
1211, 1785, 1412, 1092), TOTPOPRACE_LB = c(606, 12185, 33490,
19512, 13259), TOTPOPRACE_UB = c(1662, 14607, 37060, 22336, 15443
), HSPBIPOC_EST = c(468, 3744, 10016, 5907, 4109), HSPBIPOC_CV = c(0.385961515530025,
0.118166392157128, 0.0833061826764167, 0.109830626279308, 0.127695904080848
), HSPBIPOC_SE = c(180.629989268052, 442.414972236288, 834.394725686989,
648.769509431874, 524.702469868202), HSPBIPOC_SESQD = c(32627.1930229765,
195731.007658835, 696214.558254266, 420901.876368474, 275312.681885792
), HSPBIPOC_MOE = c(297, 728, 1373, 1067, 863), HSPBIPOC_LB = c(171,
3016, 8643, 4840, 3246), HSPBIPOC_UB = c(765, 4472, 11389, 6974,
4972), NHSPWHTALN_EST = c(666, 9652, 25259, 15017, 10242), NHSPWHTALN_CV = c(0.411657858466369,
0.0689992419410154, 0.0363261839688132, 0.0491849956570817, 0.0531540683299724
), NHSPWHTALN_SE = c(274.164133738602, 665.98068321468, 917.563080868252,
738.611079782396, 544.403967835577), NHSPWHTALN_SESQD = c(75165.972228638,
443530.270415092, 841922.007372437, 545546.327177317, 296375.68019512
), NHSPWHTALN_MOE = c(451, 1096, 1509, 1215, 896), NHSPWHTALN_LB = c(215,
8556, 23750, 13802, 9346), NHSPWHTALN_UB = c(1117, 10748, 26768,
16232, 11138), NHSPBLKALN_EST = c(170, 180, 200, 118, 82), NHSPBLKALN_CV = c(0.557840157339532,
0.438442946192193, 0.334277417565133, 0.402329064204202, 0.574051766108857
), NHSPBLKALN_SE = c(94.8328267477204, 78.9197303145948, 66.8554835130266,
47.4748295760959, 47.0722448209263), NHSPBLKALN_SESQD = c(8993.26502896315,
6228.32383292837, 4469.65567576057, 2253.85944327935, 2215.79623248122
), NHSPBLKALN_MOE = c(156, 130, 110, 78, 77), NHSPBLKALN_LB = c(14,
50, 90, 40, 5), NHSPBLKALN_UB = c(326, 310, 310, 196, 159)), row.names = c(NA,
5L), class = "data.frame")
So far I've tried piping to both dataframes and then doing the cut_names and I've also tried creating a new dataframe with both datasets in it as a tibble and then passing that into my FinalAll code (see below). That didn't work. I'm wondering now about finding a way to grab the cut_name and use that with grep somehow to grab the specific column I need for the calculation. Any help would be much appreciated. Thanks!
CombDF <- tibble(d = paste0("d", 1:2), data = list(as_tibble(ClusterFinal),as_tibble(AOIsummary)))

Joining multiple tables using dplyr

I am working on healthcare data. For the sake of simplicity, I am providing data on only one patient ID. Every patient has a unique ID and over a period of time, the doctors monitor the BCR_ABL value as shown in the table below.
structure(list(PatientId = c("Hospital1_124", "Hospital1_124",
"Hospital1_124", "Hospital1_124", "Hospital1_124", "Hospital1_124",
"Hospital1_124"), TestDate = c("2007-11-13", "2008-09-01", "2011-02-24",
"2013-05-01", "2016-02-16", "2017-05-12", "2017-08-29"), BCR_ABL = c(0.029,
0, 0, 0, 0, 100, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), .Names = c("PatientId", "TestDate",
"BCR_ABL"))
At the start of the treatment, each patient has a BCR_ABL value of 100 and ideally post treatment, this value should drop down to 0. The patients undergo tests for BCR_ABL at various stages as shown in the TestDate column.
The patients also visit the hospital for follow up visits and this is recorded in another table which contains the followup date as well as the date of starting of the medication. The table looks like this:
structure(list(PatientId = c("Hospital1_124", "Hospital1_124",
"Hospital1_124", "Hospital1_124"), FollowupDate = structure(c(11323,
17298, 17407, 17553), class = "Date"), dateofStarting = structure(c(11323,
17318, 17318, 17318), class = "Date"), nameTKI = c("Imatinib",
"Imatinib", "Imatinib", "Imatinib"), brandTKI = c("Glivec", "Glivec",
"Glivec", "Glivec"), dailydose = c("100", "400", "400", "400"
)), class = "data.frame", row.names = c(NA, -4L), .internal.selfref = <pointer: 0x0>, .Names = c("PatientId",
"FollowupDate", "dateofStarting", "nameTKI", "brandTKI", "dailydose"
))
Now the aim of the analysis is to find out the efficacy of the drug (nameTKI) being prescribed. To my mind, the best representation would be a line graph with Date on the x-axis and BCR_ABL on the y-axis. However, I am stuck on how do I go about combining the dates. I am looking at a new table which has the following variables: PatientId, Date, BCR_ABL, nameTKI, brandTKI and dailydose. I don't think the follow up date has too much of a significance. So negelecting it, the Date variable needs to be a combination of TestDate from the first table and dateofStarting from the second table, arranged chronologically for all the individual patients (I could use group_by() for that). The value for BCR_ABL would start off as 100, till the value obtained after the first test and then follow those values for all the Date entries.
I have been trying various joins from dplyr without any success. Would appreciate some help please.
A bit hard to follow your code there, but you could join the tables together using the PatientId as the primary key. However, you should think carefully about the structure of the data as well. If the first table is at the patient/test level and the second is supposed to just be at the patient level; why are there multiple dateofStarting values for a single PatientId?
library(tidyverse)
t1 <- data.frame(PatientId = rep("Hospital1_124", 7),
TestDate = as.Date(c("2007-11-13", "2008-09-01", "2011-02-24", "2013-05-01",
"2016-02-16", "2017-05-12", "2017-08-29")),
BCR_ABL = c(0.029, 0, 0, 0, 0, 100, 0),
stringsAsFactors = FALSE)
t2 <- data.frame(PatientId = rep("Hospital1_124", 4),
FollowupDate = as.Date(c(11323, 17298, 17407, 17553), origin = "1970-01-01"),
dateofStarting = as.Date(c(11323, 17318, 17318, 17318), origin = "1970-01-01"),
nameTKI = rep("Imatinib", 4),
brandTKI = rep("Glivec", 4),
dailydose = c(100, 400, 400, 400),
stringsAsFactors = FALSE)
data <- t2 %>%
select(-FollowupDate) %>%
inner_join(t1, by = c("PatientId" = "PatientId"))

R: Smoothing Time Series Data by Item

I have a data series that displays purchases over time by item ID. Purchasing habits are irregular, so I would like to smooth this data out over time and by item ID.
If items had orders placed more regularly (i.e. Every day) we could better plot/evaluate our ordering and set stocking levels. However, some people will purchase excess of an item so they don't have to restock. This then is skewing our par level data (Since a 1 day total could really be a week's worth of product since they could only be ordering once per week.
Reproducible Example:
POData <- structure(list(a = structure(c(1499918400, 1499918400, 1499918400,
1499918400, 1499918400, 1499918400, 1496376000, 1497412800, 1497412800,
1497412800, 1497412800, 1497412800, 1497240000, 1497412800, 1497412800,
1497412800, 1501214400, 1496376000, 1496376000, 1496376000, 1496289600,
1496289600, 1496289600, 1496289600, 1496289600, 1496289600, 1501214400,
1501214400, 1501214400, 1501214400), class = c("POSIXct", "POSIXt"
), tzone = ""), b = c(446032L, 101612L, 37740L, 482207L, 152360L,
4483L, 482207L, 141729L, 81192L, 482207L, 85273L, 142955L, 460003L,
142955L, 17752L, 29763L, 309189L, 361905L, 17396L, 410762L, 437420L,
17752L, 18002L, 150698L, 163342L, 433332L, 150587L, 44159L, 433332L,
446032L), c = c(4, 1, 25, 1, 1, 1, 3, 12, 12, 1, 1, 1, 300, 1,
1, 2, 6, 6, 2, 1, 1, 1, 1, 1, 1, 1, 40, 2, 1, 2)), .Names = c("PO Date",
"PS Item ID", "PO Qty"), row.names = c(NA, 30L), class = "data.frame")
This is probably a simple question, but I hope someone has a simple way to do this.
You could use something like this
require(zoo)
require(dply)
df2 = POData %>%
arrange(`PS Item ID`,`PO Date`)%>%
group_by(`PS Item ID`)%>%
mutate(temp_lag1 = lag( `PO Qty`))%>%
mutate(temp.5.previous = rollapply(data = temp_lag1,
width = 2,
FUN = mean,
align = "left",
fill = `PO Qty`,
na.rm = T))
It essentially groups by PS Item ID and arranges by PS Item ID and PO Date. The width argument in mutate specifies how far you would like to go back for a moving average. As of now its set to 1 because your data is not that extensive by product ID.

Resources