Dynamic linear regression loop for different order summation - r

I've been trying hard to recreate this model in R:
Model
(FARHANI 2012)
I've tried many things, such as a cumsum paste - however that would not work as I could not assign strings the correct variable as it kept thinking that L was a function.
I tried to do it manually, I'm only looking for p,q = 1,2,3,4,5 however after starting I realized how inefficient this is.
This is essentially what I am trying to do
model5 <- vector("list",20)
#p=1-5, q=0
model5[[1]] <- dynlm(DLUSGDP~L(DLUSGDP,1))
model5[[2]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2))
model5[[3]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2)+L(DLUSGDP,3))
model5[[4]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2)+L(DLUSGDP,3)+L(DLUSGDP,4))
model5[[5]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2)+L(DLUSGDP,3)+L(DLUSGDP,4)+L(DLUSGDP,5))
I'm also trying to do this for regressing DLUSGDP on DLWTI (my oil variable's name) for when p=0, q=1-5 and also p=1-5, q=1-5
cumsum would not work as it would sum the variables rather than treating them as independent regresses.
My goal is to run these models and then use IC to determine which should be analyzed further.
I hope you understand my problem and any help would be greatly appreciated.

I think this is what you are looking for:
reformulate(paste0("L(DLUSGDP,", 1:n,")"), "DLUSGDP")
where n is some order you want to try. For example,
n <- 3
reformulate(paste0("L(DLUSGDP,", 1:n,")"), "DLUSGDP")
# DLUSGDP ~ L(DLUSGDP, 1) + L(DLUSGDP, 2) + L(DLUSGDP, 3)
Then you can construct your model fitting by
model5 <- vector("list",20)
for (i in 1:20) {
form <- reformulate(paste0("L(DLUSGDP,", 1:i,")"), "DLUSGDP")
model5[[i]] <- dynlm(form)
}

Related

Monte Carlo Simulation for DCF Model in R

I am trying to create a function where Monte Carlo Simulation is applied to two of the variables in a DCF Model in R Studio. It supposed to take a first value FCF_0 and applied to it a specific growth FCF_ 0*(1 + growth), which is the first input variable until period 6, each period takes the last FCF to keep growing. After that I would like to discount it as well to get the present value which would be FCFn*(1/((1+WACC)^n)). Where WACC is the second variable to simulate.
So far I have the function to calculate the FCF but with a vector of specifics values of growth, which is the following:
What I am trying so far to create this function is this, but I think is bad.
Could you please help me to understand how to create both simulations and if it is neccesary for me to create two functions or in one function I can do everything? I would expect from the function to give the sum of all present values and each sum would be an element in a vector of 10.000 simulations. I am new at this and even though I have read almost for two weeks, I don't get how to create these simulations.
Thank you very much!
revfunc <- function(hist, growth){
rval <- c()
help <- c(hist)
for(i in growth){
help <- help*(1+i)
rval <- c(rval, help)
}
return(rval)
}
Monte Carlo Simulations
pvffcf_function <- function(fcf0, growth, wacc){
rval1 <- c()
help <- c(fcf0)
pvs <- rval1*(1/((1+wacc)^n))
random_growth <- rnorm(n=10000, mean(fcfgrowth), sd(fcfgrowth))
wacc <- rnorm(n=10000, 0.03804, 0.007711)
pvffcf <- sum(freecashflows)
for(i in growth){
help <- help*(1+i)
rval1 <- c(rval1, help)
}
return(freecashflows)
}

Get model.avg to return top model when subset size = 1

I'm a biologist looking at the habitat association of many bird species. As such, I have a list of global models for each species, and have used dredge on each of them. I now want to use model.avg to get the averaged coefficients for the top models (delta < 2).
However, for some species there is only one top model - the next best model has delta > 2. That's fine by me, but it means model.avg throws an error. I would like it to simply return the coefficients of that one top model (in the same way that mean(5) returns 5).
I could use an if clause to find such species and treat them separately, but then the output of, say, get.models, is very different to the output of model.avg and so I have to go on treating them differently right the way through which is a messy faff.
Is there a simple workaround to get model.avg (or a model.avg-like output) to work with a single top model?
My workaround:
tmp <- get.models(x, subset = delta < 2)
if (length(tmp)==1){
tmp2 <- c(tmp, tmp)
mod.avg.results <- model.avg(tmp2)
} else {mod.avg.results <- model.avg(x, subset = delta < 2)}
(this is in an llply, so x is each species' model.selection object)
Use coefTable or coef as a common interface for the component and averaged models. For instance:
coefTable(if(nrow(x) == 1)
get.models(x, 1)[[1]]
else model.avg(x))
where x is your "model.selection" table.

How to create a loop for Regression

I just started using R for statistical purposes and I appreciate any kind of help.
My task is to make calculations on one index and 20 stocks from the index. The data contains 22 columns (DATE, INDEX, S1 .... S20) and about 4000 rows (one row per day).
Firstly I imported the .csv file, called it "dataset" and calculated log returns this way and did it for all stocks "S1-S20" plus the INDEX.
n <- nrow(dataset)
S1 <- dataset$S1
S1_logret <- log(S1[2:n])-log(S1[1:(n-1)])
Secondly, I stored the data in a data.frame:
logret_data <- data.frame(INDEX_logret, S1_logret, S2_logret, S3_logret, S4_logret, S5_logret, S6_logret, S7_logret, S8_logret, S9_logret, S10_logret, S11_logret, S12_logret, S13_logret, S14_logret, S15_logret, S16_logret, S17_logret, S18_logret, S19_logret, S20_logret)
Then I ran the regression (S1 to S20) using the log returns:
S1_Reg1 <- lm(S1_logret~INDEX_logret)
I couldn't figure out how to write the code in a more efficient way and use some function for repetition.
In a further step I have to run a cross sectional regression for each day in a selected interval. It is impossible to do it manually and R should provide some quick solution. I am quite insecure about how to do this part. But I would also like to use kind of loop for the previous calculations.
Yet I lack the necessary R coding knowledge. Any kind of help top the point or advise for literature or tutorial is highly appreciated! Thank you!
You could provide all the separate dependent variables in a matrix to run your regressions. Something like this:
#example data
Y1 <- rnorm(100)
Y2 <- rnorm(100)
X <- rnorm(100)
df <- data.frame(Y1, Y2, X)
#run all models at once
lm(as.matrix(df[c('Y1', 'Y2')]) ~ X)
Out:
Call:
lm(formula = as.matrix(df[c("Y1", "Y2")]) ~ df$X)
Coefficients:
Y1 Y2
(Intercept) -0.15490 -0.08384
df$X -0.15026 -0.02471

R: Finding solutions for new x values with nlmrt

Good day,
I have tried to figure this out, but I really can't!! I'll supply an example of my data in R:
x <- c(36,71,106,142,175,210,246,288,357)
y <- c(19.6,20.9,19.8,21.2,17.6,23.6,20.4,18.9,17.2)
table <- data.frame(x,y)
library(nlmrt)
curve <- "y~ a + b*exp(-0.01*x) + (c*x)"
ones <- list(a=1, b=1, c=1)
Then I use wrapnls to fit the curve and to find a solution:
solve <- wrapnls(curve, data=table, start=ones, trace=FALSE)
This is all fine and works for me. Then, using the following, I obtain a prediction of y for each of the x values:
predict(solve)
But how do I find the prediction of y for new x values? For instance:
new_x <- c(10, 30, 50, 70)
I have tried:
predict(solve, new_x)
predict(solve, 10)
It just gives the same output as:
predict(solve)
I really hope someone can help! I know if I use the values of 'solve' for parameters a, b, and c and substitute them into the curve formula with the desired x value that I would be able to this, but I'm wondering if there is a simpler option. Also, without plotting the data first.
Predict requires the new data to be a data.frame with column names that match the variable names used in your model (whether your model has one or many variables). All you need to do is use
predict(solve, data.frame(x=new_x))
# [1] 18.30066 19.21600 19.88409 20.34973
And that will give you a prediction for just those 4 values. It's somewhat unfortunate that any mistakes in specifying the new data results in the fitted values for the original model being returned. An error message probably would have been more useful, but oh well.

How to find significant correlations in a large dataset

I'm using R.
My dataset has about 40 different Variables/Vektors and each has about 80 entries. I'm trying to find significant correlations, that means I want to pick one variable and let R calculate all the correlations of that variable to the other 39 variables.
I tried to do this by using a linear modell with one explaining variable that means: Y=a*X+b.
Then the lm() command gives me an estimator for a and p-value of that estimator for a. I would then go on and use one of the other variables I have for X and try again until I find a p-value thats really small.
I'm sure this is a common problem, is there some sort of package or function that can try all these possibilities (Brute force),show them and then maybe even sorts them by p-value?
You can use the function rcorr from the package Hmisc.
Using the same demo data from Richie:
m <- 40
n <- 80
the_data <- as.data.frame(replicate(m, runif(n), simplify = FALSE))
colnames(the_data) <- c("y", paste0("x", seq_len(m - 1)))
Then:
library(Hmisc)
correlations <- rcorr(as.matrix(the_data))
To access the p-values:
correlations$P
To visualize you can use the package corrgram
library(corrgram)
corrgram(the_data)
Which will produce:
In order to print a list of the significant correlations (p < 0.05), you can use the following.
Using the same demo data from #Richie:
m <- 40
n <- 80
the_data <- as.data.frame(replicate(m, runif(n), simplify = FALSE))
colnames(the_data) <- c("y", paste0("x", seq_len(m - 1)))
Install Hmisc
install.packages("Hmisc")
Import library and find the correlations (#Carlos)
library(Hmisc)
correlations <- rcorr(as.matrix(the_data))
Loop over the values printing the significant correlations
for (i in 1:m){
for (j in 1:m){
if ( !is.na(correlations$P[i,j])){
if ( correlations$P[i,j] < 0.05 ) {
print(paste(rownames(correlations$P)[i], "-" , colnames(correlations$P)[j], ": ", correlations$P[i,j]))
}
}
}
}
Warning
You should not use this for drawing any serious conclusion; only useful for some exploratory analysis and formulate hypothesis. If you run enough tests, you increase the probability of finding some significant p-values by random chance: https://www.xkcd.com/882/. There are statistical methods that are more suitable for this and that do do some adjustments to compensate for running multiple tests, e.g. https://en.wikipedia.org/wiki/Bonferroni_correction.
Here's some sample data for reproducibility.
m <- 40
n <- 80
the_data <- as.data.frame(replicate(m, runif(n), simplify = FALSE))
colnames(the_data) <- c("y", paste0("x", seq_len(m - 1)))
You can calculate the correlation between two columns using cor. This code loops over all columns except the first one (which contains our response), and calculates the correlation between that column and the first column.
correlations <- vapply(
the_data[, -1],
function(x)
{
cor(the_data[, 1], x)
},
numeric(1)
)
You can then find the column with the largest magnitude of correlation with y using:
correlations[which.max(abs(correlations))]
So knowing which variables are correlated which which other variables can be interesting, but please don't draw any big conclusions from this knowledge. You need to have a proper think about what you are trying to understand, and which techniques you need to use. The folks over at Cross Validated can help.
If you are trying to predict y using only one variable than you have to take the one that is mainly correlated with y.
To do this just use the command which.max(abs(cor(x,y))). If you want to use more than one variable in your model then you have to consider something like the lasso estimator
One option is to run a correlation matrix:
cor_result=cor(data)
write.csv(cor_result, file="cor_result.csv")
This correlates all the variables in the file against each other and outputs a matrix.

Resources