Recursive regression using R - r

I would like to run a recursive regression using my variables residential_ddiff and interest_diff thus testing the stability of the coefficients of my variable interest_diff.
The issue occurs as I want the recursive regression to be run on the 1 lag in the window of [i:(i+10)] observations but keep getting the same error:
Error in merge.zoo(residential_ddiff[i], L(interest_diff, 1)[i:(i + 10)], :
all(sapply(args, function(x) is.zoo(x) || !is.plain(x) || (is.plain(x) && .... is not TRUE
Both time series are n=91 and stored as ts objects. I've tried using both ts and numerical objects in my loop.
I've attached a screenshot of my code. For loop in R
Gratefull for any help, thank you.
I've tried lots of different options. Both trying to coerce using the as.zoo() function as well as defining the current observation of residential_ddiff[i] both same error keep occuring.
Reproducable example:
library(dynlm)
# Creating two datasets
data_1 <- c(3.2705, -1.9710, 1.3821, 1.3194, -0.8008, 0.2832, 3.2705, -1.9710, 1.3821, 1.3194, -0.8008, 0.2832, 3.2705, -1.9710, 1.3821, 1.3194, -0.8008, 0.2832, 3.2705, -1.9710, 1.3821, 1.3194, -0.8008, 0.2832)
data_2 <- c(1.41, 0.33, 0.32, 1.53, -1.55, 0.73, 1.41, 0.33, 0.32, 1.53, -1.55, 0.73, 1.41, 0.33, 0.32, 1.53, -1.55, 0.73, 1.41, 0.33, 0.32, 1.53, -1.55, 0.73)
# Storing data in a dataframe
df <- data.frame(data_1, data_2)
# Making sure the dataframe are numeric
df1 <- mutate_all(df, function(x) as.numeric(as.character(x)))
# Creating variable to store coefficients
estimate.store <- matrix(ncol = 6, nrow = nrow(df1)-3)
# Loop begins
for (i in 1:(nrow(df1-3))) {
# Creating af dynamic linear regression with data 1 and the rekursive values of i:(i+3) for the first lag of data_2. Here the main issue arises, I think.
estimation.store <- dynlm(df1$data_1[i] ~ L(df1$data_2,1)[i:(i+3)], as.zoo(df1))
estimate.store[i,] <- c(estimation.store$coef[1], confint(estimation.store)[1,], estimation.store$coef[2], confint(estimation.store)[1,])
}

Related

Find indices of slope changes in a vector in R

I have a data frame with two columns: (1) datetimes and (2) streamflow values. I would like to create a 3rd column with indicator values to find sudden increases (usually a 0 but it is a 1 when the streamflow shows a big increase).
datetime <- as.POSIXct(c(1557439200, 1557440100, 1557441000, 1557441900,1557442800,
1557443700, 1557444600, 1557445500, 1557446400, 1557447300, 1557448200, 1557449100, 1557450000, 1557450900,
1557451800, 1557452700, 1557453600, 1557454500, 1557455400, 1557456300, 1557457200, 1557458100, 1557459000), origin = "1970-01-01")enter code here
streamflow <- c(0.35, 0.35, 0.36, 0.54, 1.0, 2.7, 8.4, 9.3, 6.2, 3.8, 4.7,
2.91, 2.01, 1.65, 1.41, 1.12, 0.95, 0.62, 0.52, 0.53, 0.53, 0.44, 0.35)
data <- data.table(as.POSIXct(datetime), as.numeric(streamflow))
I am trying to create a function that would identify the datetime of where it jumps from 0.5 to 1 because that is when the event starts. It would then stop indicating it is an event when the streamflow goes below a certain threshold.
My current idea is a function that compares the local slope between two consecutive points in streamflow to a slope of all the values of streamflow within some window, but I don't really know how to write that. Or maybe there is a better idea for how to do what I am trying to do
data = data[, delta := (V2-lag(V2))/lag(V2)][
, ind_jump := delta > 0.5
]
indices <- data[ind_jump==TRUE, V1]
Not related to this, but for some weird reason R gives
(0.54 - 0.36)/0.36 > 0.5
[1] TRUE
while
0.18/0.36 > 0.5
[1] FALSE

Continously calculate the an initial investment by its return vector

I´d need some help to create a vector that contains the value of an investment in every point in time.
Imagine, I have the return (in%) of a single stock to 10 different consecutive months. Then I got an intital value of $100 and consecutively multiply the return of period t with the value of the Investment of period t-1. The output must be a vector because I want to plot the results.
Unfortunately, I have no idea to create a code - probably its a for loop?
The Monthly return:
c(-0.09, -0.11, -0.2, -0.45, -0.11, 0.2, -0.27, -0.15, -0.24,
0.16)
Value of Investment respectively:
100*(1+(-0.09))=91
91*(1+(-0,11))= 80,99
...
Desired Output Vector:
c(91, 80.99, 64.792, …)
I´m not quite sure how to compute this vector with a loop, function or other method.
I´m very glad about any help! Cheers!
r <- c(-0.09, -0.11, -0.2, -0.45, -0.11, 0.2, -0.27, -0.15, -0.24,
0.16)
100*cumprod(1 + r)
# [1] 91.00000 80.99000 64.79200 35.63560 31.71568 38.05882 27.78294 23.61550 17.94778 20.81942
or
Reduce(function(x, r) x*(1 + r), r, init = 100, accumulate = T)
# [1] 100.00000 91.00000 80.99000 64.79200 35.63560 31.71568 38.05882 27.78294 23.61550
# [10] 17.94778 20.81942

r- The confuse details of findCorrelation() (caret package) when setting exact=True

According to the findCorrelation() document I run the official example as shown below:
Code:
library(caret)
R1 <- structure(c(1, 0.86, 0.56, 0.32, 0.85, 0.86, 1, 0.01, 0.74, 0.32,
0.56, 0.01, 1, 0.65, 0.91, 0.32, 0.74, 0.65, 1, 0.36,
0.85, 0.32, 0.91, 0.36, 1),
.Dim = c(5L, 5L))
colnames(R1) <- rownames(R1) <- paste0("x", 1:ncol(R1))
findCorrelation(R1, cutoff = .6, exact = TRUE, names = TRUE
,verbose = TRUE)
Result:
> findCorrelation(R1, cutoff = .6, exact = TRUE, names = TRUE, verbose = TRUE)
## Compare row 1 and column 5 with corr 0.85
## Means: 0.648 vs 0.545 so flagging column 1
## Compare row 5 and column 3 with corr 0.91
## Means: 0.53 vs 0.49 so flagging column 5
## Compare row 3 and column 4 with corr 0.65
## Means: 0.33 vs 0.352 so flagging column 4
## All correlations <= 0.6
## [1] "x1" "x5" "x4"
I have no idea how the computation process works, i. e. why there are first compared row 1 and column 5, and how the mean is calculated, even after I have read the source file.
I hope that someone could explain the algorithm with the help of my example.
First, it determines the average absolute correlation for each variable. Columns x1 and x5 have the highest average (mean(c(0.85, 0.56, 0.32, 0.86)) and mean(c(0.85, 0.9, 0.36, 0.32)) respectively), so it looks to remove one of these on the first step. It finds x1 to be the most globally offensive, so it removes it.
After that, it recomputes and compares x5 and x3 using the same process.
It stops after removing three columns since all pairwise correlations are below your threshold.

Sequentially re-ordering sections of a vector around NA values

I have a large set of data that I want to reorder in groups of twelve using the sample() function in R to generate randomised data sets with which I can carry out a permutation test. However, this data has NA characters where data could not be collected and I would like them to stay in their respective original positions when the data is shuffled.
With help on a previous question I have managed to shuffle the data around the NA values for a single vector of 24 values with the code:
example.data <- c(0.33, 0.12, NA, 0.25, 0.47, 0.83, 0.90, 0.64, NA, NA, 1.00, 0.42)
example.data[!is.na(example.data)] <- sample(example.data[!is.na(example.data)], replace = F, prob = NULL)
[1] 0.64 0.83 NA 0.33 0.47 0.90 0.25 0.12 NA NA 0.42 1.00
Extending from this, if I have a set of data with a length of 24 how would I go about re-ordering the first and second set of 12 values as individual cases in a loop?
For example, a vector extending from the first example:
example.data <- c(0.33, 0.12, NA, 0.25, 0.47, 0.83, 0.90, 0.64, NA, NA, 1.00, 0.42, 0.73, NA, 0.56, 0.12, 1.0, 0.47, NA, 0.62, NA, 0.98, NA, 0.05)
Where example.data[1:12] and example.data[13:24] are shuffled separately within their own respective groups around their NA values.
The code I am trying to work this solution into is as follows:
shuffle.data = function(input.data,nr,ns){
simdata <- input.data
for(i in 1:nr){
start.row <- (ns*(i-1))+1
end.row <- start.row + actual.length[i] - 1
newdata = sample(input.data[start.row:end.row], size=actual.length[i], replace=F)
simdata[start.row:end.row] <- newdata
}
return(simdata)}
Where input.data is the raw input data (example.data); nr is the number of groups (2), ns is the size of each sample (12); and actual.length is the length of each group exluding NAs stored in a vector (actual.length <- c(9, 8) for the example above).
Would anyone know how to go about achieving this?
Thank you again for your help!
I agree with Gregor's comment that it may be a better approach to work with the data in another form. However, what you need to accomplish can still be done easily enough even if all the data is in one vector.
First make a function that shuffles only non-NA values of an entire vector:
shuffle_real <- function(data){
# Sample from only the non-NA values,
# and store the result only in indices of non-NA values
data[!is.na(data)] <- sample(data[!is.na(data)])
# Then return the shuffled data
return(data)
}
Now write a function that takes in a larger vector, and applies this function to each group in the vector:
shuffle_groups <- function(data, groupsize){
# It will be convenient to store the length of the data vector
N <- length(data)
# Do a sanity check to make sure there's a match between N and groupsize
if ( N %% groupsize != 0 ) {
stop('The length of the data is not a multiple of the group size.',
call.=FALSE)
}
# Get the index of every first element of a new group
starts <- seq(from=1, to=N, by=groupsize)
# and for every segment of the data of group 'groupsize',
# apply shuffle_real to it;
# note the use of c() -- otherwise a matrix would be returned,
# where each column is one group of length 'groupsize'
# (which I note because that may be more convenient)
return(c(sapply(starts, function(x) shuffle_real(data[x:(x+groupsize-1)]))))
}
For example,
example.data <- c(0.33, 0.12, NA, 0.25, 0.47, 0.83, 0.90, 0.64, NA, NA, 1.00,
0.42, 0.73, NA, 0.56, 0.12, 1.0, 0.47, NA, 0.62, NA, 0.98,
NA, 0.05)
set.seed(1234)
shuffle_groups(example.data, 12)
which results in
> shuffle_groups(example.data, 12)
[1] 0.12 0.83 NA 1.00 0.47 0.64 0.25 0.33 NA NA 0.90 0.42 0.47 NA
[15] 0.05 1.00 0.56 0.62 NA 0.73 NA 0.98 NA 0.12
or try shuffle_groups(example.data[1:23], 12), which results in Error: The length of the data is not a multiple of the group size.

Convert R code to MATLAB [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
r code for an algorithm with fictitious data:
I am working to translate this to MATLAB but struggling with the calculation that's running inside the loop. Any help will be appreciated.
data <- c(-0.39, 0.12, 0.94, 1.67, 1.76, 2.44, 3.72,
4.28, 4.92, 5.53, 0.06, 0.48, 1.01, 1.68, 1.80,
3.25, 4.12, 4.60, 5.28, 6.22)
pi <- 0.546
sigmas1 <- 0.87
sigmas2 <- 0.77
mu1 <- numeric(0)
mu2 <- numeric(0)
r <- numeric(0)
R1 <- matrix (0 ,20 ,100)
mu1[1] <- 4.62
mu2[1] <- 1.06
for(j in 1:100){
for ( i in 1:20){
r [i] <- pi * dnorm (data[i] , mu2[j], sigmas2^(1/2))/((1- pi)*dnorm(data[i],
mu1[j], sigmas1^(1/2))+ pi*dnorm(data[i], mu2[j], sigmas2^(1/2)))
R1[i, j] <- r[i]
}
r
mu1[j+1] <- sum((1-r)*data)/sum(1-r)
mu2[j+1] <- sum(r*data)/sum(r)
Muu1 <- mu1[j+1]
Muu2 <- mu2[j+1]
}
Muu1
Muu2
x11()
layout(matrix(c(1, 2)))
plot(mu1, type="l", main="", xlab="EM Iteration for the Fictitious Data")
plot(mu2, type="l", main="", xlab='EM Iteration for the Fictitious Data')
The MATLAB equivalent of the dnorm function of R is normpdf. The arguments are the same as in R:
normpdf(X,mu,sigma)
With that the for loop can easily be adapted. As the normpdf function allows vectors as inputs, you can dump the inner for loop and use a vectorized approach instead. Always keep in mind, that * and / are the matrix multiplication and division in MATLAB. To get element-wise operators, use .* and ./ instead.
Note that in MATLAB it is better to preallocate all variables. As mu1 and mu2 go from 1 to 100, but in each step you set the value mu[j+1], it will have size 1x101. For rand R1 the size is clear i think.
All together, this would give the following code:
data = [-0.39, 0.12, 0.94, 1.67, 1.76, 2.44, 3.72,...
4.28, 4.92, 5.53, 0.06, 0.48, 1.01, 1.68, 1.80,...
3.25, 4.12, 4.60, 5.28, 6.22];
pi=0.546;
sigmas1 = 0.87;
sigmas2 = 0.77;
mu1 = zeros(1,101);
mu2 = zeros(1,101);
r = zeros(1,20);
R1 = zeros(20,100);
mu1(1) = 4.62;
mu2(1) = 1.06;
for j=1:100
r= pi*normpdf(data,mu2(j),sigmas2^(1/2)) ./ ...
((1-pi)*normpdf(data,mu1(j),sigmas1^(1/2)) + ...
pi*normpdf(data,mu2(j),sigmas2^(1/2)));
R1(:,j) = r;
mu1(j+1) = sum((1-r).*data)/sum(1-r);
mu2(j+1) = sum(r.*data)/sum(r);
end
figure;
subplot(1,2,1);
plot(mu1);
subplot(1,2,2);
plot(mu2);
If this doesn't work correctly for you, or you have any questions on the code, feel free to comment.

Resources