Hand calculated Variogram in R - r

I am trying to calculate a variogram in R by "hand" to create a vector of several values at different lags, k.
Essentially I am trying to calculate this:
Gk = Var (yt+k − yt)/ Var (yt+1 − yt), k = 1, 2, … , 12
Where the time series observations are represented by yt.
My attempt below (where diff.df is a vector of differenced data length 72.
k <- seq(1,12,1)
x <- seq(1,length(diff.df)-length(k),1)
vario <- var(diff.df[x+k,]-diff.df[x,])/ var(diff.df[x+1,]-dif[x])
is producing an error of "incorrect number of dimensions"
I'm unsure what I am doing wrong here. any help would be appreciated.
Thanks.

I realized that I was acting as if the vector was a data frame ([x,] as opposed to simply looking up index of x, [x])
for (i in 1:70){
x <- seq(1,71-i,1)
vario[i] <- var(diff2.df[x+i]-diff2.df[x])/ var(diff2.df[x+1]-diff2.df[x])
}
vario
plot(vario,type = 'l',xlab = 'Lag', ylab = 'Variogram')
is the final working code for anyone that cares.

Related

Mclust() - NAs in model selection

I recently tried to perform a GMM in R on a multivariate matrix (400 obs of 196 var), which elements belong to known categories. The Mclust() function (from package mclust) gave very poor results (around 30% of individuals were well classified, whereas with k-means the result reaches more than 90%).
Here is my code :
library(mclust)
X <- read.csv("X.csv", sep = ",", h = T)
y <- read.csv("y.csv", sep = ",")
gmm <- Mclust(X, G = 5) #I want 5 clusters
cl_gmm <- gmm$classification
cl_gmm_lab <- cl_gmm
for (k in 1:nclusters){
ii = which(cl_gmm == k) # individuals of group k
counts=table(y[ii]) # number of occurences for each label
imax = which.max(counts) # Majority label
maj_lab = attributes(counts)$dimnames[[1]][imax]
print(paste("Group ",k,", majority label = ",maj_lab))
cl_gmm_lab[ii] = maj_lab
}
conf_mat_gmm <- table(y,cl_gmm_lab) # CONFUSION MATRIX
The problem seems to come from the fact that every other model than "EII" (spherical, equal volume) is "NA" when looking at gmm$BIC.
Until now I did not find any solution to this problem...are you familiar with this issue?
Here is the link for the data: https://drive.google.com/file/d/1j6lpqwQhUyv2qTpm7KbiMRO-0lXC3aKt/view?usp=sharing
Here is the link for the labels: https://docs.google.com/spreadsheets/d/1AVGgjS6h7v6diLFx4CxzxsvsiEm3EHG7/edit?usp=sharing&ouid=103045667565084056710&rtpof=true&sd=true
I finally found the answer. GMMs simply cannot apply every model when two much explenatory variables are involved. The right thing to do is first reduce dimensions and select an optimal number of dimensions that make it possible to properly apply GMMs while preserving as much informations as possible about the data.

Calculating Cook's Distance in R manually...running into issues with the for loop

I have been trying to calculate Cook's distance manually for a multiple linear regression dataset, but running into problems with the for loop. What I have been doing is this:
This is the original linear model, and the associated fitted values, length = 'n'.
{fitted = lm10$fitted.values}
This is the new, n X n, blank matrix, I created to hold the new fitted values.
{lev.mat <- matrix(rep(0, nrow(X.des)^2), nrow = nrow(X.des))}
I wanted to save time, so I filled in the first column of the matrix manually.
{newData = as.data.frame(X.des[-1,])
newModel = lm(fev~., data = newData - 1)
newFitted = newModel$fitted.values
newDist = c(fitted[1],newFitted)
lev.mat[,1] = newDist}
I then tried to fill in the rest of the columns of the lev.mat similarly, using the for loop.
for(i in 2:nrow(lev.mat)){
newData = as.data.frame(X.des[-i, ])
newModel = lm(fev~., data = newData - 1)
newFitted = newModel$fitted.values
newDist = c(newFitted[1:(i-1)],fitted[i],newFitted[i:length(newFitted)])
lev.mat[,i] = newDist
}
But I keep getting this error repeatedly:
{Error in lev.mat[, i] <- newDist :
number of items to replace is not a multiple of replacement length}
I have been at this for three hours now, and it's getting frustrating. Can anybody point out the error and help me move along? My net steps are to calculate the difference between the original fitted values and each column of values in the new fitted values matrix, sum the differences, and divide by the product of the number of predictors and the MSE.
Thanks!
Thanks a lot to #Harlan Nelson for providing me with a wonderful link! I used the background provided in the link here to complete my work. Here is the rest of my code:
Hmat = hatvalues(lm10)
Leverage = Hmat/(1 - Hmat)
mse = (lm10$residuals)^2/var(lm10$residuals)
CooksD <- (1/6)*(mse)*Leverage
lm10 was the name of my linear model, and I had 6 predictors in the model. This helped me calculate Cook's Distance for the model. Thanks again!

Creating a matrix in R

Imagine that I have two observed variables, interest and M2, both with length 1000.
I am trying to apply a function taking each observation of interest and M2 to generate a matrix of values called PCoCo. The matrix will have a 1000x1000 length.
The purpose is to generate a 3D surface plot with x = interest, y = M2, z = PCoCo.
I have the following code:
#Creating a matrix from 2 variables; interest, M2, and a function CoCo.Price.
interest = seq(0,0.1, length = 1000)
M2 = seq(0,10, length = 1000)
#PCoCo = price of the coco, should be a matrix
PCoCo = matrix(nrow=length(interest), ncol=(M2))
f = function(interest, M2){
for(i in 1:length(interest)){
for(j in 1:length(M2)){
PCoCo[j,i] = CoCo.Price(C.p, c.r, m, N, q, interest[i], S, S.Trigger,
sigma, M2[j])
}
}
}
z = outer(interest, M2, f)
I used to get the following error before assigning PCoCo as a matrix before running f:
Error in PCoCo[j, i] <- matrix(CoCo.Price(C.p, c.r, m, N, q, interest[i], :
incorrect number of subscripts on matrix
Now, I do not, however R is taking ages to compute the matrix.
I have tried reducing the dimensions to 100x100 but it is still taking very long.
The issue might be with the CoCo.Price function as it is about a 100 line function, which also has some foor loops in it.
Any advice?
Thanks.

How to work with binary contraints in linear optimization?

I have two input matrices, dt(10,3) & wt(3,3), that i need to use to find the optimal decision matrix (same dimension), Par(10,3) so as to maximize an objective function. Below R code would give some direction into the problem (used Sample inputs here) -
#Input Matrices
dt <- matrix(runif(300),100,3)
wt <- matrix(c(1,0,0,0,2,0,0,0,1),3,3) #weights
#objective function
Obj <- function(Par) {
P = matrix(Par, nrow = 10, byrow=F) # Reshape
X = t((dt%*%wt)[,1])%*%P[,1]
Y = t((dt%*%wt)[,2])%*%P[,2]
Z = t((dt%*%wt)[,3])%*%P[,3]
as.numeric(X+Y+Z) #maximize
}
Now I am struggling to apply the following constraints to the problem :
1) Matrix, Par can only have binary values (0 or 1)
2) rowSums(Par) = 1 (Basically a row can only have 1 in one of the three columns)
3) colSums(Par[,1]) <= 5, colSums(Par[,2]) <= 6, & colSums(Par[,3]) <= 4
4) X/(X+Y+Z) < 0.35, & Y/(X+Y+Z) < 0.4 (X,Y,Z are defined in the objective function)
I tried coding the constraints in constrOptim, but not sure how to input binary & integer constraints. I am reading up on lpSolve, but not able to figure out. Any help much appreciated. Thanks!
I believe this is indeed a MIP so no issues with convexity. If I am correct the model can look like:
This model can be easily transcribed into R. Note that LP/MIP solvers do not use functions for the objective and constraints (opposed to NLP solvers). In R typically one builds up matrices with the LP coefficients.
Note: I had to make the limits on the column sums much larger (I used 50,60,40).
Based on Erwin's response, I am able to formulate the model using lpSolve in R. However still struggling to add the final constraint to the model (4th constraint in my question above). Here's what I am able to code so far :
#input dimension
r <- 10
c <- 3
#input matrices
dt <- matrix(runif(r*c),r,c)
wt <- matrix(c(1,0,0,0,2,0,0,0,1),3,3) #weights
#column controller
c.limit <- c(60,50,70)
#create structure for lpSolve
ncol <- r*c
lp.create <- make.lp(ncol=ncol)
set.type(lp.create, columns=1:ncol, type = c("binary"))
#create objective values
obj.vals <- as.vector(t(dt%*%wt))
set.objfn(lp.create, obj.vals)
lp.control(lp.create,sense='max')
#Add constraints to ensure sum of parameters for every row (rowSum) <= 1
for (i in 1:r){
add.constraint(lp.create, xt=c(1,1,1),
indices=c(3*i-2,3*i-1,3*i), rhs=1, type="<=")
}
#Add constraints to ensure sum of parameters for every column (colSum) <= column limit (defined above)
for (i in 1:c){
add.constraint(lp.create, xt=rep(1,r),
indices=seq(i,ncol,by=c), rhs=c.limit[i], type="<=")
}
#Add constraints to ensure sum of column objective (t((dt%*%wt)[,i])%*%P[,i) <= limits defined in the problem)
#NOT SURE HOW TO APPLY A CONSTRAINT THAT IS DEPENDENT ON THE OBJECTIVE FUNCTION
solve(lp.create)
get.objective(lp.create) #20
final.par <- matrix(get.variables(lp.create), ncol = c, byrow=T) # Reshape
Any help that can get me to the finish line is much appreciated :)
Thanks

In R, how do I find the optimal variable to minimise the correlation between two datasets [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, how do I find the optimal variable to maximize or minimize correlation between several datasets
This can be done in Excel, but my dataset has gotten too large. In excel, I would use solver.
I have 5 variables and I want to recreate a weighted average of these 5 variables so that they have the lowest correlation to a 6th variable.
Column A,B,C,D,E = random numbers
Column F = random number (which I want to minimise the correlation to)
Column G = Awi1+Bwi2+C*2i3+D*wi4+wi5*E
where wi1 to wi5 are coefficients resulted from solver In a separate cell, I would have correl(F,G)
This is all achieved with the following constraints in mind:
1. A,B,C,D, E have to be between 0 and 1
2. A+B+C+D+E= 1
I'd like to print the results of this so that I can have an efficient frontier type chart.
How can I do this in R? Thanks for the help.
I looked at the other thread mentioned by Vincent and I think I have a better solution. I hope it is correct. As Vincent points out, your biggest problem is that the optimization tools for such non-linear problems do not offer a lot of flexibility for dealing with your constraints. Here, you have two types of constraints: 1) all your weights must be >= 0, and 2) they must sum to 1.
The optim function has a lower option that can take care of your first constraint. For the second constraint, you have to be a bit creative: you can force your weights to sum to one by scaling them inside the function to be minimized, i.e. rewrite your correlation function as function(w) cor(X %*% w / sum(w), Y).
# create random data
n.obs <- 100
n.var <- 6
X <- matrix(runif(n.obs * n.var), nrow = n.obs, ncol = n.var)
Y <- matrix(runif(n.obs), nrow = n.obs, ncol = 1)
# function to minimize
correl <- function(w)cor(X %*% w / sum(w), Y)
# inital guess
w0 <- rep(1 / n.var, n.var)
# optimize
opt <- optim(par = w0, fn = correl, method = "L-BFGS-B", lower = 0)
optim.w <- opt$par / sum(opt$par)

Resources