Related
Please have a look at the factor scores returned from robCompositions package in this example:
data(expenditures)
x <- expenditures
res.rob <- pfa(x, factors=1, score="regression")
according to pfa help, since the covariance is not specified,
the covariance is estimated from isometric log-ratio transformed data internally, but the data used for factor analysis are back-transformed to the "clr" space.
So the clr transformed data obtain as follows:
# ilr transformation
ilr <- function(x){
x.ilr=matrix(NA,nrow=nrow(x),ncol=ncol(x)-1)
for (i in 1:ncol(x.ilr)){
x.ilr[,i]=sqrt((i)/(i+1))*
log(((apply(as.matrix(x[,1:i]), 1, prod))^(1/i))/(x[,i+1]))
}
return(x.ilr)
}
#construct orthonormal basis:
#(matrix with ncol(x) rows and ncol(x)-1 columns)
V=matrix(0,nrow=ncol(x),ncol=ncol(x)-1)
for (i in 1:ncol(V)){
V[1:i,i] <- 1/i
V[i+1,i] <- (-1)
V[,i] <- V[,i]*sqrt(i/(i+1))
}
z=ilr(x) #ilr transformed data
y=z%*%t(V) #clr transformed data
now the factor scores using regression method might be calculated as follows:
loa<-c(0.970,0.830,0.986,0.876,0.977) #res.rob object
facscores<- y%*%loa
head(facscores)
-0.009485110
0.009680645
0.008426665
-0.015401000
-0.003610644
-0.004584145
but calling res.rob$scores returns us
head(res.rob$scores)
Factor1
-755.2681
705.5309
4196.5652
-778.6955
-628.2141
-663.4534
So please check am I wrong or there is probably a bug in the pfa command?
Yours,
Hamid
I am trying to reproduce the SPSS output for significance a linear trend among means when equal variances are not assumed.
I have gratefully used code from http://www-personal.umich.edu/~gonzo/coursenotes/file3.pdf to create a function for calculating separate variances, which based on my searching I understand as the “equal variances not assumed” output in SPSS.
My problem/goal:
I am only assessing polynomial orthogonal trends (mostly linear). I want to adapt the code creating the function so that the contrast argument can take pre-made contrast matrices rather than manually specifying the coefficients each time (room for typos!).
… I have tried those exact commands but receive Error in contrast %*% means : non-conformable arguments . I have played around with the code but I can’t get it to work.
Code for creating the function from the notes:
sepvarcontrast <- function(dv, group, contrast) {
means <- c(by(dv, group, mean))
vars <- c(by(dv, group, var))
ns <- c(by(dv, group, length))
ihat <- contrast %*% means
t.denominator <- sqrt(contrast^2 %*% (vars/ns))
t.welch <- ihat/ t.denominator
num.contrast <- ifelse(is.null(dim(contrast)),1,dim(contrast)[1])
df.welch <- rep(0, num.contrast)
if (is.null(dim(contrast))) contrast <- t(as.matrix(contrast))
for (i in 1:num.contrast) {
num <- (contrast[i,]^2 %*% (vars))^2
den <- sum((contrast[i,]^2 * vars)^2 / (ns-1))
df.welch[i] <- num/den
}
p.welch <- 2*(1- pt(abs(t.welch), df.welch))
result <- list(ihat = ihat, se.ihat = t.denominator, t.welch = t.welch,
df.welch = df.welch, p.welch = p.welch)
return(result)
}
I would like to be able to use the function like this:
# Create a polynomial contrast matrix for 5 groups, then save
contr.mat5 <- contr.poly(5)
# Calculate separate variance
sepvarcontrast(dv, group, contrast = contr.mat5)
I have tried those exact commands to see if they would work but receive Error in contrast %*% means : non-conformable arguments.
All suggestions are appreciated! I am still learning how to create a reprex...
I'm trying to implement my own linear regression likelihood ratio test.
The test is where you take the sum of squares of a reduced model and the sum of squares of a full model and compare it to the F statistic.
However, I am having some trouble implementing the function, especially when dealing with dummy variables.
This is the dataset I am working with and testing the function on.
Here is the code so far:
The function inputs are the setup matrix mat, the response matrix which has just one column, the indices (variables) being test, and the alpha value the test is at.
linear_regression_likelihood <- function(mat, response, indices, alpha) {
mat <- as.matrix(mat)
reduced <- mat[,c(1, indices)]
q <- 1 #set q = 1 just to test on data
p <- dim(mat)[2]
n <- dim(mat)[1]
f_stat <- qf(1-alpha, df1 = p-q, df2 = n-(p+1))
beta_hat_full <- qr.solve(t(mat)%*%mat)%*%t(mat)%*%response
y_hat_full <- mat%*%beta_hat_full
SSRes_full <- t(response - y_hat_full)%*%(response-y_hat_full)
beta_hat_red <- qr.solve(t(reduced)%*%reduced)%*%t(reduced)%*%response
y_hat_red <- reduced%*%beta_hat_red
SSRes_red <- t(response - y_hat_red)%*%(response-y_hat_red)
s_2 <- (t(response - mat%*%beta_hat_full)%*%(response - mat%*%beta_hat_full))/(n-p+1)
critical_value <- ((SSRes_red - SSRes_full)/(p-q))/s_2
print(critical_value)
if (critical_value > f_stat) {
return ("Reject H0")
}
else {
return ("Fail to Reject H0")
}
}
Here is the setup code, where I setup the matrix in the correct format. Data is the read in CSV file.
data <- data[, 2:5]
mat <- data[, 2:4]
response <- data[, 1]
library(ade4)
df <-data.frame(mat$x3)
dummy <- acm.disjonctif(df)
dummy
mat <- cbind(1, mat[1:2], dummy)
linear_regression_likelihood(mat, response, 2:3, 0.05)
This is the error I keep getting.
Error in solve.default(as.matrix(c)) : system is computationally singular: reciprocal condition number = 1.63035e-18
I know it has to do with taking the inverse of the matrix after it is multiplied, but the function is unable to do so. I thought it may be due to the dummy variables having too small of values, but I am not sure of any other way to include the dummy variables.
The test I am doing is to check whether the factor variable x3 has any affect on the response y. The actual answer which I verified using the built in functions states that we fail to reject the null hypothesis.
The error originates from line
beta_hat_full <- qr.solve(t(mat)%*%mat)%*%t(mat)%*%response
If you go through your function step-by-step you will see an error
Error in qr.solve(t(mat) %*% mat) : singular matrix 'a' in solve
The problem here is that your model matrix does not have full column rank, which translates to your regression coefficients not being unique. This is a result of the way you "dummyfied" x3. In order to ensure full rank, you need to remove one dummy column (or manually remove the intercept).
In the following example I remove the A column from dummy which means that resulting x3 coefficients measure the effect of a unit-change in B, C, and D against A.
# Read data
data <- read.csv("data_hw5.csv")
data <- data[, 2:5]
# Extract predictor and response data
mat <- data[, 2:4]
response <- data[, 1]
# Dummify categorical predictor x3
library(ade4)
df <-data.frame(mat$x3)
dummy <- acm.disjonctif(df)
dummy <- dummy[, -1] # Remove A to have A as baseline
mat <- cbind(1, mat[1:2], dummy)
# Apply linear_regression_likelihood
linear_regression_likelihood(mat, response, 2:3, 0.05);
# [,1]
#[1,] 8.291975
#[1] "Reject H0"
A note
The error could have been avoided if you had used base R's function model.matrix which ensures full rank when "dummyfying" categorical variables (model.matrix is also implicitly called in lm and glm to deal with categorical, i.e. factor variables).
Take a look at
mm <- model.matrix(y ~ x1 + x2 + x3, data = data)
which by default omits the first level of factor variable x3. mm is identical to mat after (correct) "dummification".
I'm trying to reproduce this vector (time series) calculation code:
gamma.parameters<- fitdistr(may_baseline_3months[may_baseline_3months>0],"gamma")
into a raster calculation code.
What this code originally does is trying to fit a gamma distribution by maximum likelihood estimation to a vector (time series) may_baseline_3months.
And what I want to do is to calculate the same thing but with a raster stack.
I tried doing this with calc() function:
f1<-function(x)
{
library(MASS)
return(fitdistr(x,"gamma"))
}
gamma.parameters<- calc(x = may_baseline_3months,fun = f1)
Error in .calcTest(x[1:5], fun, na.rm, forcefun, forceapply) :
cannot use this function
but it didn't work.
Note: My raster stack has only 4 layer.
EDIT
You can download a example data here spi
The fitdistr is part of the procedure of my main goal. I'm trying to calcule the Standard Precipitation Index. I already did it with a time series of a monthly precipitation of 30 year.
Here is the code for a time series till the line that I'm stock:
data<-read.csv("guatemala_spi.csv",header = T,sep=";")
dates<-data[,1]
rain_1month<-data[,2]
rain_3months<-0
#Setting the first 2 elements to NA because I'm going to calcule the accumulating the rainfall for 3 month
for (i in c(1:2)) {
rain_3months[i]<-NA
}
#Accumulating the rainfall for the rest of the data
number_of_months<-length(rain_1month)
for (j in c(3:number_of_months))
{
rain_3months[j]<-0.0
for (i in c(0:2))
{
rain_3months[j] = rain_3months[j] + rain_1month[j-i]
}
}
#Extracting a time-series for the month of interest (May)
may_rain_3months<-rain_3months[substr(dates,5,6)==”05”]
dates_may<-dates[substr(dates,5,6)==”05”]
number_of_years<-length(dates_may)
#Fitting the gama distribution by maximum likelihood estimation
start_year<-1971
end_year<-2010
start_index<-which(substr(dates_may,1,4)==start_year)
end_index<-which(substr(dates_may,1,4)==end_year)
may_baseline_3months<-may_rain_3months[start_index:end_index]
library(MASS)
gamma.parameters<-fitdistr(may_baseline_3months[may_baseline_3months>0],"gamma")
That last line is the one that I'm having problems to calculate for a raster stack.
Here's what I have so far in raster form:
Example multi-layer raster here (Monthly precipitation 2001 to 2004, 48 layers in total)
#Initiating a dates vector
dates<-c("200101","200102","200103","200104","200105","200106","200107","200108","200109","200110","200111","200112",
"200201","200202","200203","200204","200205","200206","200207","200208","200209","200210","200211","200212",
"200301","200302","200303","200304","200305","200306","200307","200308","200309","200310","200311","200312",
"200401","200402","200403","200404","200405","200406","200407","200408","200409","200410","200411","200412")
#Initiating a NA raster
rain_3months_1layer<-raster(nrow=1600, ncol=1673,extent(-118.4539, -34.80395, -50, 30),res=c(0.05,0.05))
values(rain_3months_1layer)<-NA
#Creating a raster stack NA of 48 layers
rain_3months<-stack(mget(rep( "rain_3months_1layer" , 48 )))
#Reading the data
rain_1month <- stack("chirps_rain_1month.tif")
#Accumulating the rainfall
number_of_months<-nlayers(rain_1month)
for (j in c(3:number_of_months))
{
rain_3months[[j]]<-0.0
for (i in c(0:2))
{
rain_3months[[j]] = rain_3months[[j]] + rain_1month[[j-i]]
}
}
#Extracting the raster for the month of interest (May)
may_rain_3months<-stack(rain_3months[[which(substr(dates,5,6)=="05", arr.ind = T)]])
dates_may<-dates[substr(dates,5,6)=="05"]
number_of_years<-length(dates_may)
#Fitting the gama distribution by maximum likelihood estimation
start_year<-2001
end_year<-2004
start_index<-which(substr(dates_may,1,4)==start_year)
end_index<-which(substr(dates_may,1,4)==end_year)
may_baseline_3months<-stack(may_rain_3months[[start_index:end_index]])
library(MASS)
f1<-function(x)
{
library(MASS)
return(fitdistr(x,"gamma"))
}
gamma.parameters<- calc(x = may_baseline_3months,fun = f1)
I can't make calc() to compute fitdistr() to the raster stack.
You need to make a function that calc can use. Your function f1 returns an object of class fitdistr. The calc function does not know what to do with that:
library(MASS)
set.seed(0)
x <- runif(10)
f1 <- function(x) {
return(fitdistr(x,"gamma"))
}
a <- f1(x)
class(a)
# [1] "fitdistr"
a
# shape rate
# 4.401575 6.931571
# (1.898550) (3.167113)
You need a function that returns numbers. Like f2:
f2 <- function(x) {
fitdistr(x,"gamma")$estimate
}
b <- f2(x)
class(b)
#[1] "numeric"
b
# shape rate
#4.401575 6.931571
Test f2 with calc:
library(raster)
s <- stack(lapply(1:12, function(i) setValues(r, runif(ncell(r)))))
r <- calc(s, f2)
I assume that this answers your questions. I cannot be sure because your question is way too complex. The first thing you need to do with a problems like this is to create a simple example like I have done above.
Next question
Error in stats::optim(x = c(7, 7, 7, 7), par = list(shape = Inf, rate
= Inf), : non-finite value supplied by optim.
That is a different issue, you are providing fitdistr with values it cannot deal with. You can add a try clause to skip over those. You could identify which cells this happens in and what the values are to see if there is something else you should do.
f3 <- function(x) {
x <- try (fitdistr(x,"gamma")$estimate, silent=TRUE )
if (class(x) == 'try-error') { c(-9999, -9999) } else { x }
}
x[1] <- NA
f2(x)
#Error in fitdistr(x, "gamma") : 'x' contains missing or infinite values
f3(x)
#[1] -9999 -9999
Note that you need to make sure that the number of values returned by f3 should always be the same. In this case two values. Here I use -9999 so that you can identify the cells. You can also use NA
This is a follow up question to a previous post (How to modify slots lme4 >1.0). I have a similar pairwise data structure and want the random effect to consider both "pops" in the pair. I have a functional random intercept model using the code previously suggested:
dat <- data.frame(pop1 = c(2,1,1,1,1,3,2,2,2,3,5,3,5,4,6),
pop2 = c(1,3,4,5,6,2,4,5,6,4,3,6,4,6,5),
X = c(20,25,18,40,36,70,68,72,78,76,97,100,115,110,108),
Y = c(18,16,15,40,22,18,18,18,18,45,10,47,67,5,6))
#build random effects matrix
Zl<-lapply(c("pop1","pop2"),function(nm)Matrix:::fac2sparse(dat[[nm]],"d",drop=FALSE))
ZZ<-Reduce("+",Zl[-1],Zl[[1]])
#specify model structure
mod<-lFormula(Y~X+(1|pop1),data=dat,REML=TRUE)
#replace slot
mod$reTrms$Zt <- ZZ
#fit model
dfun<-do.call(mkLmerDevfun,mod)
opt<-optimizeLmer(dfun)
mkMerMod(environment(dfun),opt,mod$reTrms,fr=mod$fr)
However, when attempting to add a random slope variable:
mod2<-lFormula(Y~X+(1+X|pop1),data=dat,REML=TRUE)
mod2$reTrms$Zt <- ZZ
dfun<-do.call(mkLmerDevfun,mod2)
Results in the same error identified in the previous post (where the issue was calling the wrong data frame): "Error in Lambdat %*% Ut :
Cholmod error 'A and B inner dimensions must match' at file ../MatrixOps/cholmod_ssmult.c, line 82"
View lm for each pop
plot(1,type="n",xlim=c(0,150),ylim=c(0,75),ylab = "Y",xlab="X")
for(i in 1:length(unique(c(dat$pop1,dat$pop2)))){
subdat<-dat[which(dat$pop1==i | dat$pop2==i),]
out<-summary(lm(subdat$Y~subdat$X))
x=subdat$X
y=x*out$coefficients[2,1]+out$coefficients[1,1]
lines(x,y,col=i))
}
legend(125,60,1:6,col=1:6,lty=1,title="Pop")
dat <- data.frame(pop1 = c(2,1,1,1,1,3,2,2,2,3,5,3,5,4,6),
pop2 = c(1,3,4,5,6,2,4,5,6,4,3,6,4,6,5),
X = c(20,25,18,40,36,70,68,72,78,76,97,100,115,110,108),
Y = c(18,16,15,32,22,29,32,38,44,45,51,47,67,59,61))
It helps to try to understand what the original code is actually doing:
## build random effects matrix
## 1. sparse dummy-variable matrices for each population ID
Zl <- lapply(dat[c("pop1","pop2")],
Matrix::fac2sparse,to="d",drop.unused.levels=FALSE)
## 2. take the sum of all components of the list of dummy-variable matrices ...
ZZ <- Reduce("+",Zl[-1],Zl[[1]])
The Reduce form is convenient in general if we have a long list, but it helps to see that in this case it's just Zl[[1]]+Zl[[2]] ...
all.equal(Zl[[1]]+Zl[[2]],ZZ) ## TRUE
What does this RE structure look like?
library(gridExtra)
grid.arrange(
image(t(Zl[[1]]),main="pop 1",sub="",xlab="pop",ylab="obs"),
image(t(Zl[[2]]),main="pop 2",sub="",xlab="pop",ylab="obs"),
image(t(ZZ),main="combined",sub="",xlab="RE",ylab="obs"),
nrow=1)
For the random slope, I think we want to take each filled element of ZZ and replace it with the X value observed for the corresponding observation/row of dat: the indexing here is a bit obscure - in this case it boils down to there being 2 filled values in each row of Z/column of Zt (the #p slot of the sparse matrix gives a zero-indexed pointer to the first non-zero element in each column ...)
vals <- dat$X[rep(1:(length(ZZ#p)-1),diff(ZZ#p))]
ZZX <- ZZ
ZZX#x <- vals
image(t(ZZX))
library(lme4)
mod <- lFormula(Y~X+(X|pop1),data=dat,REML=TRUE)
## replace slot
mod$reTrms$Zt <- rbind(ZZ,ZZX)
## fit model
dfun <- do.call(mkLmerDevfun,mod)
opt <- optimizeLmer(dfun)
m1 <- mkMerMod(environment(dfun),opt,mod$reTrms,fr=mod$fr)
This seems to work, but you should certainly check it with your own knowledge of what's supposed to be going on here ...