I run a model for multi environment analysis using the "mmer" function of the sommer package, but when I try to get the blups for random effects the following issue is shown:
predict.mmer(object = mix, classify = "Local")
fixed-effect model matrix is rank deficient so dropping 5 columns / coefficients
iteration LogLik wall cpu(sec) restrained
1 -175.248 18:50:45 2 1
2 -175.248 18:50:47 4 1
3 -175.248 18:50:48 5 1
4 -175.248 18:50:50 7 1
fixed-effect model matrix is rank deficient so dropping 5 columns / coefficients
Error in modelForMatrices$Beta[unlist(betas0[fToUse]), 1] : subscript out of bounds
In addition: Warning message:
In x[...] <- m :number of items to replace is not a multiple of replacement length
The model I adjusted in Sommer package was:
mix<-mmer(Peso~Local:Test + Local, random = ~vs(us(Local),Genotipo) + Local:Bloco, rcov = ~units, data = dados, tolparinv = 0.7)
I have three environments (Local) About 250 genotypes tested in each environment (Genotipo) Four blocks in each environment (Bloco) About 20 check treatments repeated in all environments (Test) The response variable is cassava root yield (Peso)
As I do not see the dimensions of the matrices/tables used inside the function, which resource to use in order to get the predictions?
Best
Helcio
Related
I am training a multinomial regression model using cv.glmnet, and the number of features and number of classes I have been using has been increasing. On previous versions of my training set, where I had fewer features and classes, my model converged for all lambdas after increasing the value of maxit.
However, with the training data I am using now, I get the following errors even when I increase maxit = 10^7.
Warning messages:
1: from glmnet C++ code (error code -13); Convergence for 13th lambda
value not reached after maxit=100000 iterations; solutions for larger
lambdas returned
2: from glmnet C++ code (error code -14); Convergence for 14th lambda
value not reached after maxit=100000 iterations; solutions for larger
lambdas returned
3: from glmnet C++ code (error code -13);
Convergence for 13th lambda value not reached after maxit=100000
iterations; solutions for larger lambdas returned
.
.
.
Here is code that recreates these warnings:
load(url("https://github.com/DylanDijk/RepoA/blob/main/reprod_features.rda?raw=true"))
load(url("https://github.com/DylanDijk/RepoA/blob/main/reprod_response.rda?raw=true"))
# Training the model:
model_multinom_cv = glmnet::cv.glmnet(x = reprod_features, y = reprod_response,
family = "multinomial", alpha = 1)
I was wondering if anyone had any advise on trying to get a model to converge for all lambda values in the path.
Some options I have been thinking that I will try:
Would an option be to change some of the internal parameters as
listed in the glmnet
vignette
Or to select a lambda sequence myself and then increase maxit
further. I have tried maxit = 10^8 without defining a lambda
sequence but this did not train after multiple hours.
Choose a subset of the features, I have trained the model with a small subset of the features and it the model converged for more lambda values.
But I would rather use all of the features so want to explore whether there are other options first.
Lambda path returned
Below is the lambda path returned after training my model:
> model_multinom_cv$glmnet.fit
Call: glmnet(x = train_sparse, y = train_res, trace.it = 1,
family = "multinomial", alpha = 1)
Df %Dev Lambda
1 0 0.00 0.17730
2 1 1.10 0.16150
3 2 1.88 0.14720
4 5 4.72 0.13410
5 8 8.52 0.12220
6 14 13.49 0.11130
7 21 19.90 0.10150
8 27 25.83 0.09244
9 31 30.63 0.08423
10 36 34.56 0.07674
11 41 38.61 0.06993
12 45 41.89 0.06371
I have a series of hazard-rates at two points (low and high point) in the curve with corresponding standard errors. I calculate the hazard-ratio by dividing the high point hazard-rate by the low point hazard-rate. This is the hratio column. Now in the next column I would like to show the probability (p-value) that the ratio is significantly different from 1 using the Wald-test.
I have tried doing this using the wald.test() from the aods3 package, but I keep getting an error messages. It seems that the code only allows for the comparison of two related regression models.
How would you go about doing this?
> wald
fit.low se.low fit.high se.high hratio
1 0.09387638 0.002597817 0.09530283 0.002800329 0.9850324
2 0.10941588 0.002870383 0.10831292 0.003061924 1.0101831
3 0.02549611 0.001054303 0.02857411 0.001368525 0.8922802
4 0.02818208 0.000917136 0.02871669 0.000936373 0.9813833
5 0.04857652 0.000554676 0.04897211 0.000568229 0.9919222
6 0.05121328 0.000565592 0.05142951 0.000554893 0.9957956
> library(aods3)
> wald$pv <- wald.test(b=wald$hratio)
Error in wald.test(b = wald$hratio) :
One of the arguments Terms or L must be used.
define L=NULL, Terms=NULL, Sigma = vcov(b)
My data frame looks like:
head(bush_status)
distance status count
0 endemic 844
1 exotic 8
5 native 3
10 endemic 5
15 endemic 4
20 endemic 3
The count data is non-normally distributed. I'm trying to fit a generalized additive model to my data in two ways so i can use anova to see if the p-value supports m2.
m1 <- gam(count ~ s(distance) + status, data=bush_status, family="nb")
m2 <- gam(count ~ s(distance, by=status) + status, data=bush_status, family="nb")
m1 works fine, but m2 sends the error message:
"Error in smoothCon(split$smooth.spec[[i]], data, knots, absorb.cons,
scale.penalty = scale.penalty, :
Can't find by variable"
This is pretty beyond me so if anyone could offer any advice that would be much appreciated!
From your comments it became clear that you passed a character variable to by in the smoother. You must pass a factor variable there. This has been a frequent gotcha for me too and I consider it a design flaw (because base R regression functions deal with character variables just fine).
I'm kind of new to R and machine learning in general, so apologies if this seems stupid!
I'm using the e1071 package to tune the parameters of various models. My dataset is very unbalanced and I would like for the error criterion to be Balanced Error Rate... NOT overall classification error. However, I'm stumped as how to achieve this.
Here is my code:
#Find optimal value 'k' value for k-NN model (feature subset).
c <- data_train_sub[1:13]
d <- data_train_sub[,14]
knn2 <- tune.knn(c, d, k = 1:10, tunecontrol = tune.control(sampling = "cross", performances = TRUE, sampling.aggregate = mean)
)
summary(knn2)
plot(knn2)
Which returns this:
Parameter tuning of ‘knn.wrapper’:
- sampling method: 10-fold cross validation
- best parameters:
k
1
- best performance: 0.001190476
- Detailed performance results:
k error dispersion
1 1 0.001190476 0.003764616
2 2 0.005952381 0.006274360
3 3 0.003557423 0.005728122
4 4 0.005924370 0.008352124
5 5 0.005938375 0.008407043
6 6 0.005938375 0.008407043
7 7 0.007128852 0.008315090
8 8 0.009495798 0.009343555
9 9 0.008305322 0.009751997
10 10 0.008319328 0.009795292
Has anyone any experience of altering the error being assessed in this function?
Look at the class.weights argument of the svm() function:
a named vector of weights for the different classes, used for asymmetric class sizes...
Coefficient can easily be calculated as such:
class.weights = table(Xcal$species)/sum(table(Xcal$species))
I am trying to build a for() loop to manually conduct leave-one-out cross validations for a GLMM fit using the lmer() function from the lme4 pkg. I need to remove an individual, fit the model and use the beta coefficients to predict a response for the individual that was withheld, and repeat the process for all individuals.
I have created some test data to tackle the first step of simply leaving an individual out, fitting the model and repeating for all individuals in a for() loop.
The data have a binary (0,1) Response, an IndID that classifies 4 individuals, a Time variable, and a Binary variable. There are N=100 observations. The IndID is fit as a random effect.
require(lme4)
#Make data
Response <- round(runif(100, 0, 1))
IndID <- as.character(rep(c("AAA", "BBB", "CCC", "DDD"),25))
Time <- round(runif(100, 2,50))
Binary <- round(runif(100, 0, 1))
#Make data.frame
Data <- data.frame(Response, IndID, Time, Binary)
Data <- Data[with(Data, order(IndID)), ] #**Edit**: Added code to sort by IndID
#Look at head()
head(Data)
Response IndID Time Binary
1 0 AAA 31 1
2 1 BBB 34 1
3 1 CCC 6 1
4 0 DDD 48 1
5 1 AAA 36 1
6 0 BBB 46 1
#Build model with all IndID's
fit <- lmer(Response ~ Time + Binary + (1|IndID ), data = Data,
family=binomial)
summary(fit)
As stated above, my hope is to get four model fits – one with each IndID left out in a for() loop. This is a new type of application of the for() command for me and I quickly reached my coding abilities. My attempt is below.
fit <- list()
for (i in Data$IndID){
fit[[i]] <- lmer(Response ~ Time + Binary + (1|IndID), data = Data[-i],
family=binomial)
}
I am not sure storing the model fits as a list is the best option, but I had seen it on a few other help pages. The above attempt results in the error:
Error in -i : invalid argument to unary operator
If I remove the [-i] conditional to the data=Data argument the code runs four fits, but data for each individual is not removed.
Just as an FYI, I will need to further expand the loop to:
1) extract the beta coefs, 2) apply them to the X matrix of the individual that was withheld and lastly, 3) compare the predicted values (after a logit transformation) to the observed values. As all steps are needed for each IndID, I hope to build them into the loop. I am providing the extra details in case my planned future steps inform the more intimidate question of leave-one-out model fits.
Thanks as always!
The problem you are having is because Data[-i] is expecting i to be an integer index. Instead, i is either AAA, BBB, CCC or DDD. To fix the loop, set
data = Data[Data$IndID != i, ]
in you model fit.