I have this dataframe that I applied multinom function
df = data.frame(x = c('a','a','b','b','c','c','d','d','d','e','e','f','f',
'f','f','g','g','g','h','h','h','h','i','i','j','j'),
y = c(1,2,1,3,1,2,1,4,5,1,2,2,3,4,5,1,1,2,1,2,2,3,2,2,3,4) )
df$y = factor(df$y,ordered = TRUE)
nnet::multinom(y~x, data = df)
when checking the output, I have all the variables with their coefficients (meaning everything is fine)
Coefficients:
(Intercept) xb xc xd xe xf
2 -6.045294e-05 -31.83512 3.800915e-05 -36.67053 3.800915e-05 25.00515
3 -1.613311e+01 16.13310 -1.725649e+01 -21.06832 -1.725649e+01 41.13825
4 -1.692352e+01 -14.71119 -1.428100e+01 16.92351 -1.428100e+01 41.92865
5 -2.129358e+01 -10.49359 -1.002518e+01 21.29353 -1.002518e+01 46.29867
xg xh xi xj
2 -0.6931261 0.6932295 40.499799 -25.311410
3 -24.0387863 16.1332150 -8.876562 45.191730
4 -20.2673490 -16.0884760 -6.394423 45.982129
5 -15.1755064 -11.8589447 -4.563793 -6.953942
but my original dataframe (will share only the output) that is coded as the dependent and independent variables from the df dataframe (meaning as ordinal factors) and all the analysis is well done but when it comes to interpretation I have this output :
Coefficients:
(Intercept) FIES_R.L FIES_R.Q FIES_R.C FIES_R^4 FIES_R^5
2 -0.09594409 -1.303256 0.03325169 -0.1753022 -0.46026668 -0.282463422
3 -0.18587599 -1.469957 0.42005569 -0.2977628 0.00508412 0.003068678
4 -0.58189239 -2.875183 0.33128994 -0.6787992 0.11145099 0.239368520
5 -2.68727952 -10.178604 -5.12515249 -5.8454920 -3.13775961 -1.820629143
FIES_R^6 FIES_R^7 FIES_R^8
2 -0.2179067 -0.1000471 -0.1489342
3 0.1915476 -0.5483707 -0.2565626
4 0.2585801 0.3821566 -0.2679774
5 -0.5562958 -0.6335412 -0.7205215
I don't want FIES_R.L,FIES_R.Q and FIES_R.C. I want them as : FIES_R_1, FIES_R_2, FIES_R_3, FIES_R_4, FIES_R_5, FIES_R_6, FIES_R_7, FIES_R_8,
why I have such an output ? knowing that the two dataframes include ordinal categorical variables and the x variable and the FIES variable include many categories in both dataframes. Thanks
I just figured it out : because the independent variable is an ordinal factor. Meaning FIES in my dataset in an ordinal factor. When I used the argument ordered = FALSE the problem got solved
You can change the coefnames "by hand":
mod = nnet::multinom(y~x, data = df)
mod$vcoefnames = c("(Intercept)", paste0(substr(mod$vcoefnames, 1, 6), "_", 1:8))
Related
I run a CV Lasso with the cv.gamlr function in R. I can get the coefficients for the lambdas that correspond to the “1se” or “min” criterion.
set.seed(123)
lasso<-cv.gamlr(x = X, y = Y, family ='binomial')
coef(lasso,select = "1se")
coef(lasso,select = "min")
But what if I want to obtain the coefficients for a specific lambda, stored in the lasso$gamlr$lambda vector? Is it possible to obtain them?
For example, to get the coefficients for the first lambda in the model... Something like this:
lambda_100<- lasso$gamlr$lambda[100]
coef(lasso,select = lambda_100)
Of course, this sends the following error:
Error in match.arg(select) : 'arg' must be NULL or a character vector
Thanks :)
The coefficients are stored under lasso$gamlr$beta, in your example, you can access them like this:
library(gamlr)
x = matrix(runif(500),ncol=5)
y = rnorm(100)
cvfit <- cv.gamlr(x, y, gamma=1)
dim(cvfit$gamlr$beta)
[1] 5 100
length(cvfit$gamlr$lambda)
[1] 100
cvfit$gamlr$lambda[100]
seg100
0.00125315
cvfit$gamlr$beta[,drop=FALSE,100]
5 x 1 sparse Matrix of class "dgCMatrix"
seg100
1 0.12960060
2 -0.16406246
3 -0.46566731
4 0.08197053
5 -0.54170494
Or if you prefer it in a vector:
cvfit$gamlr$beta[,100]
1 2 3 4 5
0.12960060 -0.16406246 -0.46566731 0.08197053 -0.54170494
I must do a Lasso regression with the package glmnetand I have problems to generate my x model.matrix
My data.frame: 108 observations, Y response variable, 24 predictors, here is an overview:
CONVENTIONAL_HUmin CONVENTIONAL_HUmean CONVENTIONAL_HUstd CONVENTIONAL_HUmax
1 37.9400539686119 63.4903779286635 11.7592095845857 85.2375439991287
2 23.8400539686119 80.5903779286635 15.0592095845857 125.837543999129
3 19.3035945249441 73.2764716205565 12.8816244173147 130.24141901586
CONVENTIONAL_HUQ1 CONVENTIONAL_HUQ2 CONVENTIONAL_HUQ3 HISTO_Skewness HISTO_Kurtosis
1 54.9938390994964 65.4873070322704 72.8863025473031 -0.203420585259268 2.25208159159488
2 70.8938390994964 80.3873070322704 91.4863025473031 -0.117420585259268 2.91208159159488
3 64.4689755423307 73.8666609177099 81.7351818199415 -0.0908104900456161 2.8751327713366
HISTO_ExcessKurtosis HISTO_Entropy_log10 HISTO_Entropy_log2 HISTO_Energy...Uniformity.
1 -0.751917020142877 0.701345471328916 2.32782599847774 0.219781577333287
2 -0.0887170201428774 0.793345471328916 2.63782599847774 0.184781577333287
3 -0.127231561113029 0.738530858918985 2.45445652190669 0.206887426065656
GLZLM_SZE GLZLM_LZE GLZLM_LGZE GLZLM_HGZE GLZLM_SZLGE
1 0.366581916604228 35.7249100350856 8.7285612359045e-05 11497.6407737833 3.22615226279017e-05
2 0.693581916604228 984.424910035086 8.5685612359045e-05 11697.6407737833 5.98615226279017e-05
3 0.622711792823853 1103.10288991619 8.5573088970709e-05 11571.7421733917 5.33303855950858e-05
GLZLM_SZHGE GLZLM_LZLGE GLZLM_LZHGE GLZLM_GLNU GLZLM_ZLNU
1 4164.91570215061 0.00314512237564268 405585.990838764 2.66964898745512 2.47759091065361
2 8064.91570215061 0.0835651223756427 11581585.9908388 12.9796489874551 38.5375909106536
3 7295.45317481887 0.0949686480587339 12926109.9421091 15.0930512668698 37.6083347285291
GLZLM_ZP Y
1 0.219643444043173 1
2 0.112643444043173 0
3 0.104031438564764 0
My code for the model.matrix
x=model.matrix(Y~.,data=data.det)
It générale a very large model.matrix with 244728 elements ! It seems that it has duplicated a hundred times each predictor of the 24 !
Here's an overview of the data.matrix:
(Intercept) CONVENTIONAL_HUmin-10.5599460313881
CONVENTIONAL_HUmin-117.359946031388 CONVENTIONAL_HUmin-13.0599460313881
CONVENTIONAL_HUmin-154.359946031388 CONVENTIONAL_HUmin-17.6599460313881
CONVENTIONAL_HUmin-18.3599460313881 CONVENTIONAL_HUmin-2.87994603138811
CONVENTIONAL_HUmin-21.281710504529 CONVENTIONAL_HUmin-28.3599460313881
CONVENTIONAL_HUmin-3.44994603138811 CONVENTIONAL_HUmin-3.89640547505594
CONVENTIONAL_HUmin-67.0599460313881 CONVENTIONAL_HUmin-682.359946031388
CONVENTIONAL_HUmin-9.08171050452898 CONVENTIONAL_HUmin1.04428949547101
CONVENTIONAL_HUmin1.63928949547101 CONVENTIONAL_HUmin10.8400539686119
CONVENTIONAL_HUmin10.968289495471 CONVENTIONAL_HUmin11.5400539686119
CONVENTIONAL_HUmin11.618289495471 CONVENTIONAL_HUmin11.6400539686119
CONVENTIONAL_HUmin12.518289495471 CONVENTIONAL_HUmin12.5400539686119
CONVENTIONAL_HUmin13.4400539686119 CONVENTIONAL_HUmin13.6400539686119
CONVENTIONAL_HUmin13.7400539686119 CONVENTIONAL_HUmin13.818289495471
CONVENTIONAL_HUmin14.5400539686119 CONVENTIONAL_HUmin14.6693017607572
CONVENTIONAL_HUmin14.8400539686119 CONVENTIONAL_HUmin16.9400539686119
CONVENTIONAL_HUmin17.0400539686119 CONVENTIONAL_HUmin17.618289495471
CONVENTIONAL_HUmin18.2400539686119 CONVENTIONAL_HUmin18.8400539686119
CONVENTIONAL_HUmin19.3035945249441 CONVENTIONAL_HUmin20.0400539686119
CONVENTIONAL_HUmin20.818289495471 CONVENTIONAL_HUmin21.0400539686119
CONVENTIONAL_HUmin21.118289495471 CONVENTIONAL_HUmin21.3400539686119
CONVENTIONAL_HUmin21.5400539686119 CONVENTIONAL_HUmin21.9400539686119
...
attr(,"contrasts")$CONVENTIONAL_HUmin
[1] "contr.treatment"
Not convenient at all because I end up with much more predictors in the input x for Lasso Regression which makes hazardous selection of the predictors even more present
Have you an idea of the source of the dysfunction ? any suggestion to fix that ?
Try this, you want a matrix not a model matrix...
# make a matrix of your predictors minus your outcome
x <- as.matrix(data.detect[-25])
# put the y column in a vector
y <- data.detect$Y
# run it
fit.lasso <- glmnet(x, y, family = "binomial", alpha = 1)
I have the following data frame:
lm mean resids sd resids resid 1 resid 2 resid 3 intercept beta
1 0.000000e+00 6.2806844 -3.6261548 7.2523096 -3.6261548 103.62615 24.989340
2 -2.960595e-16 8.7515899 -5.0527328 10.1054656 -5.0527328 141.96786 -1.047323
3 -2.960595e-16 5.9138984 -3.4143908 6.8287817 -3.4143908 206.29046 -26.448694
4 3.700743e-17 0.5110845 0.2950748 -0.5901495 0.2950748 240.89801 -35.806642
5 7.401487e-16 6.6260504 3.8255520 -7.6511040 3.8255520 187.03479 -23.444762
6 5.921189e-16 8.7217431 5.0355007 -10.0710014 5.0355007 41.43239 3.138396
7 0.000000e+00 5.5269434 3.1909823 -6.3819645 3.1909823 -119.90628 27.817845
8 -1.480297e-16 1.0204260 -0.5891432 1.1782864 -0.5891432 -180.33773 35.623363
9 -5.921189e-16 6.9488186 -4.0119023 8.0238046 -4.0119023 -64.72245 21.820226
10 -8.881784e-16 8.6621512 -5.0010953 10.0021906 -5.0010953 191.65339 -5.218767
Each row represents an estimated linear model with window length 3. I used rollapply on a separate dataframe with the function lm(y~t) to extract the coefficients and intercepts into a new dataframe, which I have combined with the residuals from the same model and their corresponding means and residuals.
Since the window length is 3, it implies that there are 3 residuals as shown, per model, in resid 1, resid 2 and resid 3. The mean and sd of these are included accordingly.
I am seeking to predict the next observation, in essence, k+1, where k is the window length, using the intercept and beta.
Recall that lm1 takes observations 1,2,3 to estimate the intercept and the beta, and lm2 takes 2,3,4, lm3 takes 3,4,5, etc. The function for the prediction should be:
predict_lm1 = intercept_lm1 + beta_lm1*(k+1)
Where k+1 = 4. For lm2:
predict_lm2 = intercept_lm2 + beta_lm2*(k+1)
Where k+1 = 5.
Clearly, k increases by 1 every time I move down one row in the dataset. This is because the explanatory variable is time, t, which is a sequence increasing by one per observation.
Should I use a for loop, or an apply function here?
How can I make a function that iterates down the rows and calculates the predictions accordingly with the information found in that row?
Thanks.
EDIT:
I managed to find a possible solution by writing the following:
n=nrow(dataset)
for(i in n){
predictions = dataset$Intercept + dataset$beta*(k+1)
}
However, k does not increase by 1 per iteration. Thus, k+1 is always = 4.
How can I make sure k increases by 1 accordingly?
EDIT 2
I managed to add 1 to k by writing the following:
n=nrow(dataset)
for(i in n){
x = 0
x[i] = k + 1
preds = dataset$`(Intercept)` + dataset$t*(x[i])
}
However, the first prediction is overestimated. It should be 203, whereas it is estimated as 228, implying that it sets the explanatory variable as 1 too high.
Yet, the second prediction is correct. I am not sure what I am doing wrong. Any advice?
EDIT 3
I managed to find a solution as follows:
n=nrow(dataset)
for(i in n){
x = k + 1
preds = dataset$`(Intercept)` + dataset$t*(x)
x = x + 1
}
Your loop is not iterating:
dataset <- read.table(text="lm meanresids sdresids resid1 resid2 resid3 intercept beta
1 0.000000e+00 6.2806844 -3.6261548 7.2523096 -3.6261548 103.62615 24.989340
2 -2.960595e-16 8.7515899 -5.0527328 10.1054656 -5.0527328 141.96786 -1.047323
3 -2.960595e-16 5.9138984 -3.4143908 6.8287817 -3.4143908 206.29046 -26.448694
4 3.700743e-17 0.5110845 0.2950748 -0.5901495 0.2950748 240.89801 -35.806642
5 7.401487e-16 6.6260504 3.8255520 -7.6511040 3.8255520 187.03479 -23.444762
6 5.921189e-16 8.7217431 5.0355007 -10.0710014 5.0355007 41.43239 3.138396
7 0.000000e+00 5.5269434 3.1909823 -6.3819645 3.1909823 -119.90628 27.817845
8 -1.480297e-16 1.0204260 -0.5891432 1.1782864 -0.5891432 -180.33773 35.623363
9 -5.921189e-16 6.9488186 -4.0119023 8.0238046 -4.0119023 -64.72245 21.820226
10 -8.881784e-16 8.6621512 -5.0010953 10.0021906 -5.0010953 191.65339 -5.218767", header=T)
n <- nrow(dataset)
predictions <- data.frame()
for(i in 1:n){
k <- i ##not sure where k is coming from but put it here
predictions <- rbind(predictions, dataset$intercept[i] + dataset$beta[i]*(k+1))
}
predictions
I am trying to perform lm() and segmented() in R using the same independent variable (x) and multiple dependent response variables (Curve1, Curve2, etc.) one by one. I wish to extract the estimated break point and model coefficients for each response variable. I include an example of my data below.
x Curve1 Curve2 Curve3
1 -0.236422 98.8169 95.6828 101.7910
2 -0.198083 98.3260 95.4185 101.5170
3 -0.121406 97.3442 94.8899 100.9690
4 0.875399 84.5815 88.0176 93.8424
5 0.913738 84.1139 87.7533 93.5683
6 1.795530 73.3582 78.1278 82.9956
7 1.833870 72.8905 77.7093 82.7039
8 1.872200 72.4229 77.3505 82.4123
9 2.907350 59.2070 67.6652 74.5374
10 3.865810 46.4807 58.5158 65.0220
11 3.904150 45.9716 58.1498 64.7121
12 3.942490 45.4626 57.8099 64.4022
13 4.939300 33.3040 48.9742 56.3451
14 4.977640 32.9641 48.6344 56.0352
15 5.936100 24.4682 36.4758 47.0485
16 5.936100 24.4682 36.4758 47.0485
17 6.012780 23.7885 35.9667 46.5002
18 6.971250 20.7387 29.6035 39.6476
19 7.009580 20.6167 29.3490 39.3930
20 8.006390 18.7209 22.7313 32.7753
21 8.121410 18.5022 22.3914 32.1292
22 9.041530 16.4722 19.6728 26.9604
23 9.079870 16.3877 19.5595 26.7450
I am able to do this one curve at a time using the below code. However, my full data set has over 1000 curves, so I would like to be able to repeat this code over every column somehow. I have not been at all successful trying to loop it over every column, so if anyone could show me how to do something like that and create a summary data frame similar to that generated by the below code, but with every column included, I would be extremely grateful. Thanks!
model <- lm(Curve1~x, dat) # Linear model
seg_model <- segmented(model, seg.Z = ~x) # Segmented model
breakpoint <- as.matrix(seg_model$psi.history[[5]]) # Extract breakpoint
coefficients <- as.matrix(seg_model$coefficients) # Extract coefficients
summary_curve1 <- as.data.frame(rbind(breakpoint, coefficients)) # combine breakpoint and coefficeints
colnames(summary_curve1) <- "Curve_1" # header name
summary_curve1 # display summary
Here's an approach using tidyverse and broom to return a data frame containing the results for each Curve column:
library(broom)
library(tidyverse)
model.results = setNames(names(dat[,-1]), names(dat[,-1])) %>%
map(~ lm(paste0(.x, " ~ x"), data=dat) %>%
segmented(seg.Z=~x) %>%
list(model=tidy(.),
psi=data.frame(term="breakpoint", estimate=.[["psi.history"]][[5]]))) %>%
map_df(~.[2:3] %>% bind_rows, .id="Curve")
model.results
Curve term estimate std.error statistic p.value
1 Curve1 (Intercept) 95.866127 0.14972382 640.286416 1.212599e-42
2 Curve1 x -12.691455 0.05220412 -243.112130 1.184191e-34
3 Curve1 U1.x 10.185816 0.11080880 91.922447 1.233602e-26
4 Curve1 psi1.x 0.000000 0.02821843 0.000000 1.000000e+00
5 Curve1 breakpoint 5.595706 NA NA NA
6 Curve2 (Intercept) 94.826309 0.45750667 207.267599 2.450058e-33
7 Curve2 x -9.489342 0.11156425 -85.057193 5.372730e-26
8 Curve2 U1.x 6.532312 1.17332640 5.567344 2.275438e-05
9 Curve2 psi1.x 0.000000 0.23845241 0.000000 1.000000e+00
10 Curve2 breakpoint 7.412087 NA NA NA
11 Curve3 (Intercept) 100.027990 0.29453941 339.608175 2.069087e-37
12 Curve3 x -8.931163 0.08154534 -109.523900 4.447569e-28
13 Curve3 U1.x 2.807215 0.36046013 7.787865 2.492325e-07
14 Curve3 psi1.x 0.000000 0.26319757 0.000000 1.000000e+00
15 Curve3 breakpoint 6.362132 NA NA NA
You can wrap the whole thing in a function, taking as the arguments the column name and the data, and use lapply on the column names, like this:
library(segmented)
run_mod <- function(varname, data){
data$Y <- data[,varname]
model <- lm(Y ~ x, data) # Linear model
seg_model <- segmented(model, seg.Z = ~x) # Segmented model
breakpoint <- as.matrix(seg_model$psi.history[[5]]) # Extract breakpoint
coefficients <- as.matrix(seg_model$coefficients) # Extract coefficients
summary_curve1 <- as.data.frame(rbind(breakpoint, coefficients))
colnames(summary_curve1) <- varname
return(summary_curve1)
}
lapply(names(dat)[2:ncol(dat)], function(x)run_mod(x, dat))
Which gives the summary for each fitted curve (not sure which output you actually want).
I had the same issue and I'm tryng to adapt the suggested answer, but it appears the following:
Error in model.frame.default(formula = Y ~ Prof, data = data, drop.unused.levels = TRUE) :
invalid type (list) for variable 'Y'
I run this code:
run_mod <- function(varname, data){
data$Y <- data[,varname]
model <- lm(Y ~ Prof, data) # Linear model
seg_model <- segmented(model, seg.Z = ~ Prof) # Segmented model
breakpoint <- as.matrix(seg_model$psi.history[[5]]) # Extract breakpoint
coefficients <- as.matrix(seg_model$coefficients) # Extract coefficients
summary_curve1 <- as.data.frame(rbind(breakpoint, coefficients))
colnames(summary_curve1) <- varname
return(summary_curve1)
}
lapply(names(DATApiv)[3:ncol(DATApiv)], function(Prof)run_mod(Prof, DATApiv))
NOTE: Prof = is the column in my DF the corresponds to independent variable as the x column of this example). DataPiv is my DB.
I have a 2-way repeated measures design (3 x 2), and I would like to get figures out how to calculate effect sizes (partial eta squared).
I have a matrix with data in it (called a) like so (repeated measures)
A.a A.b B.a B.b C.a C.b
1 514.0479 483.4246 541.1342 516.4149 595.5404 588.8000
2 569.0741 550.0809 569.7574 599.1509 621.4725 656.8136
3 738.2037 660.3058 812.2970 735.8543 767.0683 738.7920
4 627.1101 638.1338 641.2478 682.7028 694.3569 761.6241
5 599.3417 637.2846 599.4951 632.5684 626.4102 677.2634
6 655.1394 600.9598 729.3096 669.4189 728.8995 716.4605
idata =
Caps Lower
A a
A b
B a
B b
C a
C b
I know how to do a repeated measures ANOVA with the car package (type 3 SS is standard in my field although I know that it results in a logical error.. if somebody wants to explain that to me like I'm 5 I would love to understand it):
summary(Anova(lm(a ~ 1),
idata=idata,type=3,
idesign=~Caps*Lower)),
multivariate=FALSE)
I think what I want to do is take this part of the summary print out:
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
SS num Df Error SS den Df F Pr(>F)
(Intercept) 14920141 1 153687 5 485.4072 3.577e-06 ***
Caps 33782 2 8770 10 19.2589 0.000372 ***
Lower 195 1 13887 5 0.0703 0.801451
Caps:Lower 2481 2 907 10 13.6740 0.001376 **
And use it to calculate partial ETA squared. So, if I'm not mistaken, I need to take the SS from the first column and divide it by (itself + SS Error for that row) for each effect. Is this the correct way to go about it? If so, how do I do it? I can't figure out how to reference values from the summary print out.
The partial eta-squared can be calculated with the etasq function in heplots package
library(car)
mod <- Anova(lm(a ~ 1),
idata = idata,
type = 3,
idesign = ~Caps*Lower)
mod
library(heplots)
etasq(mod, anova = TRUE)
Since you are asking about the calculations:
From ?etasq: 'For univariate linear models, classical η^2 = SSH / SST and partial η^2 = SSH / (SSH + SSE). These are identical in one-way designs.'.
If you wish to inspect the code for the calculations of η^2 for a model with a class as in the example, you may use getS3method(f = "etasq", class = "Anova.mlm").