Mixed effects model, lmer - r

I'm trying to create a mixed effects model with lmer. The SubPlot should be nested with Plot ja Treatment should be nested with Subplot. So there's 3 to 7 Treatment in SubPlots ja always 3 SubPlot in a Plot. I created a following model:
model <- lmer(Depth ~ Mass + (1|Plot:SubPlot:Treatment), data=mydata)
But this gives me an error:
Error: number of levels of each grouping factor must be < number of observations (problems: Plot:SubPlot:Treatment)
'data.frame': 147 obs. of 6 variables:
$ Plot : int 1 1 1 1 1 1 1 1 1 1 ...
$ SubPlot : int 1 1 1 1 1 1 1 2 2 2 ...
$ Treatment : int 1 2 3 4 5 6 7 1 2 3 ...
$ Depth : num 0 4 4.5 5.5 6 6 6 3 4.5 6.5 ...
$ Mass : int 21 50 78 103 128 147 172 21 49 77 ...
Here's some data:
Plot SubPlot Treatment Depth Mass
1 1 1 0 21
1 1 2 4 50
1 1 3 4.5 78
1 1 4 5.5 103
1 1 5 6 128
1 1 6 6 147
1 1 7 6 172
1 2 1 3 21
1 2 2 4.5 49
1 2 3 6.5 77
1 2 4 7 102
1 2 5 8 127
1 2 6 9 146
1 2 7 10.5 171
1 3 1 3 21
1 3 2 1.5 49
1 3 3 1.5 77
1 3 4 1.5 102
1 3 5 1.5 127
1 3 6 1.5 146
1 3 7 1.5 171
2 1 1 3 21
2 1 2 5 50
2 1 3 5 78
2 1 4 7 103
2 1 5 9 128
2 1 6 9.5 146
2 1 7 10 171
2 2 1 1.5 21
2 2 2 4 50
2 2 3 5 78
2 2 4 9 103
2 2 5 10 128
2 2 6 10.5 146
2 2 7 10.5 171
2 3 1 0 21
2 3 2 0 50
2 3 3 0 78
2 3 4 0 103
2 3 5 0 128
2 3 6 0 146
2 3 7 0 171
Any ideas how to proceed?

Related

How to get p values for odds ratios from an ordinal regression in r

I am trying to get the p values for my odds ratio from an ordinal regression using r.
I previously constructed my p values on the log odds like this
scm <- polr(finaloutcome ~ Size_no + Hegemony + Committee, data = data3, Hess = TRUE)
(ctable <- coef(summary(scm)))
Calculate and store p value
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
## combined table
(ctable <- cbind(ctable, "p value" = p))
I created by odds ratios like this:
ci <- confint.default(scm)
exp(coef(scm))
## OR and CI
exp(cbind(OR = coef(scm), ci))
However, I am now unsure how to create the p values for the odds ratio. Using the previous method I got:
(ctable1 <- exp(coef(scm)))
p1 <- pnorm(abs(ctable1[, "t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable, "p value" = p1))
However i get the error: Error in ctable1[, "t value"] : incorrect number of dimensions
Odds ratio output sample:
Size
Hegem
Committee
9.992240e-01
6.957805e-02
1.204437e-01
Data sample:
finaloutcome
Size_no
Committee
Hegemony
1
3
54
2
0
2
2
127
3
0
3
2
127
3
0
4
2
22
1
1
5
2
193
4
1
6
2
54
2
0
7
NA
11
1
1
8
3
54
2
0
9
3
22
1
1
10
2
53
3
1
11
2
53
3
1
12
2
53
3
1
13
2
53
3
1
14
2
53
3
1
15
2
53
3
1
16
2
120
3
0
17
2
120
3
0
18
1
22
1
1
19
1
22
1
1
20
2
193
4
1
21
2
193
4
1
22
2
193
4
1
23
2
12
4
1
24
2
35
1
1
25
1
193
4
1
26
1
164
4
1
27
1
12
4
1
28
2
12
4
1
29
2
193
4
1
30
2
54
2
0
31
2
193
4
1
32
2
193
4
1
33
2
54
2
0
34
2
12
4
1
35
2
22
1
1
36
4
53
3
1
37
2
35
1
1
38
1
193
4
1
39
5
54
2
0
40
7
164
4
1
41
5
54
2
0
42
1
12
4
1
43
7
193
4
1
44
2
193
4
1
45
2
193
4
1
46
2
193
4
1
47
2
193
4
1
48
2
193
4
1
49
2
12
4
1
50
2
22
1
1
51
2
12
4
1
52
2
12
4
1
53
6
13
1
1
54
6
13
1
1
55
6
13
1
1
56
6
12
4
1
57
2
193
4
1
58
3
12
4
1
59
1
12
4
1
60
1
12
4
1
61
8
35
1
1
62
2
193
4
1
63
8
35
1
1
64
6
30
2
1
65
8
12
4
1
66
4
12
4
1
67
5
30
2
1
68
5
54
2
0
69
7
12
4
1
70
5
12
4
1
71
5
54
2
0
72
5
193
4
1
73
5
193
4
1
74
5
54
2
0
75
5
54
2
0
76
1
11
1
1
77
3
22
1
1
78
3
12
4
1
79
6
12
4
1
80
2
22
1
1
81
8
193
4
1
82
8
193
4
1
83
4
193
4
1
84
2
193
4
1
85
2
193
4
1
86
2
193
4
1
87
2
193
4
1
88
2
193
4
1
89
2
193
4
1
90
2
193
4
1
91
2
193
4
1
92
2
193
4
1
93
8
193
4
1
94
6
12
4
1
95
5
12
4
1
96
5
12
4
1
97
5
12
4
1
98
5
12
4
1
99
5
12
4
1
100
5
12
4
1
I usually use lm or glm to create my model (mdl <- lm(…) or mdl <- glm(…)). Then I use summary on the object to see these values. More than this, you can use the Yardstick and Broom. I recommend the book R for Data Science. There is a great explanation about modeling and using the Tidymodels packages.
I went through the same difficulty.
I finally used the fonction tidy from the broom package: https://broom.tidymodels.org/reference/tidy.polr.html
library(broom)
tidy(scm, p.values = TRUE)
This does not yet work if you have categorical variables with more than two levels, or missing values.

How can I find the average number of entries for a column in a set of long data

I am new to R and I surprisingly couldn't find an answer to this using the search function. Assuming I have a set of data as follows:
Plot Rate Rep Plant Tuber Weight
1 101 1 1 1 1 179.4
2 101 1 1 1 2 99.4
3 101 1 1 1 3 72.4
4 101 1 1 1 4 111.5
5 101 1 1 1 5 44.9
6 101 1 1 1 6 55.3
7 101 1 1 1 7 12.6
8 101 1 1 1 8 106.7
9 101 1 1 1 9 96.7
10 101 1 1 1 10 52.5
11 101 1 1 2 1 151.1
12 101 1 1 2 2 171.7
13 101 1 1 2 3 93.0
14 101 1 1 2 4 82.4
15 101 1 1 2 5 143.9
16 101 1 1 2 6 115.6
17 101 1 1 2 7 141.3
18 101 1 1 2 8 72.6
19 101 1 1 2 9 97.2
20 101 1 1 2 10 146.8
21 101 1 1 2 11 104.0
22 101 1 1 2 12 121.6
23 101 1 1 3 1 150.9
24 101 1 1 3 2 47.1
25 101 1 1 3 3 59.6
26 101 1 1 3 4 94.2
27 101 1 1 3 5 167.4
28 101 1 1 3 6 55.2
29 101 1 1 3 7 21.8
30 101 1 1 3 8 79.6
31 101 1 1 3 9 92.2
32 101 1 1 3 10 78.0
33 101 1 1 3 11 61.8
34 101 1 1 3 12 9.5
35 101 1 1 3 13 2.7
36 101 1 1 3 14 3.8
37 101 1 1 3 15 1.1
38 106 1 2 1 1 50.7
39 106 1 2 1 2 148.8
40 106 1 2 1 3 50.6
41 106 1 2 1 4 129.6
42 106 1 2 1 5 69.7
43 106 1 2 1 6 83.4
44 106 1 2 1 7 49.1
45 106 1 2 1 8 100.4
46 106 1 2 1 9 33.0
47 106 1 2 1 10 0.8
Here, there is a weight entry for each tuber collected from treatment combinations of Rate, Rep, and Plant.
How can I find the overall average number of tubers found in the Rate/Rep/Plant combos? For example, there are 10 tubers in 1/1/1 and 12 tubers in 1/1/2. I am looking for the average number of tubers found in a plant. The way that the tubers are expressed one at a time in a column makes this difficult for me. Any help would be hugely appreciated. Thanks in advance.
I'm going to add a little to what #akrun said here. You can use dplyr::group_by, and then find the number of tubers in a plant by taking the maximum value of Tubers within each Rate/Rep/Plant group. Then finding the average number of tubers per plant is easy:
df <- data.table::fread(
"Row Plot Rate Rep Plant Tuber Weight
1 101 1 1 1 1 179.4
2 101 1 1 1 2 99.4
3 101 1 1 1 3 72.4
4 101 1 1 1 4 111.5
5 101 1 1 1 5 44.9
6 101 1 1 1 6 55.3
7 101 1 1 1 7 12.6
8 101 1 1 1 8 106.7
9 101 1 1 1 9 96.7
10 101 1 1 1 10 52.5
11 101 1 1 2 1 151.1
12 101 1 1 2 2 171.7
13 101 1 1 2 3 93.0
14 101 1 1 2 4 82.4
15 101 1 1 2 5 143.9
16 101 1 1 2 6 115.6
17 101 1 1 2 7 141.3
18 101 1 1 2 8 72.6
19 101 1 1 2 9 97.2
20 101 1 1 2 10 146.8
21 101 1 1 2 11 104.0
22 101 1 1 2 12 121.6
23 101 1 1 3 1 150.9
24 101 1 1 3 2 47.1
25 101 1 1 3 3 59.6
26 101 1 1 3 4 94.2
27 101 1 1 3 5 167.4
28 101 1 1 3 6 55.2
29 101 1 1 3 7 21.8
30 101 1 1 3 8 79.6
31 101 1 1 3 9 92.2
32 101 1 1 3 10 78.0
33 101 1 1 3 11 61.8
34 101 1 1 3 12 9.5
35 101 1 1 3 13 2.7
36 101 1 1 3 14 3.8
37 101 1 1 3 15 1.1
38 106 1 2 1 1 50.7
39 106 1 2 1 2 148.8
40 106 1 2 1 3 50.6
41 106 1 2 1 4 129.6
42 106 1 2 1 5 69.7
43 106 1 2 1 6 83.4
44 106 1 2 1 7 49.1
45 106 1 2 1 8 100.4
46 106 1 2 1 9 33.0
47 106 1 2 1 10 0.8"
)
library(tidyverse)
tubers_per_plant <- df %>%
group_by(Rate,Rep,Plant) %>%
summarize(num_Tubers = max(Tuber))
tubers_per_plant
# A tibble: 4 × 4
# Groups: Rate, Rep [2]
Rate Rep Plant num_Tubers
<int> <int> <int> <int>
1 1 1 1 10
2 1 1 2 12
3 1 1 3 15
4 1 2 1 10
mean(tubers_per_plant$num_Tubers)
[1] 11.75

How to create a matrix in simple correspondence analysis?

I am trying to create a matrix in order to apply a simple correspondence analysis on it; I have 2 categorical variables: exp and conexinternet with 3 levels each.
obs conexinternet exp
1 1 2
2 1 1
3 2 2
4 1 1
5 1 1
6 2 1
7 1 2
8 1 2
9 1 2
10 2 1
11 1 1
12 2 1
13 2 2
14 2 1
15 1 1
16 2 2
17 1 1
18 2 2
19 2 2
20 2 2
21 2 2
22 1 1
23 2 3
24 1 1
25 2 1
26 2 1
27 1 1
28 2 2
29 2 1
30 1 2
31 1 2
32 2 3
33 2 1
34 2 1
35 2 1
36 3 2
37 2 1
38 3 2
39 2 3
40 2 3
41 2 2
42 2 3
43 2 2
44 2 2
45 2 1
46 2 2
47 2 3
48 1 3
49 2 3
50 3 2
51 2 2
52 2 2
53 2 1
54 1 2
55 1 1
56 2 3
57 3 2
58 3 1
59 3 1
60 1 2
61 2 3
62 2 2
63 3 1
64 3 2
65 3 2
66 1 2
67 3 2
68 3 2
69 3 3
70 2 1
71 3 3
72 3 2
73 3 2
74 3 2
75 3 1
76 3 2
77 3 1
I want to make a vector to categorize the observations as 11, 12, 13, 21, 22, 23, 31, 32, 33, how can I do it?
Is this what you want?
d <- read.table(text="obs conexinternet exp
1 1 2
...
77 3 1", header=T)
(tab <- xtabs(~conexinternet+exp, d))
# exp
# conexinternet 1 2 3
# 1 10 9 1
# 2 14 15 9
# 3 5 12 2

Error for tune with package e1071

I'm trying to tune the SVM model for regression in R using package e1071 with the method used in this tutorial. Here is the data
> head(gps_rg5,16)
Weather sex age occupation income weekday weekend age group rg
1 6 2 57 3 1 7 1 3 0.035725277
2 6 2 32 1 5 6 1 2 1.693898548
3 1 2 63 3 1 4 0 4 0.009012839
4 6 2 65 3 2 6 1 4 0.014902879
5 6 2 57 3 2 7 1 3 0.045594146
6 6 2 76 3 1 4 0 5 0.003531616
7 6 1 65 3 2 4 0 4 0.001575542
8 4 2 57 3 3 6 1 3 0.009384690
9 4 2 52 3 2 6 1 3 0.033322905
10 4 2 56 3 2 6 1 3 0.011879944
11 4 2 56 3 2 7 1 3 0.008266786
12 4 1 63 3 2 6 1 4 3.055594036
13 1 2 42 1 2 1 0 2 0.029010174
14 4 2 42 1 2 6 1 2 0.000933115
15 1 2 66 3 2 5 0 4 2.342416927
16 6 1 79 3 2 4 0 5 2.891190912
And this is the code for tuning:
svr1<-tune(svm,rg~.,data=train,ranges=list(cost=2^(2:9),epsilon=seq(0.01,10,0.1)))
And the code returns an error saying
Error in predict.svm(ret, xhold, decision.values = TRUE) :
Model is empty!
This is the structure of the training dataset:
Any answers would be appreciated!!!
Many thanks!!

Replace a dot "." with NA in a dataframe in R

I have the following data frame:
obs zip age bed bath size lot exter garage fp price
1 1 1 3 21 3 3.0 951 64904 other 0 0 30000
2 2 2 3 21 3 2.0 1036 217800 frame 0 0 39900
3 3 3 4 7 1 1.0 676 54450 other 2 0 46500
4 4 4 3 6 3 2.0 1456 51836 other 0 1 48600
5 5 5 1 51 3 1.0 1186 10857 other 1 0 51500
6 6 6 2 19 3 2.0 1456 40075 frame 0 0 56990
7 7 7 3 8 3 2.0 1368 . frame 0 0 59900
8 8 8 4 27 3 1.0 994 11016 frame 1 0 62500
9 9 9 1 51 2 1.0 1176 6259 frame 1 1 65500
10 10 10 3 1 3 2.0 1216 11348 other 0 0 69000
11 11 11 4 32 3 2.0 1410 25450 brick 0 0 76900
12 12 12 3 2 3 2.0 1344 . other 0 1 79000
13 13 13 3 25 2 2.0 1064 218671 other 0 0 79900
14 14 14 1 31 3 1.5 1770 19602 brick 0 1 79950
15 15 15 4 29 3 2.0 1524 12720 brick 2 1 82900
16 16 16 3 16 3 2.0 1750 130680 frame 0 0 84900
17 17 17 3 20 3 2.0 1152 104544 other 2 0 85000
18 18 18 3 18 4 2.0 1770 10640 other 0 0 87900
19 19 19 4 28 3 2.0 1624 12700 brick 2 1 89900
20 20 20 2 27 3 2.0 1540 5679 brick 2 1 89900
with the following structure:
str(df)
'data.frame': 69 obs. of 12 variables:
$ Obs : int 1 2 3 4 5 6 7 8 9 10 ...
$ obs : int 1 2 3 4 5 6 7 8 9 10 ...
$ zip : int 3 3 4 3 1 2 3 4 1 3 ...
$ age : int 21 21 7 6 51 19 8 27 51 1 ...
$ bed : int 3 3 1 3 3 3 3 3 2 3 ...
$ bath : num 3 2 1 2 1 2 2 1 1 2 ...
$ size : Factor w/ 66 levels ".","1036","1064",..: 65 2 64 14 6 14 10 66 5 7 ...
$ lot : Factor w/ 60 levels ".","10295","10400",..: 47 28 43 39 9 35 1 11 46 13 ...
$ exter : Factor w/ 3 levels "brick","frame",..: 3 2 3 3 3 2 2 2 2 3 ...
$ garage: int 0 0 2 0 1 0 0 1 1 0 ...
$ fp : int 0 0 0 1 0 0 0 0 1 0 ...
$ price : int 30000 39900 46500 48600 51500 56990 59900 62500 65500 69000 ...
As you can be seen the "lot" variable appears as a factor. I have the following questions about this data:
Why does R read this variable "lot" as a factor?
When I tried:
df$lot[df$lot == "."] <- NA all dots (.) were replaced with <NA> and not as NA as I wanted.
I then tried df$lot <- as.numeric(df$lot) but the numerical values of this variable have changed completely, with the (.) being replaced by 1. What happened when I changed the variable's type?
How may I replace all dots (.) with NA?

Resources