I'm trying to tune the SVM model for regression in R using package e1071 with the method used in this tutorial. Here is the data
> head(gps_rg5,16)
Weather sex age occupation income weekday weekend age group rg
1 6 2 57 3 1 7 1 3 0.035725277
2 6 2 32 1 5 6 1 2 1.693898548
3 1 2 63 3 1 4 0 4 0.009012839
4 6 2 65 3 2 6 1 4 0.014902879
5 6 2 57 3 2 7 1 3 0.045594146
6 6 2 76 3 1 4 0 5 0.003531616
7 6 1 65 3 2 4 0 4 0.001575542
8 4 2 57 3 3 6 1 3 0.009384690
9 4 2 52 3 2 6 1 3 0.033322905
10 4 2 56 3 2 6 1 3 0.011879944
11 4 2 56 3 2 7 1 3 0.008266786
12 4 1 63 3 2 6 1 4 3.055594036
13 1 2 42 1 2 1 0 2 0.029010174
14 4 2 42 1 2 6 1 2 0.000933115
15 1 2 66 3 2 5 0 4 2.342416927
16 6 1 79 3 2 4 0 5 2.891190912
And this is the code for tuning:
svr1<-tune(svm,rg~.,data=train,ranges=list(cost=2^(2:9),epsilon=seq(0.01,10,0.1)))
And the code returns an error saying
Error in predict.svm(ret, xhold, decision.values = TRUE) :
Model is empty!
This is the structure of the training dataset:
Any answers would be appreciated!!!
Many thanks!!
Related
I'm trying to create a mixed effects model with lmer. The SubPlot should be nested with Plot ja Treatment should be nested with Subplot. So there's 3 to 7 Treatment in SubPlots ja always 3 SubPlot in a Plot. I created a following model:
model <- lmer(Depth ~ Mass + (1|Plot:SubPlot:Treatment), data=mydata)
But this gives me an error:
Error: number of levels of each grouping factor must be < number of observations (problems: Plot:SubPlot:Treatment)
'data.frame': 147 obs. of 6 variables:
$ Plot : int 1 1 1 1 1 1 1 1 1 1 ...
$ SubPlot : int 1 1 1 1 1 1 1 2 2 2 ...
$ Treatment : int 1 2 3 4 5 6 7 1 2 3 ...
$ Depth : num 0 4 4.5 5.5 6 6 6 3 4.5 6.5 ...
$ Mass : int 21 50 78 103 128 147 172 21 49 77 ...
Here's some data:
Plot SubPlot Treatment Depth Mass
1 1 1 0 21
1 1 2 4 50
1 1 3 4.5 78
1 1 4 5.5 103
1 1 5 6 128
1 1 6 6 147
1 1 7 6 172
1 2 1 3 21
1 2 2 4.5 49
1 2 3 6.5 77
1 2 4 7 102
1 2 5 8 127
1 2 6 9 146
1 2 7 10.5 171
1 3 1 3 21
1 3 2 1.5 49
1 3 3 1.5 77
1 3 4 1.5 102
1 3 5 1.5 127
1 3 6 1.5 146
1 3 7 1.5 171
2 1 1 3 21
2 1 2 5 50
2 1 3 5 78
2 1 4 7 103
2 1 5 9 128
2 1 6 9.5 146
2 1 7 10 171
2 2 1 1.5 21
2 2 2 4 50
2 2 3 5 78
2 2 4 9 103
2 2 5 10 128
2 2 6 10.5 146
2 2 7 10.5 171
2 3 1 0 21
2 3 2 0 50
2 3 3 0 78
2 3 4 0 103
2 3 5 0 128
2 3 6 0 146
2 3 7 0 171
Any ideas how to proceed?
I am new to Principal Components Analysis in R. I'm currently working my way through a PCA with my own data, following this example: PCA by Coreysparks
I'm stuck at the section which requires me to create a survey design object. This is my code:
options(survey.lonely.psu = "adjust")
scwb<-scwb[complete.cases(scwb),]
des<-svydesign(ids=~psu, strata=~ststr, weights=~cntywt, data=scwb , nest=T)
I then receive the following message -
Error in eval(predvars, data, env) : object 'psu' not found
I tried substituting psu with 1 , since I'm pretty sure I'm working with Simple Random Sampling. But then the error message points out object 'ststr' not found and I assume it's the same for cntywt.
I've been working on this for a couple of hours and I simply can't figure out (even after quite some research) how to properly fill out this function so that I can continue with the PCA.
Any suggestions on how to approach this problem?
EDIT: Here is a sample of my data:
IDN
Jobsat_1 Jobsat_2 Jobsat_3 Member_1 Member_2 Member_4 LingInt Belong_2 Belong_3 Grpor_14 Ethnic10 MEDIAb Trust_G1 Trust_G2
1 10 10 8 3 1 1 5 10 10 2 1 2 1 3
2 7 7 5 4 1 1 5 10 10 4 4 2 3 3
3 7 7 7 1 1 1 5 9 10 4 3 2 1 1
4 7 7 7 2 1 1 5 10 10 3 3 2 3 3
5 7 8 8 3 3 3 5 10 8 3 4 2 3 3
6 7 7 7 2 1 2 5 10 8 4 3 2 5 3
Grpor_3 Grpor_4 Grpor_12 Emp_1 Volas_10 Mstatus1 Child_1 Health_1 YRBIRTH Rgender Brthcoun YR_IMM Relig_2 Educ_R INCOM_14
1 1 5 2 3 15 0 0 2 31 1 1 100 3 4 73
2 7 6 9 2 15 1 5 1 40 2 1 100 7 8 105
3 5 5 7 1 10 1 3 3 60 1 1 100 4 3 40
4 8 8 8 3 4 1 4 2 63 1 1 100 2 3 30
5 5 4 8 2 20 1 0 2 37 2 1 100 3 8 60
6 7 7 10 2 20 1 1 2 37 2 1 100 6 8 73
Emp_1.1 pc1 pc2 pc3 pc4 pc5 pc6 pc7 pc8 pc9 pc10
1 3 1.9148465 4.9846483 -0.8072519 -3.3183158 -3.32593626 -0.8836492 0.4113892 1.26298570 -0.32539801 1.4829219
2 2 -0.8358479 -0.3853229 -2.4203323 0.2514915 -0.08298205 -0.8393284 0.2384652 -0.04452874 -0.76024480 3.0219680
3 1 0.5800140 1.3872784 -3.8136358 -1.4466660 0.49100339 1.0119573 0.3198218 -0.27307314 -1.46013793 -0.7989148
4 3 -1.0482772 0.4416652 -2.7239398 -2.0870089 -1.23500720 -1.6997003 0.2737917 0.76019756 -1.41174450 0.2476696
5 2 0.1772093 0.7945494 0.3299359 0.7466387 -0.37083214 -0.2998434 0.4494944 -0.04000732 -0.09798538 -0.1694823
6 2 -1.5970861 -0.3793862 -2.0577713 -1.2681369 0.02748217 -0.8220763 0.1409177 -0.15252729 -1.15142836 -0.2362450
pc11 pc12 pc13
1 -0.06980251 0.6393237 -0.66798274
2 -0.39774429 -0.1281088 -0.06633185
3 0.33557364 0.4008058 -1.13155741
4 0.07448897 -0.5383089 -0.02582100
5 -0.04507346 -0.1805617 1.42605349
6 -0.06659864 1.1336526 0.04854108
How can I compare values within a variable dependent on another variable with dplyr?
The df is based on choice data (long format) from a survey. It has one variable that indicates a participants id, another that indicates the choice instance and one that indicates which alternative was chosen.
In my data I have the feeling that a lot of people tend to get bored of the task and therefore stick to one alternative for every instance. I would therefore like to identify people who always selected the same option from a certain instance onwards till the end.
Here is an example df:
set.seed(0)
df <- tibble(
id = rep(1:5,each=12),
inst = rep(1:12,5),
alt = sample(1:3, size =60, replace=T),
)
That looks like the following:
id inst alt
1 1 1 3
2 1 2 1
3 1 3 2
4 1 4 2
5 1 5 3
6 1 6 1
7 1 7 3
8 1 8 3
9 1 9 2
10 1 10 2
11 1 11 1 <-
12 1 12 1 <-
13 2 1 1
14 2 2 3
...
I would like to create two new variables count and count_alt. The new variable count should indicate how often the same value appeared in alt based on id and inst, only counting values from the end of id. So for participant (id==1) the count variable should be 2, since alternative 1 was chosen in the last two instances (11 & 12). The count_alt would take the value 1 (always the same as inst == 12)
The new df schould look like the following
id inst alt count count_alt
1 1 1 3 2 1
2 1 2 1 2 1
3 1 3 2 2 1
4 1 4 2 2 1
5 1 5 3 2 1
6 1 6 1 2 1
7 1 7 3 2 1
8 1 8 3 2 1
9 1 9 2 2 1
10 1 10 2 2 1
11 1 11 1 2 1
12 1 12 1 2 1
...
I would prefer to solve this with dplyr and not with a loop since I want to incooperate it into further data wrangling steps.
See if that solves it:
library(dplyr)
df %>%
group_by(id) %>%
mutate(
count = cumsum(alt != lag(alt, default = "rndm")),
count = sum(count == max(count)),
count_alt = alt[n()]
)
Output:
id inst alt count count_alt
1 1 1 3 2 1
2 1 2 1 2 1
3 1 3 2 2 1
4 1 4 2 2 1
5 1 5 3 2 1
6 1 6 1 2 1
7 1 7 3 2 1
8 1 8 3 2 1
9 1 9 2 2 1
10 1 10 2 2 1
11 1 11 1 2 1
12 1 12 1 2 1
13 2 1 1 1 2
14 2 2 3 1 2
15 2 3 2 1 2
16 2 4 3 1 2
17 2 5 2 1 2
18 2 6 3 1 2
19 2 7 3 1 2
20 2 8 2 1 2
21 2 9 3 1 2
22 2 10 3 1 2
23 2 11 1 1 2
24 2 12 2 1 2
25 3 1 1 1 3
26 3 2 1 1 3
27 3 3 2 1 3
28 3 4 1 1 3
29 3 5 2 1 3
30 3 6 3 1 3
31 3 7 2 1 3
32 3 8 2 1 3
33 3 9 2 1 3
34 3 10 2 1 3
35 3 11 1 1 3
36 3 12 3 1 3
37 4 1 3 1 1
38 4 2 3 1 1
39 4 3 1 1 1
40 4 4 3 1 1
41 4 5 2 1 1
42 4 6 3 1 1
43 4 7 2 1 1
44 4 8 3 1 1
45 4 9 2 1 1
46 4 10 2 1 1
47 4 11 3 1 1
48 4 12 1 1 1
49 5 1 2 2 2
50 5 2 3 2 2
51 5 3 3 2 2
52 5 4 2 2 2
53 5 5 3 2 2
54 5 6 2 2 2
55 5 7 1 2 2
56 5 8 1 2 2
57 5 9 1 2 2
58 5 10 1 2 2
59 5 11 2 2 2
60 5 12 2 2 2
I designed a CE Experiment using the package support.CEs. I generated a CE Design with 3 attributes an 4 levels per attribute. The questionnaire had 4 alternatives and 4 blocks
des1 <- rotation.design(attribute.names = list(
Qualitat = c("Aigua potable", "Cosetes.blanques.flotant", "Aigua.pou", "Aigua.marro"),
Disponibilitat.acces = c("Aixeta.24h", "Aixeta.10h", "Diposit.comunitari", "Pou.a.20"),
Preu = c("No.problemes.€", "Esforç.economic", "No.pagues.acces", "No.pagues.no.acces")),
nalternatives = 4, nblocks = 4, row.renames = FALSE,
randomize = TRUE, seed = 987)
The questionnaire was replied by 15 persons (ID 1-15), so 60 outputs (15 persons responding per 4 blocks:
ID BLOCK q1 q2 q3 q4
1 1 1 1 2 3 3
2 1 2 1 3 3 4
3 1 3 5 1 3 5
4 1 4 5 2 2 5
5 2 1 1 2 4 3
6 2 2 1 4 3 4
7 2 3 3 1 3 2
8 2 4 1 2 2 2
9 3 1 1 2 2 2
10 3 2 1 4 3 4
11 3 3 3 1 3 4
12 3 4 3 2 1 4
13 4 1 1 5 4 3
14 4 2 1 4 5 4
15 4 3 5 5 3 2
16 4 4 5 2 5 5
17 5 1 1 2 4 2
18 5 2 3 2 3 2
19 5 3 3 1 3 4
20 5 4 3 2 1 4
21 6 1 1 5 5 5
22 6 2 1 3 3 4
23 6 3 3 1 3 4
24 6 4 1 2 2 2
25 7 1 1 2 4 3
26 7 2 4 2 3 4
27 7 3 3 1 3 3
28 7 4 3 4 5 5
29 8 1 1 3 2 3
30 8 2 1 4 3 4
31 8 3 3 1 3 4
32 8 4 1 2 2 1
33 9 1 1 2 3 3
34 9 2 1 3 3 4
35 9 3 5 1 3 5
36 9 4 5 2 2 5
37 15 1 1 5 5 5
38 15 2 4 4 5 4
39 15 3 5 5 3 5
40 15 4 4 3 5 5
41 11 1 1 5 5 5
42 11 2 4 4 5 4
43 11 3 5 5 3 5
44 11 4 5 3 5 5
45 12 1 1 2 4 3
46 12 2 4 2 3 4
47 12 3 3 1 3 3
48 12 4 3 4 5 5
49 13 1 1 2 2 2
50 13 2 1 4 3 4
51 13 3 3 1 3 2
52 13 4 1 2 2 2
53 14 1 1 1 3 3
54 14 2 1 4 1 4
55 14 3 4 1 3 2
56 14 4 3 2 1 2
57 15 1 1 1 3 2
58 15 2 5 2 1 4
59 15 3 4 4 3 1
60 15 4 3 4 1 4
The probles is that, when i merge the questions and answers matrix with the formula
dataset1 <- make.dataset(respondent.dataset = res1,
choice.indicators = c("q1","q2","q3","q4"),
design.matrix = desmat1)
R shows a warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, :
Ran out of iterations and did not converge
I should expect that the matrix desmat1 generated had 4800 observations (80 possible combinations and 60 outputs). Instead of that i have only 1200 obseravations. The matrix dataset1 only shows the combination of 1 set of alternatives instead of the 4.
For example, for ID 1, Block 1, Question 1 only appears alternative 1. It match with the answer selected by the person, but in other cases it does not match, and that information is lost in R, so the results when clogit is applied are wrong.
I do hope thay the problems is understood.
Regards,
Edition:
I found my problem. When i make the dataset from the respondent.dataset that i generated in .csv format, r detects only the q1 response instead of q1-q4. dataset1
dataset1 <- make.dataset(respondent.dataset = res1,
choice.indicators = c("q1","q2","q3","q4"),
design.matrix = desmat1)
detects q1-q4 as new columns. But the key is that q1-q4 has to fill the columns QES in dataset1. I did another CE before with 1 block and the dataset was correctly done one reading the respondant.dataset. So the key point is that now i'm using 4 blocks but i do not know how to make R to interprete that q1-q4 are the columns QUES for each block.
res1 matrix (repondant.dataset) (Complete matriz has 60 rows = 15 respondants (ID 1-15) * 4 Questions (QES column in make.dataset)
Kind reagards,
I a trying to apply SVM on my data in order to predict future data.
So I have faced the following error:
All arguments must be the same length
> svmmodele1<-svm(data$note ~ AppCache+TCP+DNS,data=data,scale = FALSE,kernel="linear",cost= 0.08,gamma=0.06)
> svm.video.pred1<-predict(svmmodele1,data)
> svm.video.pred1
1 3 4 5 6 7 10 11 12 13 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Levels: 1 2 3 4 5
> svm.video.table1<-table(pred=svm.video.pred1, true= data$note)
Error in table(pred = svm.video.pred1, true = data$note) :
All arguments must be the same length
data$note
[1] 2 2 2 3 3 3 2 2 2 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 4 4 4 3 3 3 4 4 4 4 4 4 5 5
[39] 5 5 5 5 5 5 5 3 3 3 1 1 1 1 1 1
Levels: 1 2 3 4 5
For who are stuck on the same problem, the reason of that error is that I have some negative variable.