All arguments must be the same length using svm - r

I a trying to apply SVM on my data in order to predict future data.
So I have faced the following error:
All arguments must be the same length
> svmmodele1<-svm(data$note ~ AppCache+TCP+DNS,data=data,scale = FALSE,kernel="linear",cost= 0.08,gamma=0.06)
> svm.video.pred1<-predict(svmmodele1,data)
> svm.video.pred1
1 3 4 5 6 7 10 11 12 13 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Levels: 1 2 3 4 5
> svm.video.table1<-table(pred=svm.video.pred1, true= data$note)
Error in table(pred = svm.video.pred1, true = data$note) :
All arguments must be the same length
data$note
[1] 2 2 2 3 3 3 2 2 2 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 4 4 4 3 3 3 4 4 4 4 4 4 5 5
[39] 5 5 5 5 5 5 5 3 3 3 1 1 1 1 1 1
Levels: 1 2 3 4 5

For who are stuck on the same problem, the reason of that error is that I have some negative variable.

Related

Imputation with categorical variables with mix package in R

I'm trying to impute missing variables in a data set that contains categorical variables (7-point Likert scales) using the mix package in R. Here is what I'm doing:
1. Loading the data:
data <- read.csv("test.csv", header=TRUE, row.names="ID")
2. Here's what the data looks like:
The first column is my ID column, the next three columns are categorical variables (7-point Likert scales - these are the ones where I am interested in imputing the missing values). Then I have three auxiliary variables: aux_cat is another categorical variable (unordered ranging from 1 to 9, no missing data), aux_one is an integer (no missing data), aux_two is numerical (contains missing data).
var_one var_two var_three aux_cat aux_one aux_two
1 2 1 2 6 26 0.0
2 3 2 3 7 45 32906.5
3 6 2 3 3 31 1237.5
4 7 NA NA 8 11 277.0
5 4 3 1 5 145 78201.0
6 NA NA NA 6 30 48550.0
7 7 6 3 3 48 11568.0
8 6 6 4 2 15 4482.0
9 7 6 5 5 61 NA
10 5 6 7 3 2 NA
11 5 6 5 3 11 78663.0
12 6 2 2 3 16 1235.0
13 7 2 5 3 13 5781.0
14 6 5 4 6 16 5062.0
15 5 5 3 3 43 400.0
16 7 7 5 2 114 7968.0
17 6 5 4 3 99 247.5
18 7 7 7 6 114 1877.0
19 5 5 4 5 3 5881.5
20 4 4 2 3 65 1786.0
21 4 3 6 5 9 14117.5
22 3 3 2 3 35 2093.0
23 3 4 4 5 62 23071.5
24 5 3 5 3 22 2707.5
25 3 1 2 6 128 942.0
26 5 3 6 4 57 101379.0
27 5 5 4 6 76 1398.0
28 1 3 4 3 17 1024.5
29 4 3 2 1 143 10657.0
30 7 1 4 8 14 167.5
31 7 3 7 3 22 4344.0
32 3 3 3 6 27 1582.0
33 7 1 3 2 29 66.5
34 5 5 4 2 108 513.5
35 7 6 6 7 24 936.5
36 4 5 4 7 40 5950.5
37 NA NA NA 8 15 99.5
38 2 2 2 6 21 123.5
39 6 4 5 2 61 477.5
40 6 5 5 2 16 28921.0
41 6 2 2 2 11 1063.5
42 6 2 5 3 116 97798.5
43 4 4 2 8 11 9159.5
44 6 6 6 6 4 1098.5
45 6 4 5 7 21 236.5
46 4 6 4 5 43 219.5
47 3 2 3 3 28 85.5
48 5 5 5 2 71 13483.5
49 5 5 6 8 98 18400.0
50 5 6 6 3 27 357.0
51 5 7 6 7 14 145.5
52 4 5 5 3 93 427.5
53 3 4 5 2 40 412.0
54 6 6 3 2 8 2418.0
55 5 6 5 5 8 4923.5
56 4 5 2 7 32 4135.0
57 7 7 2 6 83 1408.5
58 7 2 3 2 12 5595.0
59 7 2 1 2 32 2280.5
60 7 4 5 3 11 638.5
61 7 5 3 3 24 225.5
62 4 3 3 9 44 570.0
3. Performing preliminary manipulations
I try to run prelim.mix(x, p) where x is the data matrix containing missing values and p is the number of categorical variables in x. The categorical variables must be in the first p columns of x, and they must be coded with consecutive positive integers starting with 1. For example, a binary variable must be coded as 1,2 rather than 0,1.
In my case p should be 4 since I have three Likert-scale variables where I want imputed values and one other categorical variable among my auxiliary variables.
s <- prelim.mix(data,4)
This step seems to work fine.
4. Finding the maximum likelihood (ML) estimate:
thetahat <- em.mix(s)
This is where I encounter the following error:
Steps of EM:
1...2...3...Error in em.mix(s) : NA/NaN/Inf in foreign function call (arg 6)
I think this must have something to do with my auxiliary variables, but I'm not sure. Any help would be much appreciated.

Clogit function in CEDesign not converge

I designed a CE Experiment using the package support.CEs. I generated a CE Design with 3 attributes an 4 levels per attribute. The questionnaire had 4 alternatives and 4 blocks
des1 <- rotation.design(attribute.names = list(
Qualitat = c("Aigua potable", "Cosetes.blanques.flotant", "Aigua.pou", "Aigua.marro"),
Disponibilitat.acces = c("Aixeta.24h", "Aixeta.10h", "Diposit.comunitari", "Pou.a.20"),
Preu = c("No.problemes.€", "Esforç.economic", "No.pagues.acces", "No.pagues.no.acces")),
nalternatives = 4, nblocks = 4, row.renames = FALSE,
randomize = TRUE, seed = 987)
The questionnaire was replied by 15 persons (ID 1-15), so 60 outputs (15 persons responding per 4 blocks:
ID BLOCK q1 q2 q3 q4
1 1 1 1 2 3 3
2 1 2 1 3 3 4
3 1 3 5 1 3 5
4 1 4 5 2 2 5
5 2 1 1 2 4 3
6 2 2 1 4 3 4
7 2 3 3 1 3 2
8 2 4 1 2 2 2
9 3 1 1 2 2 2
10 3 2 1 4 3 4
11 3 3 3 1 3 4
12 3 4 3 2 1 4
13 4 1 1 5 4 3
14 4 2 1 4 5 4
15 4 3 5 5 3 2
16 4 4 5 2 5 5
17 5 1 1 2 4 2
18 5 2 3 2 3 2
19 5 3 3 1 3 4
20 5 4 3 2 1 4
21 6 1 1 5 5 5
22 6 2 1 3 3 4
23 6 3 3 1 3 4
24 6 4 1 2 2 2
25 7 1 1 2 4 3
26 7 2 4 2 3 4
27 7 3 3 1 3 3
28 7 4 3 4 5 5
29 8 1 1 3 2 3
30 8 2 1 4 3 4
31 8 3 3 1 3 4
32 8 4 1 2 2 1
33 9 1 1 2 3 3
34 9 2 1 3 3 4
35 9 3 5 1 3 5
36 9 4 5 2 2 5
37 15 1 1 5 5 5
38 15 2 4 4 5 4
39 15 3 5 5 3 5
40 15 4 4 3 5 5
41 11 1 1 5 5 5
42 11 2 4 4 5 4
43 11 3 5 5 3 5
44 11 4 5 3 5 5
45 12 1 1 2 4 3
46 12 2 4 2 3 4
47 12 3 3 1 3 3
48 12 4 3 4 5 5
49 13 1 1 2 2 2
50 13 2 1 4 3 4
51 13 3 3 1 3 2
52 13 4 1 2 2 2
53 14 1 1 1 3 3
54 14 2 1 4 1 4
55 14 3 4 1 3 2
56 14 4 3 2 1 2
57 15 1 1 1 3 2
58 15 2 5 2 1 4
59 15 3 4 4 3 1
60 15 4 3 4 1 4
The probles is that, when i merge the questions and answers matrix with the formula
dataset1 <- make.dataset(respondent.dataset = res1,
choice.indicators = c("q1","q2","q3","q4"),
design.matrix = desmat1)
R shows a warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, :
Ran out of iterations and did not converge
I should expect that the matrix desmat1 generated had 4800 observations (80 possible combinations and 60 outputs). Instead of that i have only 1200 obseravations. The matrix dataset1 only shows the combination of 1 set of alternatives instead of the 4.
For example, for ID 1, Block 1, Question 1 only appears alternative 1. It match with the answer selected by the person, but in other cases it does not match, and that information is lost in R, so the results when clogit is applied are wrong.
I do hope thay the problems is understood.
Regards,
Edition:
I found my problem. When i make the dataset from the respondent.dataset that i generated in .csv format, r detects only the q1 response instead of q1-q4. dataset1
dataset1 <- make.dataset(respondent.dataset = res1,
choice.indicators = c("q1","q2","q3","q4"),
design.matrix = desmat1)
detects q1-q4 as new columns. But the key is that q1-q4 has to fill the columns QES in dataset1. I did another CE before with 1 block and the dataset was correctly done one reading the respondant.dataset. So the key point is that now i'm using 4 blocks but i do not know how to make R to interprete that q1-q4 are the columns QUES for each block.
res1 matrix (repondant.dataset) (Complete matriz has 60 rows = 15 respondants (ID 1-15) * 4 Questions (QES column in make.dataset)
Kind reagards,

How to create a matrix in simple correspondence analysis?

I am trying to create a matrix in order to apply a simple correspondence analysis on it; I have 2 categorical variables: exp and conexinternet with 3 levels each.
obs conexinternet exp
1 1 2
2 1 1
3 2 2
4 1 1
5 1 1
6 2 1
7 1 2
8 1 2
9 1 2
10 2 1
11 1 1
12 2 1
13 2 2
14 2 1
15 1 1
16 2 2
17 1 1
18 2 2
19 2 2
20 2 2
21 2 2
22 1 1
23 2 3
24 1 1
25 2 1
26 2 1
27 1 1
28 2 2
29 2 1
30 1 2
31 1 2
32 2 3
33 2 1
34 2 1
35 2 1
36 3 2
37 2 1
38 3 2
39 2 3
40 2 3
41 2 2
42 2 3
43 2 2
44 2 2
45 2 1
46 2 2
47 2 3
48 1 3
49 2 3
50 3 2
51 2 2
52 2 2
53 2 1
54 1 2
55 1 1
56 2 3
57 3 2
58 3 1
59 3 1
60 1 2
61 2 3
62 2 2
63 3 1
64 3 2
65 3 2
66 1 2
67 3 2
68 3 2
69 3 3
70 2 1
71 3 3
72 3 2
73 3 2
74 3 2
75 3 1
76 3 2
77 3 1
I want to make a vector to categorize the observations as 11, 12, 13, 21, 22, 23, 31, 32, 33, how can I do it?
Is this what you want?
d <- read.table(text="obs conexinternet exp
1 1 2
...
77 3 1", header=T)
(tab <- xtabs(~conexinternet+exp, d))
# exp
# conexinternet 1 2 3
# 1 10 9 1
# 2 14 15 9
# 3 5 12 2

Error for tune with package e1071

I'm trying to tune the SVM model for regression in R using package e1071 with the method used in this tutorial. Here is the data
> head(gps_rg5,16)
Weather sex age occupation income weekday weekend age group rg
1 6 2 57 3 1 7 1 3 0.035725277
2 6 2 32 1 5 6 1 2 1.693898548
3 1 2 63 3 1 4 0 4 0.009012839
4 6 2 65 3 2 6 1 4 0.014902879
5 6 2 57 3 2 7 1 3 0.045594146
6 6 2 76 3 1 4 0 5 0.003531616
7 6 1 65 3 2 4 0 4 0.001575542
8 4 2 57 3 3 6 1 3 0.009384690
9 4 2 52 3 2 6 1 3 0.033322905
10 4 2 56 3 2 6 1 3 0.011879944
11 4 2 56 3 2 7 1 3 0.008266786
12 4 1 63 3 2 6 1 4 3.055594036
13 1 2 42 1 2 1 0 2 0.029010174
14 4 2 42 1 2 6 1 2 0.000933115
15 1 2 66 3 2 5 0 4 2.342416927
16 6 1 79 3 2 4 0 5 2.891190912
And this is the code for tuning:
svr1<-tune(svm,rg~.,data=train,ranges=list(cost=2^(2:9),epsilon=seq(0.01,10,0.1)))
And the code returns an error saying
Error in predict.svm(ret, xhold, decision.values = TRUE) :
Model is empty!
This is the structure of the training dataset:
Any answers would be appreciated!!!
Many thanks!!

R which argument fits well to obtain nonuniform bins using "plot" to build an informative histogram

I am new to R,I am trying to plot a cumulative frequency histogram(non-uniform bins) for a huge amount of data(few millions of positive numbers with a minimum value "1" and maximum value varies from data to data like for instance 1*10^6 or 1*10^5).I used this simple code to generate a histogram with the data.
for example:-sample data
[89601] 10 2 2 4 3 12 3 25 25 2
[89611] 5 5 5 2 23 22 14 8 13 10
[89621] 13 19 157 2 3 2 4 2 3 33
[89631] 22 2 14 9 2 3 3 3 8 2
[89641] 8 3 2 127 8 2 18 2 4 2
[89651] 2 13 3 34 8 2 6 10 3 7
[89661] 3 9 7 3 36 9 5 2 10 15
[89671] 7 2 23 2 2 2 2 7 6 25
[89681] 3 3 2 6 37 49 28 11 3 35
[89691] 2 2 8 3 3 2 2 4 3 12
[89701] 3 5 2 7 3 2 15 6 3 14
[89711] 13 5 3 2 2 8 34 4 4 65
[89721] 5 9 12 2 11 2 2 79 9 13
[89731] 2 66 2 9 10 22 11 2 6 3
[89741] 12 2 11 5 4 4 2 4 3 4
[89751] 2 8 9 3 2 2 84 7 11 10
[89761] 8 30 16 3 63 2 2 24 13 2
[89771] 11 37 2 9 21 21 10 2 2 49
[89781] 3 3 8 5 2 19 9 6 5 4
[89791] 4 2 9 2 10 33 5 4 2 2
[89801] 4 2 2 4 9 3 11 2 5 142
[89811] 17 2 11 4 2 8 26 2 9 8
[89821] 10 2 4 2 5 2 20 7 145 11
[89831] 22 19 8 14 18 39 3 2 3 3
[89841] 2 11 10 3 2 3 3 5 6 12
[89851] 17 5 3 8 2 2 2 2 2 5
[89861] 4 2 13 3 2 2 2 2 3 2
[89871] 4 3 21 2 6 2 8 9 7 14
[89881] 2 582 3 15 11 3 20 16 9 8
[89891] 6 2 6 7 3 20 17 2 9 5
[89901] 5 11 2 12 7 2 46 2 144 9
[89911] 2 3 36 25 3 2 16 2 2 119
[89921] 5 5 10 6 2 2 6 84 13 2
[89931] 2 6 6 2 17 3 7 4 102 48
data <- read.table("sample.txt", header=FALSE)
data <- hist(data$V1, breaks=length(data$V1), xlim=c(0,4000000))
plot(data)
when I did this I could get a histogram with all the data(positive numbers)on x axis and counts on y-axis.Then again I changed the limit of the x only upto the area of interest
plot(data, xlim=c(0,200000))
Like before a histogram is plotted,but using "plot" I couldn't define the number of bins and hence the histogram is not clear(not like bars which I want to be) and informative.
As I am new to this forum,I have no idea how to upload images,so I couldn't provide with the histogram.
Any suggestions would be very helpful.
For plotting histogram you can use hist() function just this way:
hist(data$V1, xlim=c(0,200000), breaks=100)
The breaks parameter shows, how many bars will be plotted. But this number is related to all plot, not to xlim you specified. So, at first it will make a histogram with given number of breakes and after that it will cut the part of plot you need.
But there is another way to plot the bars:
data <- read.table("sample.txt", header=FALSE)
data.hist <- hist(data$V1, breaks=length(data$V1), xlim=c(0,4000000))
plot(data.hist$counts, type='h')
The hist function returns an object which represents histogram parameters.
I assume, you are interested in "counts" field.
You can plot this info in histogram-like way by defining type='h'.

Resources