regarding predicted value from xgboost package in R - r

package using:
‘xgboost’ version 0.4-4
i am using model building function xgboost() using code :
fit <- xgboost(data =sparse_matrix , label = trainSet$OutputClass,
max.depth = 4,eta = 1, nthread = 2, nround = 10,
eval_metric = "merror",objective = "multi:softmax",num_class = 45)
when i use the prediction function:
Prediction<- predict(fit,sparse_matrixtestSet)
the above code gave output as below( instead of class names its giving numerical equivalent value eventhough "label = trainSet$OutputClass" contain class names)
output:
[1] 1 1 1 1 1 35 3 3 3 4 31 7 7 7 3 3 9 9 9 9 9 9 9 10 10 11
[27] 11 11 11 11 11 11 11 11 11 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 10 10
[53] 15 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16 16 16 18 18 18
[79] 18 18 18 18 35 35 35 18 21 21 21 21 32 1 1 25 25 25 25 26 27 27 27 27 27 27
[105] 27 27 29 29 29 29 29 30 30 30 30 30 30 30 30 30 30 35 35 32 32 32 43 43 32 32
[131] 32 32 32 32 32 32 43 32 32 32 32 32 33
I have also set stringsAsFactors=FALSE while reading data set.
Can Someone Please help me How to Get predicted values in terms of class names instead of numerical values...
Thanks in advance

Related

Generating a vector with n repetitions of x, then y, then z, with a fixed upper bound

I am trying to create a vector where I have 3 repetitions of the number 1, then 3 repetitions of the number 2, and so on up to, for instance, 3 repetitions of the number 36.
c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5...)
I have tried the following use of rep() but got the following error:
Error in rep(3, seq(1:36)) : argument 'times' incorrect
What formulation do I need to use to properly generate the vector I want?
sort(rep(1:36, 3))
Or even better as #Wimpel mentioned in the comments, use the each argument of the rep function.
rep(1:36, each = 3)
output
# [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19 20 20 20 21 21 21 22
# [65] 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34 34 34 35 35 35 36 36 36
This one should work. However probably not the most elegant.
reps = c()
n = 36
for(i in 1:n){
reps = append(reps, rep(i, 3))
}
reps
alternatively using the rep function properly (see documentation (?rep for argument each):
rep(1:36,each = 3)
rep approach is preferable (see existing answers)
Here are some other options:
> kronecker(1:36, rep(1, 3))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36
> c(outer(rep(1, 3), 1:36))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36

Sequence of Number, through Iteration

I have the following vector of indices
V_ind = cumsum(c(10,9,8,7,6,5,4,3,2,1))
[1] 10 19 27 34 40 45 49 52 54 55
And I created the following FOR LOOP
k=1
for(ind in V_ind){
if(ind<=10){
print("ok")
}else{
print(c(V_ind[1:k]))
k = k + 1
}
}
Which gives as a result
[1] "ok"
[1] 10
[1] 10 19
[1] 10 19 27
[1] 10 19 27 34
[1] 10 19 27 34 40
[1] 10 19 27 34 40 45
[1] 10 19 27 34 40 45 49
[1] 10 19 27 34 40 45 49 52
[1] 10 19 27 34 40 45 49 52 54
However, what I try to acheive is the following result
[1] "ok"
[1] 10
[1] 9 10 19
[1] 8 9 10 18 19 27
[1] 7 8 9 10 17 18 19 26 27 34
[1] 6 7 8 9 10 16 17 18 19 25 26 27 33 34 40
[1] 5 6 7 8 9 10 15 16 17 18 19 24 25 26 27 32 33 34 39 40 45
[1] 4 5 6 7 8 9 10 14 15 16 17 18 19 23 24 25 26 27 31 32 33 34 38 39 40 44 45 49
[1] 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 22 23 24 25 26 27 30 31 32 33 34 37 38 39 40 43 44 45 48 49 52
[1] 2 3 4 5 6 7 8 9 10 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 29 30 31 32 33 34 36 37 38 39 40 42 43 44 45 47 48 49 51 52 54
This result goes as follows:
In the first iteration, we just print OK
In the second iteration, we extract the first element of the vector V_ind,
In the third iteration, we extract the first and second element of the vector V_ind together with the first element of the vector V_ind minus 1 that is the number 9.
In the fourth iteration, we extract the first, second and third element of the vector V_ind, together with the first element minus 1, i.e. 9, first element minus 2, i.e 8, and second element minus 1,i.e.18.
In the fifth iteration, we extract the first, second, third and fourth element of the vector V_ind together with the first element minus 1, 2, 3 respectively, i.e 7,8,9, also the second element minus 1 and 2, i.e 17,18, and the third element minus 1, i.e 26.
And this procedure goes until the end of the FOR LOOP. Is this even possible to be done in R, in a generic way?
One option using purrr could be:
map(.x = accumulate(V_ind, c),
~ map2(.x,
rev(seq_along(.x) - 1),
function(y, z) seq(y - z, y, 1)) %>%
reduce(c))
[[1]]
[1] 10
[[2]]
[1] 9 10 19
[[3]]
[1] 8 9 10 18 19 27
[[4]]
[1] 7 8 9 10 17 18 19 26 27 34
[[5]]
[1] 6 7 8 9 10 16 17 18 19 25 26 27 33 34 40
[[6]]
[1] 5 6 7 8 9 10 15 16 17 18 19 24 25 26 27 32 33 34 39 40 45
[[7]]
[1] 4 5 6 7 8 9 10 14 15 16 17 18 19 23 24 25 26 27 31 32 33 34 38 39 40 44 45 49
[[8]]
[1] 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 22 23 24 25 26 27 30 31 32 33 34 37 38 39 40 43 44 45 48 49 52
[[9]]
[1] 2 3 4 5 6 7 8 9 10 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 29 30 31 32 33 34 36 37 38 39 40 42 43 44 45 47 48
[42] 49 51 52 54
[[10]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
[42] 42 43 44 45 46 47 48 49 50 51 52 53 54 55
And if it's important, you can simply add the "OK" iteration retrospesctively:
append("OK",
map(.x = accumulate(V_ind, c),
~ map2(.x,
rev(seq_along(.x) - 1),
function(y, z) seq(y - z, y, 1)) %>%
reduce(c)))
Likewise, if you need to leave out the last number from the original vector:
append("OK",
map(.x = accumulate(head(V_ind, -1), c),
~ map2(.x,
rev(seq_along(.x) - 1),
function(y, z) seq(y - z, y, 1)) %>%
reduce(c)))
for (i in seq_along(V_ind)) {
if (i == 1) {
print("ok")
} else if (i == 2) {
print(V_ind[1])
} else {
out_vector <- V_ind[seq(i - 1)]
max_minus <- i - 2
minus_indices <- rep(seq(max_minus), rev(seq(max_minus)) + 1)
minus_vector <- c()
for (j in rev(seq(max_minus))) {
minus_vector <- c(minus_vector, rev(seq(0, j)))
}
out_vector <- numeric(length(minus_vector))
for (k in seq_along(out_vector)) {
out_vector[k] <- V_ind[minus_indices[k]] - minus_vector[k]
}
print(c(out_vector, V_ind[i - 1]))
}
}
[1] "ok"
[1] 10
[1] 9 10 19
[1] 8 9 10 18 19 27
[1] 7 8 9 10 17 18 19 26 27 34
[1] 6 7 8 9 10 16 17 18 19 25 26 27 33 34 40
[1] 5 6 7 8 9 10 15 16 17 18 19 24 25 26 27 32 33 34 39 40 45
[1] 4 5 6 7 8 9 10 14 15 16 17 18 19 23 24 25 26 27 31 32 33 34 38 39 40 44 45 49
[1] 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 22 23 24 25 26 27 30 31 32 33 34 37 38 39 40 43 44 45 48 49 52
[1] 2 3 4 5 6 7 8 9 10 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 29 30 31 32 33 34 36 37 38 39 40 42 43 44 45 47 48 49
[43] 51 52 54
You could define the indices at which to subtract, and how much to subtract, explicitly. (+1 added to seq for subtracting 0). Then you just have to append the last item (V_ind[i -1]) where no subtraction is performed to the vector
Another option where sequence plays the key role
lapply(seq_along(x), function(n){
x[rep(1:n, n:1)] - rev(sequence(1:n) - 1)
})
# [[1]]
# [1] 10
#
# [[2]]
# [1] 9 10 19
#
# [[3]]
# [1] 8 9 10 18 19 27
#
# [[4]]
# [1] 7 8 9 10 17 18 19 26 27 34
Where x is a subset of your vector:
x = cumsum(10:7)
If desired, just c "ok" to the above.

Display vector in R with a defined viewport

I want to display a vector consistently in different R environment.
For example, for a vector like this
c(1:30)
will display 24 values per row
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[25] 25 26 27 28 29 30
and not
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
The closest thing to what you are looking for is to use options() to configure the width of the results window:
options(width = 75)
c(1:30)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30

split a list and increment for loop by 10

How to split a list in r?
I want to split a list in increment manner.
for ex.:
x <- 1:50
n <- 5
spt <- split(x,cut(x,quantile(x,(0:n)/n), include.lowest=TRUE, labels=FALSE))
we get
$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`2`
[1] 11 12 13 14 15 16 17 18 19 20
$`3`
[1] 21 22 23 24 25 26 27 28 29 30
$`4`
[1] 31 32 33 34 35 36 37 38 39 40
$`5`
[1] 41 42 43 44 45 46 47 48 49 50
I don't want this output. I want the output like below,
$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`2`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$`3`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
$`4`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
$`5`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
any idea?
And i also want to know that how to increment for loop by 10 in r?
Thanks.
We can use seq
lapply(seq(10,50, by=10), function(i) x[1:i])
Or as #RichardScriven mentioned in the comments, the seq(10,50, by=10) can be replaced by 1:5 * 10L

Splitting and iterative simple regression in r

I am pretty much new to r and I have a dummy example of a bigger table underneath. I want to split the table based on id (a,b,c,d) and do iterative simple linear regression for every subset:
x is my x variable, and columns 1:6 are y variables, to have an output of each id and each column from 1:6. Also, it would be great if I could output the model p values of the slopes into a new data frame
id x 1 2 3 4 5 6
1 a 74 18 19 NA 23 29 1
2 a 77 16 19 17 22 29 2
3 a 79 16 NA 19 23 29 3
4 a 81 17 20 18 23 29 4
5 b 74 19 20 19 23 28 11
6 b 76 15 19 18 26 28 12
7 b 79 19 21 20 24 28 NA
8 b 81 19 21 20 23 28 14
9 c 68 19 20 20 23 29 8
10 c 70 17 22 22 27 29 9
11 c 73 18 22 21 23 29 10
12 c 75 19 20 19 23 29 11
13 d 65 18 18 19 22 28 5
14 d 68 18 NA 18 20 29 6
15 d 70 18 19 18 23 28 7
16 d 72 19 17 19 22 28 8
I tried to do use plyr package but it didn't work out
regression = NULL
for ( i in 3:ncol(dumm)){
regression[i] <- dlply(dumm, .(id), function(z) lm(dumm[,i]~dumm$x, z))
}
coefs <- ldply(regression, coef)
Thanks in advance!

Resources