which.min() returning two numbers - r

I need the position of the smallest value in my vector (degrees in a graph, got from function degree()). I use the which.min().
However as the vector itself is "anotated", I get two values - the name of the node and the position in the vector (which I have no idea why they are not in the right order) - here node "23" has the smallest degree and it is in the 40th position in the vector. They appear on top of each other and I cannot figure out how to separate them.
I need to use just the name of the node for further applications. I couldn't find any question about this issue.
> degs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 25 26 27 28 29 30 31 32 34 35 36 38 39 40 41 33 23 37 42 43
14 25 31 17 25 11 26 21 23 25 24 17 13 20 12 15 7 15 28 18 9 17 8 7 7 7 14 19 12 17 19 10 19 20 19 10 7 11 12 6 8 12 13
> which.min(degs)
23
40

The top number is just the name of the value and you can ignore it, see?
> c("23" = 40)
23
40

If you want just the name of the node, you can use
names(which.min(degs))
Output will be "23".

Related

Generating a vector with n repetitions of x, then y, then z, with a fixed upper bound

I am trying to create a vector where I have 3 repetitions of the number 1, then 3 repetitions of the number 2, and so on up to, for instance, 3 repetitions of the number 36.
c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5...)
I have tried the following use of rep() but got the following error:
Error in rep(3, seq(1:36)) : argument 'times' incorrect
What formulation do I need to use to properly generate the vector I want?
sort(rep(1:36, 3))
Or even better as #Wimpel mentioned in the comments, use the each argument of the rep function.
rep(1:36, each = 3)
output
# [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19 20 20 20 21 21 21 22
# [65] 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34 34 34 35 35 35 36 36 36
This one should work. However probably not the most elegant.
reps = c()
n = 36
for(i in 1:n){
reps = append(reps, rep(i, 3))
}
reps
alternatively using the rep function properly (see documentation (?rep for argument each):
rep(1:36,each = 3)
rep approach is preferable (see existing answers)
Here are some other options:
> kronecker(1:36, rep(1, 3))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36
> c(outer(rep(1, 3), 1:36))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36

Display vector in R with a defined viewport

I want to display a vector consistently in different R environment.
For example, for a vector like this
c(1:30)
will display 24 values per row
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[25] 25 26 27 28 29 30
and not
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
The closest thing to what you are looking for is to use options() to configure the width of the results window:
options(width = 75)
c(1:30)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30

regarding predicted value from xgboost package in R

package using:
‘xgboost’ version 0.4-4
i am using model building function xgboost() using code :
fit <- xgboost(data =sparse_matrix , label = trainSet$OutputClass,
max.depth = 4,eta = 1, nthread = 2, nround = 10,
eval_metric = "merror",objective = "multi:softmax",num_class = 45)
when i use the prediction function:
Prediction<- predict(fit,sparse_matrixtestSet)
the above code gave output as below( instead of class names its giving numerical equivalent value eventhough "label = trainSet$OutputClass" contain class names)
output:
[1] 1 1 1 1 1 35 3 3 3 4 31 7 7 7 3 3 9 9 9 9 9 9 9 10 10 11
[27] 11 11 11 11 11 11 11 11 11 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 10 10
[53] 15 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16 16 16 18 18 18
[79] 18 18 18 18 35 35 35 18 21 21 21 21 32 1 1 25 25 25 25 26 27 27 27 27 27 27
[105] 27 27 29 29 29 29 29 30 30 30 30 30 30 30 30 30 30 35 35 32 32 32 43 43 32 32
[131] 32 32 32 32 32 32 43 32 32 32 32 32 33
I have also set stringsAsFactors=FALSE while reading data set.
Can Someone Please help me How to Get predicted values in terms of class names instead of numerical values...
Thanks in advance

Splitting and iterative simple regression in r

I am pretty much new to r and I have a dummy example of a bigger table underneath. I want to split the table based on id (a,b,c,d) and do iterative simple linear regression for every subset:
x is my x variable, and columns 1:6 are y variables, to have an output of each id and each column from 1:6. Also, it would be great if I could output the model p values of the slopes into a new data frame
id x 1 2 3 4 5 6
1 a 74 18 19 NA 23 29 1
2 a 77 16 19 17 22 29 2
3 a 79 16 NA 19 23 29 3
4 a 81 17 20 18 23 29 4
5 b 74 19 20 19 23 28 11
6 b 76 15 19 18 26 28 12
7 b 79 19 21 20 24 28 NA
8 b 81 19 21 20 23 28 14
9 c 68 19 20 20 23 29 8
10 c 70 17 22 22 27 29 9
11 c 73 18 22 21 23 29 10
12 c 75 19 20 19 23 29 11
13 d 65 18 18 19 22 28 5
14 d 68 18 NA 18 20 29 6
15 d 70 18 19 18 23 28 7
16 d 72 19 17 19 22 28 8
I tried to do use plyr package but it didn't work out
regression = NULL
for ( i in 3:ncol(dumm)){
regression[i] <- dlply(dumm, .(id), function(z) lm(dumm[,i]~dumm$x, z))
}
coefs <- ldply(regression, coef)
Thanks in advance!

Clean bad data automatically [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 9 years ago.
I am building an App using shiny and openair to analyze wind data.
Right now the data needs to be “cleaned” before uploading by the user.
I am interested in doing this automatically.
Some of the data is empty, some of is not numeric, so it is not possible to build a wind rose.
I want to:
1. Estimate how much of the data is not numeric
2. Cut it out and leave only numeric data
here is an example of the data:
the "NO2.mg" is read as a factor and not int becuse it does not consist only numbers
OK
here is a reproducible example:
no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
no2
[1] 5 4 c1 54 c5 1 2 3 4 5 6 7 8 9 10 11 12 13 14
[20] 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
[39] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
52 Levels: 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 ... c5
> as.numeric(no2)
[1] 45 34 51 46 52 1 12 23 34 45 47 48 49 50 2 3 4 5 6
[20] 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 24 25 26 27
[39] 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 44
Worst R haiku ever:
Some of the data is empty,
some of is not numeric,
so it is not possible to build a wind rose.
To convert a factor to numeric, you need to convert to character first:
no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
no2_num <- as.numeric(as.character(no2))
#Warning message:
# NAs introduced by coercion
no2_clean <- na.omit(no2_num) #remove NAs resulting from the bad data
# [1] 5 4 54 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
# [40] 37 38 39 40 41 42 43 44 45 46 47 48 49
# attr(,"na.action")
# [1] 3 5
# attr(,"class")
# [1] "omit"
length(attr(no2_clean,"na.action"))/length(no2)*100
#[1] 3.703704
OK this is how i did it i am sure someone has abetter way
i'd love it if you share with me
this is my data:
no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
to count the "bad data:"
sum(is.na((as.numeric(as.vector(no2)))))
and to estimate the percent of bad data:
sum(is.na((as.numeric(as.vector(no2)))))/length(no2)*100

Resources