select columns by rule and stack them in R - r

Area RA RI WA WI NA NI
3 3 1 4 2 2 1
2 2 1 3 1 2 1
3 2 1 3 2 2 1
2 2 1 3 1 1 1
2 2 1 3 2 1 1
2 2 1 2 1 2 1
2 3 1 2 1 2 1
3 1 1 2 2 1 1
2 2 1 1 1 2 1
2 2 1 2 1 2 1
3 1 1 3 1 1 1
I want to retain the first column and stack every two columns as:
Area columan 1 Column 2
3 3 1
2 2 1
3 2 1
2 2 1
2 2 1
2 2 1
2 3 1
3 1 1
2 2 1
2 2 1
3 1 1
3 4 2
2 3 1
3 3 2
2 3 1
2 3 2
2 2 1
2 2 1
3 2 2
2 1 1
2 2 1
3 3 1
3 2 1
2 2 1
3 2 1
2 1 1
2 1 1
2 2 1
2 2 1
3 1 1
2 2 1
2 2 1
3 1 1
Your suggest highly appreciated !

We get the alternating columns using the recycling logical index (c(TRUE, FALSE) after subsetting the dataset without the first column (df1[-1]), unlist and cbind it with the first column.
d1 <- data.frame(Area = df1[,1], column1 = unlist(df1[-1][ c(TRUE, FALSE)]),
column2 = unlist(df1[-1][c(FALSE, TRUE)]))
row.names(d1) <- NULL
head(d1)
# Area column1 column2
#1 3 3 1
#2 2 2 1
#3 3 2 1
#4 2 2 1
#5 2 2 1
#6 2 2 1
tail(d1)
# Area column1 column2
#28 2 2 1
#29 2 2 1
#30 3 1 1
#31 2 2 1
#32 2 2 1
#33 3 1 1

Related

When I run complete(), I am getting the an error- Error in (function (classes, fdef, mtable)

I have a dataset which looks like this(A-J are column names)
A B C D E F G H I J
1 2 2 3 2 1 1 1 1
2 1 1 1 1 1 1 1 1 1
2 1 2 2 2 2 2 2 1 1
2 1 2 1 1 1 1 1 1 1
2 1 3 3 3 2 2 2 2
2 1 3 2 2 3 1 1 1 1
1 3 2 1 2 2 2 1 2
2 1 2 2 2 2 2 2 1 1
1 2 2 2 2 1 1 1 1
2 1 2 1 1 1 2 1 1 1
2 1 1 1 1 1 2 2 1 1
2 1 2 1 1 1 1 1 1 2
2 1 1 1 1 1 1 1
2 1 3 3 3 3 1 1 1 2
1 2 2 1 2 1 1 1 1
1 2 2 2 2 2 2 1 1
2 2 4 1 1 1 2 2 1 1
1 1 3 3 3 3
2 1 3 3 1 2 2 2 2 3
I am getting the below error-
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘complete’ for signature ‘"mids", "numeric"’
My data has lot of NULL values and I am trying to impute the data using below code-
imp_data<-mice(data = data_NA, m = 5, method = "rf", maxit = 5, seed = 500)
I get the error when I run the code-
complete(imp_data,1)
Please suggest where I am doing wrong
It seems that NA values are not properly assigned in the data_NA data.frame which is causing the problem.
The modified data (with NA) and transforming it using mice as it worked for me:
library(mice)
imp_data <- mice(data = data_NA, m = 5, method = "rf", maxit = 5, seed = 500)
complete(imp_data, 1)
EDITED: The error seen by OP was resolved by changing the call as:
mice::complete(imp_data, 1)
May be the mice::complete was masked by some function other package.
#Result
# A B C D E F G H I J
# 1 1 2 2 3 2 1 1 1 1 2
# 2 2 1 1 1 1 1 1 1 1 1
# 3 2 1 2 2 2 2 2 2 1 1
# 4 2 1 2 1 1 1 1 1 1 1
# 5 2 1 1 3 3 3 2 2 2 2
# 6 2 1 3 2 2 3 1 1 1 1
# 7 1 3 2 1 2 2 2 1 2 1
# 8 2 1 2 2 2 2 2 2 1 1
# 9 1 2 2 2 2 1 1 1 1 1
# 10 2 1 2 1 1 1 2 1 1 1
# 11 2 1 1 1 1 1 2 2 1 1
# 12 2 1 2 1 1 1 1 1 1 2
# 13 2 1 1 1 1 1 1 1 1 1
# 14 2 1 3 3 3 3 1 1 1 2
# 15 1 2 2 1 2 1 1 1 1 1
# 16 1 2 2 2 2 2 2 1 1 1
# 17 2 2 4 1 1 1 2 2 1 1
# 18 1 1 3 3 3 3 2 1 2 1
# 19 2 1 3 3 1 2 2 2 2 3
#
Data
data_NA<- read.table(text =
"A B C D E F G H I J
1 2 2 3 2 1 1 1 1 NA
2 1 1 1 1 1 1 1 1 1
2 1 2 2 2 2 2 2 1 1
2 1 2 1 1 1 1 1 1 1
2 1 NA 3 3 3 2 2 2 2
2 1 3 2 2 3 1 1 1 1
1 3 2 1 2 2 2 1 2 NA
2 1 2 2 2 2 2 2 1 1
1 2 2 2 2 1 1 1 1 NA
2 1 2 1 1 1 2 1 1 1
2 1 1 1 1 1 2 2 1 1
2 1 2 1 1 1 1 1 1 2
2 1 1 1 NA NA 1 1 1 1
2 1 3 3 3 3 1 1 1 2
1 2 2 1 2 1 1 1 1 NA
1 2 2 2 2 2 2 1 1 NA
2 2 4 1 1 1 2 2 1 1
1 1 3 3 3 3 NA NA NA NA
2 1 3 3 1 2 2 2 2 3",header = TRUE)

Accuracy.meas function in ROSE package of R

I am using accuracy.meas function of ROSE package in R. I got the error Response must have two levels. So checked
both the parameter response and predicted1. But both are numeric. Is there some limitations to usability of accuracy.meas function.
Note- The answer is wrong but it has nothing to do with error
accuracy.meas(test$Walc,predicted1,threshold = 0.5)
Error in accuracy.meas(response=test$Walc,predicted= predicted1, threshold = 0.5) :
Response must have two levels.
>test$Walc
[1] 1 1 1 3 3 3 1 1 2 2 1 2 1 1 3 3 1 1 1 1 3 1 1 4 2 1 1 1 1 4 4 4 5 1 1 1 1 3 1 2 3
[42] 1 5 1 4 4 1 2 2 2 1 2 2 3 2 3 1 2 1 5 1 1 3 2 2 1 1 1 1 1 1 1 2 1 1 3 3 3 2 3 1 2
[83] 2 2 1 1 3 1 1 1 2 3 3 1 1 3 1 2 1 5 2 2 1 2 1 1 2 2 1 1 3 1 2 1 1 1 3 1 1 1 1 1 1
[124] 3 3 3 4 1 1 1 1 4 1 1 1 1 3 2 1 3 3 1 1 1 1 1 1 1 1 5 1 1 1 3 1 1 1 3 4 1 3 2 4 5
[165] 2 1 1 2 1 1 2 3 1 4 1 2 1 4 4 5 1 1 5 3 5 4 5 2 4 2 2 4 1 5 5 4 2 2 1 4 4 4 2 3 4
[206] 2 3 4 4 5 2 3 4 5 5 3 2 4 4 1 5 5 5 3 2 2 4 1 5 5 2 1 1 1 2 3 3 2 1 1 3 4 1 1 1 4
[247] 1 3 1 2 2 3 3 2 2 2 2 1 2 1 1 1 1 3 1 1 1 1 1 1 1 2 1 1 3 1 1 4 3 5 2 2 4 3 4 2 3
[288] 5 5 3 1 1 3 4 4 4 3 4 5 3 3 3 3 3 4 4 3 1 3 3 4 3
> predicted1
[1] 2 2 1 2 2 2 1 1 1 2 2 2 1 1 4 4 1 1 1 1 3 2 2 3 2 2 1 2 2 2 2 2 5 3 3 2 2 2 1 1 2
[42] 1 3 2 3 3 2 2 2 2 2 2 2 3 1 3 2 1 2 4 2 3 2 3 3 1 2 2 2 1 1 2 2 1 1 2 2 3 1 2 2 2
[83] 2 2 1 1 3 2 2 1 1 3 3 1 2 2 2 3 1 3 3 3 1 2 1 2 1 2 3 1 3 2 2 2 2 2 2 2 2 2 2 1 2
[124] 4 1 4 4 2 1 1 2 1 1 2 1 1 2 2 2 3 3 1 1 1 1 2 1 1 1 4 2 1 1 2 2 1 2 2 3 1 2 2 3 4
[165] 2 2 2 3 2 1 2 2 2 4 1 2 2 4 4 5 1 1 5 2 5 4 4 2 4 3 2 2 1 4 4 2 2 2 1 4 2 3 2 3 4
[206] 3 2 4 4 5 2 2 4 4 5 4 3 3 3 2 4 4 4 3 1 2 2 2 4 4 1 1 2 2 2 3 3 1 2 1 2 2 1 1 3 2
[247] 2 2 1 4 2 2 4 2 2 2 2 2 2 2 1 1 3 2 1 2 2 2 2 1 1 2 2 2 4 4 2 3 3 5 2 2 3 3 3 3 3
[288] 3 5 4 2 2 4 4 5 4 3 4 5 3 4 4 3 3 3 3 3 2 4 4 2 3

What does this R expression do?

sp_full_in is matrix:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 0 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 2
2 1 0 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
3 2 2 0 2 2 2 2 2 2 1 1 2 2 2 1 2 1 1 1 2 1
4 1 2 1 0 2 2 2 1 2 1 1 1 2 2 1 2 1 1 2 2 1
5 2 2 2 2 0 2 2 2 2 1 1 2 1 2 1 2 1 1 1 2 2
6 2 1 1 1 1 0 1 1 1 2 2 2 2 2 1 2 1 2 2 1 1
7 2 1 1 2 1 1 0 1 1 2 1 1 2 1 1 2 1 1 1 2 1
8 1 2 1 1 1 2 2 0 1 1 1 2 2 2 1 2 1 1 2 1 1
9 2 2 1 2 1 1 2 2 0 1 1 2 1 2 1 2 1 1 2 2 2
10 2 2 1 1 1 2 2 1 1 0 2 2 2 2 1 1 1 1 1 2 2
11 2 2 1 1 1 2 1 1 1 1 0 2 1 2 1 2 1 1 1 1 2
12 1 2 1 1 2 1 1 2 1 1 1 0 2 2 1 2 1 2 1 1 1
13 2 2 2 2 1 3 2 2 2 1 1 3 0 2 1 2 2 1 2 2 2
14 2 2 1 2 1 2 1 2 1 2 2 2 1 0 1 2 1 1 1 1 1
15 2 2 2 2 2 2 2 2 2 1 1 2 2 1 0 2 1 1 1 1 2
16 1 2 2 1 1 2 2 2 1 1 2 2 2 2 1 0 1 1 2 1 2
17 2 2 1 1 1 1 1 2 1 1 1 1 2 2 1 2 0 2 2 1 1
18 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 0 1 1 1
19 2 2 1 2 1 2 2 2 2 1 1 2 2 2 1 2 1 1 0 2 2
20 2 2 1 1 1 2 2 2 2 1 2 2 2 2 1 2 1 1 1 0 1
21 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 2 1 1 2 1 0
mean(sp_full_in[which(sp_full_in != Inf)])
produces the result [1] 1.38322
I'm not quite sure I understand what this does, but the way I read it is: for every cell in sp_full_in, check if it is not infinite, if so, return the output 1, then average all the outputs. Is that correct? If not, how should it be ready?
which(sp_full_in != Inf) returns a vector of integers (and only one of them is 1). That vector of integers is then handed to "[" as indices into sp_full_in and returns all the values of sp_full_in as a vector passed to the mean function.
It is a good idea to learn to read R expressions from the "inside out". Find the innermost function call and mentally evaluate it, in this case sp_full_in != Inf,. That returns a logical matrix of all TRUE's that gets passed to which(), and since there is no 'arr.ind' argument, it returns an atomic vector of indices.
The other answers are good at explaining why you get the mean of all the finite entries in the matrix, but it's worth noting that in this case the which does nothing. I used to have the bad habit of over-using which as well.
> a <- matrix(rnorm(4), nrow = 2)
> a
[,1] [,2]
[1,] 0.5049551 -0.7844590
[2,] -1.7170087 -0.8509076
> a[which(a != Inf)]
[1] 0.5049551 -1.7170087 -0.7844590 -0.8509076
> a[a != Inf]
[1] 0.5049551 -1.7170087 -0.7844590 -0.8509076
> a[1] <- Inf
> a
[,1] [,2]
[1,] Inf -0.7844590
[2,] -1.717009 -0.8509076
> a[which(a != Inf)]
[1] -1.7170087 -0.7844590 -0.8509076
## Similarly if there was an Infinite value
> a[a != Inf]
[1] -1.7170087 -0.7844590 -0.8509076
And, while we're at it, we should also mention the function is.finite which is often preferable to != Inf. is.finite will return FALSE on Inf, -Inf, NA and NaN.
No, but you are close, when which is applied to a matrix, it checks every cell of the matrix against the condition,here it is Not Inf. Return the indices of all cells satisfying the conditions,then, according to your code, output the value of the cell according to the returned indices and finally calculate mean of those.

Conditional counting in R

I have a question I hope some of you might help me with. I am doing a thesis on pharmaceuticals and the effect from parallelimports. I am dealing with this in R, having a Panel Dataset
I need a variable, that counts for a given original product - how many parallelimporters are there for this given time period.
Product_ID PI t
1 0 1
1 1 1
1 1 1
1 0 2
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1
2 1 1
2 0 2
2 1 2
2 0 3
2 1 3
2 1 3
2 1 3
Ideally what i want here is a new column, like number of PI-products (PI=1) for an original (PI=0) at time, t. So the output would be like:
Product_ID PI t nPIcomp
1 0 1 2
1 1 1
1 1 1
1 0 2 4
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1 1
2 1 1
2 0 2 1
2 1 2
2 0 3 3
2 1 3
2 1 3
2 1 3
I hope I have made my issue clear :)
Thanks in advance,
Henrik
Something like this?
x <- read.table(text = "Product_ID PI t
1 0 1
1 1 1
1 1 1
1 0 2
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1
2 1 1
2 0 2
2 1 2
2 0 3
2 1 3
2 1 3
2 1 3", header = TRUE)
find.count <- rle(x$PI)
count <- find.count$lengths[find.count$values == 1]
x[x$PI == 0, "nPIcomp"] <- count
Product_ID PI t nPIcomp
1 1 0 1 2
2 1 1 1 NA
3 1 1 1 NA
4 1 0 2 4
5 1 1 2 NA
6 1 1 2 NA
7 1 1 2 NA
8 1 1 2 NA
9 2 0 1 1
10 2 1 1 NA
11 2 0 2 1
12 2 1 2 NA
13 2 0 3 3
14 2 1 3 NA
15 2 1 3 NA
16 2 1 3 NA
I would use ave and your two columns Product_ID and t as grouping variables. Then, within each group, apply a function that returns the sum of PI followed by the appropriate number of NAs:
dat <- transform(dat, nPIcomp = ave(PI, Product_ID, t,
FUN = function(z) {
n <- sum(z)
c(n, rep(NA, n))
}))
The same idea can be used with the data.table package if your data is large and speed is a concern.
Roman's answers gives exactly what you want. In case you want to summarise the data this would be handy, using the plyr pacakge (df is what I have called your data.frame)...
ddply( df , .(Product_ID , t ) , summarise , nPIcomp = sum(PI) )
# Product_ID t nPIcomp
#1 1 1 2
#2 1 2 4
#3 2 1 1
#4 2 2 1
#5 2 3 3

Off-diagonal and Diagonal symmetry check, Getting off-diagonal and diagonal element(s) without repetition of a Matrix

Suppose I have this matrix
8 3 1 1 2 2 1 1 1 1 1 1 2 2 1 1 3
3 8 3 1 1 2 2 1 1 1 1 1 1 2 2 1 1
1 3 8 3 1 1 2 2 1 1 1 1 1 1 2 2 1
1 1 3 8 3 1 1 2 2 1 1 1 1 1 1 2 2
2 1 1 3 8 3 1 1 2 2 1 1 1 1 1 1 2
2 2 1 1 3 8 3 1 1 2 2 1 1 1 1 1 1
1 2 2 1 1 3 8 3 1 1 2 2 1 1 1 1 1
1 1 2 2 1 1 3 8 3 1 1 2 2 1 1 1 1
1 1 1 2 2 1 1 3 8 3 1 1 2 2 1 1 1
1 1 1 1 2 2 1 1 3 8 3 1 1 2 2 1 1
1 1 1 1 1 2 2 1 1 3 8 3 1 1 2 2 1
1 1 1 1 1 1 2 2 1 1 3 8 3 1 1 2 2
2 1 1 1 1 1 1 2 2 1 1 3 8 3 1 1 2
2 2 1 1 1 1 1 1 2 2 1 1 3 8 3 1 1
1 2 2 1 1 1 1 1 1 2 2 1 1 3 8 3 1
1 1 2 2 1 1 1 1 1 1 2 2 1 1 3 8 3
3 1 1 2 2 1 1 1 1 1 1 2 2 1 1 3 8
I want to check
Off-diagonals are symmetric or not?(in above matrix, these are symmetric)
Elements occur in Off-diagonal (without repetition)?-- in above matrix, these elements are 1,2,3
Elements in diagonal are symmetric? if yes print element? (like 8 in above matrix)
# 1
all(mat == t(mat))
[1] TRUE
# 2
unique(mat[upper.tri(mat) | lower.tri(mat)])
[1] 3 1 2
# 3
if(length(unique(diag(mat))) == 1) print(diag(mat)[1])
[1] 8
mat <- as.matrix(read.table('abbas.txt'))
isSymmetric(unname(mat))
'Note that a matrix is only symmetric if its 'rownames' and 'colnames' are identical.'
unique(mat[lower.tri(mat)])
all(diag(mat) == rev(diag(mat)))
# I assume you mean the diagonal is symmetric when its reverse is the same with itself.

Resources