I'm working with a 3-d array and attempting to replace specific values when three conditions are met, and it will cope with NA values, and it performs a new evaluation for each time the conditions are met. I read somewhere that attempting to subset multiple dimension arrays in R was not a good idea. Any comments/suggestions to overcome these issues is most welcome. Any solution using base R or tidyverse (or anything!) most welcome. Thx. J
Essentially, I'm trying to replace the 7's in the final row of the third "sheet", with 0, 50% of the time when the other conditions are met.
This is my attempt using an ifelse but it fails, think it might need to be a more sophisticated apply statement or something. Here is my MWE:
library(abind)
arr1 <- as.array (matrix (c (1, 0, 1, 1, 1, 0, 0, NA, 1, 0, 1, 1,
1, 1, NA, 0, 1, 0, 1, 1, 0, 1, 1, 1), ncol = 8))
arr2 <- as.array (matrix (c (7, 5, 9, 1, 8, 4, 2, 3, 0, 8, 7, NA,
2, 4, 6, 3, 2, 8, 8, 3, 3, 6, 8, 5), ncol = 8))
arr3 <- as.array (matrix (c (7, 5, 7, 1, 8, 7, 2, 3, 6, 8, 7, 7,
2, 4, 5, 3, 2, NA, 8, 3, 4, 6, 8, 7), ncol = 8))
arrtot <- abind (arr1, arr2, arr3, along = 3)
## get the array to be changed
arrtot[dim(arrtot)[1], ,3]
arrtot[dim(arrtot)[1], ,3] <- ifelse(arrtot[dim(arrtot)[1], ,3] == 7 &
arrtot[dim(arrtot)[1], ,1] == 1 &
arrtot[dim(arrtot)[1], ,2] >= 5,
function (x) rbinom(1,1,0.5) * 7,
"original value unchanged")
Although there are several possible solutions, because of the random function, one possible correct solution would look like this where the two 7's that meet the conditions are modified to become zeros.
, , 1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 0 0 1 0 1 1
[2,] 0 1 NA 1 1 1 1 1
[3,] 1 0 1 1 NA 0 0 1
, , 2
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 7 1 2 8 2 3 8 6
[2,] 5 8 3 7 4 2 3 8
[3,] 9 4 0 NA 6 8 3 5
, , 3
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 7 1 2 8 2 3 8 6
[2,] 5 8 3 7 4 2 3 8
[3,] 0 7 6 7 5 NA 4 0
Related
I am trying to calculate the sum of the differences of all possible pairs of an array (sum of difference first/second + difference first/third, difference second/first + difference second/third, difference third/first + difference third/second). The original dataframe consists of multiple columns like the one below, still the colSum function doesn't work because of:
"Error in colSums(FC_TS1test1, dims = 1) : 'x' must be an array of at least two dimensions".
I appreciate your help!
array <- c(5, 3, 1)
The calculation behind it should be:
|(5-3)|+|(5-1)|=2+4=6
|(3-5)|+|(3-1)|=2+2=4
|(1-5)|+|(1-3)|=4+2=6
6+4+6=16
We can use sum + dist
> sum(dist(array))
[1] 840
We need combinations without repetition.
sum(combn(a, 2, diff))
# [1] -94
Or, with some packages:
colSums(matrixStats::rowDiffs(RcppAlgos::comboGeneral(a, 2, repetition=F)))
# [1] -94
or
sum(unlist(RcppAlgos::comboGeneral(a, 2, repetition=F, FUN=diff)))
# [1] -94
Two smaller examples demonstrate what the core of the code does.
combn(b, 2)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 3 3 3 4 4 2
# [2,] 4 2 5 2 5 5
RcppAlgos::comboGeneral(b, 2, repetition=F)
# [,1] [,2]
# [1,] 3 4
# [2,] 3 2
# [3,] 3 5
# [4,] 4 2
# [5,] 4 5
# [6,] 2 5
Edit
For the absolute differences according to your recent edit we may define an anonymous function:
sum(combn(a, 2, function(x) abs(diff(x))))
# [1] 840
Data:
a <- c(3, 4, 2, 5, 4, 4, 1, 5, 5, 4, 1, 4, 7, 2, 1, 3, 5, 2, 4, 2,
7, 4, 4, 1, 4, 3, 4, 4, 2, 4, 1)
b <- c(3, 4, 2, 5)
I have a matrix consisting of 10 rows ,
I would like to make a combination between these row using R such as:
M= matrix(c(
1,2,3,4,
5,6,7,3,
5,5,4,8,
5,2,7,8,
4,8,7,8,
2,6,7,9,
5,6,7,4,
5,6,7,2,
5,6,7,3,
5,6,7,0),nrow=10, byrow=TRUE)
First step
combination (3 row ) from ( 10 row ).
This means that we have other matrices (resulting from matrix M) their number 120- matrix(3*4)
Second step
combination (6 row ) from ( 10 row )
This means that we have other matrices (we also resulting from matrix M) their number 210-matrix(6*4)
You can split matrix with apply to list of rows than use combn function as below:
M <- structure(c(1, 5, 5, 5, 4, 2, 5, 5, 5, 5, 2, 6, 5, 2, 8, 6, 6,
6, 6, 6, 3, 7, 4, 7, 7, 7, 7, 7, 7, 7, 4, 3, 8, 8, 8, 9, 4, 2,
3, 0), .Dim = c(10L, 4L))
x <- apply(M, 1, list)
# combinations for three rows
cmbs3 <- combn(x, 3)
ncol(cmbs3)
# 120
cmbs3[, 2]
# second combination
# [[1]]
# [[1]][[1]]
# [1] 1 2 3 4
#
#
# [[2]]
# [[2]][[1]]
# [1] 5 6 7 3
#
#
# [[3]]
# [[3]][[1]]
# [1] 5 2 7 8
# combinations for six rows
cmbs6 <- combn(x, 6)
ncol(cmbs6)
# 210
EDIT:
Or use elgant solution provided by nicola - subsetting by row index generated by combn (I like it much more :):
lapply(combn(10, 3, simplify = FALSE), function(x) M[x, ])
Output:
[[1]]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 3
[3,] 5 5 4 8
[[2]]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 3
[3,] 5 2 7 8
...
[[119]]
[,1] [,2] [,3] [,4]
[1,] 5 6 7 4
[2,] 5 6 7 3
[3,] 5 6 7 0
[[120]]
[,1] [,2] [,3] [,4]
[1,] 5 6 7 2
[2,] 5 6 7 3
[3,] 5 6 7 0
I have a vector, say vec1, and another vector named vec2 as follows:
vec1 = c(4,1)
# [1] 4 1
vec2 = c(5,3,2)
# [1] 5 3 2
What I'm looking for is all possible combinations of vec1 and vec2 while the order of the vectors' elements is kept. That is, the resultant matrix should be like this:
> res
[,1] [,2] [,3] [,4] [,5]
[1,] 4 1 5 3 2
[2,] 4 5 1 3 2
[3,] 4 5 3 1 2
[4,] 4 5 3 2 1
[5,] 5 4 1 3 2
[6,] 5 4 3 1 2
[7,] 5 4 3 2 1
[8,] 5 3 4 1 2
[9,] 5 3 4 2 1
[10,] 5 3 2 4 1
# res=structure(c(4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 1, 5, 5, 5, 4, 4, 4,
# 3, 3, 3, 5, 1, 3, 3, 1, 3, 3, 4, 4, 2, 3, 3, 1, 2, 3, 1, 2, 1,
# 2, 4, 2, 2, 2, 1, 2, 2, 1, 2, 1, 1), .Dim = c(10L, 5L))
There is no repetition allowed for two vectors. That is, all rows of the resultant matrix have unique elements.
I'm actually looking for the most efficient way. One way to tackle this problem is to generate all possible permutations of length n which grows factorially (n=5 here) and then apply filtering. But it's time-consuming as n grows.
Is there an efficient way to do that?
Try this one:
nv1 <- length(vec1)
nv2 <- length(vec2)
n <- nv1 + nv2
result <- combn(n,nv1,function(v) {z=integer(n);z[v]=vec1;z[-v]=vec2;z})
The idea is to produce all combinations of indices at which to put the elements of vec1.
Not that elegant as Marat Talipov solution, but you can do:
# get the ordering per vector
cc <- c(order(vec1,decreasing = T), order(vec2, decreasing = T)+length(vec1))
cc
[1] 1 2 3 4 5
# permutation to get all "order-combinations"
library(combinat)
m <- do.call(rbind, permn(cc))
# remove unsorted per vector, only if both vectors are correct set TRUE for both:
gr <- apply(m, 1, function(x){
!is.unsorted(x[x < (length(vec1)+1)]) & !is.unsorted(x[x > (length(vec1))])
})
# result, exchange the order index with the vector elements:
t(apply(m[gr, ], 1, function(x, y) y[x], c(vec1, vec2)))
[,1] [,2] [,3] [,4] [,5]
[1,] 4 1 5 3 2
[2,] 4 5 3 1 2
[3,] 4 5 3 2 1
[4,] 4 5 1 3 2
[5,] 5 4 1 3 2
[6,] 5 4 3 2 1
[7,] 5 4 3 1 2
[8,] 5 3 4 1 2
[9,] 5 3 4 2 1
[10,] 5 3 2 4 1
In MATLAB and numpy, you can index a vector by an array of indices and get a result of the same shape out, e.g.
A = [1 1 2 3 5 8 13];
B = [1 2; 2 6; 7 1; 4 4];
A(B)
## ans =
##
## 1 1
## 1 8
## 13 1
## 3 3
or
import numpy as np
a = np.array([1, 1, 2, 3, 5, 8, 13])
b = np.reshape(np.array([0, 1, 1, 5, 6, 0, 3, 3]), (4, 2))
a[b]
## array([[ 1, 1],
## [ 1, 8],
## [13, 1],
## [ 3, 3]])
However, in R, indexing a vector by an array of indices returns a vector:
a <- c(1, 1, 2, 3, 5, 8, 13)
b <- matrix(c(1, 2, 7, 4, 2, 6, 1, 4), nrow = 4)
a[b]
## [1] 1 1 13 3 1 8 1 3
Is there an idiomatic way in R to perform vectorized lookup that preserves array shape?
You can't specify dimensions through subsetting alone in R (AFAIK). Here is a workaround:
`dim<-`(a[b], dim(b))
Produces:
[,1] [,2]
[1,] 1 1
[2,] 1 8
[3,] 13 1
[4,] 3 3
dim<-(...) just allows us to use the dimension setting function dim<- for its result rather than side effect as is normally the case.
You can also do stuff like:
t(apply(b, 1, function(idx) a[idx]))
but that will be slow.
This is not very elegant, but it works
matrix(a[b],nrow=nrow(b))
Option 1: if we do not need to keep the original values in b, we could simply
"Caveat: the values in b will be over-written"
b[] = a[b]
b
# [,1] [,2]
# [1,] 1 1
# [2,] 1 8
# [3,] 13 1
# [4,] 3 3
Option 2: if want to retain the values in b, An easy workaround could be
c = b # copy b to c
c[] = a[c]
c
# [,1] [,2]
# [1,] 1 1
# [2,] 1 8
# [3,] 13 1
# [4,] 3 3
Actually I found Option 2 is easy to follow and clean.
Right now I have a vector called closest.labels that has the following data in it:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 2 2 2 2 2 2 2 2 2 2
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 9 9 9 9 9 9 9 7 7 4
What I would like to do is return the row data as well as the index of that row where there are more than two unique values. In the above example this would only be the third row. So far I have been partially successful using apply and a function that I created. See below:
colCountFx <- function(col){
result <- subset(list(index=col,count=length(unique(col))),length(unique(col))>2)
return(result)
}
apply(closest.labels,1, colCountFx)
My issue is that this returns what appears to be an empty row for the first two records as well. Output:
[[1]]
named list()
[[2]]
named list()
[[3]]
[[3]]$index
[1] 9 9 9 9 9 9 9 7 7 4
[[3]]$count
[1] 3
What would I need to change to have nothing returned for the rows that are currently returning named list()? Also, I am fairly new to R so if you think there is a better way to go at this I am open to that as well.
If it is a list you're going for, you can try something like this. Personally, though, I find nested lists somewhat cumbersome.
First, some data (I've added an extra row for clarity):
closest.labels <- structure(c(2, 0, 9, 8, 2, 0, 9, 8, 2, 0, 9, 8, 2, 0, 9, 8, 2,
0, 9, 8, 2, 0, 9, 5, 2, 0, 7, 6, 2, 0, 7, 7, 2, 0,
4, 8, 2, 0, 4, 9), .Dim = c(4L, 10L))
Next, a modified function:
colCountFx <- function(data) {
temp = apply(data, 1, function(x) length(unique(x)))
result = which(temp > 2)
out = vector("list")
for (i in 1:length(result)) {
out[[i]] = list(index = data[result[i], ], count = temp[result[i]])
}
names(out) = paste("row", result, sep = "_")
out
}
Let's test it:
colCountFx(closest.labels)
# $row_3
# $row_3$index
# [1] 9 9 9 9 9 9 7 7 4 4
#
# $row_3$count
# [1] 3
#
#
# $row_4
# $row_4$index
# [1] 8 8 8 8 8 5 6 7 8 9
#
# $row_4$count
# [1] 5
You can get the index with the length of unique items applied across rows. mat will be used as the name for the matrix containing the items.
nUnique <- apply( mat, 1, function(x) length(unique(x)) )
ind <- which(nUnique > 2)
You can just select rows now based on that index.
mat[ind,]
You could trim off the empty lists by using another index. Say:
remaining <- apply(closest.labels,1, colCountFx)
remaining.ind <- sapply(remaining,length) != 0
remaining[remaining.ind]
Or, expanding on Patrick Li's answer:
ind <- apply(closest.labels, 1, function(x) length(unique(x)))
which(ind > 2) #indices of rows that have more than 2 unique values
closest.labels[which(ind > 2),] #rows that have at least one unique value
> ind <- apply(x, 1, function(x) length(unique(x)))
> ind
[1] 1 1 3