Conditional collapse of matrix rows with custom function - r

I want to collapse rows in a matrix so that no value of a particular column ever falls below 20. And I want to apply a custom function to the rows to collapse/sum them...
Here is an example matrix:
d <- matrix(data = c(0,105,1,21,2,11,4,5,5,15,7,21,9,1),
ncol = 2,
byrow = TRUE
)
colnames(d) <- c('val','freq')
Looking like this:
d
val freq
[1,] 0 105
[2,] 1 21
[3,] 2 11
[4,] 4 5
[5,] 5 15
[6,] 7 21
[7,] 9 1
The column where the cells must be 20 or above is "freq". So row 1 and 2 are fine, but I need to collapse row 3:5. And I want to replace row 3:5 with the single row from this function:
library(reshape)
replacement <- function(x){
mat <- d[x, ]
mat.res <- untable(mat[ ,c(1, 2)],
num = mat[ ,2]
)
res <- c(mean(mat.res[ ,1]), length(mat.res[ ,1]))
return(res)
}
The function call:
replacement(3:5)
[1] 3.774194 31.000000
Going through the matrix; row 6 is fine, but since row 7 would be left with freq=1 this row needs to be collapsed with row 6. The function call again:
replacement(6:7)
[1] 7.090909 22.000000
The resulting matrix should be:
val freq
[1,] 0 105
[2,] 1 21
[3,] 3.774194 31.000000
[4,] 7.090909 22.000000
The final row numbering is not important.
I have a feeling that the window functions of dplyr might hold the solution, but I need help understanding exactly how. It does not have to be dplyr. I take whatever works ;-)

For future reference, this is not very elegant, but it works...
rows <- dim(d)[1]
tmp <- NULL
inc <- 1
tmpSum <- 0
for(i in 1:rows){
if(d[i, 2] > 19){
tmp <- rbind(tmp, c(d[i, ], inc))
inc <- inc + 1
tmpSum <- 0
} else {
tmp <- rbind(tmp, c(d[i, ], inc))
tmpSum <- d[i,2] + tmpSum
if(tmpSum > 19){
inc <- inc + 1
}
}
}
if(sum(tmp[tmp[ ,3] == max(tmp[ ,3]), 2]) < 19){
tmp[tmp[ ,3] == max(tmp[ ,3]), 3] <- tmp[tmp[ ,3] == max(tmp[ ,3]), 3]-1
}
res <- NULL
for(i in 1:max(tmp[ ,3])){
val <- mean(rep(tmp[tmp[ ,3] == i, 1], tmp[tmp[ ,3] == i, 2]))
freq <- length(rep(tmp[tmp[ ,3] == i, 1], tmp[tmp[ ,3] == i, 2]))
res <- rbind(res, c(val, freq))
}
res

Related

Conditionally update rast values from another raster using terra

I am using the lapp functin of {terra} in R and I want to update rast_a with values from rast_b or rast_c (and some other math) depending on the value in each cell of rast_a.
sample data
rast_a <- rast(ncol = 2, nrow = 2)
values(rast_a) <- 1:4
rast_b <- rast(ncol = 2, nrow = 2)
values(rast_b) <- c(2,2,2,2)
rast_c <- rast(ncol = 2, nrow = 2)
values(rast_c) <- c(3,3,3,3)
Problem
This is my (wrong) attempt.
my_update_formula <- function(a, b, c) {
a[a == 1] <- b[a == 1] + 10 + 20 - 30
a[a == 2] <- c[a == 2] + 10 + 50 - 50
return(a)
}
result <- lapp(c(rast_a, rast_b, rast_c),
fun = my_update_formula)
values(result)
lyr1
[1,] 3
[2,] 3
[3,] 3
[4,] 4
The actual result should be 2,3,3,4. But because of the operations inside the formula, the first value gets updated twice. First it is changed from 1 to 2 (correctly) but then it fulfills the condition of the second line of code also, and is changed again (I don't want that to happen).
How can I solve this please?
You can change your formula to
f1 <- function(a, b, c) {
d <- a
d[a == 1] <- b[a == 1]
d[a == 2] <- c[a == 2] + 10
d
}
#or
f2 <- function(a, b, c) {
i <- a == 1
j <- a == 2
a[i] <- b[i]
a[j] <- c[j] + 10
return(a)
}
lapp(c(rast_a, rast_b, rast_c), fun = f1) |> values()
# lyr1
#[1,] 2
#[2,] 13
#[3,] 3
#[4,] 4
lapp(c(rast_a, rast_b, rast_c), fun = f2) |> values()
# lyr1
#[1,] 2
#[2,] 13
#[3,] 3
#[4,] 4
You can get the same result with
x <- ifel(rast_a==1, rast_b,
ifel(rast_a == 2, rast_c + 10, rast_a))

Psych's reverse.code function producing NAs in R

When using reverse.code in R, the values in my ID column (which are not meant to be reversed) turn into NA once the ID value exceeds 999 (I have 10,110 observations).
Does anyone know if there is anything I can do to fix this?
Is there another function I can use to reverse these items without loosing data?
Here is my code:
library(psych)
keys <- c(1,-1,-1,-1) #Where column 1 = ID and the rest are my variables to be reversed
rev_dat2 <- reverse.code(keys, rev_dat)
Thanks!
Here is the relevant line of the source code of reverse.code(), where new is the object holding the reverse-coded data:
new[abs(new) > 999] <- NA
As you can see, setting values larger than 9999 to missing is hard-coded into the routine. You could write a new version of the function that didn't do that. For example, in the function below, we just make a much larger threshold:
my.reverse.code <- function (keys, items, mini = NULL, maxi = NULL)
{
if (is.vector(items)) {
nvar <- 1
}
else {
nvar <- dim(items)[2]
}
items <- as.matrix(items)
if (is.null(maxi)) {
colMax <- apply(items, 2, max, na.rm = TRUE)
}
else {
colMax <- maxi
}
if (is.null(mini)) {
colMin <- apply(items, 2, min, na.rm = TRUE)
}
else {
colMin <- mini
}
colAdj <- colMax + colMin
if (length(keys) < nvar) {
temp <- keys
if (is.character(temp))
temp <- match(temp, colnames(items))
keys <- rep(1, nvar)
keys[temp] <- -1
}
if (is.list(keys) | is.character(keys)) {
keys <- make.keys(items, keys)
keys <- diag(keys)
}
keys.d <- diag(keys, nvar, nvar)
items[is.na(items)] <- -99999999999
reversed <- items %*% keys.d
adj <- abs(keys * colAdj)
adj[keys > 0] <- 0
new <- t(adj + t(reversed))
new[abs(new) > 99999999999] <- NA
colnames(new) <- colnames(items)
colnames(new)[keys < 0] <- paste(colnames(new)[keys < 0],
"-", sep = "")
return(new)
}
The reason they used a numeric value threshold is that for the recoding they do to work, they needed all values to be numeric. So, they set missing values to -999 and then later turn them back into missing values. The same is done above, but with a lot bigger number.
keys <- c(1,-1,-1,-1) #Where column 1 = ID and the rest are my variables to be reversed
rev_dat <- data.frame(
id = 9998:10002,
x = 1:5,
y = 5:1,
z = 1:5
)
library(psych)
reverse.code(keys, rev_dat)
# id x- y- z-
# [1,] NA 5 1 5
# [2,] NA 4 2 4
# [3,] NA 3 3 3
# [4,] NA 2 4 2
# [5,] NA 1 5 1
my.reverse.code(keys, rev_dat)
# id x- y- z-
# [1,] 9998 5 1 5
# [2,] 9999 4 2 4
# [3,] 10000 3 3 3
# [4,] 10001 2 4 2
# [5,] 10002 1 5 1

Manipulating sub matrices in R

Nh<-matrix(c(17,26,30,17,23, 17 ,24, 23), nrow=2, ncol=4); Nh
Sh<-matrix(c(8.290133, 6.241174, 6.096808, 7.4449672, 6.894924, 7.692115,
4.540521, 7.409122), nrow=2, ncol=4); Sh
NhSh<-as.matrix(Nh*Sh); NhSh
rh<-c( 0.70710678, 0.40824829, 0.28867513, 0.22360680, 0.18257419,
0.15430335, 0.13363062, 0.11785113, 0.10540926, 0.09534626); rh
pv <- c()
for (j in 1:2) {
for (i in 1:4) {
pv <- rbind(pv, NhSh[j,i]*rh)
}
}
pv
row.names(pv) <- rep(c(1:2), each = 4)
lst<-lapply(split(seq_len(nrow(pv)), as.numeric(row.names(pv))), function(i)
pv[i,])
data<-40
nlargest <- function(x, data)
{
res <- order(x)[seq_len(data)];
pos <- arrayInd(res, dim(x), useNames = TRUE);
list(values = pv[res], position = pos)
}
out <- lapply(lst, nlargest, data = 40)
In continuation of above code Is there any brief way of repeating the following steps for each out$’k’$position for k in 1:2?
s1<-c(1,1,1,1); ch<-c(5,7,10,5); C<-150; a<-out$'1'$position
for (j in a[40:1, "row"] )
{
s1[j] <- s1[j]+1;
cost1 <- sum(ch*s1);
if (cost1>=C) break
}
s1; cost1
#Output [1] 5 6 6 5
# [1] 152
I have to get 2 values for 's' and 'cost' for out$k$position. I tried
mat = replicate (2,{x = matrix(data = rep(NA, 80), ncol = 2)}); mat
for (k in 1:2)
{
mat[,,k]<-out$'k'$position
}
mat
Error in mat[, , k] <- out$k$position :number of items to replace is not a multiple of replacement length
for (k in 1:2)
{
for (j in mat[,,k][40:1] ) {
s[j] <- s[j]+1
cost <- sum(ch*s)
if (cost>=C) break
}
}
s; cost
Error : Error in s[j] <- s[j] + 1 : NAs are not allowed in subscripted assignments
Please anyone help in resolving these errors.
We could apply the function directly by looping over the list. Note that each element of the list is a matrix
sapply(lst, is.matrix)
# 1 2
#TRUE TRUE
so, there is no need to unlist and create a matrix
out <- lapply(lst, nlargest, data = 40)
-checking with the OP's results
out1 <- nlargest(sub1, 40)
identical(out[[1]], out1)
#[1] TRUE
Update2
Based on the second update, we need to initialize 'cost' and 'sl' with the same length as 'k' elements. Here, we initialize 'sl' as a list of vectors
sl <- rep(list(c(1, 1, 1, 1)), 2)
C <- 150
cost <- numeric(2)
for (k in 1:2){
for (j in mat[,,k][40:1, 1] ) {
sl[[k]][j] <- sl[[k]][j]+1
cost[k] <- sum(ch*sl[[k]])
if (cost[k] >=C) break
}
}
sl
#[[1]]
#[1] 5 7 6 4
#[[2]]
#[1] 6 5 5 7
cost
#[1] 154 150

compare the information between two matrices R

I have two matrices, one is generated out of the other by deleting some rows. For example:
m = matrix(1:18, 6, 3)
m1 = m[c(-1, -3, -6),]
Suppose I do not know which rows in m were eliminated to create m1, how should I find it out by comparing the two matrices? The result I want looks like this:
1, 3, 6
The actual matrix I am dealing with is very big. I was wondering if there is any efficient way of conducting it.
Here are some approaches:
1) If we can assume that there are no duplicated rows in m -- this is the case in the example in the question -- then:
which(tail(!duplicated(rbind(m1, m)), nrow(m)))
## [1] 1 3 6
2) Transpose m and m1 giving tm and tm1 since it is more efficient to work on columns than rows.
Define match_indexes(i) which returns a vector r such that each row in m[r, ] matches m1[i, ].
Apply that to each i in 1:n1 and remove the result from 1:n.
n <- nrow(m); n1 <- nrow(m1)
tm <- t(m); tm1 <- t(m1)
match_indexes <- function(i) which(colSums(tm1[, i] == tm) == n1)
setdiff(1:n, unlist(lapply(1:n1, match_indexes)))
## [1] 1 3 6
3) Calculate an interaction vector for each matrix and then use setdiff and finally match to get the indexes:
i <- interaction(as.data.frame(m))
i1 <- interaction(as.data.frame(m1))
match(setdiff(i, i1), i)
## [1] 1 3 6
Added If there can be duplicates in m then (1) and (3) will only return the first of any multiply occurring row in m not in m1.
m <- matrix(1:18, 6, 3)
m1 <- m[c(2, 4, 5),]
m <- rbind(m, m[1:2, ])
# 1
which(tail(!duplicated(rbind(m1, m)), nrow(m)))
## 1 3 6
# 2
n <- nrow(m); n1 <- nrow(m1)
tm <- t(m); tm1 <- t(m1)
match_indexes <- function(i) which(colSums(tm1[, i] == tm) == n1)
setdiff(1:n, unlist(lapply(1:n1, match_indexes)))
## 1 3 6 7
# 3
i <- interaction(as.data.frame(m))
i1 <- interaction(as.data.frame(m1))
match(setdiff(i, i1), i)
## 1 3 6
A possible way is to represent each row as a string:
x1 <- apply(m, 1, paste0, collapse = ';')
x2 <- apply(m1, 1, paste0, collapse = ';')
which(!x1 %in% x2)
# [1] 1 3 6
Some benchmark with a large matrix using my solution and G. Grothendieck's solutions:
set.seed(123)
m <- matrix(rnorm(20000 * 5000), nrow = 20000)
m1 <- m[-sample.int(20000, 1000), ]
system.time({
which(tail(!duplicated(rbind(m1, m)), nrow(m)))
})
# user system elapsed
# 339.888 2.368 342.204
system.time({
x1 <- apply(m, 1, paste0, collapse = ';')
x2 <- apply(m1, 1, paste0, collapse = ';')
which(!x1 %in% x2)
})
# user system elapsed
# 395.428 0.568 395.955
system({
n <- nrow(m); n1 <- nrow(m1)
tm <- t(m); tm1 <- t(m1)
match_indexes <- function(i) which(colSums(tm1[, i] == tm) == n1)
setdiff(1:n, unlist(lapply(1:n1, match_indexes)))
})
# > 15 min, not finish
system({
i <- interaction(as.data.frame(m))
i1 <- interaction(as.data.frame(m1))
match(setdiff(i, i1), i)
})
# run out of memory. My 32G RAM machine crashed.
We can also use do.call
which(!do.call(paste, as.data.frame(m)) %in% do.call(paste, as.data.frame(m1)))
#[1] 1 3 6

Binary coding of pairwise comparisons

I'm working on a questionnaire where there are always three statements presented at a time and participants have to rank order these according to their preferences (3 = most preferred, 1 = least preferred).
For further analyses I have to transform these rankings into pairwise comparisons within each block of three. Below is a code doing this for the first six items (2 blocks) of the questionnaire.
data <- matrix(c(1,2,3,1,2,3,2,1,3,3,1,2),2,6)
i1i2 <- ifelse(data[,1] > data[,2], 1, 0)
i1i3 <- ifelse(data[,1] > data[,3], 1, 0)
i2i3 <- ifelse(data[,2] > data[,3], 1, 0)
i4i5 <- ifelse(data[,4] > data[,5], 1, 0)
i4i6 <- ifelse(data[,4] > data[,6], 1, 0)
i5i6 <- ifelse(data[,5] > data[,6], 1, 0)
result <- cbind(i1i2, i1i3, i2i3, i4i5, i4i6, i5i6)
print(result)
I extended this code to fit a 45 item questionnaire and it works fine. Now, I'd like to write a function which automatically does this job for n items. I experimented with while and for loops but couldn't succeed.
Can anyone please give me a hint/ reference to the relevant functions I need/ an example on how to do this?
Related: Brown, A., & Maydeu-Olivares, A. (2011). Item response modeling of forced-choice questionnaires. Educational and Psychological Measurement, 71(3), 460–502.
First off, remove the ifelse and put them at the end instead:
i1i2 <- data[,1] > data[,2]
i1i3 <- data[,1] > data[,3]
i2i3 <- data[,2] > data[,3]
â€Ĥ
result <- ifelse(cbind(i1i2, i1i3, i2i3, i4i5, i4i6, i5i6), 1, 0)
Next, avoid unnecessary repetition.
three_way_compare = function (data, index) {
cbind(data[, index + 0] > data[, index + 1],
data[, index + 0] > data[, index + 2],
data[, index + 1] > data[, index + 2])
}
result = ifelse(do.call(cbind, lapply(seq(1, ncol(data), by = 3),
three_way_compare, data = data)), 1, 0)
While there are probably more efficient alternatives, you could convert your matrix to a list of vectors of length 3 and apply the ifelse statements to them through a function.
Update:
If you have multiple rows in your matrix, you need to use t(data) inside split() to get the correct values.
# Put data in lists of 3
blocks <- split(t(data), ceiling(seq_along(data)/3))
# Define function
comparison <-function(x) {
i1 <- ifelse(x[1] > x[2], 1, 0)
i2 <- ifelse(x[1] > x[3], 1, 0)
i3 <- ifelse(x[2] > x[3], 1, 0)
return(cbind(i1,i2,i3))
}
# Apply function to list
lapply(blocks,comparison)
# $`1`
# i1 i2 i3
# [1,] 0 0 1
#
# $`2`
# i1 i2 i3
# [1,] 0 1 1
#
# $`3`
# i1 i2 i3
# [1,] 1 0 0
#
# $`4`
# i1 i2 i3
# [1,] 0 0 1
# Or unlist to get vector
unlist(lapply(blocks,comparison))
# 11 12 13 21 22 23 31 32 33 41 42 43
# 0 0 1 0 1 1 1 0 0 0 0 1

Resources