Frequency/Contingency table along 1 dimension in R - r

I'm trying to count co-occurrences along a single dimension. It's somewhat similar to win/loss, dominance matrices, or frequency tables, (and spectrograms/raster plots) but without directionality and along 1 variable.
Here's an example of the data:
person response
1 a 1
2 a 2
3 a 4
4 b 1
5 b 2
6 c 2
7 c 4
8 d 4
9 d 3
The goal would be to get an n x n matrix as the one shown below (the NA's can also be the number of occurrences period):
[,1] [,2] [,3] [,4]
[1,] NA 2 0 1
[2,] - NA 0 2
[3,] - - NA 1
[4,] - - - NA
How can I convert the long data into a matrix in R? (without manual counting).
What is this type of metric is called? It's not a typical 'contingency' table.
After the table is created, what's the best way to plot the resulting matrix with colors denoting the count/frequency?

Test this
r1 = sort(unique(df1$response))
r2 = split(df1$response, df1$person)
ans = sapply(seq_along(r1), function(i)
rowSums(sapply(r2, function(x) (r1[i] %in% x) * (r1 %in% x))))
diag(ans) = NA
ans
# [,1] [,2] [,3] [,4]
#[1,] NA 2 0 1
#[2,] 2 NA 0 2
#[3,] 0 0 NA 1
#[4,] 1 2 1 NA

Related

Individual shift of each column in a matrix

I look for a R-code that transform the matrix as follows (a: the original matrix, b: the desired output), example:
a <- matrix(c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6), nrow = 6, ncol = 4)
b <- matrix(c(1,2,3,4,5,6,2,3,4,5,6,0,3,4,5,6,0,0,4,5,6,0,0,0), nrow = 6, ncol = 4)
a
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 2 2 2 2
[3,] 3 3 3 3
[4,] 4 4 4 4
[5,] 5 5 5 5
[6,] 6 6 6 6
b
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
[3,] 3 4 5 6
[4,] 4 5 6 0
[5,] 5 6 0 0
[6,] 6 0 0 0
Thus, the first column is not shifted, the second column is shifted up one step, the third column shifted up two steps, and so on. The shifted columns are padded with zeros.
The following links didn't help me (nor: double for-loop, a function with different variables, the codes diag or kronecker).
R: Shift values in single column of dataframe UP
r matrix individual shift operations of elements
Rotate a Matrix in R
Have you any ideas? Thanks.
This seems to work with data.table. Should perform well with a large matrix:
library(data.table)
# One way
dt[, shift(.SD, 0:3, 0, "lead", FALSE), .SDcols = 1]
# Alternatively
dt[, shift(dt, 0:3, 0, "lead", FALSE)][, 1:4]
Both return:
V1 V2 V3 V4
1: 1 2 3 4
2: 2 3 4 5
3: 3 4 5 6
4: 4 5 6 0
5: 5 6 0 0
6: 6 0 0 0
Using the following data:
a <- matrix(c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6), nrow = 6, ncol = 4)
dt <- setDT(as.data.frame(a))
I have a raw solution using sapply. You shift your column on each iteration of sapply, and then sapply concatenate all the output, that you can feed to matrix with the good size (the size of your initial matrix)
matrix(sapply(1:dim(a)[2], function(x){c(a[x:dim(a)[1], x], rep(0, (x - 1) ))}), ncol = dim(a)[2], nrow = dim(a)[1])
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
[3,] 3 4 5 6
[4,] 4 5 6 0
[5,] 5 6 0 0
[6,] 6 0 0 0
You can shift the columns by filling a matrix which have one row more than "a" with the values from "a" (a Warning is generated during the recycling). Select the original number of rows. Replace the lower right triangle with zeros.
nr <- nrow(a)
a2 <- matrix(a, ncol = ncol(a), nrow = nr + 1)[1:nr, ]
a2[col(a2) + row(a2) > nr + 1] <- 0
a2
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 2 3 4 5
# [3,] 3 4 5 6
# [4,] 4 5 6 0
# [5,] 5 6 0 0
# [6,] 6 0 0 0
Building on tyluRp's answer, which almost worked for me, I suggest to loop through all columns and call shift on each, individually. Let's start with a matrix of random numbers here:
a <- matrix(floor(10*runif(24)), ncol=4)
a
[,1] [,2] [,3] [,4]
[1,] 8 4 8 3
[2,] 0 6 9 0
[3,] 1 6 0 7
[4,] 0 3 9 7
[5,] 2 4 2 9
[6,] 4 8 5 6
library(data.table)
dt <- setDT(as.data.frame(a))
Now the loop that does the job...
for (i in 2:length(dt)) dt[,i] <- shift(dt[,i,with=F],(i-1),0,"lead")
...by replacing columns with their shifted version.
The original answers replaced all columns by shifted copies of the first column, thus losing data. This is probably due to the group behaviour of data.table.

Extracting unique rows in a 3+ column matrix

Using R, I am trying to extract unique rows in a matrix, where a "unique row" is subject to all the values in a given row.
For example if I had this data set:
x = matrix(c(1,1,1,2,2,5,1,2,2,1,2,1,5,3,5,2,1,1),6,3)
Rows 1 & 6, and rows 4 & 5 are duplicated since (1,1,5) = (5,1,1) and (2,1,2) = (2,2,1).
Ultimately, i'm trying to end up with something in the form of:
y = matrix(c(1,1,1,2,1,2,2,1,5,3,5,2),4,3)
or
z = matrix(c(1,1,2,5,2,2,2,1,3,5,1,1),4,3)
The order doesn't matter as long as only one of the unique rows remains. I've searched online, but functions such as unique() and duplicated() have only worked for exact matching rows.
Thanks in advance for any help you provide.
Another answer: use sets. Slightly modified matrix:
library(sets)
x <- matrix(c(1,1,1,2,2,5,5, 1,2,2,1,2,1,5, 5,3,5,2,1,1,1),7,3)
x
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 2 3
[3,] 1 2 5
[4,] 2 1 2
[5,] 2 2 1
[6,] 5 1 1
[7,] 5 5 1
If (5,1,1) = (5,5,1) you can use just ordinary sets:
a <- sapply(1:nrow(x), function(i) as.set(x[i,]))
x[!duplicated(a),]
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 2 3
[3,] 1 2 5
[4,] 2 1 2
Note: rows 6 and 7 are both gone.
If (5,1,1) != (5,5,1), use generalized sets:
b <- sapply(1:nrow(x), function(i) as.gset(x[i,]))
x[!duplicated(b),]
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 2 3
[3,] 1 2 5
[4,] 2 1 2
[5,] 5 5 1

Creating a matrix of increasing concentric rings of numbers in R

I need to write a function in R that creates a matrix of increasing concentric rings of numbers. This function's argument is a number of layers. For example, if x = 3, matrix will look like following:
1 1 1 1 1
1 2 2 2 1
1 2 3 2 1
1 2 2 2 1
1 1 1 1 1
I have no idea how to do it. I would really appreciate any suggestions.
1) Try this:
x <- 3 # input
n <- 2*x-1
m <- diag(n)
x - pmax(abs(row(m) - x), abs(col(m) - x))
giving:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 1 2 2 2 1
[3,] 1 2 3 2 1
[4,] 1 2 2 2 1
[5,] 1 1 1 1 1
2) A second approach is:
x <- 3 # input
n <- 2*x-1
mid <- pmin(1:n, n:1) # middle row/column
outer(mid, mid, pmin)
giving the same result as before.
3) yet another approach having some similarities to the prior two approaches is:
x <- 3 # input
n <- 2*x-1
Dist <- abs(seq_len(n) - x)
x - outer(Dist, Dist, pmax)
Note: The above gives the sample matrix shown in the question but the subject of the question says the rings should be increasing which may mean increasing from the center to the outside so if that is what is wanted then try this where m, mid and Dist are as before:
pmax(abs(row(m) - x), abs(col(m) - x)) + 1
or
x - outer(mid, mid, pmin) + 1
or
outer(Dist, Dist, pmax) + 1
Any of these give:
[,1] [,2] [,3] [,4] [,5]
[1,] 3 3 3 3 3
[2,] 3 2 2 2 3
[3,] 3 2 1 2 3
[4,] 3 2 2 2 3
[5,] 3 3 3 3 3
Try this:
x<-3
res<-matrix(nrow=2*x-1,ncol=2*x-1)
for (i in 1:x) res[i:(2*x-i),i:(2*x-i)]<-i
res
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 1 1 1 1
#[2,] 1 2 2 2 1
#[3,] 1 2 3 2 1
#[4,] 1 2 2 2 1
#[5,] 1 1 1 1 1
A recursive solution for kicks (odd n only)
f <- function(n) if (n == 1) 1 else `[<-`(matrix(1,n,n), 2:(n-1), 2:(n-1), 1+Recall(n-2))
f(5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 1
# [2,] 1 2 2 2 1
# [3,] 1 2 3 2 1
# [4,] 1 2 2 2 1
# [5,] 1 1 1 1 1
Here's the logic, implement it yourself in R.
Create a matrix with number of rows and columns equal to 2*x-1 and
fill it with zeros and start traversing the array from (0,0) to
(2*x-2,2*x-2).
Now, at each cell, calculate the 'level' of the cell. The level of
the cell is the nearest distance of it from the four borders of
the matrix, i.e. min(i,j,2*x-2-i,2*x-2-j).
This 'level' value is the one to be put in the cell.

Subtracting two columns to give a new column in R

Hello I´am trying to subtract the column B from column A in a dat matrix to create a C column (A - B):
My input:
A B
1 2
2 2
3 2
4 2
My expected output:
A B C
1 2 -1
2 2 0
3 2 1
4 2 2
I have tried: dat$C <- (dat$A - dat$B), but I get a: ## $ operator is invalid for atomic vectorserror
Cheers.
As #Bryan Hanson was saying in the above comment, your syntax and data organization relates more to a data frame. I would treat your data as a data frame and simply use the syntax you provided earlier:
> data <- data.frame(A = c(1,2,3,4), B = c(2,2,2,2))
> data$C <- (data$A - data$B)
> data
A B C
1 1 2 -1
2 2 2 0
3 3 2 1
4 4 2 2
Yes right, If you really mean a matrix, you can see this example
> x <- matrix(data=1:3,nrow=4,ncol=3)
> x
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 1
[3,] 3 1 2
[4,] 1 2 3
> x[,3] = x[,1]-x[,2]
> x
[,1] [,2] [,3]
[1,] 1 2 -1
[2,] 2 3 -1
[3,] 3 1 2
[4,] 1 2 -1
>
However, one should absolutely point out here that matrix operations in R don't abide by the usual linear algebra closure.
> x[,3]-x[,1]
[1] 2 -1 -1 2
> is.matrix(x[,3]-x[,1])
[1] FALSE
Further, one more directly anticipates a column vector which can be obtained by applying the matrix transpose TWICE (which is weird in itself) with the first one casting the output as a matrix.
> t(t(x[,3]-x[,1]))
[,1]
[1,] 2
[2,] -1
[3,] -1
[4,] 2
Or one can use the drop=FALSE option:
>x[,3,drop=FALSE] - x[,1,drop=FALSE]
[,1]
[1,] 2
[2,] -1
[3,] -1
[4,] 2
Behavior of 2D slices From 3D array are fine though.

R Sum complete cases of two columns

How can I sum the number of complete cases of two columns?
With c equal to:
a b
[1,] NA NA
[2,] 1 1
[3,] 1 1
[4,] NA 1
Applying something like
rollapply(c, 2, function(x) sum(complete.cases(x)),fill=NA)
I'd like to get back a single number, 2 in this case. This will be for a large data set with many columns, so I'd like to use rollapply across the whole set instead of simply doing sum(complete.cases(a,b)).
Am I over thinking it?
Thanks!
Did you try sum(complete.cases(x))?!
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 15 , TRUE ) , 5 )
# [,1] [,2] [,3]
#[1,] 1 NA 5
#[2,] 4 3 2
#[3,] 2 5 4
#[4,] 5 3 3
#[5,] 5 2 NA
sum(complete.cases(x))
#[1] 3
To find the complete.cases() of the first two columns:
sum(complete.cases(x[,1:2]))
#[1] 4
And to apply to two columns of a matrix across the whole matrix you could do this:
# Bigger data for example
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 50 , TRUE ) , 5 )
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 1 NA 5 5 5 4 5 2 NA NA
#[2,] 4 3 2 1 4 3 5 4 2 1
#[3,] 2 5 4 NA 3 3 4 1 2 2
#[4,] 5 3 3 1 5 1 4 1 2 1
#[5,] 5 2 NA 5 3 NA NA 1 NA 5
# Column indices
id <- seq( 1 , ncol(x) , by = 2 )
[1] 1 3 5 7 9
apply( cbind(id,id+1) , 1 , function(i) sum(complete.cases(x[,c(i)])) )
[1] 4 3 4 4 3
complete.cases() works row-wise across the whole data.frame or matrix returning TRUE for those rows which are not missing any data. A minor aside, "c" is a bad variable name because c() is one of the most commonly used functions.
You can calculate the number of complete cases in neighboring matrix columns using rollapply like this:
m <- matrix(c(NA,1,1,NA,1,1,1,1),ncol=4)
# [,1] [,2] [,3] [,4]
#[1,] NA 1 1 1
#[2,] 1 NA 1 1
library(zoo)
rowSums(rollapply(is.na(t(m)), 2, function(x) !any(x)))
#[1] 0 1 2
This shoudl work for both matrix and data.frame
> sum(apply(c, 1, function(x)all(!is.na(x))))
[1] 2
and you could simply iterate through large matrix M
for (i in 1:(ncol(M)-1) ){
c <- M[,c(i,i+1]
agreement <- sum(apply(c, 1, function(x)all(!is.na(x))))
}

Resources