Populate elements of a matrix using values from a column - r

I am trying to populate a matrix using the values from a specific column (Dependent). In the example below in row 1 the Dependent value is 3 which will indicate a 1 in the 3rd column. Row 4 has a Dependent value of 2 so a 1 is put in column 2. I have considered using a for loop but was interested if there is a more elegant way of solving the problem.
Project Dependent 1 2 3 4
1 3 1
2
3
4 2 1
5 4 1
Thanks in advance!

For
Project <- 1:5
Dependent <- c(3, 0, 0, 2, 4)
df <- data.frame(Project, Dependent)
Create a matrix
m = matrix(nrow = max(df$Project), ncol = max(df$Dependent))
and populate it using a 2-column matrix of row and column vectors as indexes
m[as.matrix(df)] = 1

here is what you described. hope it helps
Project<-1:5
Dependent<-c(3,0,0,2,4)
df<-data.frame(Project,Dependent)
df
Project Dependent
1 1 3
2 2 0
3 3 0
4 4 2
5 5 4
s<-matrix(NA, nrow = nrow(df), ncol = nrow(df))
for(i in 1:length(df$Dependent)) {
if (i > 0 ) s[i,df$Dependent[i]]<-1 else NULL
}
s
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA 1 NA NA
[2,] NA NA NA NA NA
[3,] NA NA NA NA NA
[4,] NA 1 NA NA NA
[5,] NA NA NA 1 NA

Related

How do I loop correctly?

Here is the data below. I'm not sure which type of looping I should be using, but here is what I am looking to do: If, for row 1, there is a 6 present, then for column 7 we have "Yes", if there is no 6 present, then column 7 has "No". Ignore columns 8 & 9.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 6 1 1 6 1 NA NA NA
[2,] 5 5 5 5 5 5 NA NA NA
[3,] 1 1 6 1 1 6 NA NA NA
[4,] 5 5 5 5 5 5 NA NA NA
[5,] 6 1 1 6 1 1 NA NA NA
[6,] 5 5 5 5 5 5 NA NA NA
[7,] 1 6 1 1 6 1 NA NA NA
[8,] 5 5 5 5 5 5 NA NA NA
[9,] 1 1 6 1 1 6 NA NA NA
[10,] 5 5 5 5 5 5 NA NA NA
Here is the code that I have.
data.matrix <- matrix(data=NA,nrow = b, ncol = n+3)
b <- 10
n <- 6
for (i in 1:b)
{
data.matrix[,1:n] <- sample(6,n,replace=T)
}
Side Note: I keep getting this error
"the condition has length > 1 and only the first element will be used"
Here is a solution using apply:
a[,7] <- apply(a, 1, function(x) ifelse(max(x,na.rm = T) == 6,"YES","NO"))
where a is the input data.frame/tibble. As commented above, if you have matrix, then convert it to data.frame and perform this operation.
Here is solution with lapply and which:
res <- apply(data.matrix, 1, function(x) {
x[[7]] <- length(which(x == 6)) > 0
x
})
res <- t(res)

Frequency/Contingency table along 1 dimension in R

I'm trying to count co-occurrences along a single dimension. It's somewhat similar to win/loss, dominance matrices, or frequency tables, (and spectrograms/raster plots) but without directionality and along 1 variable.
Here's an example of the data:
person response
1 a 1
2 a 2
3 a 4
4 b 1
5 b 2
6 c 2
7 c 4
8 d 4
9 d 3
The goal would be to get an n x n matrix as the one shown below (the NA's can also be the number of occurrences period):
[,1] [,2] [,3] [,4]
[1,] NA 2 0 1
[2,] - NA 0 2
[3,] - - NA 1
[4,] - - - NA
How can I convert the long data into a matrix in R? (without manual counting).
What is this type of metric is called? It's not a typical 'contingency' table.
After the table is created, what's the best way to plot the resulting matrix with colors denoting the count/frequency?
Test this
r1 = sort(unique(df1$response))
r2 = split(df1$response, df1$person)
ans = sapply(seq_along(r1), function(i)
rowSums(sapply(r2, function(x) (r1[i] %in% x) * (r1 %in% x))))
diag(ans) = NA
ans
# [,1] [,2] [,3] [,4]
#[1,] NA 2 0 1
#[2,] 2 NA 0 2
#[3,] 0 0 NA 1
#[4,] 1 2 1 NA

Multiplication of matrices with NA values

If I have 2 square Matrices with random NA values, for example:
Matrix A:
1 2 3
1 5 NA 7
2 NA 3 8
3 NA 4 5
Matrix B:
1 2 3
1 NA 8 NA
2 2 5 9
3 NA 4 3
What is the best way to multiply them? Would changing NA values to 0 give a different result of the dot product?
NAs will be ignored:
## Dummy matrices
mat1 <- matrix(sample(1:9, 9), 3, 3)
mat2 <- matrix(sample(1:9, 9), 3, 3)
## Adding NAs
mat1[sample(1:9, 4)] <- NA
mat2[sample(1:9, 4)] <- NA
mat1
# [,1] [,2] [,3]
#[1,] 9 NA 3
#[2,] 2 NA NA
#[3,] NA 1 8
mat2
# [,1] [,2] [,3]
#[1,] NA NA 4
#[2,] NA 9 3
#[3,] NA 7 1
mat1 * mat2
# [,1] [,2] [,3]
#[1,] NA NA 12
#[2,] NA NA NA
#[3,] NA 7 8
mat1 %*% mat2
# [,1] [,2] [,3]
#[1,] NA NA NA
#[2,] NA NA NA
#[3,] NA NA NA
In this case the dot product results in only NAs because there are no operations that does not involve an NA. Different matrices can lead to different results.

R: Error: new columns would leave holes after existing columns

When running this code, I get the following error:
Error in `[<-.data.frame`(`*tmp*`, , i, value = list(x = 0.0654882985934691, :
new columns would leave holes after existing columns
I am trying to populate a data.frame with i number of columns, which with the output of the posted for loop should look like something like this (Excel example for convenience only):
The aim is to store the output of the loop in such a way that I can get the average of each column at a later stage.
What can be done to achieve this?
library(plyr)
library(forecast)
library(vars)
x <- rnorm(70)
y <- rnorm(70)
dx <- cbind(x,y)
dx <- as.ts(dx)
# Forecast Accuracy
j = 12 #Forecast horizon
k = nrow(dx)-j #length of minimum training set
prediction <- data.frame()
for (i in 1:j) {
trainingset <- window(dx, end = k+i-1)
testset <- window(dx, start = k+i, end = k+j)
fit <- VAR(trainingset, p = 2)
fcast <- forecast(fit, h = j-i+1)
fcastmean <- do.call('cbind', fcast[['mean']])
fcastmean <- as.data.frame(fcastmean)
prediction[,i] <- rbind(fcastmean[,1])
}
Edit
As per the comment below, I have edited the above code to specify the first variable of fcastmean.
The error I get has however changed as a result, now being:
Error in `[<-.data.frame`(`*tmp*`, , i, value = c(-0.316529962287372, :
replacement has 1 row, data has 0
Edit 2
Below is the minimum replicable version without any packages as requested in the comments. I believe that should be equivalent in terms of the question posed.
x <- rnorm(70)
y <- rnorm(70)
dx <- cbind(x,y)
dx <- as.ts(dx)
j = 12
k = nrow(dx)-j
prediction <- matrix(NA,j,j)
for (i in 1:j) {
fcast <- as.matrix(1:(j-i+1))
fcastmean <- fcast
prediction[,i] <- (fcastmean)
}
For your new example, try
sapply(1:j, function(i) `length<-`(1:(j-i+1), j))
The result is
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2 2 NA
[3,] 3 3 3 3 3 3 3 3 3 3 NA NA
[4,] 4 4 4 4 4 4 4 4 4 NA NA NA
[5,] 5 5 5 5 5 5 5 5 NA NA NA NA
[6,] 6 6 6 6 6 6 6 NA NA NA NA NA
[7,] 7 7 7 7 7 7 NA NA NA NA NA NA
[8,] 8 8 8 8 8 NA NA NA NA NA NA NA
[9,] 9 9 9 9 NA NA NA NA NA NA NA NA
[10,] 10 10 10 NA NA NA NA NA NA NA NA NA
[11,] 11 11 NA NA NA NA NA NA NA NA NA NA
[12,] 12 NA NA NA NA NA NA NA NA NA NA NA
`length<-`(x, j) pads x with NA until it reaches a length of j.
You can replace 1:(j-i+1) with whatever function of i you want. In the OP's original example, I am guessing something like this will work (untested):
sapply(1:j, function(i){
trainingset <- window(dx, end = k+i-1)
# testset <- window(dx, start = k+i, end = k+j)
# ^ this isn't actually used...
fit <- VAR(trainingset, p = 2)
fcast <- forecast(fit, h = j-i+1)
`length<-`(fcast$mean, j)
})
function(i){...} is called an anonymous function and can be written like any other.

R: Convert obscure table into matrix

I have table that looks like this:
Row Col Value
1 1 31
1 2 56
1 8 13
2 1 83
2 2 51
2 9 16
3 2 53
I need to convert this table into matrix (Row column represents rows and Col column represents columns). For the output like this:
1 2 3 4 5 6 7 8 9
1 31 56 NA NA NA NA NA 13 NA
2 81 51 NA NA NA NA NA NA 16
3 NA 53 NA NA NA NA NA NA NA
I believe that there is quick way to do what I want as my solution would be looping for every row/column combination and cbind everything.
Reproducible example:
require(data.table)
myTable <- data.table(
Row = c(1,1,1,2,2,2,3),
Col = c(1,2,8,1,2,9,1),
Value = c(31,56,13,83,51,16,53))
Straightforward:
dat <- data.frame(
Row = c(1,1,1,2,2,2,3),
Col = c(1,2,8,1,2,9,1),
Value = c(31,56,13,83,51,16,53))
m = matrix(NA, nrow = max(dat$Row), ncol = max(dat$Col))
m[cbind(dat$Row, dat$Col)] = dat$Value
m
Sparse matrix. You probably want a sparse matrix
require(Matrix) # doesn't require installation
mySmat <- with(myTable,sparseMatrix(Row,Col,x=Value))
which gives
3 x 9 sparse Matrix of class "dgCMatrix"
[1,] 31 56 . . . . . 13 .
[2,] 83 51 . . . . . . 16
[3,] 53 . . . . . . . .
Matrix. If you really need a matrix-class object with NAs, there's
myMat <- as.matrix(mySmat)
myMat[myMat==0] <- NA
which gives
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 31 56 NA NA NA NA NA 13 NA
[2,] 83 51 NA NA NA NA NA NA 16
[3,] 53 NA NA NA NA NA NA NA NA
Efficiency considerations. For shorter code:
myMat <- with(myTable,as.matrix(sparseMatrix(Row,Col,x=Value)))
myMat[myMat==0] <- NA
For faster speed (but slower than creating a sparse matrix), initialize to NA and then fill, as #jimmyb and #bgoldst do:
myMat <- with(myTable,matrix(,max(Row),max(Col)))
myMat[cbind(myTable$Row,myTable$Col)] <- myTable$Value
This workaround is only necessary if you insist on NAs over zeros. A sparse matrix is almost certainly what you should use. Creating and working with it should be faster; and storing it should be less memory-intensive.
I believe the most concise and performant way to achieve this is to preallocate the matrix with NAs, and then assign a vector slice by manually computing the linear indexes from Row and Col:
df <- data.frame(Row=c(1,1,1,2,2,2,3), Col=c(1,2,8,1,2,9,2), Value=c(31,56,13,83,51,16,53) );
m <- matrix(NA,max(df$Row),max(df$Col));
m[(df$Col-1)*nrow(m)+df$Row] <- df$Value;
m;
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 31 56 NA NA NA NA NA 13 NA
## [2,] 83 51 NA NA NA NA NA NA 16
## [3,] NA 53 NA NA NA NA NA NA NA
xtabs in base R is perfect for this if you can live with "0" where you have NA.
This would be the basic approach:
xtabs(Value ~ Row + Col, myTable)
# Col
# Row 1 2 8 9
# 1 31 56 13 0
# 2 83 51 0 16
# 3 53 0 0 0
However, that doesn't fill in the gaps, because not all factor levels are available. You can do this separately, or on-the-fly, like this:
xtabs(Value ~ factor(Row, sequence(max(Row))) +
factor(Col, sequence(max(Col))), myTable)
# factor(Col, sequence(max(Col)))
# factor(Row, sequence(max(Row))) 1 2 3 4 5 6 7 8 9
# 1 31 56 0 0 0 0 0 13 0
# 2 83 51 0 0 0 0 0 0 16
# 3 53 0 0 0 0 0 0 0 0
By extension, this means that if the "Row" and "Col" values are factors, dcast.data.table should also work:
dcast.data.table(myTable, Row ~ Col, value.var = "Value", drop = FALSE)
(But it doesn't in my test for some reason. I had to do library(reshape2); dcast(myTable, Row ~ Col, value.var = "Value", drop = FALSE) to get it to work, thus not taking advantage of "data.table" speed.)

Resources