I am trying to populate a matrix using the values from a specific column (Dependent). In the example below in row 1 the Dependent value is 3 which will indicate a 1 in the 3rd column. Row 4 has a Dependent value of 2 so a 1 is put in column 2. I have considered using a for loop but was interested if there is a more elegant way of solving the problem.
Project Dependent 1 2 3 4
1 3 1
2
3
4 2 1
5 4 1
Thanks in advance!
For
Project <- 1:5
Dependent <- c(3, 0, 0, 2, 4)
df <- data.frame(Project, Dependent)
Create a matrix
m = matrix(nrow = max(df$Project), ncol = max(df$Dependent))
and populate it using a 2-column matrix of row and column vectors as indexes
m[as.matrix(df)] = 1
here is what you described. hope it helps
Project<-1:5
Dependent<-c(3,0,0,2,4)
df<-data.frame(Project,Dependent)
df
Project Dependent
1 1 3
2 2 0
3 3 0
4 4 2
5 5 4
s<-matrix(NA, nrow = nrow(df), ncol = nrow(df))
for(i in 1:length(df$Dependent)) {
if (i > 0 ) s[i,df$Dependent[i]]<-1 else NULL
}
s
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA 1 NA NA
[2,] NA NA NA NA NA
[3,] NA NA NA NA NA
[4,] NA 1 NA NA NA
[5,] NA NA NA 1 NA
Related
Here is the data below. I'm not sure which type of looping I should be using, but here is what I am looking to do: If, for row 1, there is a 6 present, then for column 7 we have "Yes", if there is no 6 present, then column 7 has "No". Ignore columns 8 & 9.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 6 1 1 6 1 NA NA NA
[2,] 5 5 5 5 5 5 NA NA NA
[3,] 1 1 6 1 1 6 NA NA NA
[4,] 5 5 5 5 5 5 NA NA NA
[5,] 6 1 1 6 1 1 NA NA NA
[6,] 5 5 5 5 5 5 NA NA NA
[7,] 1 6 1 1 6 1 NA NA NA
[8,] 5 5 5 5 5 5 NA NA NA
[9,] 1 1 6 1 1 6 NA NA NA
[10,] 5 5 5 5 5 5 NA NA NA
Here is the code that I have.
data.matrix <- matrix(data=NA,nrow = b, ncol = n+3)
b <- 10
n <- 6
for (i in 1:b)
{
data.matrix[,1:n] <- sample(6,n,replace=T)
}
Side Note: I keep getting this error
"the condition has length > 1 and only the first element will be used"
Here is a solution using apply:
a[,7] <- apply(a, 1, function(x) ifelse(max(x,na.rm = T) == 6,"YES","NO"))
where a is the input data.frame/tibble. As commented above, if you have matrix, then convert it to data.frame and perform this operation.
Here is solution with lapply and which:
res <- apply(data.matrix, 1, function(x) {
x[[7]] <- length(which(x == 6)) > 0
x
})
res <- t(res)
I'm trying to count co-occurrences along a single dimension. It's somewhat similar to win/loss, dominance matrices, or frequency tables, (and spectrograms/raster plots) but without directionality and along 1 variable.
Here's an example of the data:
person response
1 a 1
2 a 2
3 a 4
4 b 1
5 b 2
6 c 2
7 c 4
8 d 4
9 d 3
The goal would be to get an n x n matrix as the one shown below (the NA's can also be the number of occurrences period):
[,1] [,2] [,3] [,4]
[1,] NA 2 0 1
[2,] - NA 0 2
[3,] - - NA 1
[4,] - - - NA
How can I convert the long data into a matrix in R? (without manual counting).
What is this type of metric is called? It's not a typical 'contingency' table.
After the table is created, what's the best way to plot the resulting matrix with colors denoting the count/frequency?
Test this
r1 = sort(unique(df1$response))
r2 = split(df1$response, df1$person)
ans = sapply(seq_along(r1), function(i)
rowSums(sapply(r2, function(x) (r1[i] %in% x) * (r1 %in% x))))
diag(ans) = NA
ans
# [,1] [,2] [,3] [,4]
#[1,] NA 2 0 1
#[2,] 2 NA 0 2
#[3,] 0 0 NA 1
#[4,] 1 2 1 NA
If I have 2 square Matrices with random NA values, for example:
Matrix A:
1 2 3
1 5 NA 7
2 NA 3 8
3 NA 4 5
Matrix B:
1 2 3
1 NA 8 NA
2 2 5 9
3 NA 4 3
What is the best way to multiply them? Would changing NA values to 0 give a different result of the dot product?
NAs will be ignored:
## Dummy matrices
mat1 <- matrix(sample(1:9, 9), 3, 3)
mat2 <- matrix(sample(1:9, 9), 3, 3)
## Adding NAs
mat1[sample(1:9, 4)] <- NA
mat2[sample(1:9, 4)] <- NA
mat1
# [,1] [,2] [,3]
#[1,] 9 NA 3
#[2,] 2 NA NA
#[3,] NA 1 8
mat2
# [,1] [,2] [,3]
#[1,] NA NA 4
#[2,] NA 9 3
#[3,] NA 7 1
mat1 * mat2
# [,1] [,2] [,3]
#[1,] NA NA 12
#[2,] NA NA NA
#[3,] NA 7 8
mat1 %*% mat2
# [,1] [,2] [,3]
#[1,] NA NA NA
#[2,] NA NA NA
#[3,] NA NA NA
In this case the dot product results in only NAs because there are no operations that does not involve an NA. Different matrices can lead to different results.
When running this code, I get the following error:
Error in `[<-.data.frame`(`*tmp*`, , i, value = list(x = 0.0654882985934691, :
new columns would leave holes after existing columns
I am trying to populate a data.frame with i number of columns, which with the output of the posted for loop should look like something like this (Excel example for convenience only):
The aim is to store the output of the loop in such a way that I can get the average of each column at a later stage.
What can be done to achieve this?
library(plyr)
library(forecast)
library(vars)
x <- rnorm(70)
y <- rnorm(70)
dx <- cbind(x,y)
dx <- as.ts(dx)
# Forecast Accuracy
j = 12 #Forecast horizon
k = nrow(dx)-j #length of minimum training set
prediction <- data.frame()
for (i in 1:j) {
trainingset <- window(dx, end = k+i-1)
testset <- window(dx, start = k+i, end = k+j)
fit <- VAR(trainingset, p = 2)
fcast <- forecast(fit, h = j-i+1)
fcastmean <- do.call('cbind', fcast[['mean']])
fcastmean <- as.data.frame(fcastmean)
prediction[,i] <- rbind(fcastmean[,1])
}
Edit
As per the comment below, I have edited the above code to specify the first variable of fcastmean.
The error I get has however changed as a result, now being:
Error in `[<-.data.frame`(`*tmp*`, , i, value = c(-0.316529962287372, :
replacement has 1 row, data has 0
Edit 2
Below is the minimum replicable version without any packages as requested in the comments. I believe that should be equivalent in terms of the question posed.
x <- rnorm(70)
y <- rnorm(70)
dx <- cbind(x,y)
dx <- as.ts(dx)
j = 12
k = nrow(dx)-j
prediction <- matrix(NA,j,j)
for (i in 1:j) {
fcast <- as.matrix(1:(j-i+1))
fcastmean <- fcast
prediction[,i] <- (fcastmean)
}
For your new example, try
sapply(1:j, function(i) `length<-`(1:(j-i+1), j))
The result is
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2 2 NA
[3,] 3 3 3 3 3 3 3 3 3 3 NA NA
[4,] 4 4 4 4 4 4 4 4 4 NA NA NA
[5,] 5 5 5 5 5 5 5 5 NA NA NA NA
[6,] 6 6 6 6 6 6 6 NA NA NA NA NA
[7,] 7 7 7 7 7 7 NA NA NA NA NA NA
[8,] 8 8 8 8 8 NA NA NA NA NA NA NA
[9,] 9 9 9 9 NA NA NA NA NA NA NA NA
[10,] 10 10 10 NA NA NA NA NA NA NA NA NA
[11,] 11 11 NA NA NA NA NA NA NA NA NA NA
[12,] 12 NA NA NA NA NA NA NA NA NA NA NA
`length<-`(x, j) pads x with NA until it reaches a length of j.
You can replace 1:(j-i+1) with whatever function of i you want. In the OP's original example, I am guessing something like this will work (untested):
sapply(1:j, function(i){
trainingset <- window(dx, end = k+i-1)
# testset <- window(dx, start = k+i, end = k+j)
# ^ this isn't actually used...
fit <- VAR(trainingset, p = 2)
fcast <- forecast(fit, h = j-i+1)
`length<-`(fcast$mean, j)
})
function(i){...} is called an anonymous function and can be written like any other.
I have table that looks like this:
Row Col Value
1 1 31
1 2 56
1 8 13
2 1 83
2 2 51
2 9 16
3 2 53
I need to convert this table into matrix (Row column represents rows and Col column represents columns). For the output like this:
1 2 3 4 5 6 7 8 9
1 31 56 NA NA NA NA NA 13 NA
2 81 51 NA NA NA NA NA NA 16
3 NA 53 NA NA NA NA NA NA NA
I believe that there is quick way to do what I want as my solution would be looping for every row/column combination and cbind everything.
Reproducible example:
require(data.table)
myTable <- data.table(
Row = c(1,1,1,2,2,2,3),
Col = c(1,2,8,1,2,9,1),
Value = c(31,56,13,83,51,16,53))
Straightforward:
dat <- data.frame(
Row = c(1,1,1,2,2,2,3),
Col = c(1,2,8,1,2,9,1),
Value = c(31,56,13,83,51,16,53))
m = matrix(NA, nrow = max(dat$Row), ncol = max(dat$Col))
m[cbind(dat$Row, dat$Col)] = dat$Value
m
Sparse matrix. You probably want a sparse matrix
require(Matrix) # doesn't require installation
mySmat <- with(myTable,sparseMatrix(Row,Col,x=Value))
which gives
3 x 9 sparse Matrix of class "dgCMatrix"
[1,] 31 56 . . . . . 13 .
[2,] 83 51 . . . . . . 16
[3,] 53 . . . . . . . .
Matrix. If you really need a matrix-class object with NAs, there's
myMat <- as.matrix(mySmat)
myMat[myMat==0] <- NA
which gives
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 31 56 NA NA NA NA NA 13 NA
[2,] 83 51 NA NA NA NA NA NA 16
[3,] 53 NA NA NA NA NA NA NA NA
Efficiency considerations. For shorter code:
myMat <- with(myTable,as.matrix(sparseMatrix(Row,Col,x=Value)))
myMat[myMat==0] <- NA
For faster speed (but slower than creating a sparse matrix), initialize to NA and then fill, as #jimmyb and #bgoldst do:
myMat <- with(myTable,matrix(,max(Row),max(Col)))
myMat[cbind(myTable$Row,myTable$Col)] <- myTable$Value
This workaround is only necessary if you insist on NAs over zeros. A sparse matrix is almost certainly what you should use. Creating and working with it should be faster; and storing it should be less memory-intensive.
I believe the most concise and performant way to achieve this is to preallocate the matrix with NAs, and then assign a vector slice by manually computing the linear indexes from Row and Col:
df <- data.frame(Row=c(1,1,1,2,2,2,3), Col=c(1,2,8,1,2,9,2), Value=c(31,56,13,83,51,16,53) );
m <- matrix(NA,max(df$Row),max(df$Col));
m[(df$Col-1)*nrow(m)+df$Row] <- df$Value;
m;
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 31 56 NA NA NA NA NA 13 NA
## [2,] 83 51 NA NA NA NA NA NA 16
## [3,] NA 53 NA NA NA NA NA NA NA
xtabs in base R is perfect for this if you can live with "0" where you have NA.
This would be the basic approach:
xtabs(Value ~ Row + Col, myTable)
# Col
# Row 1 2 8 9
# 1 31 56 13 0
# 2 83 51 0 16
# 3 53 0 0 0
However, that doesn't fill in the gaps, because not all factor levels are available. You can do this separately, or on-the-fly, like this:
xtabs(Value ~ factor(Row, sequence(max(Row))) +
factor(Col, sequence(max(Col))), myTable)
# factor(Col, sequence(max(Col)))
# factor(Row, sequence(max(Row))) 1 2 3 4 5 6 7 8 9
# 1 31 56 0 0 0 0 0 13 0
# 2 83 51 0 0 0 0 0 0 16
# 3 53 0 0 0 0 0 0 0 0
By extension, this means that if the "Row" and "Col" values are factors, dcast.data.table should also work:
dcast.data.table(myTable, Row ~ Col, value.var = "Value", drop = FALSE)
(But it doesn't in my test for some reason. I had to do library(reshape2); dcast(myTable, Row ~ Col, value.var = "Value", drop = FALSE) to get it to work, thus not taking advantage of "data.table" speed.)