R: Convert obscure table into matrix - r

I have table that looks like this:
Row Col Value
1 1 31
1 2 56
1 8 13
2 1 83
2 2 51
2 9 16
3 2 53
I need to convert this table into matrix (Row column represents rows and Col column represents columns). For the output like this:
1 2 3 4 5 6 7 8 9
1 31 56 NA NA NA NA NA 13 NA
2 81 51 NA NA NA NA NA NA 16
3 NA 53 NA NA NA NA NA NA NA
I believe that there is quick way to do what I want as my solution would be looping for every row/column combination and cbind everything.
Reproducible example:
require(data.table)
myTable <- data.table(
Row = c(1,1,1,2,2,2,3),
Col = c(1,2,8,1,2,9,1),
Value = c(31,56,13,83,51,16,53))

Straightforward:
dat <- data.frame(
Row = c(1,1,1,2,2,2,3),
Col = c(1,2,8,1,2,9,1),
Value = c(31,56,13,83,51,16,53))
m = matrix(NA, nrow = max(dat$Row), ncol = max(dat$Col))
m[cbind(dat$Row, dat$Col)] = dat$Value
m

Sparse matrix. You probably want a sparse matrix
require(Matrix) # doesn't require installation
mySmat <- with(myTable,sparseMatrix(Row,Col,x=Value))
which gives
3 x 9 sparse Matrix of class "dgCMatrix"
[1,] 31 56 . . . . . 13 .
[2,] 83 51 . . . . . . 16
[3,] 53 . . . . . . . .
Matrix. If you really need a matrix-class object with NAs, there's
myMat <- as.matrix(mySmat)
myMat[myMat==0] <- NA
which gives
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 31 56 NA NA NA NA NA 13 NA
[2,] 83 51 NA NA NA NA NA NA 16
[3,] 53 NA NA NA NA NA NA NA NA
Efficiency considerations. For shorter code:
myMat <- with(myTable,as.matrix(sparseMatrix(Row,Col,x=Value)))
myMat[myMat==0] <- NA
For faster speed (but slower than creating a sparse matrix), initialize to NA and then fill, as #jimmyb and #bgoldst do:
myMat <- with(myTable,matrix(,max(Row),max(Col)))
myMat[cbind(myTable$Row,myTable$Col)] <- myTable$Value
This workaround is only necessary if you insist on NAs over zeros. A sparse matrix is almost certainly what you should use. Creating and working with it should be faster; and storing it should be less memory-intensive.

I believe the most concise and performant way to achieve this is to preallocate the matrix with NAs, and then assign a vector slice by manually computing the linear indexes from Row and Col:
df <- data.frame(Row=c(1,1,1,2,2,2,3), Col=c(1,2,8,1,2,9,2), Value=c(31,56,13,83,51,16,53) );
m <- matrix(NA,max(df$Row),max(df$Col));
m[(df$Col-1)*nrow(m)+df$Row] <- df$Value;
m;
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 31 56 NA NA NA NA NA 13 NA
## [2,] 83 51 NA NA NA NA NA NA 16
## [3,] NA 53 NA NA NA NA NA NA NA

xtabs in base R is perfect for this if you can live with "0" where you have NA.
This would be the basic approach:
xtabs(Value ~ Row + Col, myTable)
# Col
# Row 1 2 8 9
# 1 31 56 13 0
# 2 83 51 0 16
# 3 53 0 0 0
However, that doesn't fill in the gaps, because not all factor levels are available. You can do this separately, or on-the-fly, like this:
xtabs(Value ~ factor(Row, sequence(max(Row))) +
factor(Col, sequence(max(Col))), myTable)
# factor(Col, sequence(max(Col)))
# factor(Row, sequence(max(Row))) 1 2 3 4 5 6 7 8 9
# 1 31 56 0 0 0 0 0 13 0
# 2 83 51 0 0 0 0 0 0 16
# 3 53 0 0 0 0 0 0 0 0
By extension, this means that if the "Row" and "Col" values are factors, dcast.data.table should also work:
dcast.data.table(myTable, Row ~ Col, value.var = "Value", drop = FALSE)
(But it doesn't in my test for some reason. I had to do library(reshape2); dcast(myTable, Row ~ Col, value.var = "Value", drop = FALSE) to get it to work, thus not taking advantage of "data.table" speed.)

Related

How do I loop correctly?

Here is the data below. I'm not sure which type of looping I should be using, but here is what I am looking to do: If, for row 1, there is a 6 present, then for column 7 we have "Yes", if there is no 6 present, then column 7 has "No". Ignore columns 8 & 9.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 6 1 1 6 1 NA NA NA
[2,] 5 5 5 5 5 5 NA NA NA
[3,] 1 1 6 1 1 6 NA NA NA
[4,] 5 5 5 5 5 5 NA NA NA
[5,] 6 1 1 6 1 1 NA NA NA
[6,] 5 5 5 5 5 5 NA NA NA
[7,] 1 6 1 1 6 1 NA NA NA
[8,] 5 5 5 5 5 5 NA NA NA
[9,] 1 1 6 1 1 6 NA NA NA
[10,] 5 5 5 5 5 5 NA NA NA
Here is the code that I have.
data.matrix <- matrix(data=NA,nrow = b, ncol = n+3)
b <- 10
n <- 6
for (i in 1:b)
{
data.matrix[,1:n] <- sample(6,n,replace=T)
}
Side Note: I keep getting this error
"the condition has length > 1 and only the first element will be used"
Here is a solution using apply:
a[,7] <- apply(a, 1, function(x) ifelse(max(x,na.rm = T) == 6,"YES","NO"))
where a is the input data.frame/tibble. As commented above, if you have matrix, then convert it to data.frame and perform this operation.
Here is solution with lapply and which:
res <- apply(data.matrix, 1, function(x) {
x[[7]] <- length(which(x == 6)) > 0
x
})
res <- t(res)

Populate elements of a matrix using values from a column

I am trying to populate a matrix using the values from a specific column (Dependent). In the example below in row 1 the Dependent value is 3 which will indicate a 1 in the 3rd column. Row 4 has a Dependent value of 2 so a 1 is put in column 2. I have considered using a for loop but was interested if there is a more elegant way of solving the problem.
Project Dependent 1 2 3 4
1 3 1
2
3
4 2 1
5 4 1
Thanks in advance!
For
Project <- 1:5
Dependent <- c(3, 0, 0, 2, 4)
df <- data.frame(Project, Dependent)
Create a matrix
m = matrix(nrow = max(df$Project), ncol = max(df$Dependent))
and populate it using a 2-column matrix of row and column vectors as indexes
m[as.matrix(df)] = 1
here is what you described. hope it helps
Project<-1:5
Dependent<-c(3,0,0,2,4)
df<-data.frame(Project,Dependent)
df
Project Dependent
1 1 3
2 2 0
3 3 0
4 4 2
5 5 4
s<-matrix(NA, nrow = nrow(df), ncol = nrow(df))
for(i in 1:length(df$Dependent)) {
if (i > 0 ) s[i,df$Dependent[i]]<-1 else NULL
}
s
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA 1 NA NA
[2,] NA NA NA NA NA
[3,] NA NA NA NA NA
[4,] NA 1 NA NA NA
[5,] NA NA NA 1 NA

R Apply() function instead of a loop requiring the index of the values

I use a for loop (which works well) to replace randomly two values in each line of a dataset by NA (the indexes of this values are randomly changes at each line).
Now I would like to use apply() to do exactly the same thing.
I tried this code (as many other things which return NA everywhere):
my_fun<-function(x){if (j %in% sample(1:ncol(y),2)) {x[j]<-NA}}
apply(y,1,my_fun)
But it doesn't work (it does not make any change to the initial dataset).
The problem is that the object j is not found. j should be the number of the column.
Does someone have an idea?
From your description I argue that you want:
my_fun <- function(x) { x[sample(1:length(x), 2)] <- NA; x }
apply(y, 1, my_fun) # or
t(apply(y, 1, my_fun))
Testing the function:
set.seed(42)
y <- matrix(1:60, 10)
y
t(apply(y, 1, my_fun))
# > t(apply(y, 1, my_fun))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 11 21 31 NA NA
# [2,] 2 NA 22 32 NA 52
# [3,] 3 13 NA NA 43 53
# [4,] NA 14 24 34 NA 54
# [5,] 5 15 25 NA 45 NA
# [6,] 6 16 NA NA 46 56
# [7,] 7 NA 27 37 47 NA
# [8,] 8 18 NA 38 NA 58
# [9,] NA 19 29 39 49 NA
# [10,] 10 20 NA 40 50 NA

R: Error: new columns would leave holes after existing columns

When running this code, I get the following error:
Error in `[<-.data.frame`(`*tmp*`, , i, value = list(x = 0.0654882985934691, :
new columns would leave holes after existing columns
I am trying to populate a data.frame with i number of columns, which with the output of the posted for loop should look like something like this (Excel example for convenience only):
The aim is to store the output of the loop in such a way that I can get the average of each column at a later stage.
What can be done to achieve this?
library(plyr)
library(forecast)
library(vars)
x <- rnorm(70)
y <- rnorm(70)
dx <- cbind(x,y)
dx <- as.ts(dx)
# Forecast Accuracy
j = 12 #Forecast horizon
k = nrow(dx)-j #length of minimum training set
prediction <- data.frame()
for (i in 1:j) {
trainingset <- window(dx, end = k+i-1)
testset <- window(dx, start = k+i, end = k+j)
fit <- VAR(trainingset, p = 2)
fcast <- forecast(fit, h = j-i+1)
fcastmean <- do.call('cbind', fcast[['mean']])
fcastmean <- as.data.frame(fcastmean)
prediction[,i] <- rbind(fcastmean[,1])
}
Edit
As per the comment below, I have edited the above code to specify the first variable of fcastmean.
The error I get has however changed as a result, now being:
Error in `[<-.data.frame`(`*tmp*`, , i, value = c(-0.316529962287372, :
replacement has 1 row, data has 0
Edit 2
Below is the minimum replicable version without any packages as requested in the comments. I believe that should be equivalent in terms of the question posed.
x <- rnorm(70)
y <- rnorm(70)
dx <- cbind(x,y)
dx <- as.ts(dx)
j = 12
k = nrow(dx)-j
prediction <- matrix(NA,j,j)
for (i in 1:j) {
fcast <- as.matrix(1:(j-i+1))
fcastmean <- fcast
prediction[,i] <- (fcastmean)
}
For your new example, try
sapply(1:j, function(i) `length<-`(1:(j-i+1), j))
The result is
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2 2 NA
[3,] 3 3 3 3 3 3 3 3 3 3 NA NA
[4,] 4 4 4 4 4 4 4 4 4 NA NA NA
[5,] 5 5 5 5 5 5 5 5 NA NA NA NA
[6,] 6 6 6 6 6 6 6 NA NA NA NA NA
[7,] 7 7 7 7 7 7 NA NA NA NA NA NA
[8,] 8 8 8 8 8 NA NA NA NA NA NA NA
[9,] 9 9 9 9 NA NA NA NA NA NA NA NA
[10,] 10 10 10 NA NA NA NA NA NA NA NA NA
[11,] 11 11 NA NA NA NA NA NA NA NA NA NA
[12,] 12 NA NA NA NA NA NA NA NA NA NA NA
`length<-`(x, j) pads x with NA until it reaches a length of j.
You can replace 1:(j-i+1) with whatever function of i you want. In the OP's original example, I am guessing something like this will work (untested):
sapply(1:j, function(i){
trainingset <- window(dx, end = k+i-1)
# testset <- window(dx, start = k+i, end = k+j)
# ^ this isn't actually used...
fit <- VAR(trainingset, p = 2)
fcast <- forecast(fit, h = j-i+1)
`length<-`(fcast$mean, j)
})
function(i){...} is called an anonymous function and can be written like any other.

Loop columns of matrix with nested apply

I am trying to loop over the columns of a matrix and change certain predefined sequences within the colomns, which are available in form of vectors.
Let's say I have the following matrix:
m2 <- matrix(sample(1:36),9,4)
[,1] [,2] [,3] [,4]
[1,] 11 6 1 14
[2,] 22 16 27 3
[3,] 34 10 23 32
[4,] 21 19 31 35
[5,] 17 9 2 4
[6,] 28 18 29 5
[7,] 20 30 13 36
[8,] 26 33 24 15
[9,] 8 12 25 7
As an example my vector of sequence starts is a and my vector of sequence ends is b. Thus the first sequence to delete in all columns is a[1] to b[1], the 2nd a[2] to b[2] and so on.
My testing code is as follows:
testing <- function(x){
apply(x,2, function(y){
a <- c(1,5)
b <- c(2,8)
mapply(function(y){
y[a:b] <- NA; y
},a,b)
})
}
Expected outcome:
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] 34 10 23 32
[4,] 21 19 31 35
[5,] NA NA NA NA
[6,] NA NA NA NA
[7,] NA NA NA NA
[8,] NA NA NA NA
[9,] 8 12 25 7
Actual result:
Error in (function (y) : unused argument (dots[[2]][[1]])
What is wrong in the above code? I know I could just set the rows to NA, but I am trying to get the above output by using nested apply functions to learn more about them.
We get the sequence of corresponding elements of 'a', 'b' using Map, unlist to create a vector and assign the rows of 'm2' to NA based on that.
m2[unlist(Map(":", a, b)),] <- NA
m2
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] 34 10 23 32
# [4,] 21 19 31 35
# [5,] NA NA NA NA
# [6,] NA NA NA NA
# [7,] NA NA NA NA
# [8,] NA NA NA NA
# [9,] 8 12 25 7

Resources